169 lines
3.8 KiB
Markdown
169 lines
3.8 KiB
Markdown
# Backup / Restore Runbook (Pre-Prod & Prod)
|
||
|
||
## 1. Scope
|
||
|
||
适用于 `quyun_v2` 的以下状态数据:
|
||
- PostgreSQL(业务主数据)
|
||
- 对象存储目录(本地存储或 S3 兼容对象)
|
||
- 关键运行配置快照(不含明文 secret)
|
||
|
||
本 Runbook 目标:
|
||
1. 能稳定执行备份
|
||
2. 能在预发环境完成恢复
|
||
3. 有明确 RTO / RPO 验证步骤
|
||
|
||
---
|
||
|
||
## 2. Preconditions
|
||
|
||
- 拥有数据库备份权限(`pg_dump` / `psql`)
|
||
- 拥有对象存储读写权限(本地目录或 S3 API)
|
||
- 预发环境可用并与生产版本兼容
|
||
- 已确认以下变量(示例):
|
||
|
||
```bash
|
||
export QY_DB_HOST=127.0.0.1
|
||
export QY_DB_PORT=5432
|
||
export QY_DB_NAME=quyun_v2
|
||
export QY_DB_USER=postgres
|
||
export QY_DB_PASSWORD='***'
|
||
```
|
||
|
||
---
|
||
|
||
## 3. PostgreSQL Backup
|
||
|
||
### 3.1 创建备份目录
|
||
|
||
```bash
|
||
mkdir -p /tmp/quyun-backup
|
||
```
|
||
|
||
### 3.2 导出数据库(自定义格式)
|
||
|
||
```bash
|
||
PGPASSWORD="$QY_DB_PASSWORD" \
|
||
pg_dump -h "$QY_DB_HOST" -p "$QY_DB_PORT" -U "$QY_DB_USER" \
|
||
-F c -d "$QY_DB_NAME" \
|
||
-f "/tmp/quyun-backup/${QY_DB_NAME}_$(date +%Y%m%d_%H%M%S).dump"
|
||
```
|
||
|
||
### 3.3 备份完整性校验
|
||
|
||
```bash
|
||
pg_restore -l /tmp/quyun-backup/<backup-file>.dump >/tmp/quyun-backup/restore.list
|
||
```
|
||
|
||
验收标准:命令退出码为 0,且 `restore.list` 非空。
|
||
|
||
---
|
||
|
||
## 4. Object Storage Backup
|
||
|
||
## 4.1 本地存储(`Storage.Type=local`)
|
||
|
||
```bash
|
||
tar -czf "/tmp/quyun-backup/storage_$(date +%Y%m%d_%H%M%S).tar.gz" ./backend/storage
|
||
```
|
||
|
||
### 4.2 S3/MinIO(`Storage.Type=s3`)
|
||
|
||
使用 `mc`(MinIO Client)示例:
|
||
|
||
```bash
|
||
mc alias set quyun-s3 http://127.0.0.1:9000 "$STORAGE_ACCESS_KEY" "$STORAGE_SECRET_KEY"
|
||
mc mirror quyun-s3/quyun-01 "/tmp/quyun-backup/s3_quyun-01_$(date +%Y%m%d_%H%M%S)"
|
||
```
|
||
|
||
验收标准:目标目录文件数量 > 0,且抽样对象可读取。
|
||
|
||
---
|
||
|
||
## 5. Restore Procedure (Pre-Prod Drill)
|
||
|
||
### 5.1 预发库准备
|
||
|
||
```bash
|
||
PGPASSWORD="$QY_DB_PASSWORD" \
|
||
psql -h "$QY_DB_HOST" -p "$QY_DB_PORT" -U "$QY_DB_USER" -d postgres \
|
||
-c "DROP DATABASE IF EXISTS ${QY_DB_NAME}_restore;"
|
||
|
||
PGPASSWORD="$QY_DB_PASSWORD" \
|
||
psql -h "$QY_DB_HOST" -p "$QY_DB_PORT" -U "$QY_DB_USER" -d postgres \
|
||
-c "CREATE DATABASE ${QY_DB_NAME}_restore;"
|
||
```
|
||
|
||
### 5.2 恢复数据库
|
||
|
||
```bash
|
||
PGPASSWORD="$QY_DB_PASSWORD" \
|
||
pg_restore -h "$QY_DB_HOST" -p "$QY_DB_PORT" -U "$QY_DB_USER" \
|
||
-d "${QY_DB_NAME}_restore" --clean --if-exists \
|
||
"/tmp/quyun-backup/<backup-file>.dump"
|
||
```
|
||
|
||
### 5.3 恢复后校验
|
||
|
||
```bash
|
||
PGPASSWORD="$QY_DB_PASSWORD" \
|
||
psql -h "$QY_DB_HOST" -p "$QY_DB_PORT" -U "$QY_DB_USER" -d "${QY_DB_NAME}_restore" \
|
||
-c "SELECT COUNT(*) FROM users;"
|
||
|
||
PGPASSWORD="$QY_DB_PASSWORD" \
|
||
psql -h "$QY_DB_HOST" -p "$QY_DB_PORT" -U "$QY_DB_USER" -d "${QY_DB_NAME}_restore" \
|
||
-c "SELECT COUNT(*) FROM audit_logs;"
|
||
```
|
||
|
||
验收标准:
|
||
- 核心表(`users`, `orders`, `audit_logs`, `contents`)有合理数据量
|
||
- 抽样业务查询无语法或权限错误
|
||
|
||
---
|
||
|
||
## 6. Service Verification After Restore
|
||
|
||
启动服务后执行:
|
||
|
||
```bash
|
||
curl -f -sS http://127.0.0.1:18080/healthz
|
||
curl -f -sS http://127.0.0.1:18080/readyz
|
||
```
|
||
|
||
验收标准:两个端点均返回 2xx。
|
||
|
||
---
|
||
|
||
## 7. RTO / RPO Recording
|
||
|
||
每次演练记录:
|
||
- Backup start/end time
|
||
- Restore start/end time
|
||
- Data validation result
|
||
- Incident / blockers
|
||
|
||
建议目标:
|
||
- RTO <= 30 分钟
|
||
- RPO <= 24 小时(按日备份基线)
|
||
|
||
---
|
||
|
||
## 8. Failure Handling
|
||
|
||
- `pg_dump` 失败:检查网络/权限/磁盘空间,重试一次
|
||
- `pg_restore` 失败:保留日志,回退至原预发库,不进行覆盖发布
|
||
- 对象恢复失败:仅允许在“非阻断业务路径”条件下继续演练,否则中止
|
||
|
||
---
|
||
|
||
## 9. Evidence Requirement
|
||
|
||
每次演练需归档到:
|
||
- `docs/release-evidence/<date>.md`
|
||
|
||
最少包含:
|
||
1. 执行人、时间窗
|
||
2. 命令与退出码
|
||
3. 核心校验 SQL 输出
|
||
4. healthz/readyz 结果
|
||
5. 结论(PASS/FAIL)
|