Disaster Recovery¶
Backup strategy, recovery procedures, and failover documentation.
In This Section¶
- Backup Strategy - 3-2-1 backup approach with Ceph
- Recovery Runbooks - Step-by-step restore procedures
- Failover Procedures - HA failover and site recovery
Quick Reference¶
Storage Architecture¶
| Location | Type | Capacity | Purpose |
|---|---|---|---|
| ceph-pool | Ceph RBD | ~1.8 TB usable | Primary storage (3-way replication) |
| backup-storage | USB | 1.8 TB | Local vzdump backups |
| px5 /mnt/nvme-vmdata | NVMe | 500 GB | DR RBD exports |
| pikvm NFS | NFS | 2.7 TB | DR vzdump backups |
Recovery Priority¶
- hub2 (OVH Dedicated Server) - Central services hub (Traefik, Auth, APIs, Monitoring)
- CT1112 (PostgreSQL) - Database
- CT1113 (IoT Platform) - Production workloads
Emergency Commands¶
# Fast rollback via Ceph snapshot (seconds)
qm stop 1111
rbd snap rollback ceph-pool/vm-1111-disk-0@daily-20260103
qm start 1111
# List available snapshots
rbd snap ls ceph-pool/vm-1111-disk-0
# Restore from vzdump (minutes)
qmrestore /mnt/backup-storage/dump/vzdump-qemu-1111-*.vma.zst 1111 \
--storage ceph-pool --force
Data Protection Layers¶
Layer 1: Ceph Replication (size=3) → Automatic, all UK nodes
Layer 2: Ceph Snapshots → Daily at 02:00, 7-day retention
Layer 3: Vzdump Backups → Daily to USB + NFS
Layer 4: RBD Export to France → Daily at 01:00, cross-site DR