Backup & Recovery¶
Comprehensive backup schedules and recovery procedures for the CharlieHub cluster.
Last Verified: 2026-02-04
Backup Architecture¶
3-2-1 Strategy¶
- 3 copies: Ceph (3x replication) + PBS (France) + vzdump (UK)
- 2 media types: Local NAS (UK) + PBS/NFS (France)
- 1 off-site: France site (PBS on pikvm-backup)
Backup Methods¶
| Method | Storage | Purpose | Transfer Size |
|---|---|---|---|
| PBS (Primary) | pbs-fr (France) | Incremental off-site | 1-5 GB/night |
| Vzdump (UK) | px3-nas | Fast UK restore | Full backup |
| Ceph Replication | ceph-pool | Live redundancy | Automatic |
PBS vs Vzdump
PBS uses incremental deduplication - only changed data is transferred. After the initial full backup, nightly transfers drop from 40-100GB to 1-5GB.
Backup Layers¶
| Layer | Method | Location | Recovery Time | Purpose |
|---|---|---|---|---|
| Ceph Replication | Automatic 3x | UK nodes | Instant | Live redundancy |
| PBS Incremental | Daily 22:00-03:00 | pbs-fr (France) | 10-20 min | Primary off-site |
| UK Secondary | Daily 05:30 | px3-nas (NFS) | 5-10 min | Fast UK restore |
| PBS Weekly | Sunday 07:00 | pbs-fr | 10-20 min | Long-term archive |
Daily Schedule¶
All times in UTC. Node-staggered to prevent I/O contention.
PBS Jobs (Primary - Incremental)¶
| Time | Job | Node | Storage | Retention |
|---|---|---|---|---|
| 22:00 | pbs-px3-daily | px3 | pbs-fr | 7 daily, 4 weekly, 2 monthly |
| 00:30 | pbs-px2-daily | px2 | pbs-fr | 7 daily, 4 weekly, 2 monthly |
| 03:00 | pbs-px1-daily | px1 | pbs-fr | 7 daily, 4 weekly, 2 monthly |
Vzdump Jobs (UK Local)¶
| Time | Job | Node | Storage | Retention |
|---|---|---|---|---|
| 05:30 | uk-secondary | px2, px3 | px3-nas | 5 daily, 2 weekly |
Weekly Schedule¶
| Day | Time | Job | Storage | Retention |
|---|---|---|---|---|
| Sunday | 07:00 | pbs-weekly | pbs-fr | 8 weekly, 3 monthly |
Schedule Rationale¶
The backup schedule is node-staggered to prevent Ceph I/O contention:
22:00-00:00 px3-suzuka window (lightest workload, starts first)
00:30-02:30 px2-monza window
03:00-05:00 px1-silverstone window (production, last)
05:30-06:00 UK secondary backup (after PBS completes)
Proxmox Backup Server (PBS)¶
PBS runs on CT 5101 (pbs-fr) in France, providing incremental backups with deduplication.
PBS Details¶
| Parameter | Value |
|---|---|
| Container | CT 5101 on px5-lemans |
| IP | 10.35.1.101 |
| Web UI | https://10.35.1.101:8007 |
| Datastore | pbs-main (on pikvm-backup NFS) |
| PVE Storage | pbs-fr |
PBS Benefits¶
| Metric | Before (vzdump NFS) | After (PBS) |
|---|---|---|
| Nightly WAN transfer | 40-100 GB | 1-5 GB |
| Storage used | ~1.5 TB | ~500 GB |
| Restore time (France) | 30-60 min | 10-20 min |
| Resume on failure | No | Yes |
For full PBS documentation, see PBS Service.
Ceph Scrub Window¶
Ceph scrubs are restricted to 09:00-17:00 UTC (business hours) to avoid overlap with overnight backups.
# Current scrub settings
osd_scrub_begin_hour = 9
osd_scrub_end_hour = 17
osd_max_scrubs = 1
osd_scrub_load_threshold = 0.3
Vzdump Settings¶
Global vzdump settings in /etc/vzdump.conf (all nodes):
bwlimit: 80000 # 80 MB/s max bandwidth
ionice: 7 # Lowest I/O priority
pigz: 2 # 2 compression threads
Storage Locations¶
| Storage | Type | Location | Capacity | Purpose |
|---|---|---|---|---|
| pbs-fr | PBS | CT 5101 (France) | ~1.1 TB free | Primary off-site |
| px3-nas | NFS | 10.44.1.30 (UK) | 1.8 TB | Fast UK restore |
| ceph-pool | RBD | 5 OSDs UK | 8.9 TB | Live VM storage |
Recovery Procedures¶
Restore from PBS (Recommended)¶
# List available PBS backups
pvesm list pbs-fr --content backup | grep 1112
# Restore CT from PBS
pct restore 1112 pbs-fr:backup/ct/1112/2026-02-04T03:00:00Z --storage ceph-pool
# Restore VM from PBS
qmrestore pbs-fr:backup/vm/1111/2026-02-04T03:00:00Z 1111 --storage ceph-pool
# Restore to different VMID (test)
pct restore 9999 pbs-fr:backup/ct/1112/2026-02-04T03:00:00Z --storage ceph-pool --unique
Recovery time: 10-20 minutes
Restore from UK Secondary (Fastest)¶
# List available backups
ls -lh /mnt/pve/px3-nas/dump/ | grep 2912
# Restore
pct restore 2912 /mnt/pve/px3-nas/dump/vzdump-lxc-2912-*.tar.zst --storage ceph-pool
Recovery time: 5-10 minutes
File-Level Restore (PBS Only)¶
PBS supports individual file restore without full VM recovery:
- Open PBS Web UI (https://10.35.1.101:8007)
- Navigate to Datastore > pbs-main > Content
- Select backup snapshot
- Click "Browse Files"
- Download individual files
Monitoring & Troubleshooting¶
Check Backup Status¶
# Check PBS storage
pvesm status | grep pbs-fr
# List PBS backups
pvesm list pbs-fr --content backup | tail -10
# Check vzdump logs
tail -100 /var/log/vzdump/vzdump-*.log
Common Issues¶
PBS unreachable:
# Check PBS container
ssh px5 "pct status 5101"
# Check PBS service
ssh px5 "pct exec 5101 -- systemctl status proxmox-backup-proxy"
# UK backups (px3-nas) continue regardless
Ceph slow ops during backups:
# Check current slow ops
ceph health detail | grep slow
# Emergency: pause scrubs
ceph osd set noscrub
ceph osd set nodeep-scrub
Critical VMs¶
These VMs have all protection layers:
| VMID | Name | PBS (France) | UK Secondary |
|---|---|---|---|
| 1112 | prod-database | ✅ | ✅ |
| 1113 | prod-iot-platform | ✅ | ✅ |
| 1118 | isp-monitor (STOPPED - migrated to Mint) | ❌ | ❌ |
| 3102 | homelab-monitor | ✅ | ✅ |
Retention Summary¶
| Storage | Daily | Weekly | Monthly |
|---|---|---|---|
| pbs-fr (PBS) | 7 | 4 | 2 |
| px3-nas | 5 | 2 | - |
Legacy Jobs (Disabled)¶
These vzdump-to-NFS jobs were replaced by PBS:
| Job | Replaced By | Status |
|---|---|---|
| pikvm-px1 | pbs-px1-daily | Parallel run |
| pikvm-px2 | pbs-px2-daily | Parallel run |
| pikvm-px3 | pbs-px3-daily | Parallel run |
| backup-55292acc | pbs-px1-daily | Parallel run |
| weekly-archive | pbs-weekly | Parallel run |
Parallel Run Period
During Feb 4-18, 2026, both old vzdump and new PBS jobs run simultaneously for validation. After successful validation, old vzdump jobs will be disabled.
Hub2 Backups¶
hub2 (central dedicated server) has a separate backup system using rsync over WireGuard VPN.
| Time | Target | Path | Retention |
|---|---|---|---|
| 03:00 UTC | px3 (UK) | /mnt/nas-backup/hub2-snapshots/ |
7 daily |
| 03:00 UTC | px5 (FR) | /mnt/pve/pikvm-backup/hub2-offsite/ |
7 daily |
Related Documentation¶
- PBS Service - Full PBS documentation
- Backup Strategy
- Cron Schedules
- Storage Reference
Last updated: 2026-02-04