Skip to content

Backup Strategy

CharlieHub uses a 3-2-1 backup strategy with PBS (Proxmox Backup Server) for efficient off-site backups.

Full backup documentation: Backup & Recovery
PBS documentation: PBS Service

Last Updated: 2026-02-04


Strategy Overview

Requirement Implementation
3 copies Ceph 5-OSD 3x replication + PBS (France) + vzdump (UK)
2 media types Local NAS (UK) + PBS (France)
1 off-site PBS on pbs-fr (France)

Backup Methods

Method Storage Transfer Recovery Time
PBS (Primary) pbs-fr (France) Incremental 1-5 GB 10-20 min
Vzdump (UK) px3-nas Full backup 5-10 min
Ceph ceph-pool Automatic Instant

Why PBS?

PBS uses incremental deduplication. After the initial full backup, nightly transfers drop from 40-100GB to 1-5GB, making WAN backups practical.


Backup Schedule Summary

Node-staggered to prevent Ceph I/O contention:

Time Window Node Method Storage
22:00-00:00 px3-suzuka PBS pbs-fr (France)
00:30-02:30 px2-monza PBS pbs-fr (France)
03:00-05:00 px1-silverstone PBS pbs-fr (France)
05:30-06:00 px2, px3 Vzdump px3-nas (UK)

Ceph Scrub Window

Scrubs restricted to 09:00-17:00 UTC (no overlap with backups).


Data Protection Layers

┌─────────────────────────────────────────────────────────────────────────┐
│                         LAYER 1: CEPH REPLICATION                       │
│  All writes stored on 5 OSDs across 3 UK nodes (size=3 replication)    │
│  Recovery: Instant (data already available)                            │
│  RPO: 0 (synchronous)                                                  │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                    LAYER 2: PBS INCREMENTAL (FRANCE)                    │
│  Daily at 22:00-03:00 (node-staggered) to pbs-fr                       │
│  Deduplication reduces transfer to 1-5 GB/night                        │
│  Retention: 7 daily + 4 weekly + 2 monthly                             │
│  Recovery: 10-20 minutes                                               │
│  RPO: Up to 24 hours                                                   │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                      LAYER 3: UK LOCAL VZDUMP                           │
│  Daily at 05:30 (px2/px3) to px3-nas                                   │
│  Full backup for fast local restore                                    │
│  Recovery: 5-10 minutes                                                │
│  RPO: Up to 25 hours                                                   │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                      LAYER 4: PBS WEEKLY ARCHIVE                        │
│  Sunday at 07:00 to pbs-fr                                             │
│  Retention: 8 weekly, 3 monthly                                        │
│  RPO: Up to 7 days                                                     │
└─────────────────────────────────────────────────────────────────────────┘

Recovery Time Objectives (RTO)

Scenario Method RTO
VM data corruption Ceph RBD rollback < 1 minute
VM configuration error Vzdump restore (UK) 5-10 minutes
Single node failure Ceph automatic (HA failover) 1-2 minutes
UK secondary restore From px3-nas 5-10 minutes
PBS restore (France) From pbs-fr 10-20 minutes
UK site failure Restore on px5 from PBS 1-2 hours

PBS Benefits

Metric Before (vzdump NFS) After (PBS)
Nightly WAN transfer 40-100 GB 1-5 GB
Storage used ~1.5 TB ~500 GB
Restore time (France) 30-60 min 10-20 min
Resume on failure No Yes
Deduplication No Yes

Critical VMs

These VMs have all protection layers enabled:

VMID Name Ceph PBS (France) UK Local
1112 prod-database
1113 prod-iot-platform
1118 isp-monitor (STOPPED - migrated to Mint)
3102 homelab-monitor

Storage Summary

Storage Type Location Capacity Purpose
pbs-fr PBS CT 5101 (France) ~1.1 TB free Primary off-site
px3-nas NFS UK 1.8 TB Fast UK restore
ceph-pool RBD 5 OSDs UK ~2.9 TB Live VM storage

Hub2 Backups

hub2 (central dedicated server) is backed up daily via rsync over WireGuard.

Time Target Retention
03:00 UTC px3 (UK) 7 daily
03:00 UTC px5 (FR) 7 daily

For full details:

Last updated: 2026-02-04