Operations Guide¶
Day-to-day operational procedures for the CharlieHub cluster and hub2 services.
In This Section¶
Core Operations¶
- Daily Tasks - Common commands and checks
- VM/CT Management - Create, migrate, and manage VMs/CTs
- Backup & Recovery - Backup schedules and restore procedures
- Monitoring - Prometheus, Grafana, alerts
- Troubleshooting - Common issues and solutions
Service Management¶
- Traefik Deployment Mandate - Routing architecture and requirements
- Traefik Backup & Recovery - Configuration backup procedures
- GMC Reboot Runbook - GMC server recovery procedures
- Deployment Checklist - Service deployment verification
Security & Infrastructure¶
- Security Maintenance - Quarterly credential rotation, pre-commit hooks, emergency recovery
- Firewall Rules Persistence - iptables rules management and boot-time restoration
- Ceph Migration - Storage migration procedures
Critical Infrastructure Notes¶
Firewall Rules (iptables)¶
All firewall rules for WireGuard routing, service isolation, and security are persisted to disk and automatically restored on boot. See Firewall Rules Persistence for details.
Secret Rotation¶
Quarterly credential rotation is scheduled for the last Sunday of each quarter (March, June, September, December) at 02:00 UTC. See Security Maintenance.
Service Dependencies¶
- hub2 services depend on Docker network connectivity
- WireGuard routing depends on iptables rules
- Traefik routing depends on Docker service labels
- Prometheus metrics depend on exporter health
Quick Reference¶
# Check cluster health
pvecm status
# List all VMs/CTs across cluster
pvesh get /cluster/resources --type vm
# Check storage status
pvesm status
# View running containers on hub2
docker compose ps
# Check iptables rules
sudo iptables -L DOCKER-USER -n -v
# View firewall rules persistence status
sudo systemctl status netfilter-persistent
Emergency Contacts & Escalation¶
- Service Connectivity Issues → Check Firewall Rules Persistence
- Container Issues → Check Daily Tasks
- Data Loss → Check Backup & Recovery
- Security Incident → Check Security Maintenance