Emergency Procedures¶
What to do when things go wrong.
API is Down, Database is Up¶
Symptom: Can't reach the API, but PostgreSQL is accessible
Safe approach: Modify database directly (documented)
# Step 1: Verify what's wrong
docker exec -i charliehub-postgres psql -U charliehub -d charliehub_domains \
-c "SELECT domain, status FROM domains WHERE status='active' LIMIT 5"
# Step 2: Make the fix in the database
docker exec -i charliehub-postgres psql -U charliehub -d charliehub_domains \
-c "UPDATE domains SET status='active' WHERE domain='critical-service.com'"
# Step 3: Regenerate configuration
docker exec charliehub_domain_manager_v3 python3 /app/services/traefik_generator.py
# Step 4: Verify Traefik reloaded
curl -s http://localhost:8091/api/http/routers | jq '.[] | select(.name | contains("critical"))'
# Step 5: Document the incident
echo "$(date): Emergency fix - updated domains via SQL due to API downtime" >> /var/log/emergency.log
Key: Database is the source of truth. Fixing there is safe.
Configuration Got Corrupted¶
Symptom: Routes are wrong, config is inconsistent
Recovery: Restore from snapshot
# CharlieHub uses automatic snapshots
ls -la /opt/charliehub/traefik/config/history/ | head -5
# Find the last good snapshot (before corruption)
# Restore it
sudo cp /opt/charliehub/traefik/config/history/2026-02-12_15-01-12.yml \
/opt/charliehub/traefik/config/generated/routes.yml
# Verify Traefik reloaded with the good config
curl -s http://localhost:8091/api/http/routers | wc -l
Then:
# Figure out what went wrong with the database
docker exec -i charliehub-postgres psql -U charliehub -d charliehub_domains \
-c "SELECT * FROM domains WHERE status='active' ORDER BY updated_at DESC LIMIT 10"
# Fix the database issue
curl -X PUT /api/domains/PROBLEM_ID -d '{...}'
# Regenerate
docker exec charliehub_domain_manager_v3 python3 /app/services/traefik_generator.py
Someone Made Direct YAML Edits¶
Symptom: Routes.yml was modified directly, changes not reflected in database
Recovery:
# Step 1: Restore the generated file
git checkout /opt/charliehub/traefik/config/generated/routes.yml
# Step 2: Figure out what they were trying to do
git log --oneline -5
# Step 3: Do it the right way
# Ask: "What change were you trying to make?"
# Answer: Use the API
# Step 4: Regenerate
docker exec charliehub_domain_manager_v3 python3 /app/services/traefik_generator.py
Someone Inserted Directly into Database¶
Symptom: Route appears in database but not in Traefik
Recovery:
# Step 1: Find the bad row
docker exec -i charliehub-postgres psql -U charliehub -d charliehub_domains \
-c "SELECT * FROM domains WHERE domain='unknown-domain.com'"
# Step 2: Check if it's valid (constraints check)
# If INSERT succeeded despite being invalid, constraints are broken
# Step 3: Delete the bad row
docker exec -i charliehub-postgres psql -U charliehub -d charliehub_domains \
-c "DELETE FROM domains WHERE domain='unknown-domain.com'"
# Step 4: Verify constraints are working
# Try to insert an invalid row:
docker exec -i charliehub-postgres psql -U charliehub -d charliehub_domains \
-c "INSERT INTO domains (protocol='tcp', cors_enabled=true) ..."
# Should get: ERROR: new row violates check constraint
# Step 5: Educate the user
# "Use the API, it validates this stuff"
Traefik Won't Start¶
Symptom: docker logs charliehub-traefik shows errors
Common causes:
Bad YAML in routes.yml¶
# Validate the YAML
docker run -v /opt/charliehub/traefik/config/generated:/data \
-it mikefarah/yq eval '/data/routes.yml'
# If it fails, restore snapshot
sudo cp /opt/charliehub/traefik/config/history/LAST_GOOD.yml \
/opt/charliehub/traefik/config/generated/routes.yml
# Restart Traefik
docker restart charliehub-traefik
Configuration Constraint Violation¶
# Check the generated file for invalid configs
cat /opt/charliehub/traefik/config/generated/routes.yml | grep -A 5 "ERROR"
# If the generator created bad config:
# 1. Find the bad domain entry
docker exec -i charliehub-postgres psql -U charliehub -d charliehub_domains \
-c "SELECT domain, status FROM domains WHERE status='active'"
# 2. Fix it via API
curl -X PUT /api/domains/ID -d '{...}'
# 3. Regenerate
docker exec charliehub_domain_manager_v3 python3 /app/services/traefik_generator.py
# 4. Restart Traefik
docker restart charliehub-traefik
Database Constraints Are Too Strict¶
Symptom: Valid-seeming config gets rejected by constraint
Options:
Option 1: The Constraint is Right, Config is Wrong¶
# Database constraint: TCP routes must have backend_host + backend_port
# You're trying: protocol='tcp' without backend_host
# This is correct behavior - your config is invalid
# Fix: Add backend_host, or change to protocol='http'
Option 2: The Constraint is Wrong¶
# The constraint is preventing legitimate config
# Solution: Update the constraint
# Step 1: Review the constraint
docker exec -i charliehub-postgres psql -U charliehub -d charliehub_domains \
-c "SELECT constraint_name, constraint_definition FROM information_schema.check_constraints WHERE table_name='domains'"
# Step 2: Modify it
ALTER TABLE domains DROP CONSTRAINT bad_constraint;
ALTER TABLE domains ADD CONSTRAINT new_constraint CHECK (...);
# Step 3: Document why you changed it
git commit -m "fix(database): Relaxed constraint X because Y"
# Step 4: Update standards documentation
# Edit /docs/standards/ and explain the new rule
Incident Checklist¶
When something goes wrong:
- [ ] Immediate: Identify symptoms
- [ ] Assess: Is API down? Database? Traefik?
- [ ] Stabilize: Restore from snapshot if needed
- [ ] Investigate: What went wrong?
- [ ] Fix: Make change at the source (database/API)
- [ ] Verify: Test that it works
- [ ] Prevent: How to prevent this in future?
- [ ] Document: Log what happened and why
- [ ] Post-mortem: Why did the safeguards not catch this?
Post-Incident Questions¶
After any incident, ask:
- Was this prevented by a safeguard? (Constraint, API validation, doc)
-
If no → Add the missing safeguard
-
Was this caught early? (Git hook, pre-commit, log monitoring)
-
If no → Add monitoring
-
Could it happen again? (Same root cause)
-
If yes → Fix the root cause, not the symptom
-
Did we follow the standards? (10 commandments)
- If no → Reinforce training on standards
When to Escalate¶
Contact your team lead if:
- You had to modify the database directly
- A constraint needed to be changed
- The API doesn't support what you need (needs extension)
- The documentation is unclear or contradictory
- You're about to bypass the system with a workaround
Don't: Just make a workaround and move on Do: Escalate to get it extended properly
Key Principles¶
- Database is source of truth - Fix problems there
- API validates changes - Use it for normal operations
- Generate from source - Never edit output files
- Snapshots are backups - Know how to restore
- Constraints prevent corruption - Work with them, not around them
- Document incidents - Learn from what went wrong