px3-suzuka Hardware Inventory¶
Node: px3-suzuka (10.44.1.30)
Role: NAS host + Ceph OSD.0 + Local VM host
Last Updated: 2026-02-07
Storage Drives¶
px3 has 3 active physical storage drives (sdb replaced 2026-02-07).
1. System Boot Drive (SATA SSD)¶
| Property | Value |
|---|---|
| Device | /dev/sda |
| Model | GIGABYTE GP-GSTFS31120GNTD |
| Serial | SN192508961210 |
| Type | SATA SSD |
| Interface | Direct SATA connection |
| Total Capacity | 120 GiB |
| Health | ✅ PASSED |
Logical Volumes:
pve-root ~40GB Root filesystem (/)
pve-data ~63GB Local-LVM thin pool (VMs)
pve-swap ~8GB Swap space
Proxmox Storage: local (host system + local-lvm for VMs)
2. NAS Storage Drive — REPLACEMENT PENDING¶
!!! warning "Temporary Setup (since 2026-02-07)" The original NAS drive (WDC WD20EADS Caviar Green) was removed due to Ceph I/O contention. NAS is temporarily served from sdd (USB). A Samsung 870 EVO 2TB SSD replacement has been ordered.
Removed drive (sdb):
| Property | Value |
|---|---|
| Model | WDC WD20EADS-42R6B0 (Caviar Green) |
| Serial | WD-WCAVY3864795 |
| Type | HDD (mechanical), ~15 years old |
| Interface | SATA 1.5 Gb/s (SATA 2.6) |
| Reason for removal | I/O contention caused Ceph osd.0 suicide timeouts |
Replacement (on order):
| Property | Value |
|---|---|
| Model | Samsung SSD 870 EVO 2TB |
| Type | SATA SSD |
| Purpose | NAS primary storage |
| Runbook | px3:/root/SDB_REPLACEMENT_RUNBOOK.md |
See Ceph Migration Guide for incident details.
3. Ceph OSD Drive (SATA SSD)¶
| Property | Value |
|---|---|
| Device | /dev/sdc |
| Model | Samsung SSD 870 EVO 2TB |
| Serial | S754NX0Y200909M |
| Type | SATA SSD |
| Interface | SATA 6 Gb/s (SATA 3.3) |
| Total Capacity | 1.8 TiB |
| Health | ✅ PASSED |
Ceph Configuration:
OSD.0:
Hostname: px3-suzuka
Weight: 1.81940
Status: up
Device: Samsung SSD 870 EVO 2TB
BlueStore: Yes
Hardening (applied 2026-02-07):
- systemd restart policy: unlimited retries, 20s cooldown
- Recovery throttling:
osd_recovery_max_active=1,osd_max_backfills=1,osd_recovery_sleep_ssd=0.1 - See Ceph Hardening section below
!!! danger "Do NOT partition or modify this drive — Ceph manages it entirely."
4. USB Drive — TEMPORARY NAS PRIMARY¶
!!! info "Temporary Primary (since 2026-02-07)" This USB drive is currently serving as the primary NAS storage while the SSD replacement is in transit. When the new SSD arrives, this drive reverts to backup mirror duty.
| Property | Value |
|---|---|
| Device | /dev/sdd |
| Model | WDC WD20NMVW-11AV3S0 (My Passport) |
| Serial | WD-WXH1E33WJNF6 |
| Type | HDD in USB 3.0 enclosure |
| Speed | ~72 MB/s read |
| Total Capacity | 1.8 TiB |
| UUID | b72b55f1-3434-403a-a055-ad60c8bb1416 |
| Health | ✅ PASSED |
Current mount (temporary):
/dev/sdd2 → /mnt/nas-storage (ext4, nofail)
fstab entry:
UUID=b72b55f1-3434-403a-a055-ad60c8bb1416 /mnt/nas-storage ext4 defaults,nofail,x-systemd.device-timeout=10s 0 2
Storage Summary Table¶
| Drive | Model | Interface | Size | Purpose | Status |
|---|---|---|---|---|---|
| sda | GIGABYTE GP-GSTFS | SATA | 120GB | System boot + local VMs | ✅ Active |
| sdb | ~~WDC WD20EADS~~ | ~~SATA 1.5~~ | ~~1.8T~~ | ~~NAS storage~~ | ❌ Removed (2026-02-07) |
| sdc | Samsung 870 EVO | SATA 6Gb/s | 1.8T | Ceph OSD.0 | ✅ Active |
| sdd | WD My Passport | USB 3.0 | 1.8T | Temporary NAS primary | ✅ Active |
| — | Samsung 870 EVO | SATA | 2TB | New NAS (sdb replacement) | 📦 On order |
Ceph OSD.0 Hardening¶
Applied 2026-02-07 after osd.0 suicide timeout incident.
systemd Restart Policy¶
Override at /etc/systemd/system/ceph-osd@.service.d/restart-policy.conf:
[Service]
StartLimitIntervalSec=0
StartLimitBurst=0
RestartSec=20
This ensures osd.0 always restarts after a crash, with a 20-second cooldown.
Recovery Throttling¶
# Applied per-OSD (osd.0 only):
ceph config set osd.0 osd_recovery_max_active 1
ceph config set osd.0 osd_max_backfills 1
ceph config set osd.0 osd_recovery_sleep_ssd 0.1
Limits recovery I/O on osd.0 to prevent contention with NAS workloads on the shared SATA controller.
!!! note "Post SSD-replacement" After the WD Green is replaced with an SSD, these throttling values can be relaxed. Wait one week of stable operation before removing:
ceph config rm osd.0 osd_recovery_max_active
ceph config rm osd.0 osd_recovery_sleep_ssd
Backup I/O Priority¶
All NAS/backup cron jobs on px3 are wrapped with ionice -c2 -n7 nice -n 19:
0 * * * * ionice -c2 -n7 nice -n 19 /usr/local/bin/nas-tier-sync.sh
0 6 * * 0 ionice -c2 -n7 nice -n 19 /usr/local/bin/nas-offsite-backup.sh
0 3 * * * ionice -c2 -n7 nice -n 19 /usr/local/bin/nas-mirror-sync.sh
0 8 * * * ionice -c2 -n7 nice -n 19 /usr/local/bin/backup-verification.sh
vzdump is also configured with ionice: 7 and bwlimit: 80000 in /etc/vzdump.conf.
LVM Layout¶
Volume Group: pve (120GB total)
| LV | Size | Mount |
|---|---|---|
| pve/root | ~40GB | / |
| pve/data | ~63GB | Local-LVM thin-pool |
| pve/swap | ~8GB | Swap |
Note: Very limited local storage (120GB total). VMs should prefer ceph-pool.
Known Issues¶
- NAS on USB (temporary)
- sdd USB drive serving NAS while SSD replacement ships
- Performance: ~72 MB/s (adequate for current workloads)
-
Risk: USB is less reliable for sustained I/O — SSD replacement resolves this
-
Limited local storage
- Only 120GB boot drive (vs 1.8TB NVMe on px1)
- Prefer ceph-pool for new workloads