Files
kener/docs/adr/0007-problem-first-overall-status.md
T

2.2 KiB

Overall Status is problem-first: DOWN > DEGRADED > MAINTENANCE > UP

Everywhere a set of monitor statuses collapses into one display status — the page banner (GetStatusSummary/GetStatusColor in src/lib/clientTools.ts), the per-day bar summaries, and the all-monitors _ badge and dot badge (GetLatestStatusActiveAll in src/lib/server/controllers/monitorsController.ts) — the same worst-wins ordering applies: DOWN > DEGRADED > MAINTENANCE > UP, with NO_DATA only when no monitor has any data at all.

Issue #717 exposed that the codebase had three independent answers to "what does maintenance mean when aggregating". The frontend banner checked maintenance first, so one monitor in a planned window reported "Under Maintenance" even while another monitor was hard DOWN — a real outage masked by planned work. The badge loop had no MAINTENANCE branch at all, so maintenance samples were silently skipped (DEGRADED+UP+MAINTENANCE → "Degraded", disagreeing with the banner) and a fleet entirely under maintenance fell through to "No Status Available". Group Monitor scoring counts maintenance as UP. With the same monitors, the page and the badge told different stories, which breaks any automation treating either as the source of truth.

Problem-first won because maintenance is planned and acknowledged while DOWN/DEGRADED are active problems users are hitting now; a status page that says "Under Maintenance" during an unrelated outage is lying in the reassuring direction. The rejected alternatives: canonicalizing the frontend's maintenance-first order (makes the masking bug the contract), and treating maintenance as UP at the display level for uniformity with group scoring (erases planned windows from badges that already have an "Under Maintenance" label). Group Monitor scoring deliberately keeps maintenance≈UP — it answers a different question (a member's planned work should not tank the group's derived status), whereas Overall Status answers "what should a visitor see". Also affirmed here rather than changed: the _ badge is site-wide (every ACTIVE, non-hidden monitor), not page-scoped, so on multi-page installs it legitimately need not match any single page's banner — a documentation fact, not a bug.