- layoutController/site-configurations: use strict boolean check instead of
Boolean() coercion for showInlineEvents so persisted "false" strings don't
flip the toggle
- (kener)/+page.svelte: collapse confusing triple negation !!! to single !
- NotificationsList: add aria-label/title to the icon-only events button
(+ "Open events page" en locale key)
- move NotificationEvent into shared $lib/types/notifications so client code
no longer imports from the server dashboardController; controller re-exports
it for backwards compatibility
- [page_path] and monitor pages: pass hideNotificationsPopover={showInlineEvents}
to ThemePlus so inline and popover event surfaces stay mutually exclusive
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
GET / was throwing KnexTimeoutError ("Timeout acquiring a connection")
in production. Root cause was the connection pool, not the database:
the single process (SvelteKit + cron scheduler + BullMQ workers) shared
one pool capped at 10, while one GET / fans out ~6 queries. A couple of
concurrent page loads, or a per-minute monitor burst overlapping a load,
exceeded 10 and queued acquires blew past the 15s timeout. Postgres
itself had 97 free slots the whole time and no leak.
Split into two pools so background work can't starve page loads:
- web pool (DATABASE_POOL_MAX, default 10) serves HTTP requests
- worker pool (DATABASE_WORKER_POOL_MAX, default 5) serves background jobs
Routing is by execution context via AsyncLocalStorage: q.createWorker
(the single chokepoint all workers/schedulers flow through) runs each
processor inside a worker-pool context, and BaseRepository.knex resolves
the pool from that context, defaulting to the web pool. This keeps shared
controllers correct whether they run in a request or a job. SQLite has no
real pool and reuses a single connection, so the split is a no-op there.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ioredis v5 dropped the namespace merge on the default export, so
`Redis.RedisOptions` resolves to TS2702 (type used as a namespace).
Import the `RedisOptions` type by name instead.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The per-row status+note backfill is one logical confirmation flip; wrap the
read+updates in a knex transaction so a mid-loop failure can't leave the window
half-confirmed/half-held (coderabbit out-of-diff finding).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- PATCH: confirmation_threshold:null resets to 1 (off); undefined keeps existing (Copilot)
- backfill note is per-row severity-aware: 'Down'/'Degraded confirmed after N…' (Copilot)
- enforce 1–60 at the data layer via clampConfirmationThreshold on insert/update,
covering all app write paths incl. the manage API (coderabbit)
- anchor via dedicated getLastObservedStatus query so a long incident/maintenance
window can no longer push the anchor out of the lookback and bypass damping (coderabbit)
- overlays fetched AFTER execute() and keyed by job ts, making the freeze gate
timestamp-safe and catching mid-check overlays (coderabbit + greptile)
- use Array.includes over indexOf!==-1 (greptile); refresh pendingHold doc (coderabbit)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
New v4/monitors/grace-period.md covering behavior, config, API, interactions
(alerts/maintenance/NO_DATA/groups/heartbeat), and verification; linked from the
Monitors sidebar and the Monitors Overview related-docs.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Broaden the ignore from the single 0009 ADR to the entire docs/adr/ directory and
untrack the existing ADRs (0001-0008); files are kept on disk and remain in history.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
gitignore + untrack CONTEXT.md, docs/adr/0009-*, and docs/superpowers/ (files kept
on disk). Remove the 'ADR 0009' citations from code comments; issue references and the
pre-existing ADR 0005 citations are retained.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Held (pending) rows now keep the real error text tagged '| Status held during
grace period' instead of dropping it, so no diagnostic info is lost. On confirmation
the backfill appends '| Down confirmed after N consecutive checks' to the existing
text (pipe-separated) rather than overwriting it; recovery clears the error. Append
is per-row for cross-DB safety and idempotent on replay.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A pending (held) row was written with latency 0, losing the measured latency and
denting the latency chart during every grace window (and discarding a recovering
check's real latency). Keep the observed latency; only drop the error text so a
held row never shows a status-contradicting failure message.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replaces getRecentObservedSamples with getRecentSamplesForConfirmation, which adds
INCIDENT/MAINTENANCE overlay rows and the `type` column to the result set so the
Confirmation Threshold resolver can detect freeze boundaries. MANUAL and DEFAULT
rows remain excluded (transparent). Adds OVERLAY_TYPES constant alongside the
existing OBSERVED_CHECK_TYPES. Updates dbimpl.ts declaration and binding to match.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>