Cold-start and recovery
The demo loses no irreplaceable state because the fixture set is deterministic and the only mutable state is the in-session walkthrough. Production loses no irreplaceable state because every regulated action persists before responding 2xx. This page documents the recovery rules for every cold-start scenario and the failure modes the system is designed to fall into.
Demo: what is persistent and what is not
Section titled “Demo: what is persistent and what is not”useDemoStore (lib/state.ts) persists to localStorage under key lao-demo-state. The partialize function selects which keys persist:
{ name: "lao-demo-state", storage: createJSONStorage(() => localStorage), partialize: (state) => ({ skin: state.skin, // Persona, mode, walkthrough step, and live additions all reset // every session so the visitor always lands in scripted mode // with a clean fixture set. }),}Persisted across reloads:
skin(the active tenant)
Reset on every page load:
persona(defaults toprincipal-compliance-officer)focusedArId(defaults to null)mode(defaults toscripted)walkthroughStep(defaults to 0)personaSwitchSeen(defaults to false)liveBreaches(defaults to empty array)liveMIReturns(defaults to empty array)
The visitor returning to the demo lands on the same skin they last viewed, but with a fresh walkthrough, no in-session writes, and the persona switch confirmation modal armed.
Demo recovery scenarios
Section titled “Demo recovery scenarios”Browser refresh mid-walkthrough
Section titled “Browser refresh mid-walkthrough”The visitor is on step 6 (AR-side MI return form). They refresh.
What happens:
useDemoStorerehydrates.skinsurvives. Everything else resets.- The route they were on (
/demo/ar/mi) still resolves; the surface is independent of walkthrough step. walkthroughStepis 0; the walkthrough overlay is at step 0 (“Welcome to Oversight”), but the URL is/demo/ar/mi.- The
walkthrough-advancercomponent watches the path and bumps the step floor when the visitor lands on a known surface, so step advances to 6 once the route matches.
The flow continues. No data was lost because nothing the visitor did was a regulated action; the MI return draft is held in the form’s local React state, not the store.
Browser refresh after filing a breach AR-side
Section titled “Browser refresh after filing a breach AR-side”The visitor filed a breach on step 7. They refresh before reaching the principal-side triage queue (step 8).
What happens:
useDemoStorerehydrates.liveBreachesresets to empty.- The walkthrough advances to step 8 once the visitor reaches
/demo/principal/breaches. - The triage queue renders fixture breaches only; the in-session breach is gone.
This is acceptable for a demo: the fixture set already contains a breach designed to look fresh on the queue. The walkthrough copy avoids referring to “the breach you just filed” as if continuity is guaranteed.
Demo reset
Section titled “Demo reset”A “Reset demo” control in the chrome calls useDemoStore.resetWalkthrough():
resetWalkthrough: () => set({ walkthroughStep: 0, mode: "scripted", personaSwitchSeen: false }),The visitor is bounced to step 0 with mode reset to scripted. liveBreaches and liveMIReturns are not cleared by this action; the visitor can also use the more aggressive “Clear demo data” control which clears them too.
localStorage disabled
Section titled “localStorage disabled”A visitor with localStorage blocked (Safari Private mode default, some enterprise browser policies):
createJSONStorage(() => localStorage)returns a storage that throws onsetItem.- Zustand’s
persistmiddleware swallows the throw and continues with in-memory state. - The visitor experiences the demo correctly within one tab session; closing the tab loses state.
No regression. The demo is designed not to require persistence.
Production: durability rules
Section titled “Production: durability rules”Every regulated action persists before the API returns a 2xx response. The ordering constraint is:
- Validate input (Zod, business rules, state-machine transition).
- Open a transaction inside the tenant-scoped middleware (
app.tenant_idGUC set). - Update the entity row.
- Append the audit event with hash-chain link.
- Trigger any side effects (risk recompute, deadline alerter, FCA bundle generation) idempotently.
- Commit.
- Return 2xx.
A crash between any of steps 3-6 rolls back the transaction. The client sees a 5xx and retries. The audit chain is never written without the entity update, and vice versa.
Side effects that escape the transaction boundary (sending email, generating a PDF) are queued via pgmq (Postgres-backed message queue) and consumed by workers. The queue insert is part of the transaction, so the side effect is durable as soon as the transaction commits.
Production recovery scenarios
Section titled “Production recovery scenarios”Server restart
Section titled “Server restart”What happens:
- In-flight requests fail with a 5xx; clients retry.
- Sessions are unaffected (rows in Postgres).
- Workers reconnect to
pgmqand resume. - The deadline alerter is idempotent: it dedupes by breach id and a “last alerted at” column.
- The risk recompute worker is idempotent: each recompute writes a new history row keyed by
(arId, computedAt); duplicate triggers produce duplicate rows that the trajectory query handles.
No data loss. No double-side-effect (email or SMS deduped via pgmq visibility timeout).
Browser refresh mid-flow
Section titled “Browser refresh mid-flow”The user was halfway through a file-review workspace, with unsaved findings.
What happens:
- Inline saving is the default: every finding edit
PATCHes/api/reviews/:idwithIf-Match. By the time the user refreshes, every finding they edited is persisted. - The unsaved finding (the one they were typing into when the refresh fired) is lost.
- The review remains in
InProgress; the user resumes from where they left off, with the lost finding empty.
The UX cost is one re-typed finding. No regulated action is lost.
Expired session
Section titled “Expired session”The user’s session expired mid-action.
What happens:
- The next request returns 401.
- The client redirects to
/sign-in?next=<current-path>. - The user re-authenticates; the redirect lands them back on the surface they were on.
- Any unsaved form state is preserved by the client’s optimistic-state cache (React Query, with
cacheTimeexceeding the sign-in flow).
For step-up-gated actions (notify FCA, sign off annual review), an expired step-up token returns 403 with error.code === "step_up_required". The UI prompts re-step-up without losing the form.
Hash-chain mismatch (production-only failure mode)
Section titled “Hash-chain mismatch (production-only failure mode)”The nightly integrity job recomputes every audit event’s hash and compares against the stored value. A mismatch indicates either a software bug or tampering.
What happens:
- The job pages the firm with a P1 incident.
- The audit log surface displays a banner: “Integrity check failed at 2026-05-08T03:14:00Z. Records up to
are verified. Records after that point are under review. Contact support.” - The firm cannot export the chain until the mismatch is resolved (forensic restore from backup, software fix, or both).
This is the safe failure mode: the system makes the failure visible rather than silently continuing.
Idempotent reads
Section titled “Idempotent reads”GET /api/ars, GET /api/breaches, etc. are idempotent. GET /api/audit uses cursor pagination that ignores writes after the cursor’s at value, so a long-running read is consistent. Two concurrent reads of the same audit page return the same rows.
Idempotent writes
Section titled “Idempotent writes”Write routes that could be retried use the request’s Idempotency-Key header (UUID generated client-side) to dedupe. The handler stores (tenantId, idempotencyKey, response) in a 24-hour cache; a duplicate request returns the cached response.
POST /api/mi-returns is also idempotent on (tenantId, arId, period) regardless of the Idempotency-Key header. A second submission for the same period returns 409 with error.code === "period_already_submitted" and a link to the existing return.
Graceful expiry of in-flight UI
Section titled “Graceful expiry of in-flight UI”The walkthrough overlay, persona-switch confirmation modal, and toast notifications all expire on a timer. They never block navigation or input. A visitor who walks away mid-walkthrough returns to a quiet UI; the overlay re-anchors when they next interact.
The risk-score-explainer tooltip and the breach-deadline countdown re-render on a 1-second interval. Both are pure functions of the data they reference; the interval can stop and restart with no state loss.
Production cold-start: app deployment
Section titled “Production cold-start: app deployment”A new deployment goes live behind Vercel’s atomic-swap. The previous deployment serves until the new one is healthy. Health checks verify:
GET /api/healthreturns 200 with database connectivity confirmed.GET /api/versionreturns the expected commit SHA.- The migration check confirms every tenant-scoped table has RLS enabled.
A failed health check rolls back automatically. See Production hardening.