10 KiB
Phase 6 Design: Background Persistence
This document expands Phase 6 of ARCHITECTURE_RESILIENCE_REVIEW.md into a concrete design target.
Phases 1-5 separate durable state, coordination policy, render-facing snapshots, render-thread ownership, and live-state layering. Phase 6 should make disk persistence a background snapshot-writing concern instead of a synchronous side effect of mutations.
Status
- Phase 6 design package: proposed.
- Phase 6 implementation: not started.
- Current alignment:
RuntimeStoreowns durable serialization, config, package metadata, preset IO, and persistence requests;CommittedLiveStateowns the current committed/session layer state; andRuntimeCoordinatoralready publishes explicit persistence-request outcomes for persisted mutations. The remaining issue is that actual disk writes are still synchronous store work rather than queued, debounced, atomic background writes.
Current persistence footholds:
RuntimeStoreowns persistent runtime-state serialization, stack preset serialization, and durable file IO.CommittedLiveStateowns current committed/session layer and parameter state.RuntimeCoordinatorResult::persistenceRequestedexists as an explicit mutation outcome.RuntimeEventType::RuntimePersistenceRequestedexists as the event-level persistence request.- Phase 5 clarified which live-state mutations are durable, committed-live, transient automation, or render-local. Settled OSC commits are session-only by default and do not request persistence.
Why Phase 6 Exists
Synchronous persistence is a poor fit for live software. A mutation that changes state should not also have to block on filesystem timing, antivirus scans, slow disks, or transient IO failures. The app needs persistence to be reliable and observable, but not timing-sensitive.
The resilience review calls this out because SavePersistentState() style behavior can create unnecessary stalls and makes recovery harder to reason about.
Phase 6 should turn persistence into:
- request
- snapshot
- background write
- completion/failure observation
Goals
Phase 6 should establish:
- a queued persistence request path
- debounced/coalesced durable-state snapshot writes
- atomic file replacement for runtime-state saves where practical
- structured completion/failure reporting
- clear separation between state mutation and disk flush
- deterministic shutdown flushing policy
- tests for coalescing, snapshot selection, write failure, and shutdown behavior without rendering or DeckLink
Non-Goals
Phase 6 should not require:
- changing live-state layering rules
- changing DeckLink/backend lifecycle
- replacing stack preset semantics wholesale
- adding cloud sync or external storage
- building an unlimited historical state archive
- making every write async immediately if a narrow compatibility path still needs a synchronous result
Target Model
Phase 6 should make persistence a small pipeline:
RuntimeCoordinator accepts mutation
-> publishes/returns persistence request
-> PersistenceWriter captures a durable snapshot from RuntimeStore serialization
-> background worker debounces/coalesces writes
-> atomic write commits file
-> HealthTelemetry/runtime event records success or failure
The key rule is:
RuntimeStoreowns durable state and serializationCommittedLiveStateowns current session state; only coordinator-approved durable snapshots should be persistedPersistenceWriterowns when and how snapshots are writtenRuntimeCoordinatorowns whether a mutation requests persistence
Proposed Collaborators
PersistenceWriter
Owns the worker thread, queue, debounce timer, and write execution.
Responsibilities:
- accept persistence requests
- coalesce repeated runtime-state writes
- request/build a durable snapshot from
RuntimeStore - write to a temporary file and atomically replace the target
- report success/failure observations
- flush on shutdown according to policy
Non-responsibilities:
- deciding mutation validity
- owning durable in-memory state
- composing render snapshots
- blocking render/backend timing paths
PersistenceSnapshot
Immutable write input captured from durable state.
Responsibilities:
- contain serialized runtime-state text or structured data ready to serialize
- identify target path and snapshot generation
- preserve enough metadata for completion/failure diagnostics
Non-responsibilities:
- mutation policy
- file IO
PersistenceRequest
Small request object or event payload.
Expected fields:
- reason/action name
- target kind: runtime state, preset, config if later needed
- optional debounce key
- force/flush flag for explicit save operations
- generation or sequence
Write Policy
Runtime State
Default policy:
- coalesce repeated requests
- debounce short bursts
- write newest snapshot
- report failures without blocking render/control paths
Stack Presets
Preset save is more operator-explicit than routine runtime-state persistence.
Initial policy options:
- keep preset save synchronous while runtime-state persistence becomes async
- or route preset writes through the same worker with a completion result for the caller
Conservative Phase 6 default:
- background runtime-state persistence first
- leave preset save/load synchronous unless the implementation has a clean completion path
Shutdown
Shutdown should explicitly decide:
- flush latest pending snapshot before exit
- skip flush if no pending durable change exists
- report/write failure if flush fails
- avoid indefinite hang on shutdown
Atomicity And Failure Handling
Runtime-state writes should prefer:
- serialize snapshot content in memory
- write to
target.tmp - flush/close file
- replace target atomically where platform support allows
- retain or report backup/failure context if replacement fails
Failures should not silently disappear. They should publish:
- persistence target
- reason/action
- error message
- whether a newer request is pending
- whether the app is still running with unsaved changes
Migration Plan
Step 1. Name Persistence Requests
Make request types and event payloads explicit enough that callers stop thinking in terms of direct disk writes.
Initial target:
- keep existing coordinator persistence decisions
- introduce a
PersistenceRequest/PersistenceSnapshotshape - document which requests are debounceable
Step 2. Extract Snapshot Writing From RuntimeStore
Move file-write mechanics behind a helper while keeping serialization ownership in RuntimeStore.
Initial target:
RuntimeStorecan build serialized runtime-state snapshotsPersistenceWriterwrites the snapshot- existing synchronous save path can call through the writer/helper during transition
Step 3. Add Debounced Background Worker
Introduce a worker thread or queued task owner.
Initial target:
- repeated runtime-state requests coalesce
- worker writes only latest pending snapshot
- tests cover coalescing without filesystem where possible
Step 4. Add Atomic Write And Failure Reporting
Make disk writes safer and observable.
Initial target:
- temp-file then replace
- failure returned/published with structured reason
HealthTelemetryreceives persistence warning state
Step 5. Wire Coordinator/Event Requests To Writer
Route RuntimePersistenceRequested or coordinator persistence outcomes into the writer.
Initial target:
- accepted durable mutations request persistence
- transient-only mutations do not
- runtime reload/preset policies remain explicit
Step 6. Define Shutdown Flush
Make app shutdown persistence behavior deterministic.
Initial target:
- stop accepting new requests
- flush latest pending snapshot with bounded wait
- report failure if flush fails
Testing Strategy
Recommended tests:
- repeated persistence requests coalesce into one write
- newest snapshot wins after multiple mutations
- transient-only mutation does not request persistence
- write failure records an error and keeps unsaved state visible
- shutdown flush writes pending snapshot
- shutdown with no pending request does not write
- preset save path remains explicit
- temp-file replacement success/failure is handled
Useful homes:
RuntimeSubsystemTestsfor coordinator persistence outcomes- a new
PersistenceWriterTeststarget for worker/coalescing/write policy - filesystem tests using a temporary directory for atomic write behavior
Risks
Data Loss Risk
Debouncing introduces a window where in-memory state is newer than disk. Shutdown flush and unsaved-state telemetry are the guardrails.
Complexity Risk
A persistence worker can become a hidden second store if it owns mutable truth. It should own snapshots and write policy only.
Blocking Shutdown Risk
Flushing forever on shutdown is not acceptable. Use bounded waits and visible failure reporting.
Preset Semantics Risk
Operator-triggered preset save often feels like it should complete before reporting success. Keep preset behavior explicit rather than silently changing it.
Phase 6 Exit Criteria
Phase 6 can be considered complete once the project can say:
- durable mutations enqueue persistence instead of directly writing from mutation paths
- runtime-state writes are debounced/coalesced
- writes use temp-file/replace or equivalent atomic policy
- persistence failures are reported through structured health/events
- transient/live-only mutations do not request persistence
- shutdown flush behavior is explicit and tested
RuntimeStoreremains durable-state/serialization owner, not worker policy owner- persistence behavior has focused non-render tests
Open Questions
- Should preset save remain synchronous, or move behind a completion-based async request?
- What debounce interval is appropriate for routine runtime-state writes?
- Should failed persistence retry automatically, or wait for the next mutation/request?
- Should the app expose "unsaved changes" in the UI/health snapshot?
- Should runtime config writes share this worker, or stay separate?
Short Version
Phase 6 should make persistence boring, safe, and off the hot path.
Mutations update in-memory durable state. Persistence requests are queued and coalesced. A background writer saves atomic snapshots and reports failures. Render, backend callbacks, and control ingress should not pay filesystem costs.