Files
video-shader-toys/docs/PHASE_6_BACKGROUND_PERSISTENCE_DESIGN.md
Aiden d332dceb5b
Some checks failed
CI / React UI Build (push) Successful in 11s
CI / Native Windows Build And Tests (push) Successful in 2m43s
CI / Windows Release Package (push) Has been cancelled
Step 6
2026-05-11 19:25:29 +10:00

10 KiB

Phase 6 Design: Background Persistence

This document expands Phase 6 of ARCHITECTURE_RESILIENCE_REVIEW.md into a concrete design target.

Phases 1-5 separate durable state, coordination policy, render-facing snapshots, render-thread ownership, and live-state layering. Phase 6 should make disk persistence a background snapshot-writing concern instead of a synchronous side effect of mutations.

Status

  • Phase 6 design package: proposed.
  • Phase 6 implementation: not started.
  • Current alignment: RuntimeStore owns durable serialization, config, package metadata, preset IO, and persistence requests; CommittedLiveState owns the current committed/session layer state; and RuntimeCoordinator already publishes explicit persistence-request outcomes for persisted mutations. The remaining issue is that actual disk writes are still synchronous store work rather than queued, debounced, atomic background writes.

Current persistence footholds:

  • RuntimeStore owns persistent runtime-state serialization, stack preset serialization, and durable file IO.
  • CommittedLiveState owns current committed/session layer and parameter state.
  • RuntimeCoordinatorResult::persistenceRequested exists as an explicit mutation outcome.
  • RuntimeEventType::RuntimePersistenceRequested exists as the event-level persistence request.
  • Phase 5 clarified which live-state mutations are durable, committed-live, transient automation, or render-local. Settled OSC commits are session-only by default and do not request persistence.

Why Phase 6 Exists

Synchronous persistence is a poor fit for live software. A mutation that changes state should not also have to block on filesystem timing, antivirus scans, slow disks, or transient IO failures. The app needs persistence to be reliable and observable, but not timing-sensitive.

The resilience review calls this out because SavePersistentState() style behavior can create unnecessary stalls and makes recovery harder to reason about.

Phase 6 should turn persistence into:

  • request
  • snapshot
  • background write
  • completion/failure observation

Goals

Phase 6 should establish:

  • a queued persistence request path
  • debounced/coalesced durable-state snapshot writes
  • atomic file replacement for runtime-state saves where practical
  • structured completion/failure reporting
  • clear separation between state mutation and disk flush
  • deterministic shutdown flushing policy
  • tests for coalescing, snapshot selection, write failure, and shutdown behavior without rendering or DeckLink

Non-Goals

Phase 6 should not require:

  • changing live-state layering rules
  • changing DeckLink/backend lifecycle
  • replacing stack preset semantics wholesale
  • adding cloud sync or external storage
  • building an unlimited historical state archive
  • making every write async immediately if a narrow compatibility path still needs a synchronous result

Target Model

Phase 6 should make persistence a small pipeline:

RuntimeCoordinator accepts mutation
  -> publishes/returns persistence request
  -> PersistenceWriter captures a durable snapshot from RuntimeStore serialization
  -> background worker debounces/coalesces writes
  -> atomic write commits file
  -> HealthTelemetry/runtime event records success or failure

The key rule is:

  • RuntimeStore owns durable state and serialization
  • CommittedLiveState owns current session state; only coordinator-approved durable snapshots should be persisted
  • PersistenceWriter owns when and how snapshots are written
  • RuntimeCoordinator owns whether a mutation requests persistence

Proposed Collaborators

PersistenceWriter

Owns the worker thread, queue, debounce timer, and write execution.

Responsibilities:

  • accept persistence requests
  • coalesce repeated runtime-state writes
  • request/build a durable snapshot from RuntimeStore
  • write to a temporary file and atomically replace the target
  • report success/failure observations
  • flush on shutdown according to policy

Non-responsibilities:

  • deciding mutation validity
  • owning durable in-memory state
  • composing render snapshots
  • blocking render/backend timing paths

PersistenceSnapshot

Immutable write input captured from durable state.

Responsibilities:

  • contain serialized runtime-state text or structured data ready to serialize
  • identify target path and snapshot generation
  • preserve enough metadata for completion/failure diagnostics

Non-responsibilities:

  • mutation policy
  • file IO

PersistenceRequest

Small request object or event payload.

Expected fields:

  • reason/action name
  • target kind: runtime state, preset, config if later needed
  • optional debounce key
  • force/flush flag for explicit save operations
  • generation or sequence

Write Policy

Runtime State

Default policy:

  • coalesce repeated requests
  • debounce short bursts
  • write newest snapshot
  • report failures without blocking render/control paths

Stack Presets

Preset save is more operator-explicit than routine runtime-state persistence.

Initial policy options:

  • keep preset save synchronous while runtime-state persistence becomes async
  • or route preset writes through the same worker with a completion result for the caller

Conservative Phase 6 default:

  • background runtime-state persistence first
  • leave preset save/load synchronous unless the implementation has a clean completion path

Shutdown

Shutdown should explicitly decide:

  • flush latest pending snapshot before exit
  • skip flush if no pending durable change exists
  • report/write failure if flush fails
  • avoid indefinite hang on shutdown

Atomicity And Failure Handling

Runtime-state writes should prefer:

  1. serialize snapshot content in memory
  2. write to target.tmp
  3. flush/close file
  4. replace target atomically where platform support allows
  5. retain or report backup/failure context if replacement fails

Failures should not silently disappear. They should publish:

  • persistence target
  • reason/action
  • error message
  • whether a newer request is pending
  • whether the app is still running with unsaved changes

Migration Plan

Step 1. Name Persistence Requests

Make request types and event payloads explicit enough that callers stop thinking in terms of direct disk writes.

Initial target:

  • keep existing coordinator persistence decisions
  • introduce a PersistenceRequest/PersistenceSnapshot shape
  • document which requests are debounceable

Step 2. Extract Snapshot Writing From RuntimeStore

Move file-write mechanics behind a helper while keeping serialization ownership in RuntimeStore.

Initial target:

  • RuntimeStore can build serialized runtime-state snapshots
  • PersistenceWriter writes the snapshot
  • existing synchronous save path can call through the writer/helper during transition

Step 3. Add Debounced Background Worker

Introduce a worker thread or queued task owner.

Initial target:

  • repeated runtime-state requests coalesce
  • worker writes only latest pending snapshot
  • tests cover coalescing without filesystem where possible

Step 4. Add Atomic Write And Failure Reporting

Make disk writes safer and observable.

Initial target:

  • temp-file then replace
  • failure returned/published with structured reason
  • HealthTelemetry receives persistence warning state

Step 5. Wire Coordinator/Event Requests To Writer

Route RuntimePersistenceRequested or coordinator persistence outcomes into the writer.

Initial target:

  • accepted durable mutations request persistence
  • transient-only mutations do not
  • runtime reload/preset policies remain explicit

Step 6. Define Shutdown Flush

Make app shutdown persistence behavior deterministic.

Initial target:

  • stop accepting new requests
  • flush latest pending snapshot with bounded wait
  • report failure if flush fails

Testing Strategy

Recommended tests:

  • repeated persistence requests coalesce into one write
  • newest snapshot wins after multiple mutations
  • transient-only mutation does not request persistence
  • write failure records an error and keeps unsaved state visible
  • shutdown flush writes pending snapshot
  • shutdown with no pending request does not write
  • preset save path remains explicit
  • temp-file replacement success/failure is handled

Useful homes:

  • RuntimeSubsystemTests for coordinator persistence outcomes
  • a new PersistenceWriterTests target for worker/coalescing/write policy
  • filesystem tests using a temporary directory for atomic write behavior

Risks

Data Loss Risk

Debouncing introduces a window where in-memory state is newer than disk. Shutdown flush and unsaved-state telemetry are the guardrails.

Complexity Risk

A persistence worker can become a hidden second store if it owns mutable truth. It should own snapshots and write policy only.

Blocking Shutdown Risk

Flushing forever on shutdown is not acceptable. Use bounded waits and visible failure reporting.

Preset Semantics Risk

Operator-triggered preset save often feels like it should complete before reporting success. Keep preset behavior explicit rather than silently changing it.

Phase 6 Exit Criteria

Phase 6 can be considered complete once the project can say:

  • durable mutations enqueue persistence instead of directly writing from mutation paths
  • runtime-state writes are debounced/coalesced
  • writes use temp-file/replace or equivalent atomic policy
  • persistence failures are reported through structured health/events
  • transient/live-only mutations do not request persistence
  • shutdown flush behavior is explicit and tested
  • RuntimeStore remains durable-state/serialization owner, not worker policy owner
  • persistence behavior has focused non-render tests

Open Questions

  • Should preset save remain synchronous, or move behind a completion-based async request?
  • What debounce interval is appropriate for routine runtime-state writes?
  • Should failed persistence retry automatically, or wait for the next mutation/request?
  • Should the app expose "unsaved changes" in the UI/health snapshot?
  • Should runtime config writes share this worker, or stay separate?

Short Version

Phase 6 should make persistence boring, safe, and off the hot path.

Mutations update in-memory durable state. Persistence requests are queued and coalesced. A background writer saves atomic snapshots and reports failures. Render, backend callbacks, and control ingress should not pay filesystem costs.