# Phase 1 Design: Subsystem Boundaries and Target Architecture This document expands Phase 1 of [ARCHITECTURE_RESILIENCE_REVIEW.md](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/docs/ARCHITECTURE_RESILIENCE_REVIEW.md) into a concrete target design. Its purpose is to define the long-term subsystem split before later phases introduce a full event model, split `RuntimeHost`, and move rendering onto a sole-owner render thread. The main goal of Phase 1 is not to immediately rewrite the app. It is to establish clear ownership boundaries so later refactors all move toward the same architecture instead of solving local problems in conflicting ways. ## Why Phase 1 Exists Today the app works, but too many responsibilities still converge in a few places: - `RuntimeHost` owns persistence, live layer state, shader package access, status reporting, and mutation entrypoints. - `OpenGLComposite` coordinates runtime setup, render state retrieval, shader rebuild handling, transient OSC overlay behavior, and video backend integration. - DeckLink callback-driven playout still reaches directly into render-facing work. - Background services rely on polling and shared mutable state more than explicit subsystem contracts. Those are exactly the kinds of overlaps that make timing issues, state regressions, and recovery edge cases harder to solve cleanly. Phase 1 creates a map for where each responsibility should eventually live. ## Design Goals The target architecture should optimize for: - live timing isolation - explicit state ownership - predictable recovery behavior - clear boundaries between persistent state and transient live state - easier testing of non-GL and non-hardware logic - fewer cross-thread shared mutable objects - a playout model that can evolve toward producer/consumer scheduling ## Non-Goals Phase 1 does not itself require: - replacing every direct call with events immediately - moving all rendering to a new thread yet - redesigning the shader contract again - changing DeckLink behavior in place - removing all existing classes before replacements exist This phase is the target design and the dependency rules. Later phases perform the actual extraction. ## Current Pressure Points The following current code paths are the strongest evidence for the split proposed here: - `RuntimeHost` is both store and live authority: - [RuntimeHost.h](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/runtime/RuntimeHost.h:15) - [RuntimeHost.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/runtime/RuntimeHost.cpp:726) - `OpenGLComposite` is both app orchestrator and render/runtime coordinator: - [OpenGLComposite.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/OpenGLComposite.cpp:106) - [OpenGLComposite.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/OpenGLComposite.cpp:283) - `RuntimeServices` mixes service orchestration with polling and deferred state work: - [RuntimeServices.h](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/control/RuntimeServices.h:46) - [RuntimeServices.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/control/RuntimeServices.cpp:194) - Playout is still callback-coupled to render-facing work: - [OpenGLVideoIOBridge.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/pipeline/OpenGLVideoIOBridge.cpp:68) ## Target Subsystems The long-term architecture should converge on six primary subsystems: 1. `RuntimeStore` 2. `RuntimeCoordinator` 3. `RuntimeSnapshotProvider` 4. `ControlServices` 5. `RenderEngine` 6. `VideoBackend` 7. `HealthTelemetry` The split below is intentionally sharper than the current code. The point is to make ownership obvious. ## Subsystem Responsibilities ### `RuntimeStore` `RuntimeStore` owns persisted and operator-authored state. It is the source of truth for: - runtime config loaded from disk - persisted layer stack structure - persisted parameter values - stack preset serialization/deserialization - shader/package metadata that must survive across renders It should not be responsible for: - render-thread timing - GL resource lifetime - live transient overlays - hardware callback coordination - UI/websocket broadcasting policy Design rules: - disk I/O belongs here or in its dedicated writer helper - values here are authoritative for saved state - writes may be debounced later, but the data model itself belongs here ### `RuntimeCoordinator` `RuntimeCoordinator` is the mutation and policy layer. It is responsible for: - receiving valid mutation requests from controls, services, or automation - validating requested changes against shader definitions and config rules - resolving how persisted state, committed live state, and transient overlays should interact - requesting snapshot publication when state changes affect render - requesting persistence when stored state changes It should not be responsible for: - direct disk serialization details - direct GL work - hardware device lifecycle - polling loops Design rules: - all non-render mutations should eventually flow through this layer - this layer decides whether a change is persisted, transient, or both - this layer owns state policy, not device policy ### `RuntimeSnapshotProvider` `RuntimeSnapshotProvider` publishes render-facing snapshots. It is responsible for: - building immutable or near-immutable render snapshots - translating runtime state into render-ready structures - publishing versioned snapshots - serving the render side without large mutable shared locks It should not be responsible for: - deciding whether a mutation is allowed - directly applying UI/OSC requests - persistence - shader compilation orchestration Design rules: - render consumes snapshots, not live mutable store objects - snapshots should be cheap to read and explicit about version changes - dynamic frame-only values may still be attached later, but the snapshot shape should stay stable ### `ControlServices` `ControlServices` is the ingress boundary for non-render control sources. It is responsible for: - OSC receive and route resolution - REST/websocket/control UI ingress - file-watch or reload request ingress - translating external inputs into typed internal actions/events - low-cost buffering/coalescing where appropriate It should not be responsible for: - persistence decisions - render snapshot building - hardware playout policy - direct long-lived state ownership beyond ingress-specific queues Design rules: - external inputs enter here and are normalized before they touch core state - service-specific timing concerns stay here unless they affect whole-app policy - no service should directly mutate render-facing state structures ### `RenderEngine` `RenderEngine` is the owner of live rendering behavior. It is responsible for: - sole ownership of GL work in the target architecture - shader program lifecycle once compilation outputs are available - texture upload scheduling - render-pass execution - temporal history and shader feedback resources - transient render-only overlays - preview production as a subordinate output - output-frame production for the video backend It should not be responsible for: - persistence - user-facing control normalization - hardware discovery/configuration - high-level runtime mutation policy Design rules: - render consumes snapshots plus render-local transient state - render-local state is allowed if it stays render-local - preview must be treated as best-effort relative to playout ### `VideoBackend` `VideoBackend` owns input/output device lifecycle and playout policy. It is responsible for: - input device configuration and callbacks - output device configuration and callbacks - frame scheduling policy - buffer-pool ownership - playout headroom policy - input signal status - backend state transitions and recovery logic It should not be responsible for: - composing frames - owning GL contexts long-term - validating shader parameter changes - persistence Design rules: - this subsystem is the consumer of rendered output frames, not the owner of frame composition policy - it should evolve toward producer/consumer playout rather than callback-driven rendering - backend state should be explicit and reportable ### `HealthTelemetry` `HealthTelemetry` owns structured operational visibility. It is responsible for: - logging - warning/error counters - timing traces - subsystem health state - degraded-mode reporting - operator-visible health summaries It should not be responsible for: - deciding core app behavior - owning render or backend state - persistence policy Design rules: - all major subsystems publish health information here - health visibility should outlive UI connection state - modal dialogs should not be the main operational surface ## Target Dependency Rules The architecture should follow these rules as closely as possible. Allowed dependency directions: - `ControlServices -> RuntimeCoordinator` - `RuntimeCoordinator -> RuntimeStore` - `RuntimeCoordinator -> RuntimeSnapshotProvider` - `RuntimeCoordinator -> HealthTelemetry` - `RuntimeSnapshotProvider -> RuntimeStore` - `RenderEngine -> RuntimeSnapshotProvider` - `RenderEngine -> HealthTelemetry` - `VideoBackend -> RenderEngine` - `VideoBackend -> HealthTelemetry` Conditionally allowed during migration: - `ControlServices -> HealthTelemetry` - `ControlServices -> RuntimeStore` only through temporary compatibility shims Not allowed in the target design: - `RenderEngine -> RuntimeStore` - `RenderEngine -> ControlServices` - `VideoBackend -> RuntimeStore` - `ControlServices -> RenderEngine` for direct mutation - `RuntimeStore -> RenderEngine` - `HealthTelemetry -> any subsystem` for control flow The key principle is: - store owns durable data - coordinator owns mutation policy - snapshot provider owns render-facing state publication - render owns live GPU execution - backend owns device timing - telemetry observes all of them ## State Ownership Model The app has several different kinds of state, and Phase 1 should name them explicitly. ### Persisted State Owned by `RuntimeStore`. Examples: - layer stack structure - selected shader ids - saved parameter values - runtime host config - stack presets ### Committed Live State Owned logically by `RuntimeCoordinator`, stored in the store or a live-state companion depending on future implementation. Examples: - current operator-selected parameter values - current bypass state - current selected shader for each layer This is state that should normally survive until explicitly changed and can be persisted if policy says so. ### Transient Live Overlay State Owned by the subsystem that consumes it, not by the persisted store. Examples: - active OSC overlay targets while automation is flowing - shader feedback buffers - temporal history textures - queued input frames - in-flight preview state - playout queue state This is where many current issues come from. The design rule is: - transient state may influence output - transient state should not masquerade as persisted truth ### Health and Timing State Owned by `HealthTelemetry`. Examples: - frame pacing stats - render timing - late/dropped frame counters - queue depths - warning states ## Target Runtime Flow This section describes the intended long-term flow once later phases are in place. ### Control Mutation Flow 1. OSC/UI/file-watch input enters `ControlServices`. 2. `ControlServices` normalizes it into an internal action or event. 3. `RuntimeCoordinator` validates and classifies the action. 4. If the action changes durable state, `RuntimeStore` is updated. 5. If the action changes render-facing state, `RuntimeSnapshotProvider` publishes a new snapshot. 6. If the action requires persistence, a persistence request is queued. 7. Health/timing observations are emitted separately. ### Render Flow 1. `RenderEngine` consumes the latest published snapshot. 2. `RenderEngine` combines that snapshot with render-local transient state. 3. `RenderEngine` performs uploads, pass execution, feedback/history maintenance, and output production. 4. `RenderEngine` produces: - preview-ready output - video-backend-ready output frames - render timing and warning signals ### Video Output Flow Target long-term flow: 1. `RenderEngine` produces completed output frames ahead of demand. 2. `VideoBackend` consumes those frames from a bounded queue or ring buffer. 3. Device callbacks only drive dequeue/schedule/accounting behavior. 4. `HealthTelemetry` records queue depth, lateness, underruns, and recovery events. ### Reload / Shader Rebuild Flow 1. file-watch or manual reload enters through `ControlServices` 2. `RuntimeCoordinator` classifies the reload request 3. `RuntimeStore` and shader/package metadata are refreshed if needed 4. `RuntimeSnapshotProvider` republishes affected snapshot state 5. `RenderEngine` rebuilds render-local resources from the new snapshot/build outputs The important boundary here is that reload is not "a render concern that also touches persistence." It is a coordinated runtime concern with a render-local execution phase. ## Suggested Public Interfaces These are not final class signatures, but they show the shape the architecture should move toward. ### `RuntimeStore` Core responsibilities: - `LoadConfig()` - `LoadPersistentState()` - `SavePersistentStateSnapshot(...)` - `GetStoredLayerStack()` - `SetStoredLayerStack(...)` - `GetStackPresetNames()` - `SaveStackPreset(...)` - `LoadStackPreset(...)` ### `RuntimeCoordinator` Core responsibilities: - `ApplyControlMutation(...)` - `ApplyAutomationTarget(...)` - `ResetLayer(...)` - `RequestReload(...)` - `CommitOverlayState(...)` - `PublishSnapshotIfNeeded()` - `RequestPersistenceIfNeeded()` ### `RuntimeSnapshotProvider` Core responsibilities: - `BuildSnapshot(...)` - `GetLatestSnapshot()` - `GetSnapshotVersion()` - `PublishSnapshot(...)` ### `ControlServices` Core responsibilities: - `StartOscIngress(...)` - `StartWebControlIngress(...)` - `StartFileWatchIngress(...)` - `EnqueueControlAction(...)` - `DrainServiceEvents(...)` ### `RenderEngine` Core responsibilities: - `StartRenderLoop(...)` - `ConsumeSnapshot(...)` - `EnqueueInputFrame(...)` - `ProduceOutputFrame(...)` - `ResetRenderLocalState(...)` - `HandleRebuildOutputs(...)` ### `VideoBackend` Core responsibilities: - `ConfigureInput(...)` - `ConfigureOutput(...)` - `StartPlayout(...)` - `StopPlayout(...)` - `ConsumeRenderedFrame(...)` - `ReportBackendState(...)` ### `HealthTelemetry` Core responsibilities: - `RecordTimingSample(...)` - `RecordCounterDelta(...)` - `RaiseWarning(...)` - `ClearWarning(...)` - `AppendLogEntry(...)` - `BuildHealthSnapshot()` ## Mapping From Current Code to Target Subsystems This is not a one-to-one rename plan. It is a responsibility migration map. ### Current `RuntimeHost` Should eventually split across: - `RuntimeStore` - `RuntimeCoordinator` - `RuntimeSnapshotProvider` - parts of `HealthTelemetry` Likely examples: - config loading/saving -> `RuntimeStore` - layer stack mutation validation -> `RuntimeCoordinator` - render state building/versioning -> `RuntimeSnapshotProvider` - timing/status setters -> `HealthTelemetry` ### Current `RuntimeServices` Should eventually become mostly: - `ControlServices` - a small service-hosting shell Likely examples: - OSC ingress/coalescing -> `ControlServices` - file-watch ingress -> `ControlServices` - deferred service coordination now done by polling -> split between `ControlServices` and event-driven coordinator calls ### Current `OpenGLComposite` Should eventually split across: - application bootstrap shell - `RenderEngine` - orchestration glue that wires subsystems together Likely examples: - render-pass facing code -> `RenderEngine` - app/service/backend bootstrap -> composition root - runtime mutation API surface -> coordinator-facing adapter, not render owner ### Current `OpenGLVideoIOBridge` and `DeckLinkSession` Should eventually align more clearly under: - `VideoBackend` - `RenderEngine` Likely examples: - device callback and scheduling policy -> `VideoBackend` - GL upload/readback/render work -> `RenderEngine` ## Architectural Guardrails As later phases begin, these rules should be treated as guardrails. ### 1. No new cross-cutting state should be added to `RuntimeHost` If a new feature needs durable state, place it conceptually under `RuntimeStore`. If it needs render-local transient state, place it conceptually under `RenderEngine`. If it needs timing/status counters, place it conceptually under `HealthTelemetry`. ### 2. Render-local state should stay render-local Do not push shader feedback, temporal history, preview caches, or playout queues back into the store just to make them easy to reach from other systems. ### 3. Device callbacks should not become a dumping ground for app work Callback threads should converge toward signaling and queue management, not core rendering, persistence, or control mutation. ### 4. Persistence should not be used as a control synchronization mechanism Saving state is not how subsystems discover changes. Published snapshots and explicit events should handle that. ### 5. Health reporting should observe, not coordinate Telemetry systems may record warnings and degraded states, but they should not become the hidden control plane for the app. ## Migration Strategy Phase 1 is a design phase, but it should support incremental migration. Recommended order after this document: 1. Introduce names and interfaces before moving logic. 2. Create compatibility adapters around `RuntimeHost` rather than forcing a flag day. 3. Move read-only render snapshot publication out before moving all mutation logic. 4. Move service ingress boundaries out before removing the old polling shell. 5. Isolate timing/health setters from the core store as early as practical. This keeps progress measurable while reducing rewrite risk. ## Suggested Deliverables for Completing Phase 1 Phase 1 can reasonably be considered complete once the project has: - this subsystem-boundary design document - agreed subsystem names and responsibilities - agreed allowed dependency directions - explicit state categories: persisted, committed live, transient overlay, health/timing - a current-to-target responsibility map for `RuntimeHost`, `RuntimeServices`, `OpenGLComposite`, and backend/render bridge code - a decision that later phases will build against this target rather than inventing new boundaries ad hoc ## Open Questions For Later Phases These do not block Phase 1, but they should remain visible. - Should shader package registry ownership live entirely in `RuntimeStore`, or should compile-ready derived registry data move into the snapshot provider? - Should committed live state be stored directly in `RuntimeStore`, or split into store plus live-session state owned by the coordinator? - How much of shader build orchestration belongs to `RenderEngine` versus a separate build service? - At what phase should preview become fully decoupled from playout cadence? - Should persistence become its own `PersistenceWriter` subsystem in Phase 6, or remain an implementation detail under `RuntimeStore`? ## Short Version Phase 1 should establish one simple rule for the rest of the refactor: - durable state lives in the store - mutation policy lives in the coordinator - render-facing state is published as snapshots - external control sources enter through services - GL work belongs to render - hardware pacing belongs to the backend - health visibility belongs to telemetry If later phases keep to that rule, the architecture will become materially more resilient without needing another round of foundational boundary changes.