20 KiB
Phase 1 Design: Subsystem Boundaries and Target Architecture
This document expands Phase 1 of ARCHITECTURE_RESILIENCE_REVIEW.md into a concrete target design. Its purpose is to define the long-term subsystem split before later phases introduce a full event model, split RuntimeHost, and move rendering onto a sole-owner render thread.
The main goal of Phase 1 is not to immediately rewrite the app. It is to establish clear ownership boundaries so later refactors all move toward the same architecture instead of solving local problems in conflicting ways.
Why Phase 1 Exists
Today the app works, but too many responsibilities still converge in a few places:
RuntimeHostowns persistence, live layer state, shader package access, status reporting, and mutation entrypoints.OpenGLCompositecoordinates runtime setup, render state retrieval, shader rebuild handling, transient OSC overlay behavior, and video backend integration.- DeckLink callback-driven playout still reaches directly into render-facing work.
- Background services rely on polling and shared mutable state more than explicit subsystem contracts.
Those are exactly the kinds of overlaps that make timing issues, state regressions, and recovery edge cases harder to solve cleanly.
Phase 1 creates a map for where each responsibility should eventually live.
Design Goals
The target architecture should optimize for:
- live timing isolation
- explicit state ownership
- predictable recovery behavior
- clear boundaries between persistent state and transient live state
- easier testing of non-GL and non-hardware logic
- fewer cross-thread shared mutable objects
- a playout model that can evolve toward producer/consumer scheduling
Non-Goals
Phase 1 does not itself require:
- replacing every direct call with events immediately
- moving all rendering to a new thread yet
- redesigning the shader contract again
- changing DeckLink behavior in place
- removing all existing classes before replacements exist
This phase is the target design and the dependency rules. Later phases perform the actual extraction.
Current Pressure Points
The following current code paths are the strongest evidence for the split proposed here:
RuntimeHostis both store and live authority:OpenGLCompositeis both app orchestrator and render/runtime coordinator:RuntimeServicesmixes service orchestration with polling and deferred state work:- Playout is still callback-coupled to render-facing work:
Target Subsystems
The long-term architecture should converge on six primary subsystems:
RuntimeStoreRuntimeCoordinatorRuntimeSnapshotProviderControlServicesRenderEngineVideoBackendHealthTelemetry
The split below is intentionally sharper than the current code. The point is to make ownership obvious.
Subsystem Responsibilities
RuntimeStore
RuntimeStore owns persisted and operator-authored state.
It is the source of truth for:
- runtime config loaded from disk
- persisted layer stack structure
- persisted parameter values
- stack preset serialization/deserialization
- shader/package metadata that must survive across renders
It should not be responsible for:
- render-thread timing
- GL resource lifetime
- live transient overlays
- hardware callback coordination
- UI/websocket broadcasting policy
Design rules:
- disk I/O belongs here or in its dedicated writer helper
- values here are authoritative for saved state
- writes may be debounced later, but the data model itself belongs here
RuntimeCoordinator
RuntimeCoordinator is the mutation and policy layer.
It is responsible for:
- receiving valid mutation requests from controls, services, or automation
- validating requested changes against shader definitions and config rules
- resolving how persisted state, committed live state, and transient overlays should interact
- requesting snapshot publication when state changes affect render
- requesting persistence when stored state changes
It should not be responsible for:
- direct disk serialization details
- direct GL work
- hardware device lifecycle
- polling loops
Design rules:
- all non-render mutations should eventually flow through this layer
- this layer decides whether a change is persisted, transient, or both
- this layer owns state policy, not device policy
RuntimeSnapshotProvider
RuntimeSnapshotProvider publishes render-facing snapshots.
It is responsible for:
- building immutable or near-immutable render snapshots
- translating runtime state into render-ready structures
- publishing versioned snapshots
- serving the render side without large mutable shared locks
It should not be responsible for:
- deciding whether a mutation is allowed
- directly applying UI/OSC requests
- persistence
- shader compilation orchestration
Design rules:
- render consumes snapshots, not live mutable store objects
- snapshots should be cheap to read and explicit about version changes
- dynamic frame-only values may still be attached later, but the snapshot shape should stay stable
ControlServices
ControlServices is the ingress boundary for non-render control sources.
It is responsible for:
- OSC receive and route resolution
- REST/websocket/control UI ingress
- file-watch or reload request ingress
- translating external inputs into typed internal actions/events
- low-cost buffering/coalescing where appropriate
It should not be responsible for:
- persistence decisions
- render snapshot building
- hardware playout policy
- direct long-lived state ownership beyond ingress-specific queues
Design rules:
- external inputs enter here and are normalized before they touch core state
- service-specific timing concerns stay here unless they affect whole-app policy
- no service should directly mutate render-facing state structures
RenderEngine
RenderEngine is the owner of live rendering behavior.
It is responsible for:
- sole ownership of GL work in the target architecture
- shader program lifecycle once compilation outputs are available
- texture upload scheduling
- render-pass execution
- temporal history and shader feedback resources
- transient render-only overlays
- preview production as a subordinate output
- output-frame production for the video backend
It should not be responsible for:
- persistence
- user-facing control normalization
- hardware discovery/configuration
- high-level runtime mutation policy
Design rules:
- render consumes snapshots plus render-local transient state
- render-local state is allowed if it stays render-local
- preview must be treated as best-effort relative to playout
VideoBackend
VideoBackend owns input/output device lifecycle and playout policy.
It is responsible for:
- input device configuration and callbacks
- output device configuration and callbacks
- frame scheduling policy
- buffer-pool ownership
- playout headroom policy
- input signal status
- backend state transitions and recovery logic
It should not be responsible for:
- composing frames
- owning GL contexts long-term
- validating shader parameter changes
- persistence
Design rules:
- this subsystem is the consumer of rendered output frames, not the owner of frame composition policy
- it should evolve toward producer/consumer playout rather than callback-driven rendering
- backend state should be explicit and reportable
HealthTelemetry
HealthTelemetry owns structured operational visibility.
It is responsible for:
- logging
- warning/error counters
- timing traces
- subsystem health state
- degraded-mode reporting
- operator-visible health summaries
It should not be responsible for:
- deciding core app behavior
- owning render or backend state
- persistence policy
Design rules:
- all major subsystems publish health information here
- health visibility should outlive UI connection state
- modal dialogs should not be the main operational surface
Target Dependency Rules
The architecture should follow these rules as closely as possible.
Allowed dependency directions:
ControlServices -> RuntimeCoordinatorRuntimeCoordinator -> RuntimeStoreRuntimeCoordinator -> RuntimeSnapshotProviderRuntimeCoordinator -> HealthTelemetryRuntimeSnapshotProvider -> RuntimeStoreRenderEngine -> RuntimeSnapshotProviderRenderEngine -> HealthTelemetryVideoBackend -> RenderEngineVideoBackend -> HealthTelemetry
Conditionally allowed during migration:
ControlServices -> HealthTelemetryControlServices -> RuntimeStoreonly through temporary compatibility shims
Not allowed in the target design:
RenderEngine -> RuntimeStoreRenderEngine -> ControlServicesVideoBackend -> RuntimeStoreControlServices -> RenderEnginefor direct mutationRuntimeStore -> RenderEngineHealthTelemetry -> any subsystemfor control flow
The key principle is:
- store owns durable data
- coordinator owns mutation policy
- snapshot provider owns render-facing state publication
- render owns live GPU execution
- backend owns device timing
- telemetry observes all of them
State Ownership Model
The app has several different kinds of state, and Phase 1 should name them explicitly.
Persisted State
Owned by RuntimeStore.
Examples:
- layer stack structure
- selected shader ids
- saved parameter values
- runtime host config
- stack presets
Committed Live State
Owned logically by RuntimeCoordinator, stored in the store or a live-state companion depending on future implementation.
Examples:
- current operator-selected parameter values
- current bypass state
- current selected shader for each layer
This is state that should normally survive until explicitly changed and can be persisted if policy says so.
Transient Live Overlay State
Owned by the subsystem that consumes it, not by the persisted store.
Examples:
- active OSC overlay targets while automation is flowing
- shader feedback buffers
- temporal history textures
- queued input frames
- in-flight preview state
- playout queue state
This is where many current issues come from. The design rule is:
- transient state may influence output
- transient state should not masquerade as persisted truth
Health and Timing State
Owned by HealthTelemetry.
Examples:
- frame pacing stats
- render timing
- late/dropped frame counters
- queue depths
- warning states
Target Runtime Flow
This section describes the intended long-term flow once later phases are in place.
Control Mutation Flow
- OSC/UI/file-watch input enters
ControlServices. ControlServicesnormalizes it into an internal action or event.RuntimeCoordinatorvalidates and classifies the action.- If the action changes durable state,
RuntimeStoreis updated. - If the action changes render-facing state,
RuntimeSnapshotProviderpublishes a new snapshot. - If the action requires persistence, a persistence request is queued.
- Health/timing observations are emitted separately.
Render Flow
RenderEngineconsumes the latest published snapshot.RenderEnginecombines that snapshot with render-local transient state.RenderEngineperforms uploads, pass execution, feedback/history maintenance, and output production.RenderEngineproduces:- preview-ready output
- video-backend-ready output frames
- render timing and warning signals
Video Output Flow
Target long-term flow:
RenderEngineproduces completed output frames ahead of demand.VideoBackendconsumes those frames from a bounded queue or ring buffer.- Device callbacks only drive dequeue/schedule/accounting behavior.
HealthTelemetryrecords queue depth, lateness, underruns, and recovery events.
Reload / Shader Rebuild Flow
- file-watch or manual reload enters through
ControlServices RuntimeCoordinatorclassifies the reload requestRuntimeStoreand shader/package metadata are refreshed if neededRuntimeSnapshotProviderrepublishes affected snapshot stateRenderEnginerebuilds render-local resources from the new snapshot/build outputs
The important boundary here is that reload is not "a render concern that also touches persistence." It is a coordinated runtime concern with a render-local execution phase.
Suggested Public Interfaces
These are not final class signatures, but they show the shape the architecture should move toward.
RuntimeStore
Core responsibilities:
LoadConfig()LoadPersistentState()SavePersistentStateSnapshot(...)GetStoredLayerStack()SetStoredLayerStack(...)GetStackPresetNames()SaveStackPreset(...)LoadStackPreset(...)
RuntimeCoordinator
Core responsibilities:
ApplyControlMutation(...)ApplyAutomationTarget(...)ResetLayer(...)RequestReload(...)CommitOverlayState(...)PublishSnapshotIfNeeded()RequestPersistenceIfNeeded()
RuntimeSnapshotProvider
Core responsibilities:
BuildSnapshot(...)GetLatestSnapshot()GetSnapshotVersion()PublishSnapshot(...)
ControlServices
Core responsibilities:
StartOscIngress(...)StartWebControlIngress(...)StartFileWatchIngress(...)EnqueueControlAction(...)DrainServiceEvents(...)
RenderEngine
Core responsibilities:
StartRenderLoop(...)ConsumeSnapshot(...)EnqueueInputFrame(...)ProduceOutputFrame(...)ResetRenderLocalState(...)HandleRebuildOutputs(...)
VideoBackend
Core responsibilities:
ConfigureInput(...)ConfigureOutput(...)StartPlayout(...)StopPlayout(...)ConsumeRenderedFrame(...)ReportBackendState(...)
HealthTelemetry
Core responsibilities:
RecordTimingSample(...)RecordCounterDelta(...)RaiseWarning(...)ClearWarning(...)AppendLogEntry(...)BuildHealthSnapshot()
Mapping From Current Code to Target Subsystems
This is not a one-to-one rename plan. It is a responsibility migration map.
Current RuntimeHost
Should eventually split across:
RuntimeStoreRuntimeCoordinatorRuntimeSnapshotProvider- parts of
HealthTelemetry
Likely examples:
- config loading/saving ->
RuntimeStore - layer stack mutation validation ->
RuntimeCoordinator - render state building/versioning ->
RuntimeSnapshotProvider - timing/status setters ->
HealthTelemetry
Current RuntimeServices
Should eventually become mostly:
ControlServices- a small service-hosting shell
Likely examples:
- OSC ingress/coalescing ->
ControlServices - file-watch ingress ->
ControlServices - deferred service coordination now done by polling -> split between
ControlServicesand event-driven coordinator calls
Current OpenGLComposite
Should eventually split across:
- application bootstrap shell
RenderEngine- orchestration glue that wires subsystems together
Likely examples:
- render-pass facing code ->
RenderEngine - app/service/backend bootstrap -> composition root
- runtime mutation API surface -> coordinator-facing adapter, not render owner
Current OpenGLVideoIOBridge and DeckLinkSession
Should eventually align more clearly under:
VideoBackendRenderEngine
Likely examples:
- device callback and scheduling policy ->
VideoBackend - GL upload/readback/render work ->
RenderEngine
Architectural Guardrails
As later phases begin, these rules should be treated as guardrails.
1. No new cross-cutting state should be added to RuntimeHost
If a new feature needs durable state, place it conceptually under RuntimeStore.
If it needs render-local transient state, place it conceptually under RenderEngine.
If it needs timing/status counters, place it conceptually under HealthTelemetry.
2. Render-local state should stay render-local
Do not push shader feedback, temporal history, preview caches, or playout queues back into the store just to make them easy to reach from other systems.
3. Device callbacks should not become a dumping ground for app work
Callback threads should converge toward signaling and queue management, not core rendering, persistence, or control mutation.
4. Persistence should not be used as a control synchronization mechanism
Saving state is not how subsystems discover changes. Published snapshots and explicit events should handle that.
5. Health reporting should observe, not coordinate
Telemetry systems may record warnings and degraded states, but they should not become the hidden control plane for the app.
Migration Strategy
Phase 1 is a design phase, but it should support incremental migration.
Recommended order after this document:
- Introduce names and interfaces before moving logic.
- Create compatibility adapters around
RuntimeHostrather than forcing a flag day. - Move read-only render snapshot publication out before moving all mutation logic.
- Move service ingress boundaries out before removing the old polling shell.
- Isolate timing/health setters from the core store as early as practical.
This keeps progress measurable while reducing rewrite risk.
Suggested Deliverables for Completing Phase 1
Phase 1 can reasonably be considered complete once the project has:
- this subsystem-boundary design document
- agreed subsystem names and responsibilities
- agreed allowed dependency directions
- explicit state categories: persisted, committed live, transient overlay, health/timing
- a current-to-target responsibility map for
RuntimeHost,RuntimeServices,OpenGLComposite, and backend/render bridge code - a decision that later phases will build against this target rather than inventing new boundaries ad hoc
Open Questions For Later Phases
These do not block Phase 1, but they should remain visible.
- Should shader package registry ownership live entirely in
RuntimeStore, or should compile-ready derived registry data move into the snapshot provider? - Should committed live state be stored directly in
RuntimeStore, or split into store plus live-session state owned by the coordinator? - How much of shader build orchestration belongs to
RenderEngineversus a separate build service? - At what phase should preview become fully decoupled from playout cadence?
- Should persistence become its own
PersistenceWritersubsystem in Phase 6, or remain an implementation detail underRuntimeStore?
Short Version
Phase 1 should establish one simple rule for the rest of the refactor:
- durable state lives in the store
- mutation policy lives in the coordinator
- render-facing state is published as snapshots
- external control sources enter through services
- GL work belongs to render
- hardware pacing belongs to the backend
- health visibility belongs to telemetry
If later phases keep to that rule, the architecture will become materially more resilient without needing another round of foundational boundary changes.