Files
video-shader-toys/docs/PHASE_1_SUBSYSTEM_BOUNDARIES_DESIGN.md
Aiden 27dbb55f7b
All checks were successful
CI / React UI Build (push) Successful in 11s
CI / Native Windows Build And Tests (push) Successful in 2m36s
CI / Windows Release Package (push) Successful in 2m48s
OSC seperation
2026-05-11 00:26:59 +10:00

25 KiB

Phase 1 Design: Subsystem Boundaries and Target Architecture

This document expands Phase 1 of ARCHITECTURE_RESILIENCE_REVIEW.md into a concrete target design. Its purpose is to define the long-term subsystem split before later phases introduce a full event model, split RuntimeHost, and move rendering onto a sole-owner render thread.

The main goal of Phase 1 is not to immediately rewrite the app. It is to establish clear ownership boundaries so later refactors all move toward the same architecture instead of solving local problems in conflicting ways.

Why Phase 1 Exists

Today the app works, but too many responsibilities still converge in a few places:

  • RuntimeHost owns persistence, live layer state, shader package access, status reporting, and mutation entrypoints.
  • OpenGLComposite coordinates runtime setup, render state retrieval, shader rebuild handling, transient OSC overlay behavior, and video backend integration.
  • DeckLink callback-driven playout still reaches directly into render-facing work.
  • Background services rely on polling and shared mutable state more than explicit subsystem contracts.

Those are exactly the kinds of overlaps that make timing issues, state regressions, and recovery edge cases harder to solve cleanly.

Phase 1 creates a map for where each responsibility should eventually live.

Design Goals

The target architecture should optimize for:

  • live timing isolation
  • explicit state ownership
  • predictable recovery behavior
  • clear boundaries between persistent state and transient live state
  • easier testing of non-GL and non-hardware logic
  • fewer cross-thread shared mutable objects
  • a playout model that can evolve toward producer/consumer scheduling

Non-Goals

Phase 1 does not itself require:

  • replacing every direct call with events immediately
  • moving all rendering to a new thread yet
  • redesigning the shader contract again
  • changing DeckLink behavior in place
  • removing all existing classes before replacements exist

This phase is the target design and the dependency rules. Later phases perform the actual extraction.

Current Pressure Points

The following current code paths are the strongest evidence for the split proposed here:

Target Subsystems

The long-term architecture should converge on seven primary subsystems:

  1. RuntimeStore
  2. RuntimeCoordinator
  3. RuntimeSnapshotProvider
  4. ControlServices
  5. RenderEngine
  6. VideoBackend
  7. HealthTelemetry

The split below is intentionally sharper than the current code. The point is to make ownership obvious.

Subsystem-specific design notes that elaborate these boundaries live under docs/subsystems.

Phase 1 Document Set

This document is the parent note for the Phase 1 subsystem package. The bundle index and subsystem notes live here:

Current Implementation Foothold

The codebase now has an initial Phase 1 compatibility split in place:

These are still compatibility seams, not a completed subsystem extraction. Most of them continue to delegate heavily to RuntimeHost, OpenGLComposite, DeckLinkSession, and the existing bridge/pipeline classes. Their purpose is to give later Phase 1 work real code boundaries that can be expanded in parallel:

  • store-facing UI/runtime control calls in OpenGLCompositeRuntimeControls.cpp now route through RuntimeStore
  • runtime startup for path resolution, config load, persistent state load, and shader package scan now initializes through RuntimeStore
  • runtime/UI state JSON composition now lives in RuntimeStore instead of RuntimeHost
  • regular stored layer mutations and stack preset save/load now live in RuntimeStore instead of RuntimeHost public APIs
  • persisted OSC-by-control-key commits now apply through RuntimeStore, while RuntimeHost no longer exposes the old OSC smoothing/apply helper
  • mutation and reload policy now routes through RuntimeCoordinator
  • render-state and shader-build reads in OpenGLComposite.cpp, OpenGLShaderPrograms.cpp, and ShaderBuildQueue.cpp now route through RuntimeSnapshotProvider
  • render-state assembly, cached parameter refresh, and frame-context application now live in RuntimeSnapshotProvider instead of RuntimeHost public APIs
  • service ingress and polling coordination now route through ControlServices
  • timing and status writes now route through HealthTelemetry
  • HealthTelemetry now owns the live signal, video-I/O, and performance snapshots directly instead of RuntimeHost keeping those backing fields
  • render-side frame advancement and render-performance reporting now flow through RuntimeSnapshotProvider and HealthTelemetry instead of directly through RuntimeHost
  • live OSC overlay state and smoothing/commit decisions now live under RenderEngine instead of OpenGLComposite
  • OpenGLComposite now owns a RenderEngine seam for renderer, pipeline, render-pass, and shader-program responsibilities
  • OpenGLComposite now owns a VideoBackend seam for device/session ownership and callback wiring
  • OpenGLVideoIOBridge now acts as an explicit compatibility adapter between VideoBackend and RenderEngine, instead of OpenGLComposite directly owning both sides

That means the next parallel Phase 1 work can focus on moving responsibility behind these seams instead of first inventing them.

Subsystem Responsibilities

RuntimeStore

RuntimeStore owns persisted and operator-authored state.

It is the source of truth for:

  • runtime config loaded from disk
  • persisted layer stack structure
  • persisted parameter values
  • stack preset serialization/deserialization
  • shader/package metadata that must survive across renders

It should not be responsible for:

  • render-thread timing
  • GL resource lifetime
  • live transient overlays
  • hardware callback coordination
  • UI/websocket broadcasting policy

Design rules:

  • disk I/O belongs here or in its dedicated writer helper
  • values here are authoritative for saved state
  • writes may be debounced later, but the data model itself belongs here

RuntimeCoordinator

RuntimeCoordinator is the mutation and policy layer.

It is responsible for:

  • receiving valid mutation requests from controls, services, or automation
  • validating requested changes against shader definitions and config rules
  • resolving how persisted state, committed live state, and transient overlays should interact
  • requesting snapshot publication when state changes affect render
  • requesting persistence when stored state changes

It should not be responsible for:

  • direct disk serialization details
  • direct GL work
  • hardware device lifecycle
  • polling loops

Design rules:

  • all non-render mutations should eventually flow through this layer
  • this layer decides whether a change is persisted, transient, or both
  • this layer owns state policy, not device policy

RuntimeSnapshotProvider

RuntimeSnapshotProvider publishes render-facing snapshots.

It is responsible for:

  • building immutable or near-immutable render snapshots
  • translating runtime state into render-ready structures
  • publishing versioned snapshots
  • serving the render side without large mutable shared locks

It should not be responsible for:

  • deciding whether a mutation is allowed
  • directly applying UI/OSC requests
  • persistence
  • shader compilation orchestration

Design rules:

  • render consumes snapshots, not live mutable store objects
  • snapshots should be cheap to read and explicit about version changes
  • dynamic frame-only values may still be attached later, but the snapshot shape should stay stable

ControlServices

ControlServices is the ingress boundary for non-render control sources.

It is responsible for:

  • OSC receive and route resolution
  • REST/websocket/control UI ingress
  • file-watch or reload request ingress
  • translating external inputs into typed internal actions/events
  • low-cost buffering/coalescing where appropriate

It should not be responsible for:

  • persistence decisions
  • render snapshot building
  • hardware playout policy
  • direct long-lived state ownership beyond ingress-specific queues

Design rules:

  • external inputs enter here and are normalized before they touch core state
  • service-specific timing concerns stay here unless they affect whole-app policy
  • no service should directly mutate render-facing state structures

RenderEngine

RenderEngine is the owner of live rendering behavior.

It is responsible for:

  • sole ownership of GL work in the target architecture
  • shader program lifecycle once compilation outputs are available
  • texture upload scheduling
  • render-pass execution
  • temporal history and shader feedback resources
  • transient render-only overlays
  • preview production as a subordinate output
  • output-frame production for the video backend

It should not be responsible for:

  • persistence
  • user-facing control normalization
  • hardware discovery/configuration
  • high-level runtime mutation policy

Design rules:

  • render consumes snapshots plus render-local transient state
  • render-local state is allowed if it stays render-local
  • preview must be treated as best-effort relative to playout

VideoBackend

VideoBackend owns input/output device lifecycle and playout policy.

It is responsible for:

  • input device configuration and callbacks
  • output device configuration and callbacks
  • frame scheduling policy
  • buffer-pool ownership
  • playout headroom policy
  • input signal status
  • backend state transitions and recovery logic

It should not be responsible for:

  • composing frames
  • owning GL contexts long-term
  • validating shader parameter changes
  • persistence

Design rules:

  • this subsystem is the consumer of rendered output frames, not the owner of frame composition policy
  • it should evolve toward producer/consumer playout rather than callback-driven rendering
  • backend state should be explicit and reportable

HealthTelemetry

HealthTelemetry owns structured operational visibility.

It is responsible for:

  • logging
  • warning/error counters
  • timing traces
  • subsystem health state
  • degraded-mode reporting
  • operator-visible health summaries

It should not be responsible for:

  • deciding core app behavior
  • owning render or backend state
  • persistence policy

Design rules:

  • all major subsystems publish health information here
  • health visibility should outlive UI connection state
  • modal dialogs should not be the main operational surface

Target Dependency Rules

The architecture should follow these rules as closely as possible.

Allowed dependency directions:

  • ControlServices -> RuntimeCoordinator
  • RuntimeCoordinator -> RuntimeStore
  • RuntimeCoordinator -> RuntimeSnapshotProvider
  • RuntimeCoordinator -> HealthTelemetry
  • RuntimeSnapshotProvider -> RuntimeStore
  • RenderEngine -> RuntimeSnapshotProvider
  • RenderEngine -> HealthTelemetry
  • VideoBackend -> RenderEngine
  • VideoBackend -> HealthTelemetry

Conditionally allowed during migration:

  • ControlServices -> HealthTelemetry
  • ControlServices -> RuntimeStore only through temporary compatibility shims

Not allowed in the target design:

  • RenderEngine -> RuntimeStore
  • RenderEngine -> ControlServices
  • VideoBackend -> RuntimeStore
  • ControlServices -> RenderEngine for direct mutation
  • RuntimeStore -> RenderEngine
  • HealthTelemetry -> any subsystem for control flow

The key principle is:

  • store owns durable data
  • coordinator owns mutation policy
  • snapshot provider owns render-facing state publication
  • render owns live GPU execution
  • backend owns device timing
  • telemetry observes all of them

State Ownership Model

The app has several different kinds of state, and Phase 1 should name them explicitly.

Persisted State

Owned by RuntimeStore.

Examples:

  • layer stack structure
  • selected shader ids
  • saved parameter values
  • runtime host config
  • stack presets

Committed Live State

Owned logically by RuntimeCoordinator, stored in the store or a live-state companion depending on future implementation.

Examples:

  • current operator-selected parameter values
  • current bypass state
  • current selected shader for each layer

This is state that should normally survive until explicitly changed and can be persisted if policy says so.

Transient Live Overlay State

Owned by the subsystem that consumes it, not by the persisted store.

Examples:

  • active OSC overlay targets while automation is flowing
  • shader feedback buffers
  • temporal history textures
  • queued input frames
  • in-flight preview state
  • playout queue state

This is where many current issues come from. The design rule is:

  • transient state may influence output
  • transient state should not masquerade as persisted truth

Health and Timing State

Owned by HealthTelemetry.

Examples:

  • frame pacing stats
  • render timing
  • late/dropped frame counters
  • queue depths
  • warning states

Target Runtime Flow

This section describes the intended long-term flow once later phases are in place.

Control Mutation Flow

  1. OSC/UI/file-watch input enters ControlServices.
  2. ControlServices normalizes it into an internal action or event.
  3. RuntimeCoordinator validates and classifies the action.
  4. If the action changes durable state, RuntimeStore is updated.
  5. If the action changes render-facing state, RuntimeSnapshotProvider publishes a new snapshot.
  6. If the action requires persistence, a persistence request is queued.
  7. Health/timing observations are emitted separately.

Render Flow

  1. RenderEngine consumes the latest published snapshot.
  2. RenderEngine combines that snapshot with render-local transient state.
  3. RenderEngine performs uploads, pass execution, feedback/history maintenance, and output production.
  4. RenderEngine produces:
    • preview-ready output
    • video-backend-ready output frames
    • render timing and warning signals

Video Output Flow

Target long-term flow:

  1. RenderEngine produces completed output frames ahead of demand.
  2. VideoBackend consumes those frames from a bounded queue or ring buffer.
  3. Device callbacks only drive dequeue/schedule/accounting behavior.
  4. HealthTelemetry records queue depth, lateness, underruns, and recovery events.

Reload / Shader Rebuild Flow

  1. file-watch or manual reload enters through ControlServices
  2. RuntimeCoordinator classifies the reload request
  3. RuntimeStore and shader/package metadata are refreshed if needed
  4. RuntimeSnapshotProvider republishes affected snapshot state
  5. RenderEngine rebuilds render-local resources from the new snapshot/build outputs

The important boundary here is that reload is not "a render concern that also touches persistence." It is a coordinated runtime concern with a render-local execution phase.

Suggested Public Interfaces

These are not final class signatures, but they show the shape the architecture should move toward.

RuntimeStore

Core responsibilities:

  • LoadConfig()
  • LoadPersistentState()
  • SavePersistentStateSnapshot(...)
  • GetStoredLayerStack()
  • SetStoredLayerStack(...)
  • GetStackPresetNames()
  • SaveStackPreset(...)
  • LoadStackPreset(...)

RuntimeCoordinator

Core responsibilities:

  • ApplyControlMutation(...)
  • ApplyAutomationTarget(...)
  • ResetLayer(...)
  • RequestReload(...)
  • CommitOverlayState(...)
  • PublishSnapshotIfNeeded()
  • RequestPersistenceIfNeeded()

RuntimeSnapshotProvider

Core responsibilities:

  • BuildSnapshot(...)
  • GetLatestSnapshot()
  • GetSnapshotVersion()
  • PublishSnapshot(...)

ControlServices

Core responsibilities:

  • StartOscIngress(...)
  • StartWebControlIngress(...)
  • StartFileWatchIngress(...)
  • EnqueueControlAction(...)
  • DrainServiceEvents(...)

RenderEngine

Core responsibilities:

  • StartRenderLoop(...)
  • ConsumeSnapshot(...)
  • EnqueueInputFrame(...)
  • ProduceOutputFrame(...)
  • ResetRenderLocalState(...)
  • HandleRebuildOutputs(...)

VideoBackend

Core responsibilities:

  • ConfigureInput(...)
  • ConfigureOutput(...)
  • StartPlayout(...)
  • StopPlayout(...)
  • ConsumeRenderedFrame(...)
  • ReportBackendState(...)

HealthTelemetry

Core responsibilities:

  • RecordTimingSample(...)
  • RecordCounterDelta(...)
  • RaiseWarning(...)
  • ClearWarning(...)
  • AppendLogEntry(...)
  • BuildHealthSnapshot()

Mapping From Current Code to Target Subsystems

This is not a one-to-one rename plan. It is a responsibility migration map.

Current RuntimeHost

Should eventually split across:

  • RuntimeStore
  • RuntimeCoordinator
  • RuntimeSnapshotProvider
  • parts of HealthTelemetry

Likely examples:

  • config loading/saving -> RuntimeStore
  • layer stack mutation validation -> RuntimeCoordinator
  • render state building/versioning -> RuntimeSnapshotProvider
  • timing/status setters -> HealthTelemetry

Current RuntimeServices

Should eventually become mostly:

  • ControlServices
  • a small service-hosting shell

Likely examples:

  • OSC ingress/coalescing -> ControlServices
  • file-watch ingress -> ControlServices
  • deferred service coordination now done by polling -> split between ControlServices and event-driven coordinator calls

Current OpenGLComposite

Should eventually split across:

  • application bootstrap shell
  • RenderEngine
  • orchestration glue that wires subsystems together

Likely examples:

  • render-pass facing code -> RenderEngine
  • app/service/backend bootstrap -> composition root
  • runtime mutation API surface -> coordinator-facing adapter, not render owner

Current OpenGLVideoIOBridge and DeckLinkSession

Should eventually align more clearly under:

  • VideoBackend
  • RenderEngine

Likely examples:

  • device callback and scheduling policy -> VideoBackend
  • GL upload/readback/render work -> RenderEngine

Architectural Guardrails

As later phases begin, these rules should be treated as guardrails.

1. No new cross-cutting state should be added to RuntimeHost

If a new feature needs durable state, place it conceptually under RuntimeStore. If it needs render-local transient state, place it conceptually under RenderEngine. If it needs timing/status counters, place it conceptually under HealthTelemetry.

2. Render-local state should stay render-local

Do not push shader feedback, temporal history, preview caches, or playout queues back into the store just to make them easy to reach from other systems.

3. Device callbacks should not become a dumping ground for app work

Callback threads should converge toward signaling and queue management, not core rendering, persistence, or control mutation.

4. Persistence should not be used as a control synchronization mechanism

Saving state is not how subsystems discover changes. Published snapshots and explicit events should handle that.

5. Health reporting should observe, not coordinate

Telemetry systems may record warnings and degraded states, but they should not become the hidden control plane for the app.

Migration Strategy

Phase 1 is a design phase, but it should support incremental migration.

Recommended order after this document:

  1. Introduce names and interfaces before moving logic.
  2. Create compatibility adapters around RuntimeHost rather than forcing a flag day.
  3. Move read-only render snapshot publication out before moving all mutation logic.
  4. Move service ingress boundaries out before removing the old polling shell.
  5. Isolate timing/health setters from the core store as early as practical.

This keeps progress measurable while reducing rewrite risk.

Suggested Deliverables for Completing Phase 1

Phase 1 can reasonably be considered complete once the project has:

  • this subsystem-boundary design document
  • agreed subsystem names and responsibilities
  • agreed allowed dependency directions
  • explicit state categories: persisted, committed live, transient overlay, health/timing
  • a current-to-target responsibility map for RuntimeHost, RuntimeServices, OpenGLComposite, and backend/render bridge code
  • a decision that later phases will build against this target rather than inventing new boundaries ad hoc

Open Questions For Later Phases

These do not block Phase 1, but they should remain visible.

  • Should shader package registry ownership live entirely in RuntimeStore, or should compile-ready derived registry data move into the snapshot provider?
  • Should committed live state be stored directly in RuntimeStore, or split into store plus live-session state owned by the coordinator?
  • How much of shader build orchestration belongs to RenderEngine versus a separate build service?
  • At what phase should preview become fully decoupled from playout cadence?
  • Should persistence become its own PersistenceWriter subsystem in Phase 6, or remain an implementation detail under RuntimeStore?

Short Version

Phase 1 should establish one simple rule for the rest of the refactor:

  • durable state lives in the store
  • mutation policy lives in the coordinator
  • render-facing state is published as snapshots
  • external control sources enter through services
  • GL work belongs to render
  • hardware pacing belongs to the backend
  • health visibility belongs to telemetry

If later phases keep to that rule, the architecture will become materially more resilient without needing another round of foundational boundary changes.