aiden/video-shader-toys

Fork 0

Files

Aiden 618831d578

CI / React UI Build (push) Successful in 11s

Details

CI / Native Windows Build And Tests (push) Successful in 2m26s

Details

CI / Windows Release Package (push) Successful in 2m32s

Details

Phase 1 goals

2026-05-10 22:36:34 +10:00

26 KiB

Raw Blame History

Architecture Resilience Review

This note summarizes the main architectural improvements that would make the app more resilient during live use, especially around timing isolation, failure isolation, and recoverability.

Phase checklist:

Define subsystem boundaries and target architecture
Introduce an internal event model
Split RuntimeHost
Make the render thread the sole GL owner
Refactor live state layering into an explicit composition model
Move persistence onto a background snapshot writer
Make DeckLink/backend lifecycle explicit with a state machine
Add structured health, telemetry, and operational reporting

Timing Review

The recent OSC work removed several control-path stalls, but the app still has a few deeper timing characteristics that matter for live resilience:

output playout is still effectively render-on-demand from the DeckLink completion callback
output buffering and preroll are now larger, but the buffering model is still static and only loosely related to actual render cost
GPU readback is partly asynchronous, but the fallback path still returns to synchronous readback on any miss
preview presentation is still tied to the playout render path
background service timing still relies on coarse polling sleeps

Those points are important because they affect not just average performance, but how the app behaves under brief spikes, device jitter, or load bursts.

Key Findings

1. `RuntimeHost` is carrying too many responsibilities

RuntimeHost currently acts as:

config store
persistent state store
live parameter/state authority
shader package registry owner
status/telemetry sink
control mutation entrypoint

That makes it a single contention and failure domain. It is also why OSC and render timing issues repeatedly surfaced around shared state access.

Relevant code:

RuntimeHost.h

Recommended direction:

split persisted config/state from live render-facing state
separate status/telemetry updates from control mutation paths
make render consume snapshots rather than sharing a large mutable authority object

2. OpenGL ownership is still centralized behind one shared lock

Even after recent timing improvements, preview, input upload, and playout rendering still rely on one shared GL context protected by one CRITICAL_SECTION.

Relevant code:

This is still a central choke point and limits timing isolation.

Recommended direction:

use one dedicated render thread as the sole GL owner
have input/output/control threads queue work instead of performing GL work directly
remove ad hoc GL use from callback threads

3. Control flow is spread across polling and shared-memory patterns

RuntimeServices currently mixes:

file polling
deferred OSC commit handling
control service orchestration

OSC ingest, overlay application, and host sync are distributed across several components.

Relevant code:

Recommended direction:

introduce a small internal event pipeline or message bus
use typed events for OSC, reloads, persistence requests, and status changes
make timing ownership explicit per subsystem

Example event types:

OscParameterTargeted
RenderOverlaySettled
PersistStateRequested
ShaderReloadRequested
DeckLinkStatusChanged

4. Error handling is still heavily UI-coupled

Failures are often surfaced via MessageBoxA, while background services mainly log with OutputDebugStringA.

Relevant code:

This is not ideal for a live system where modal dialogs and silent debug logging are both poor operational behavior.

Recommended direction:

introduce structured in-app error reporting
define severity levels and counters
prefer degraded runtime states over modal failure handling where possible
add a rolling log file for operational troubleshooting

5. Live OSC overlay and persisted state are still separate concepts without a formal model

The current design works better now, but it still relies on hand-managed reconciliation between:

persisted parameter state in RuntimeHost
transient OSC overlay state in OpenGLComposite

Relevant code:

OpenGLComposite.h

Recommended direction:

Formalize three layers of state:

base persisted state
operator/UI committed state
transient live automation overlay

Then render can always resolve:

final = base + committed + transient

That avoids special-case sync behavior becoming scattered across the code.

6. DeckLink lifecycle could be modeled more explicitly

DeckLinkSession has a number of imperative calls, but startup, preroll, running, degraded, and stopped are not represented as an explicit state machine.

Relevant code:

DeckLinkSession.h

Recommended direction:

introduce explicit session states
define allowed transitions
centralize recovery behavior
make shutdown ordering and degraded-mode behavior more predictable

Timing-specific additions:

separate "device callback received" from "render the next output frame" so output cadence is not driven directly by the completion callback thread
make playout headroom configurable and adaptive instead of using a fixed compile-time preroll count
track an explicit backend health state such as running-steady, catching-up, late, and dropping

Relevant timing code:

Why this matters:

PlayoutFrameCompleted() currently begins an output frame, takes the shared GL path, renders, reads back, and schedules the next frame in one callback-driven flow.
VideoPlayoutScheduler::AccountForCompletionResult() currently reacts to both late and dropped frames by blindly advancing the schedule index by 2, which is simple but not especially robust.
kPrerollFrameCount is now 12, but DeckLinkSession::ConfigureOutput() still creates a fixed pool of 10 mutable output frames. That mismatch suggests the buffering model is not being sized from one coherent source of truth.

Recommended direction:

move playout to a producer/consumer model where a render worker fills output buffers ahead of the DeckLink callback
define buffer-pool sizing from one policy object, for example: preroll depth, minimum spare buffers, and allowed catch-up depth
replace fixed "skip two frames" recovery with measured lag accounting based on actual scheduled-versus-completed position
expose playout latency as a runtime setting or policy, rather than burying it in a constant

6a. The current playout timing model is still callback-coupled

The app now has more headroom, but the next output frame is still produced directly in the scheduled-frame completion callback path.

Relevant code:

That means the completion callback is currently responsible for:

frame pacing accounting
acquiring the next output buffer
taking the GL critical section
rendering the composite
performing output readback
scheduling the next frame

This works when the app is comfortably within budget, but it makes deadline misses much harder to absorb gracefully.

Recommended direction:

make the DeckLink callback a lightweight notifier
have a dedicated playout worker or render worker keep an ahead-of-time queue of ready output frames
treat callback time as control-plane time, not render time

6b. A producer/consumer playout model would be a better long-term fit

The stronger architecture for this app is:

a render scheduler or dedicated render thread runs at the configured video cadence
rendering produces completed output frames ahead of need
those frames are placed into a bounded queue or ring buffer
the DeckLink side consumes already-prepared frames when callbacks indicate they are needed

That is a better fit than callback-driven rendering because it separates:

render timing
GL ownership
output-device timing
latency policy

In that model:

render is the producer
DeckLink is the timing consumer
the queue between them becomes the main place to manage latency versus resilience

Why this is preferable:

brief callback jitter is less likely to become a visible dropped frame
render spikes can be absorbed by queue headroom instead of immediately missing output deadlines
latency becomes an explicit policy choice rather than an incidental side effect of callback timing
queue depth, underruns, stale-frame reuse, and catch-up behavior become measurable and tunable

Recommended direction:

move toward a bounded producer/consumer playout queue
make queue depth and target headroom runtime policy, not compile-time constants
define explicit underrun behavior, for example:
- reuse newest completed frame
- reuse last scheduled frame
- output black or degraded frame
keep DeckLink callbacks limited to dequeue/schedule/accounting work wherever possible

7. Persistence should be more asynchronous and debounced

SavePersistentState() is still called directly from many update paths.

Relevant code:

RuntimeHost.cpp

Recent OSC work already reduced this problem for live automation, but the broader architecture would still benefit from:

a debounced persistence queue
atomic write-behind snapshots
clear separation between state mutation and disk flush

This improves both resilience and timing safety.

8. Telemetry is useful, but still too coarse

The app already records render timing and playout pacing, which is a good foundation.

Relevant code:

Recommended direction:

Add lightweight tracing for:

input callback latency
input upload skip count
GL lock wait time
render queue depth
render time
pass build/compile latency
readback time
output scheduling lag
output queue depth
preroll depth versus spare-buffer depth
preview present cost and skipped-preview count
control queue depth
RuntimeHost lock contention

That would make future tuning and failure diagnosis much easier.

Timing-specific observations from the current code:

render time is captured as one total number in OpenGLRenderPipeline.cpp, but not split into draw, pack, readback wait, readback copy, or preview present
frame pacing stats are recorded in OpenGLVideoIOBridge.cpp, but there is no explicit visibility into how much queued playout headroom remains
input uploads are intentionally skipped when the GL bridge is busy in OpenGLVideoIOBridge.cpp, but the app does not currently surface how often that is happening

8a. Preview and playout are still too close together

The desktop preview is rate-limited, but still presented from inside the render pipeline path.

Relevant code:

This means preview presentation can still consume time on the same path that is trying to meet output deadlines.

Recommended direction:

treat preview as best-effort and entirely subordinate to playout
move preview present to a separate presentation schedule fed from the latest completed render
record preview skips and preview present cost independently from playout timing

8b. Readback is improved, but still not fully deadline-safe

The async readback path is a good step, but the miss path still falls back to synchronous glReadPixels() and then flushes the async pipeline.

Relevant code:

That means a single late GPU fence can push the app back onto the most timing-sensitive path exactly when it is already under pressure.

Recommended direction:

increase readback instrumentation before changing policy again
consider deeper readback buffering or a true stale-frame reuse policy instead of immediate synchronous fallback
separate "freshest possible frame" policy from "never miss output deadline" policy and make that tradeoff explicit

8c. Background control and file-watch timing are still coarse

RuntimeServices::PollLoop() currently uses a 25 x Sleep(10) loop, which gives it a coarse ~250 ms cadence for file-watch polling and deferred OSC commit work.

Relevant code:

RuntimeServices.cpp

That is acceptable for non-critical background work, but it is still too blunt to be the long-term timing model for coordination-heavy runtime services.

Recommended direction:

replace coarse sleep polling with waitable events or condition-variable driven wakeups where practical
isolate truly background work from latency-sensitive control reconciliation
add separate metrics for queue age, not just queue depth

Phased Roadmap

This roadmap is ordered by architectural dependency rather than by “quick wins.” The goal is to move the app toward clearer ownership boundaries and safer live behavior without doing later work on top of foundations that are likely to change again.

Phase 1. Define subsystem boundaries and target architecture

Before changing major internals, formalize the target responsibilities for each major part of the app.

Target split:

RuntimeStore
- persisted config
- persisted layer stack
- preset persistence
RuntimeSnapshot
- render-facing immutable or near-immutable snapshots
- parameter values prepared for the render path
ControlServices
- OSC ingress
- web control ingress
- reload/file-watch requests
- commit/persist requests
RenderEngine
- sole owner of live GL rendering
- sole consumer of render snapshots plus transient overlays
VideoBackend
- DeckLink input/output lifecycle
- pacing and scheduling
Health/Telemetry
- logging
- counters
- timing traces
- degraded-state reporting

Why this phase comes first:

it prevents later refactors from reintroducing responsibility overlap
it gives names to the seams the later phases will build around
it reduces the risk of replacing one monolith with several poorly-defined ones

Suggested deliverables:

a short architecture diagram
a responsibility table for each subsystem
a list of allowed dependencies between subsystems
a dedicated Phase 1 design note:
- PHASE_1_SUBSYSTEM_BOUNDARIES_DESIGN.md

Phase 2. Introduce an internal event model

Once subsystem boundaries are defined, introduce a typed event pipeline between them. This should happen before large state splits so the app has a stable coordination model.

Example event families:

control events
- OscParameterTargeted
- UiParameterCommitted
- TriggerFired
runtime events
- ShaderReloadRequested
- PackagesRescanned
- PersistStateRequested
render events
- OverlayApplied
- OverlaySettled
- SnapshotPublished
backend events
- InputSignalChanged
- OutputLateFrameDetected
- OutputDroppedFrameDetected
health events
- SubsystemWarningRaised
- SubsystemRecovered

Why this phase comes second:

it provides a migration path away from direct cross-calls
it makes ownership explicit before data structures are split apart
it lets you move one subsystem at a time without losing coordination

Suggested outcome:

the app stops relying on “shared object plus mutex plus polling” as the default coordination pattern

Phase 3. Split `RuntimeHost` into persistent state, render snapshot state, and service-facing coordination

After the event model exists, break apart RuntimeHost.

Recommended split:

RuntimeStore
- owns config and saved layer data
- handles serialization/deserialization
- does not sit on the live render path
RuntimeCoordinator
- resolves control actions
- validates mutations
- publishes new snapshots
- bridges events between services and render
RuntimeSnapshotProvider
- publishes immutable render snapshots
- avoids large shared mutable structures on the render path

Why this phase comes before render-thread isolation:

render isolation is easier when the render thread consumes clean snapshots instead of a large mutable host object
otherwise the GL refactor still drags along too much shared state complexity

Primary design rule:

render should read snapshots
persistence should write stored state
services should request mutations through the coordinator

Phase 4. Make the render thread the sole GL owner

With state and coordination cleaner, move to a dedicated render-thread model.

Target behavior:

one thread owns the GL context
input callbacks never perform GL work directly
output callbacks never perform GL work directly
preview presentation, texture upload, render passes, readback, and output pack work are all issued by the render thread

Other threads should only:

enqueue new video frames
enqueue control updates
enqueue backend events
consume produced output buffers

Why this phase comes here:

it is much safer once state access and control coordination are no longer centered on RuntimeHost
it avoids coupling the render-thread refactor to storage and service refactors at the same time

Expected benefits:

less cross-thread GL contention
easier timing reasoning
much lower risk of callback-driven stalls
a clearer foundation for future GPU pipeline work

Phase 5. Refactor live state layering into an explicit composition model

Once rendering and snapshots are isolated, formalize how final parameter values are derived.

Recommended layers:

base persisted state
operator-committed live state
transient automation overlay

Render should derive final values from a clear composition rule such as:

final = base + committed + transient

Why this phase follows render isolation:

once render owns snapshot consumption, it becomes much easier to cleanly evaluate layered state without touching persistence or control services
it turns the current OSC overlay behavior into a first-class model instead of an implementation detail

Expected benefits:

fewer one-off sync rules
clearer behavior for OSC, UI changes, and automation
easier future expansion to presets, cues, or timed transitions

Phase 6. Move persistence onto a background snapshot writer

After the state model is explicit, persistence should become a background concern rather than a synchronous side effect of mutations.

Target behavior:

mutations update authoritative in-memory stored state
persistence requests are queued
disk writes are debounced and coalesced
writes are atomic and versioned where practical

Why this phase comes after state splitting:

otherwise persistence logic will need to be rewritten twice
it should operate on the new RuntimeStore model, not on the current mixed-responsibility object

Expected benefits:

less timing interference
better corruption resistance
cleaner restart/recovery semantics

Phase 7. Make DeckLink/backend lifecycle explicit with a state machine

Once the render and state layers are cleaner, refactor the video backend into an explicit lifecycle model.

Suggested states:

uninitialized
devices-discovered
configured
prerolling
running
degraded
stopping
stopped
failed

Why this phase belongs here:

the backend should integrate with the new event model
degraded/recovery behavior will be easier once rendering and state coordination are already more deterministic

Expected benefits:

safer startup/shutdown ordering
clearer recovery behavior
easier handling of missing input, dropped frames, or reconfiguration
a clearer place to own playout headroom policy, output queue sizing, and late-frame recovery behavior

Phase 8. Add structured health, telemetry, and operational reporting

This phase should happen after the main ownership changes so the telemetry can reflect the final architecture instead of a transient one.

Recommended coverage:

render queue depth
GL lock wait time, if any shared lock remains
input callback latency
input upload skip count
output scheduling lag
output queue depth and spare-buffer depth
readback timing
readback fence wait timing
synchronous readback fallback count
preview present timing and skipped-preview count
snapshot publish frequency
persistence queue depth
event queue depth
backend state transitions
warning/error counters per subsystem

Also replace modal-only error handling with:

structured in-app health state
severity-based logging
rolling log files
operator-visible degraded-state messages

Why this phase comes last:

it should instrument the architecture you intend to keep
otherwise instrumentation work gets invalidated by the refactor

Recommended Execution Order

If this is approached as a serious architecture program rather than opportunistic cleanup, the recommended order is:

Define subsystem boundaries and target architecture.
Introduce the internal event model.
Split RuntimeHost.
Make the render thread the sole GL owner.
Formalize live state layering and composition.
Move persistence to a background snapshot writer.
Refactor DeckLink/backend lifecycle into an explicit state machine.
Add structured telemetry, health reporting, and operational diagnostics.

Why This Order Makes Sense

This order tries to avoid doing foundational work twice.

The event model comes before major subsystem extraction so coordination patterns stabilize early.
RuntimeHost is split before render isolation so the render thread does not inherit the current monolithic state model.
Live state layering is formalized only after render ownership is clearer.
Persistence is moved later so it can target the final state model rather than the current one.
Telemetry is intentionally late so it instruments the architecture that survives the refactor.

Short Version

The app is in a much better place than it was before the OSC timing work, but the main remaining architectural risk is still shared ownership. Too many responsibilities converge on RuntimeHost and the shared GL path. The most sensible path forward is:

define boundaries
establish an event model
split state ownership
isolate rendering
formalize layered live state
background persistence
explicit backend lifecycle
health and telemetry

That sequence gives each later phase a cleaner foundation than the current app has today.

26 KiB Raw Blame History

Architecture Resilience Review

Timing Review

Key Findings

1. RuntimeHost is carrying too many responsibilities

2. OpenGL ownership is still centralized behind one shared lock

3. Control flow is spread across polling and shared-memory patterns

4. Error handling is still heavily UI-coupled

5. Live OSC overlay and persisted state are still separate concepts without a formal model

6. DeckLink lifecycle could be modeled more explicitly

6a. The current playout timing model is still callback-coupled

6b. A producer/consumer playout model would be a better long-term fit

7. Persistence should be more asynchronous and debounced

8. Telemetry is useful, but still too coarse

8a. Preview and playout are still too close together

8b. Readback is improved, but still not fully deadline-safe

8c. Background control and file-watch timing are still coarse

Phased Roadmap

Phase 1. Define subsystem boundaries and target architecture

Phase 2. Introduce an internal event model

Phase 3. Split RuntimeHost into persistent state, render snapshot state, and service-facing coordination

Phase 4. Make the render thread the sole GL owner

Phase 5. Refactor live state layering into an explicit composition model

Phase 6. Move persistence onto a background snapshot writer

Phase 7. Make DeckLink/backend lifecycle explicit with a state machine

Phase 8. Add structured health, telemetry, and operational reporting

Recommended Execution Order

Why This Order Makes Sense

Short Version

26 KiB

Raw Blame History

1. `RuntimeHost` is carrying too many responsibilities

Phase 3. Split `RuntimeHost` into persistent state, render snapshot state, and service-facing coordination