Files
video-shader-toys/docs/CURRENT_SYSTEM_ARCHITECTURE.md
Aiden 2531d871e8
All checks were successful
CI / React UI Build (push) Successful in 10s
CI / Native Windows Build And Tests (push) Successful in 2m49s
CI / Windows Release Package (push) Successful in 3m8s
Doc cleanup
2026-05-12 01:37:20 +10:00

17 KiB

Current System Architecture

This document describes how the application currently works.

It replaces the phase-by-phase design trail as the best entry point for understanding the repo. The older phase documents remain useful history, but they mix implementation notes, experiments, and target designs. This document is organized by current runtime behavior and subsystem ownership instead.

Application Shape

The app is a live OpenGL compositor with DeckLink input/output, runtime control services, persistent layer-stack state, live state overlays, health telemetry, and a small internal event model.

At runtime the major subsystems are:

  • OpenGLComposite
  • RuntimeStore
  • RuntimeCoordinator
  • RuntimeSnapshotProvider
  • RuntimeServices
  • RuntimeUpdateController
  • RenderEngine
  • VideoBackend
  • DeckLinkSession
  • HealthTelemetry
  • RuntimeEventDispatcher
  • PersistenceWriter

The key architectural rule is:

  • runtime/control subsystems decide what state should exist
  • render subsystems decide how to draw that state
  • video subsystems decide how frames move to and from hardware
  • telemetry observes behavior without becoming a control plane

Process Startup

The Win32 app creates the window, chooses a pixel format, creates an OpenGL context, initializes COM, and constructs OpenGLComposite.

OpenGLComposite owns the high-level assembly of the runtime:

  • runtime store
  • runtime coordinator
  • runtime services
  • runtime update controller
  • render engine
  • video backend

Startup proceeds broadly as:

  1. COM and OpenGL are initialized by the Win32 app.
  2. OpenGLComposite::InitDeckLink() discovers/configures DeckLink and runtime state.
  3. Runtime services are started.
  4. Shader programs and GL resources are initialized.
  5. The render thread is started.
  6. The video backend starts output preroll and playback.

The normal VS Code debug launch currently sets:

VST_DISABLE_INPUT_CAPTURE=1

That disables DeckLink input capture for output-timing isolation while keeping the output path active.

Runtime State

RuntimeStore

RuntimeStore owns durable runtime data and file-backed state.

It owns:

  • runtime host configuration
  • stored layer stack data
  • persisted parameter values
  • stack presets
  • shader package catalog metadata
  • runtime state presentation data
  • persistence requests

It does not own render-thread resources, DeckLink timing, control ingress, or mutation policy.

CommittedLiveState

CommittedLiveState owns current session/operator layer state that is live but not necessarily persisted as the durable base state.

It gives the renderer and snapshot builder a named read model for current committed layer state.

RuntimeCoordinator

RuntimeCoordinator is the mutation policy boundary.

It validates and applies runtime mutations, classifies whether changes are persisted/committed/transient, emits persistence requests, and produces render reset/reload decisions.

It keeps mutation decisions out of:

  • the render engine
  • control services
  • video backend
  • telemetry

RuntimeSnapshotProvider

RuntimeSnapshotProvider publishes render-facing snapshots.

It owns the currently published render snapshot and gives the render path a stable read boundary. Rendering does not read mutable store objects directly.

Live State And Layering

The current render state is built from named layers of state:

  • persisted layer/package/default state from the runtime store
  • committed live/session state
  • transient live overlays from OSC/control input
  • render-local state owned by the renderer

RuntimeStateLayerModel names these categories. RenderStateComposer and RuntimeLiveState combine live values into render-facing state.

RenderFrameInput and RenderFrameState are the frame contract:

  • RenderFrameInput describes what kind of frame is being built
  • RenderFrameState describes the resolved state used to draw that frame

The renderer should not ask global state systems which snapshot or layer state to use midway through drawing.

Control And Events

RuntimeServices

RuntimeServices owns runtime-facing services such as OSC/control integration and service lifecycle.

It connects control ingress to the coordinator and live-state bridge.

ControlServices

ControlServices handles OSC/control ingress, buffering, and polling/wake behavior.

It does not own runtime mutation policy. It normalizes ingress and asks the coordinator/runtime services to apply changes.

RuntimeEventDispatcher

The app uses typed runtime events for internal coordination and observation.

Events are used for:

  • runtime state broadcast requests
  • shader build lifecycle
  • backend state changes
  • input/output frame observations
  • timing samples
  • health and queue observations

Events say what happened. Commands/request methods still exist where a caller needs an immediate success/failure answer.

Persistence

Persistence is handled by PersistenceWriter.

Runtime mutations can enqueue persistence requests without blocking the render/output path. Shutdown performs a bounded persistence flush.

The store owns durable state; the writer owns background write execution.

Render System

RenderEngine

RenderEngine owns normal runtime OpenGL work.

It starts a dedicated render thread and binds the GL context on that thread. Runtime GL work enters through render-thread requests or render command queues.

The render thread handles:

  • output frame rendering
  • input frame upload
  • preview present
  • screenshot capture
  • render-local resets
  • shader/rebuild application
  • temporal history and shader feedback resources

Startup initialization still happens before the render thread starts while the app explicitly owns the context. Normal runtime work is routed through RenderEngine.

Current Render-Thread Limitation

The current render thread is a shared GL executor, not a pure output-only cadence thread.

This means output render can still be delayed by:

  • input upload work
  • preview present requests
  • screenshot capture
  • render reset commands
  • shader/resource update work
  • synchronous render-thread request queue wait

For output-timing diagnosis, input capture can be disabled with:

VST_DISABLE_INPUT_CAPTURE=1

When enabled, the backend skips DeckLink input configuration/start and HasInputSource() reports false.

OpenGLRenderPipeline

OpenGLRenderPipeline draws the frame and performs output packing/readback.

The current output path:

  1. binds the composite framebuffer
  2. calls the render effect callback
  3. blits/composes into the output framebuffer
  4. packs the output for the configured pixel format
  5. flushes GL
  6. reads output into the provided system-memory output frame
  7. records render/readback timing

For BGRA8 output, the pipeline uses a BGRA-compatible pack framebuffer and async PBO readback by default.

Video Backend

VideoBackend

VideoBackend owns app-level video device lifecycle, output production, system-memory frame slots, and backend playout health.

It owns:

  • backend lifecycle state
  • output production worker
  • output completion worker
  • system-memory output frame pool
  • ready/completed output queue
  • render cadence controller
  • playout policy
  • output frame scheduling into VideoIODevice
  • backend timing and queue telemetry

It does not own GL drawing. It asks OpenGLVideoIOBridge / RenderEngine to render into system-memory output frames.

Lifecycle

The current backend lifecycle includes:

  • discovery
  • configuring
  • configured
  • prerolling
  • running
  • degraded
  • stopping
  • stopped
  • failed

Startup now separates output schedule preparation from scheduled playback:

  1. prepare the DeckLink output schedule
  2. start output completion worker
  3. start output producer worker
  4. warm up rendered system-memory preroll frames
  5. optionally start input streams
  6. start DeckLink scheduled playback

Output Production

The output producer is cadence-driven.

RenderCadenceController tracks the selected output frame duration and decides when the producer should render another frame.

The render producer attempts to render one output frame per selected output tick. It does not speed up just because DeckLink is empty.

If render/GPU work is late enough, the cadence controller can skip late ticks according to policy.

System-Memory Frame Pool

SystemOutputFramePool owns reusable system-memory output slots.

Slots have four states:

  • Free
  • Rendering
  • Completed
  • Scheduled

Completed-but-unscheduled frames are treated as a latest-N cache. If render cadence needs space and old completed frames have not been scheduled, the oldest unscheduled completed frame can be recycled.

Scheduled frames are protected until DeckLink reports completion.

Output Queue

RenderOutputQueue holds completed unscheduled output frames waiting to be scheduled.

It is bounded and latest-N:

  • pushing beyond capacity releases/drops the oldest ready frame
  • DropOldestFrame() is used when the frame pool needs to recycle old completed work

Scheduling

VideoBackend::ScheduleReadyOutputFramesToTarget() schedules completed system-memory frames up to the configured preroll/scheduled target.

DeckLink scheduling is capped by the current app-owned scheduled count. Real DeckLink buffered-frame telemetry is also recorded.

Completion Handling

DeckLink completion callbacks do not render.

The callback path reports completion into VideoBackend, which processes completions on a backend worker. Completion processing:

  • releases the system-memory slot by buffer pointer
  • records pacing
  • accounts for late/drop/flushed/completed result
  • records telemetry
  • wakes the output producer

DeckLinkSession

DeckLinkSession is the DeckLink implementation of VideoIODevice.

It owns:

  • DeckLink discovery
  • input/output mode selection
  • DeckLink input/output interfaces
  • keyer configuration
  • capture and playout delegates
  • schedule-time generation through VideoPlayoutScheduler
  • DeckLink frame scheduling
  • actual buffered-frame telemetry

For output, system-memory frames are scheduled through DeckLink CreateVideoFrameWithBuffer().

When a system-memory frame is scheduled, DeckLinkSession records a map from the DeckLink frame object back to the app-owned system-memory buffer pointer. On completion, the buffer pointer is returned so VideoBackend can release the matching slot.

DeckLinkSession calls GetBufferedVideoFrameCount() after schedule/completion where available.

Telemetry separates:

  • actual DeckLink buffered frames
  • app-owned scheduled system-memory slots
  • synthetic schedule/completion counters
  • late/drop/flushed completion results

Output Timing Experiments And Current Finding

The repo includes DeckLinkRenderCadenceProbe, a small standalone test app under:

apps/DeckLinkRenderCadenceProbe

The probe does not use the main runtime, shader system, preview path, input upload path, or shared render engine. It uses:

  • one OpenGL render thread with its own hidden GL context
  • simple BGRA8 motion rendering
  • async PBO readback
  • latest-N system-memory frame slots
  • a playout thread that feeds DeckLink
  • real rendered warmup before scheduled playback

The first hardware result was smooth at roughly 59.94/60 fps with:

  • renderFps near 59.9
  • scheduleFps near 59.9
  • DeckLink actual buffered frames stable at 4
  • no late frames
  • no dropped frames
  • no PBO misses
  • no completed-frame drops

That proves the clean architecture can work on the test machine. Remaining main-app timing issues are therefore likely integration/ownership issues in the main app rather than a fundamental DeckLink/OpenGL/BGRA8 limitation.

The highest-value current suspects are:

  • input upload sharing the output render thread
  • shared render-thread task queue contention
  • preview/screenshot work
  • runtime/render-state work on the output path

Health Telemetry

HealthTelemetry owns app-visible health and timing observations.

It records:

  • signal/input status
  • performance/render timing
  • event queue timing
  • backend lifecycle/playout state
  • output render queue wait
  • output render/readback timing
  • system-memory frame counts
  • actual DeckLink buffer depth
  • late/drop/flushed/completed frame counters
  • schedule-call timing/failure counts

Several hot-path telemetry calls use try-lock variants so observation does not become a major timing dependency.

Runtime state presentation exposes telemetry through the runtime JSON/open API surface.

Preview And Screenshot

Preview is best-effort.

OpenGLComposite::paintGL() skips preview when the backend reports output pressure. Preview presentation is requested through the render thread.

Screenshot capture is also a render-thread request. It reads pixels from the output framebuffer and writes PNG asynchronously after capture.

Both preview and screenshot share GL execution with output render, so they are secondary to output timing.

Output Readback Modes

The output readback path supports environment-selected modes:

VST_OUTPUT_READBACK_MODE=async_pbo
VST_OUTPUT_READBACK_MODE=sync
VST_OUTPUT_READBACK_MODE=cached_only

Default behavior is async_pbo.

Experiment findings:

  • direct synchronous readback was slower on the sampled machine
  • cached-only recovered timing but is visually invalid for live motion
  • BGRA8 pack framebuffer plus async PBO removed the earlier large readback stall

Current Debug/Experiment Launches

VS Code launch configurations include:

  • Debug LoopThroughWithOpenGLCompositing
  • Debug LoopThroughWithOpenGLCompositing - sync readback experiment
  • Debug LoopThroughWithOpenGLCompositing - cached output experiment
  • Debug DeckLinkRenderCadenceProbe

The default main-app debug launch currently disables input capture with VST_DISABLE_INPUT_CAPTURE=1 so output timing can be tested without input upload interference.

Current Ownership Summary

Area Current Owner
Durable runtime config/state RuntimeStore
Current committed live layer state CommittedLiveState
Mutation validation/policy RuntimeCoordinator
Render snapshot publication RuntimeSnapshotProvider
OSC/control ingress RuntimeServices / ControlServices
Internal event dispatch RuntimeEventDispatcher
Background persistence writes PersistenceWriter
GL context and normal GL work RenderEngine render thread
Render-pass execution and output readback OpenGLRenderPipeline
Device lifecycle and output production VideoBackend
DeckLink API integration DeckLinkSession
Operational health/timing HealthTelemetry

Current Runtime Flow Summary

Control Mutation

OSC/API/control input
  -> RuntimeServices / ControlServices
  -> RuntimeCoordinator
  -> RuntimeStore / CommittedLiveState / RuntimeLiveState
  -> RuntimeSnapshotProvider publication or live overlay update
  -> RuntimeEventDispatcher observations

Output Render

VideoBackend output producer
  -> RenderCadenceController tick
  -> SystemOutputFramePool acquire rendering slot
  -> OpenGLVideoIOBridge::RenderScheduledFrame
  -> RenderEngine::RequestOutputFrame
  -> render thread
  -> OpenGLRenderPipeline::RenderFrame
  -> system-memory output slot
  -> RenderOutputQueue completed frame
RenderOutputQueue completed frame
  -> VideoBackend schedules to target
  -> DeckLinkSession::ScheduleOutputFrame
  -> CreateVideoFrameWithBuffer
  -> ScheduleVideoFrame
  -> DeckLink playback
  -> completion callback
  -> VideoBackend completion worker
  -> release scheduled system-memory slot

Input Capture

When input capture is enabled:

DeckLink input callback
  -> VideoBackend::HandleInputFrame
  -> OpenGLVideoIOBridge::UploadInputFrame
  -> RenderEngine::QueueInputFrame
  -> render thread upload

When VST_DISABLE_INPUT_CAPTURE=1, this flow is skipped.

Known Current Constraints

  • The main app render thread still handles multiple kinds of GL work.
  • Output render still uses a synchronous request/response call into the render thread.
  • Input upload can contend with output render when input capture is enabled.
  • Preview and screenshot share the render thread.
  • Phase/experiment documents still exist as historical notes, but this document is the current architecture summary.

Practical Rules

  • Keep one owner for each kind of state.
  • Keep GL work on the render thread.
  • Keep DeckLink completion callbacks passive.
  • Treat completed unscheduled output frames as latest-N cache entries.
  • Protect scheduled output frames until DeckLink completion.
  • Keep output timing more important than preview/screenshot.
  • Measure timing by domain instead of adding fallback branches blindly.