Files
video-shader-toys/docs/PHASE_7_BACKEND_LIFECYCLE_PLAYOUT_DESIGN.md
Aiden 0a7954e879
All checks were successful
CI / React UI Build (push) Successful in 10s
CI / Native Windows Build And Tests (push) Successful in 2m47s
CI / Windows Release Package (push) Successful in 3m2s
Phase 7 done
2026-05-11 21:15:51 +10:00

16 KiB

Phase 7 Design: Backend Lifecycle And Playout

This document expands Phase 7 of ARCHITECTURE_RESILIENCE_REVIEW.md into a concrete design target.

Phase 4 made the render thread the sole owner of normal runtime GL work. Phase 7 Step 4 moved DeckLink completion processing onto a backend worker, so the callback no longer directly waits for render-thread output production. Phase 7 Step 5 added a bounded ready-frame queue inside that worker, so scheduling now consumes completed output frames and falls back explicitly on underrun. Phase 7 should make backend lifecycle, buffer policy, playout headroom, and recovery explicit.

Phase 5 clarified that live parameter layering stops at final render-state composition. Phase 7 should keep backend lifecycle, output queue ownership, buffer reuse, temporal/feedback resources, and stale-frame/underrun policy outside the persisted/committed/transient parameter model.

Status

  • Phase 7 design package: proposed.
  • Phase 7 implementation: complete.
  • Current alignment: VideoBackend, VideoIODevice, DeckLinkSession, VideoBackendLifecycle, and VideoPlayoutScheduler exist. Phase 4 removed callback-thread GL ownership, Step 4 moved completion processing onto a backend worker, Step 5 uses RenderOutputQueue as the ready-frame handoff inside that worker, Step 6 replaces fixed late/drop skip-ahead with measured recovery decisions, and Step 7 reports backend playout health through HealthTelemetry.

Current backend footholds:

  • VideoBackend wraps device discovery/configuration, start/stop, input callback handling, output completion handling, and telemetry publication.
  • DeckLinkSession owns DeckLink device handles, frame pool creation, preroll, keyer configuration, and scheduled playback.
  • VideoPlayoutPolicy names current frame pool, preroll, ready-frame, underrun, and catch-up policy defaults.
  • RenderOutputQueue names the future bounded ready-output-frame handoff and has pure queue tests.
  • VideoPlayoutScheduler owns schedule time generation, completion indexing, late/drop streaks, ready-queue pressure input, and measured recovery decisions.
  • OpenGLVideoIOBridge is the current adapter between VideoBackend and RenderEngine.
  • HealthTelemetry receives signal, render, pacing, lifecycle, queue, underrun, late/drop, and scheduler recovery observations.

Why Phase 7 Exists

The current output path works only while render/readback stays comfortably inside budget. A late render can make the callback late, which reduces device-side headroom, which makes the next callback more fragile.

The resilience review calls this the main remaining live-resilience risk after Phase 4:

  • output playout is still effectively filled on demand by a backend completion worker, but scheduling now consumes a bounded ready-frame queue
  • buffer pool size and preroll depth are not sourced from one policy
  • late/dropped recovery is a fixed skip rule
  • backend lifecycle is imperative rather than represented as explicit states

Phase 7 should separate hardware timing from render production.

Goals

Phase 7 should establish:

  • explicit backend lifecycle states and allowed transitions
  • one playout policy for frame pool size, preroll, headroom, and underrun behavior
  • a bounded producer/consumer output queue between render and DeckLink scheduling
  • lightweight DeckLink callbacks that dequeue/schedule/account rather than render
  • measured recovery from late/dropped frames
  • structured backend health reporting
  • tests for scheduler, queue, lifecycle, and underrun policy without DeckLink hardware

Non-Goals

Phase 7 should not require:

  • a new renderer
  • changing shader/state composition
  • changing committed-live or transient automation layering
  • replacing DeckLink support with multiple backends
  • full telemetry UI redesign
  • removing every synchronous API immediately
  • perfect adaptive latency policy in the first pass

Target Timing Model

The target model is producer/consumer playout:

RenderEngine/render scheduler produces completed output frames
  -> bounded ready-frame queue
  -> VideoBackend consumes ready frames
  -> DeckLink callback schedules already-prepared frames

The callback should not wait for rendering. It should:

  • record completion result
  • recycle/release completed buffers
  • dequeue a ready frame or apply underrun policy
  • schedule the next frame
  • publish backend timing/health observations

The queue contains rendered output-frame ownership and scheduling metadata, not live parameter state. Parameter composition should already be resolved before an output frame enters this playout boundary.

Target Lifecycle Model

Suggested backend states:

  1. Uninitialized
  2. Discovering
  3. Discovered
  4. Configuring
  5. Configured
  6. Prerolling
  7. Running
  8. Degraded
  9. Stopping
  10. Stopped
  11. Failed

Suggested transition rules:

  • Uninitialized -> Discovering
  • Discovering -> Discovered | Failed
  • Discovered -> Configuring | Stopped
  • Configuring -> Configured | Failed
  • Configured -> Prerolling | Stopped
  • Prerolling -> Running | Failed | Stopping
  • Running -> Degraded | Stopping | Failed
  • Degraded -> Running | Stopping | Failed
  • Stopping -> Stopped

The exact enum can change, but the lifecycle should become observable and testable.

Proposed Collaborators

VideoBackendStateMachine

Pure or mostly pure lifecycle transition helper.

Responsibilities:

  • validate state transitions
  • produce transition observations
  • track failure reasons
  • keep start/stop/recovery behavior auditable

Non-responsibilities:

  • DeckLink API calls
  • rendering
  • persistence

PlayoutPolicy

Policy object for queue and timing behavior.

Expected fields:

  • target preroll frames
  • maximum ready frames
  • minimum spare device buffers
  • underrun behavior
  • maximum catch-up frames
  • adaptive headroom enabled/disabled

RenderOutputQueue

Bounded queue or ring for completed output frames.

Responsibilities:

  • accept completed render outputs
  • expose ready frames for scheduling
  • track depth, drops, stale reuse, and underruns
  • keep ownership/lifetime clear between render and backend

OutputFramePool

Backend-owned device buffer pool.

Responsibilities:

  • own DeckLink mutable frames
  • expose available buffers for render/readback or scheduling
  • recycle completed frames
  • report spare-buffer depth

PlayoutController

Coordinates policy, ready frames, device schedule times, and completion accounting.

Responsibilities:

  • preroll frames
  • schedule next frame
  • handle late/drop/completed/flushed results
  • apply underrun policy
  • publish timing state

Output Queue Policy

The initial output queue should be small and bounded.

Candidate defaults:

  • target ready frames: 2-3
  • max ready frames: 3-5
  • underrun: reuse last completed frame if available, otherwise black
  • late/drop: increase degraded counters and optionally increase headroom within limits

The exact numbers should be measured, but the policy should live in one place instead of being split across constants.

Underrun Policy

When no fresh rendered frame is available, options are:

  1. reuse newest completed frame
  2. reuse last scheduled frame
  3. schedule black/degraded frame
  4. skip/catch up schedule time

Phase 7 should pick one default and make it visible in telemetry. Reusing the newest completed frame is often the best first policy for live visual continuity, but key/fill behavior may require careful testing.

Migration Plan

Step 1. Name Lifecycle States

Introduce backend state enum and transition reporting without changing scheduling behavior much.

Initial target:

  • state changes are explicit
  • invalid transitions are detectable
  • tests cover allowed transitions

Current implementation:

  • VideoBackendLifecycle names backend states and validates allowed transitions.
  • VideoBackend applies lifecycle transitions around discovery, configuration, start, stop, degradation, failure, and resource release.
  • Existing BackendStateChangedEvent publication now uses lifecycle state names for backend lifecycle observations.
  • VideoBackendLifecycleTests cover allowed transitions, rejected invalid transitions, failure reasons, retry, and stable state names.

Step 2. Create Playout Policy Object

Unify fixed constants and scheduler assumptions.

Initial target:

  • frame pool size derives from policy
  • preroll count derives from policy
  • late/drop recovery reads policy

Current implementation:

  • VideoPlayoutPolicy defines current output frame pool, preroll, ready-frame, spare-buffer, underrun, catch-up, and adaptive-headroom settings.
  • DeckLinkSession uses the policy for output frame pool creation and preroll count.
  • VideoPlayoutScheduler stores the policy and uses lateOrDropCatchUpFrames instead of a hard-coded +2 recovery step.
  • VideoPlayoutSchedulerTests cover default compatibility behavior, policy-driven catch-up, and policy normalization.

Step 3. Add Ready Output Queue

Introduce a bounded queue for completed output frames.

Initial target:

  • pure queue tests
  • explicit depth/underrun metrics
  • no DeckLink dependency in queue tests

Current implementation:

  • RenderOutputQueue owns a bounded FIFO of RenderOutputFrame values.
  • The queue is configured from VideoPlayoutPolicy::maxReadyFrames.
  • Queue metrics report depth, capacity, pushed, popped, dropped, and underrun counts.
  • Overflow drops the oldest ready frame, preserving the newest completed output for scheduling.
  • RenderOutputQueueTests cover ordering, bounded overflow, underrun counting, and capacity shrink behavior without DeckLink hardware.

Step 4. Move Callback Toward Dequeue/Schedule

Stop producing frames directly in the completion callback path.

Transitional target:

  • callback wakes/schedules a backend worker
  • worker consumes ready frames

Final target:

  • callback only records, recycles, dequeues, schedules

Current implementation:

  • VideoBackend::HandleOutputFrameCompletion(...) now enqueues completion work and wakes an output-completion worker.
  • The output-completion worker drains pending completions and runs the existing render/schedule path.
  • This preserves behavior while removing the direct callback-thread wait on render-thread output production.
  • Step 5 now makes this worker consume ready frames from RenderOutputQueue; Step 4 remains the boundary that keeps output completion callbacks from doing render production directly.

Step 5. Make Render Produce Ahead

Teach render/output code to keep the ready queue filled to target headroom.

Initial target:

  • render thread produces on demand until queue has target depth
  • callback does not synchronously wait for fresh render
  • stale/black fallback is explicit on underrun

Current implementation:

  • The backend output-completion worker fills RenderOutputQueue to VideoPlayoutPolicy::targetReadyFrames.
  • Scheduling now pops a ready frame from RenderOutputQueue instead of directly scheduling the freshly rendered frame.
  • If no ready frame can be produced, the worker schedules an explicit black fallback frame and reports degraded lifecycle state.
  • This is still demand-filled by the backend worker; a future pass can make render production more proactive or timer/pressure driven.

Step 6. Replace Fixed Late/Drop Recovery

Replace fixed +2 schedule-index recovery with measured lag/headroom accounting.

Initial target:

  • track scheduled index, completed index, queue depth, late streak, drop streak
  • recovery decisions use measured lag

Current implementation:

  • VideoPlayoutRecoveryDecision reports completion result, completed index, scheduled index, ready queue depth, scheduled lead, measured lag, catch-up frames, late streak, and drop streak.
  • VideoPlayoutScheduler::AccountForCompletionResult(...) now accepts ready queue depth and returns a recovery decision.
  • Recovery is measured from late/drop streaks, scheduled lead, and ready queue pressure, then capped by VideoPlayoutPolicy::lateOrDropCatchUpFrames.
  • VideoBackend passes the current ready queue depth into the video device completion-accounting call.
  • VideoPlayoutSchedulerTests cover measured late recovery, measured drop recovery, policy caps, completed-index tracking, and streak clearing.

Step 7. Route Backend Health Structurally

Publish backend lifecycle, queue depth, underrun, late/drop, and degraded-state observations through HealthTelemetry.

Initial target:

  • backend lifecycle state is visible in health telemetry
  • ready queue depth, capacity, drops, and underruns are visible
  • late/drop streaks and scheduler recovery decisions are visible
  • runtime-state JSON exposes the backend playout health snapshot

Current implementation:

  • HealthTelemetry::BackendPlayoutSnapshot captures lifecycle state, completion result, ready queue metrics, scheduler indices, scheduled lead, measured lag, catch-up frames, late/drop streaks, aggregate late/drop/flushed counts, degraded state, and status message.
  • VideoBackend::RecordBackendPlayoutHealth(...) samples RenderOutputQueue metrics after each processed output completion and reports the latest scheduler recovery decision.
  • RuntimeStatePresenter publishes the snapshot as backendPlayout, including readyQueue and recovery sections.
  • HealthTelemetryTests cover backend playout health recording, try-record behavior, and inclusion in the full health snapshot.

Testing Strategy

Recommended tests:

  • allowed lifecycle transitions pass
  • invalid lifecycle transitions fail
  • playout policy derives frame pool/preroll sizes consistently
  • output queue preserves ordering
  • bounded output queue rejects/drops according to policy
  • underrun reuses last frame or black according to policy
  • late/drop accounting updates degraded state
  • scheduler catch-up uses measured lag, not fixed skip
  • stop drains/recycles device-frame ownership in pure fakes

Useful homes:

  • VideoPlayoutSchedulerTests for scheduler evolution
  • VideoIODeviceFakeTests for fake backend lifecycle
  • a new VideoBackendStateMachineTests
  • a new RenderOutputQueueTests

Risks

Latency Risk

More headroom means more latency. Phase 7 should make latency a visible policy choice.

Buffer Lifetime Risk

Render and backend will share ownership boundaries around output buffers. Frame ownership must be explicit to avoid reuse while hardware still owns a frame.

Underrun Policy Risk

Reusing stale frames can be visually acceptable, but wrong key/fill behavior may be worse than black. Test with real output.

Callback Thread Risk

Even after decoupling render, callback work must stay small and bounded.

Scope Risk

Backend lifecycle and playout queue are related, but either can grow large. Implement in small, testable slices.

Phase 7 Exit Criteria

Phase 7 can be considered complete once the project can say:

  • backend lifecycle states and transitions are explicit
  • playout policy owns preroll, pool size, headroom, and underrun behavior
  • output callbacks no longer synchronously wait for render production
  • render produces completed output frames into a bounded queue
  • underrun behavior is explicit and observable
  • late/drop recovery is measured rather than fixed skip-only
  • backend health reports lifecycle, queue, underrun, late, and dropped state
  • queue/lifecycle/scheduler behavior has non-DeckLink tests

Open Questions

  • What should the default ready-frame depth be at 30fps and 60fps?
  • Should underrun reuse last completed, last scheduled, or black?
  • Should output queue depth be user-configurable?
  • Should render cadence be driven by backend demand, a timer, or queue-fill pressure?
  • How should external keying influence stale-frame/black fallback?
  • Should input and output lifecycle states be separate endpoints under one backend shell?

Short Version

Phase 7 should stop making DeckLink callbacks wait for render.

Render produces ahead into a bounded queue. The backend consumes ready frames according to explicit lifecycle and playout policy. Queue depth, underruns, late frames, dropped frames, and degraded states become measured and visible.