# Phase 7 Design: Backend Lifecycle And Playout This document expands Phase 7 of [ARCHITECTURE_RESILIENCE_REVIEW.md](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/docs/ARCHITECTURE_RESILIENCE_REVIEW.md) into a concrete design target. Phase 4 made the render thread the sole owner of normal runtime GL work. Phase 7 Step 4 moved DeckLink completion processing onto a backend worker, so the callback no longer directly waits for render-thread output production. Phase 7 Step 5 added a bounded ready-frame queue inside that worker, so scheduling now consumes completed output frames and falls back explicitly on underrun. Phase 7 should make backend lifecycle, buffer policy, playout headroom, and recovery explicit. Phase 5 clarified that live parameter layering stops at final render-state composition. Phase 7 should keep backend lifecycle, output queue ownership, buffer reuse, temporal/feedback resources, and stale-frame/underrun policy outside the persisted/committed/transient parameter model. ## Status - Phase 7 design package: proposed. - Phase 7 implementation: Step 6 complete. - Current alignment: `VideoBackend`, `VideoIODevice`, `DeckLinkSession`, `VideoBackendLifecycle`, and `VideoPlayoutScheduler` exist. Phase 4 removed callback-thread GL ownership, Step 4 moved completion processing onto a backend worker, Step 5 uses `RenderOutputQueue` as the ready-frame handoff inside that worker, and Step 6 replaces fixed late/drop skip-ahead with measured recovery decisions. Current backend footholds: - `VideoBackend` wraps device discovery/configuration, start/stop, input callback handling, output completion handling, and telemetry publication. - `DeckLinkSession` owns DeckLink device handles, frame pool creation, preroll, keyer configuration, and scheduled playback. - `VideoPlayoutPolicy` names current frame pool, preroll, ready-frame, underrun, and catch-up policy defaults. - `RenderOutputQueue` names the future bounded ready-output-frame handoff and has pure queue tests. - `VideoPlayoutScheduler` owns schedule time generation, completion indexing, late/drop streaks, ready-queue pressure input, and measured recovery decisions. - `OpenGLVideoIOBridge` is the current adapter between `VideoBackend` and `RenderEngine`. - `HealthTelemetry` receives some signal, render, and pacing stats. ## Why Phase 7 Exists The current output path works only while render/readback stays comfortably inside budget. A late render can make the callback late, which reduces device-side headroom, which makes the next callback more fragile. The resilience review calls this the main remaining live-resilience risk after Phase 4: - output playout is still effectively filled on demand by a backend completion worker, but scheduling now consumes a bounded ready-frame queue - buffer pool size and preroll depth are not sourced from one policy - late/dropped recovery is a fixed skip rule - backend lifecycle is imperative rather than represented as explicit states Phase 7 should separate hardware timing from render production. ## Goals Phase 7 should establish: - explicit backend lifecycle states and allowed transitions - one playout policy for frame pool size, preroll, headroom, and underrun behavior - a bounded producer/consumer output queue between render and DeckLink scheduling - lightweight DeckLink callbacks that dequeue/schedule/account rather than render - measured recovery from late/dropped frames - structured backend health reporting - tests for scheduler, queue, lifecycle, and underrun policy without DeckLink hardware ## Non-Goals Phase 7 should not require: - a new renderer - changing shader/state composition - changing committed-live or transient automation layering - replacing DeckLink support with multiple backends - full telemetry UI redesign - removing every synchronous API immediately - perfect adaptive latency policy in the first pass ## Target Timing Model The target model is producer/consumer playout: ```text RenderEngine/render scheduler produces completed output frames -> bounded ready-frame queue -> VideoBackend consumes ready frames -> DeckLink callback schedules already-prepared frames ``` The callback should not wait for rendering. It should: - record completion result - recycle/release completed buffers - dequeue a ready frame or apply underrun policy - schedule the next frame - publish backend timing/health observations The queue contains rendered output-frame ownership and scheduling metadata, not live parameter state. Parameter composition should already be resolved before an output frame enters this playout boundary. ## Target Lifecycle Model Suggested backend states: 1. `Uninitialized` 2. `Discovering` 3. `Discovered` 4. `Configuring` 5. `Configured` 6. `Prerolling` 7. `Running` 8. `Degraded` 9. `Stopping` 10. `Stopped` 11. `Failed` Suggested transition rules: - `Uninitialized -> Discovering` - `Discovering -> Discovered | Failed` - `Discovered -> Configuring | Stopped` - `Configuring -> Configured | Failed` - `Configured -> Prerolling | Stopped` - `Prerolling -> Running | Failed | Stopping` - `Running -> Degraded | Stopping | Failed` - `Degraded -> Running | Stopping | Failed` - `Stopping -> Stopped` The exact enum can change, but the lifecycle should become observable and testable. ## Proposed Collaborators ### `VideoBackendStateMachine` Pure or mostly pure lifecycle transition helper. Responsibilities: - validate state transitions - produce transition observations - track failure reasons - keep start/stop/recovery behavior auditable Non-responsibilities: - DeckLink API calls - rendering - persistence ### `PlayoutPolicy` Policy object for queue and timing behavior. Expected fields: - target preroll frames - maximum ready frames - minimum spare device buffers - underrun behavior - maximum catch-up frames - adaptive headroom enabled/disabled ### `RenderOutputQueue` Bounded queue or ring for completed output frames. Responsibilities: - accept completed render outputs - expose ready frames for scheduling - track depth, drops, stale reuse, and underruns - keep ownership/lifetime clear between render and backend ### `OutputFramePool` Backend-owned device buffer pool. Responsibilities: - own DeckLink mutable frames - expose available buffers for render/readback or scheduling - recycle completed frames - report spare-buffer depth ### `PlayoutController` Coordinates policy, ready frames, device schedule times, and completion accounting. Responsibilities: - preroll frames - schedule next frame - handle late/drop/completed/flushed results - apply underrun policy - publish timing state ## Output Queue Policy The initial output queue should be small and bounded. Candidate defaults: - target ready frames: 2-3 - max ready frames: 3-5 - underrun: reuse last completed frame if available, otherwise black - late/drop: increase degraded counters and optionally increase headroom within limits The exact numbers should be measured, but the policy should live in one place instead of being split across constants. ## Underrun Policy When no fresh rendered frame is available, options are: 1. reuse newest completed frame 2. reuse last scheduled frame 3. schedule black/degraded frame 4. skip/catch up schedule time Phase 7 should pick one default and make it visible in telemetry. Reusing the newest completed frame is often the best first policy for live visual continuity, but key/fill behavior may require careful testing. ## Migration Plan ### Step 1. Name Lifecycle States Introduce backend state enum and transition reporting without changing scheduling behavior much. Initial target: - [x] state changes are explicit - [x] invalid transitions are detectable - [x] tests cover allowed transitions Current implementation: - `VideoBackendLifecycle` names backend states and validates allowed transitions. - `VideoBackend` applies lifecycle transitions around discovery, configuration, start, stop, degradation, failure, and resource release. - Existing `BackendStateChangedEvent` publication now uses lifecycle state names for backend lifecycle observations. - `VideoBackendLifecycleTests` cover allowed transitions, rejected invalid transitions, failure reasons, retry, and stable state names. ### Step 2. Create Playout Policy Object Unify fixed constants and scheduler assumptions. Initial target: - [x] frame pool size derives from policy - [x] preroll count derives from policy - [x] late/drop recovery reads policy Current implementation: - `VideoPlayoutPolicy` defines current output frame pool, preroll, ready-frame, spare-buffer, underrun, catch-up, and adaptive-headroom settings. - `DeckLinkSession` uses the policy for output frame pool creation and preroll count. - `VideoPlayoutScheduler` stores the policy and uses `lateOrDropCatchUpFrames` instead of a hard-coded `+2` recovery step. - `VideoPlayoutSchedulerTests` cover default compatibility behavior, policy-driven catch-up, and policy normalization. ### Step 3. Add Ready Output Queue Introduce a bounded queue for completed output frames. Initial target: - [x] pure queue tests - [x] explicit depth/underrun metrics - [x] no DeckLink dependency in queue tests Current implementation: - `RenderOutputQueue` owns a bounded FIFO of `RenderOutputFrame` values. - The queue is configured from `VideoPlayoutPolicy::maxReadyFrames`. - Queue metrics report depth, capacity, pushed, popped, dropped, and underrun counts. - Overflow drops the oldest ready frame, preserving the newest completed output for scheduling. - `RenderOutputQueueTests` cover ordering, bounded overflow, underrun counting, and capacity shrink behavior without DeckLink hardware. ### Step 4. Move Callback Toward Dequeue/Schedule Stop producing frames directly in the completion callback path. Transitional target: - [x] callback wakes/schedules a backend worker - [x] worker consumes ready frames Final target: - callback only records, recycles, dequeues, schedules Current implementation: - `VideoBackend::HandleOutputFrameCompletion(...)` now enqueues completion work and wakes an output-completion worker. - The output-completion worker drains pending completions and runs the existing render/schedule path. - This preserves behavior while removing the direct callback-thread wait on render-thread output production. - Step 5 now makes this worker consume ready frames from `RenderOutputQueue`; Step 4 remains the boundary that keeps output completion callbacks from doing render production directly. ### Step 5. Make Render Produce Ahead Teach render/output code to keep the ready queue filled to target headroom. Initial target: - [x] render thread produces on demand until queue has target depth - [x] callback does not synchronously wait for fresh render - [x] stale/black fallback is explicit on underrun Current implementation: - The backend output-completion worker fills `RenderOutputQueue` to `VideoPlayoutPolicy::targetReadyFrames`. - Scheduling now pops a ready frame from `RenderOutputQueue` instead of directly scheduling the freshly rendered frame. - If no ready frame can be produced, the worker schedules an explicit black fallback frame and reports degraded lifecycle state. - This is still demand-filled by the backend worker; a future pass can make render production more proactive or timer/pressure driven. ### Step 6. Replace Fixed Late/Drop Recovery Replace fixed `+2` schedule-index recovery with measured lag/headroom accounting. Initial target: - [x] track scheduled index, completed index, queue depth, late streak, drop streak - [x] recovery decisions use measured lag Current implementation: - `VideoPlayoutRecoveryDecision` reports completion result, completed index, scheduled index, ready queue depth, scheduled lead, measured lag, catch-up frames, late streak, and drop streak. - `VideoPlayoutScheduler::AccountForCompletionResult(...)` now accepts ready queue depth and returns a recovery decision. - Recovery is measured from late/drop streaks, scheduled lead, and ready queue pressure, then capped by `VideoPlayoutPolicy::lateOrDropCatchUpFrames`. - `VideoBackend` passes the current ready queue depth into the video device completion-accounting call. - `VideoPlayoutSchedulerTests` cover measured late recovery, measured drop recovery, policy caps, completed-index tracking, and streak clearing. ### Step 7. Route Backend Health Structurally Publish backend lifecycle, queue depth, underrun, late/drop, and degraded-state observations through `HealthTelemetry`. ## Testing Strategy Recommended tests: - allowed lifecycle transitions pass - invalid lifecycle transitions fail - playout policy derives frame pool/preroll sizes consistently - output queue preserves ordering - bounded output queue rejects/drops according to policy - underrun reuses last frame or black according to policy - late/drop accounting updates degraded state - scheduler catch-up uses measured lag, not fixed skip - stop drains/recycles device-frame ownership in pure fakes Useful homes: - `VideoPlayoutSchedulerTests` for scheduler evolution - `VideoIODeviceFakeTests` for fake backend lifecycle - a new `VideoBackendStateMachineTests` - a new `RenderOutputQueueTests` ## Risks ### Latency Risk More headroom means more latency. Phase 7 should make latency a visible policy choice. ### Buffer Lifetime Risk Render and backend will share ownership boundaries around output buffers. Frame ownership must be explicit to avoid reuse while hardware still owns a frame. ### Underrun Policy Risk Reusing stale frames can be visually acceptable, but wrong key/fill behavior may be worse than black. Test with real output. ### Callback Thread Risk Even after decoupling render, callback work must stay small and bounded. ### Scope Risk Backend lifecycle and playout queue are related, but either can grow large. Implement in small, testable slices. ## Phase 7 Exit Criteria Phase 7 can be considered complete once the project can say: - [x] backend lifecycle states and transitions are explicit - [x] playout policy owns preroll, pool size, headroom, and underrun behavior - [x] output callbacks no longer synchronously wait for render production - [x] render produces completed output frames into a bounded queue - [x] underrun behavior is explicit and observable - [x] late/drop recovery is measured rather than fixed skip-only - [ ] backend health reports lifecycle, queue, underrun, late, and dropped state - [ ] queue/lifecycle/scheduler behavior has non-DeckLink tests ## Open Questions - What should the default ready-frame depth be at 30fps and 60fps? - Should underrun reuse last completed, last scheduled, or black? - Should output queue depth be user-configurable? - Should render cadence be driven by backend demand, a timer, or queue-fill pressure? - How should external keying influence stale-frame/black fallback? - Should input and output lifecycle states be separate endpoints under one backend shell? ## Short Version Phase 7 should stop making DeckLink callbacks wait for render. Render produces ahead into a bounded queue. The backend consumes ready frames according to explicit lifecycle and playout policy. Queue depth, underruns, late frames, dropped frames, and degraded states become measured and visible.