Files
video-shader-toys/docs/PHASE_7_BACKEND_LIFECYCLE_PLAYOUT_DESIGN.md
Aiden f288455709
All checks were successful
CI / React UI Build (push) Successful in 11s
CI / Native Windows Build And Tests (push) Successful in 2m47s
CI / Windows Release Package (push) Successful in 3m2s
Phase 7
2026-05-11 21:05:11 +10:00

385 lines
15 KiB
Markdown

# Phase 7 Design: Backend Lifecycle And Playout
This document expands Phase 7 of [ARCHITECTURE_RESILIENCE_REVIEW.md](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/docs/ARCHITECTURE_RESILIENCE_REVIEW.md) into a concrete design target.
Phase 4 made the render thread the sole owner of normal runtime GL work. Phase 7 Step 4 moved DeckLink completion processing onto a backend worker, so the callback no longer directly waits for render-thread output production. Phase 7 Step 5 added a bounded ready-frame queue inside that worker, so scheduling now consumes completed output frames and falls back explicitly on underrun. Phase 7 should make backend lifecycle, buffer policy, playout headroom, and recovery explicit.
Phase 5 clarified that live parameter layering stops at final render-state composition. Phase 7 should keep backend lifecycle, output queue ownership, buffer reuse, temporal/feedback resources, and stale-frame/underrun policy outside the persisted/committed/transient parameter model.
## Status
- Phase 7 design package: proposed.
- Phase 7 implementation: Step 6 complete.
- Current alignment: `VideoBackend`, `VideoIODevice`, `DeckLinkSession`, `VideoBackendLifecycle`, and `VideoPlayoutScheduler` exist. Phase 4 removed callback-thread GL ownership, Step 4 moved completion processing onto a backend worker, Step 5 uses `RenderOutputQueue` as the ready-frame handoff inside that worker, and Step 6 replaces fixed late/drop skip-ahead with measured recovery decisions.
Current backend footholds:
- `VideoBackend` wraps device discovery/configuration, start/stop, input callback handling, output completion handling, and telemetry publication.
- `DeckLinkSession` owns DeckLink device handles, frame pool creation, preroll, keyer configuration, and scheduled playback.
- `VideoPlayoutPolicy` names current frame pool, preroll, ready-frame, underrun, and catch-up policy defaults.
- `RenderOutputQueue` names the future bounded ready-output-frame handoff and has pure queue tests.
- `VideoPlayoutScheduler` owns schedule time generation, completion indexing, late/drop streaks, ready-queue pressure input, and measured recovery decisions.
- `OpenGLVideoIOBridge` is the current adapter between `VideoBackend` and `RenderEngine`.
- `HealthTelemetry` receives some signal, render, and pacing stats.
## Why Phase 7 Exists
The current output path works only while render/readback stays comfortably inside budget. A late render can make the callback late, which reduces device-side headroom, which makes the next callback more fragile.
The resilience review calls this the main remaining live-resilience risk after Phase 4:
- output playout is still effectively filled on demand by a backend completion worker, but scheduling now consumes a bounded ready-frame queue
- buffer pool size and preroll depth are not sourced from one policy
- late/dropped recovery is a fixed skip rule
- backend lifecycle is imperative rather than represented as explicit states
Phase 7 should separate hardware timing from render production.
## Goals
Phase 7 should establish:
- explicit backend lifecycle states and allowed transitions
- one playout policy for frame pool size, preroll, headroom, and underrun behavior
- a bounded producer/consumer output queue between render and DeckLink scheduling
- lightweight DeckLink callbacks that dequeue/schedule/account rather than render
- measured recovery from late/dropped frames
- structured backend health reporting
- tests for scheduler, queue, lifecycle, and underrun policy without DeckLink hardware
## Non-Goals
Phase 7 should not require:
- a new renderer
- changing shader/state composition
- changing committed-live or transient automation layering
- replacing DeckLink support with multiple backends
- full telemetry UI redesign
- removing every synchronous API immediately
- perfect adaptive latency policy in the first pass
## Target Timing Model
The target model is producer/consumer playout:
```text
RenderEngine/render scheduler produces completed output frames
-> bounded ready-frame queue
-> VideoBackend consumes ready frames
-> DeckLink callback schedules already-prepared frames
```
The callback should not wait for rendering. It should:
- record completion result
- recycle/release completed buffers
- dequeue a ready frame or apply underrun policy
- schedule the next frame
- publish backend timing/health observations
The queue contains rendered output-frame ownership and scheduling metadata, not live parameter state. Parameter composition should already be resolved before an output frame enters this playout boundary.
## Target Lifecycle Model
Suggested backend states:
1. `Uninitialized`
2. `Discovering`
3. `Discovered`
4. `Configuring`
5. `Configured`
6. `Prerolling`
7. `Running`
8. `Degraded`
9. `Stopping`
10. `Stopped`
11. `Failed`
Suggested transition rules:
- `Uninitialized -> Discovering`
- `Discovering -> Discovered | Failed`
- `Discovered -> Configuring | Stopped`
- `Configuring -> Configured | Failed`
- `Configured -> Prerolling | Stopped`
- `Prerolling -> Running | Failed | Stopping`
- `Running -> Degraded | Stopping | Failed`
- `Degraded -> Running | Stopping | Failed`
- `Stopping -> Stopped`
The exact enum can change, but the lifecycle should become observable and testable.
## Proposed Collaborators
### `VideoBackendStateMachine`
Pure or mostly pure lifecycle transition helper.
Responsibilities:
- validate state transitions
- produce transition observations
- track failure reasons
- keep start/stop/recovery behavior auditable
Non-responsibilities:
- DeckLink API calls
- rendering
- persistence
### `PlayoutPolicy`
Policy object for queue and timing behavior.
Expected fields:
- target preroll frames
- maximum ready frames
- minimum spare device buffers
- underrun behavior
- maximum catch-up frames
- adaptive headroom enabled/disabled
### `RenderOutputQueue`
Bounded queue or ring for completed output frames.
Responsibilities:
- accept completed render outputs
- expose ready frames for scheduling
- track depth, drops, stale reuse, and underruns
- keep ownership/lifetime clear between render and backend
### `OutputFramePool`
Backend-owned device buffer pool.
Responsibilities:
- own DeckLink mutable frames
- expose available buffers for render/readback or scheduling
- recycle completed frames
- report spare-buffer depth
### `PlayoutController`
Coordinates policy, ready frames, device schedule times, and completion accounting.
Responsibilities:
- preroll frames
- schedule next frame
- handle late/drop/completed/flushed results
- apply underrun policy
- publish timing state
## Output Queue Policy
The initial output queue should be small and bounded.
Candidate defaults:
- target ready frames: 2-3
- max ready frames: 3-5
- underrun: reuse last completed frame if available, otherwise black
- late/drop: increase degraded counters and optionally increase headroom within limits
The exact numbers should be measured, but the policy should live in one place instead of being split across constants.
## Underrun Policy
When no fresh rendered frame is available, options are:
1. reuse newest completed frame
2. reuse last scheduled frame
3. schedule black/degraded frame
4. skip/catch up schedule time
Phase 7 should pick one default and make it visible in telemetry. Reusing the newest completed frame is often the best first policy for live visual continuity, but key/fill behavior may require careful testing.
## Migration Plan
### Step 1. Name Lifecycle States
Introduce backend state enum and transition reporting without changing scheduling behavior much.
Initial target:
- [x] state changes are explicit
- [x] invalid transitions are detectable
- [x] tests cover allowed transitions
Current implementation:
- `VideoBackendLifecycle` names backend states and validates allowed transitions.
- `VideoBackend` applies lifecycle transitions around discovery, configuration, start, stop, degradation, failure, and resource release.
- Existing `BackendStateChangedEvent` publication now uses lifecycle state names for backend lifecycle observations.
- `VideoBackendLifecycleTests` cover allowed transitions, rejected invalid transitions, failure reasons, retry, and stable state names.
### Step 2. Create Playout Policy Object
Unify fixed constants and scheduler assumptions.
Initial target:
- [x] frame pool size derives from policy
- [x] preroll count derives from policy
- [x] late/drop recovery reads policy
Current implementation:
- `VideoPlayoutPolicy` defines current output frame pool, preroll, ready-frame, spare-buffer, underrun, catch-up, and adaptive-headroom settings.
- `DeckLinkSession` uses the policy for output frame pool creation and preroll count.
- `VideoPlayoutScheduler` stores the policy and uses `lateOrDropCatchUpFrames` instead of a hard-coded `+2` recovery step.
- `VideoPlayoutSchedulerTests` cover default compatibility behavior, policy-driven catch-up, and policy normalization.
### Step 3. Add Ready Output Queue
Introduce a bounded queue for completed output frames.
Initial target:
- [x] pure queue tests
- [x] explicit depth/underrun metrics
- [x] no DeckLink dependency in queue tests
Current implementation:
- `RenderOutputQueue` owns a bounded FIFO of `RenderOutputFrame` values.
- The queue is configured from `VideoPlayoutPolicy::maxReadyFrames`.
- Queue metrics report depth, capacity, pushed, popped, dropped, and underrun counts.
- Overflow drops the oldest ready frame, preserving the newest completed output for scheduling.
- `RenderOutputQueueTests` cover ordering, bounded overflow, underrun counting, and capacity shrink behavior without DeckLink hardware.
### Step 4. Move Callback Toward Dequeue/Schedule
Stop producing frames directly in the completion callback path.
Transitional target:
- [x] callback wakes/schedules a backend worker
- [x] worker consumes ready frames
Final target:
- callback only records, recycles, dequeues, schedules
Current implementation:
- `VideoBackend::HandleOutputFrameCompletion(...)` now enqueues completion work and wakes an output-completion worker.
- The output-completion worker drains pending completions and runs the existing render/schedule path.
- This preserves behavior while removing the direct callback-thread wait on render-thread output production.
- Step 5 now makes this worker consume ready frames from `RenderOutputQueue`; Step 4 remains the boundary that keeps output completion callbacks from doing render production directly.
### Step 5. Make Render Produce Ahead
Teach render/output code to keep the ready queue filled to target headroom.
Initial target:
- [x] render thread produces on demand until queue has target depth
- [x] callback does not synchronously wait for fresh render
- [x] stale/black fallback is explicit on underrun
Current implementation:
- The backend output-completion worker fills `RenderOutputQueue` to `VideoPlayoutPolicy::targetReadyFrames`.
- Scheduling now pops a ready frame from `RenderOutputQueue` instead of directly scheduling the freshly rendered frame.
- If no ready frame can be produced, the worker schedules an explicit black fallback frame and reports degraded lifecycle state.
- This is still demand-filled by the backend worker; a future pass can make render production more proactive or timer/pressure driven.
### Step 6. Replace Fixed Late/Drop Recovery
Replace fixed `+2` schedule-index recovery with measured lag/headroom accounting.
Initial target:
- [x] track scheduled index, completed index, queue depth, late streak, drop streak
- [x] recovery decisions use measured lag
Current implementation:
- `VideoPlayoutRecoveryDecision` reports completion result, completed index, scheduled index, ready queue depth, scheduled lead, measured lag, catch-up frames, late streak, and drop streak.
- `VideoPlayoutScheduler::AccountForCompletionResult(...)` now accepts ready queue depth and returns a recovery decision.
- Recovery is measured from late/drop streaks, scheduled lead, and ready queue pressure, then capped by `VideoPlayoutPolicy::lateOrDropCatchUpFrames`.
- `VideoBackend` passes the current ready queue depth into the video device completion-accounting call.
- `VideoPlayoutSchedulerTests` cover measured late recovery, measured drop recovery, policy caps, completed-index tracking, and streak clearing.
### Step 7. Route Backend Health Structurally
Publish backend lifecycle, queue depth, underrun, late/drop, and degraded-state observations through `HealthTelemetry`.
## Testing Strategy
Recommended tests:
- allowed lifecycle transitions pass
- invalid lifecycle transitions fail
- playout policy derives frame pool/preroll sizes consistently
- output queue preserves ordering
- bounded output queue rejects/drops according to policy
- underrun reuses last frame or black according to policy
- late/drop accounting updates degraded state
- scheduler catch-up uses measured lag, not fixed skip
- stop drains/recycles device-frame ownership in pure fakes
Useful homes:
- `VideoPlayoutSchedulerTests` for scheduler evolution
- `VideoIODeviceFakeTests` for fake backend lifecycle
- a new `VideoBackendStateMachineTests`
- a new `RenderOutputQueueTests`
## Risks
### Latency Risk
More headroom means more latency. Phase 7 should make latency a visible policy choice.
### Buffer Lifetime Risk
Render and backend will share ownership boundaries around output buffers. Frame ownership must be explicit to avoid reuse while hardware still owns a frame.
### Underrun Policy Risk
Reusing stale frames can be visually acceptable, but wrong key/fill behavior may be worse than black. Test with real output.
### Callback Thread Risk
Even after decoupling render, callback work must stay small and bounded.
### Scope Risk
Backend lifecycle and playout queue are related, but either can grow large. Implement in small, testable slices.
## Phase 7 Exit Criteria
Phase 7 can be considered complete once the project can say:
- [x] backend lifecycle states and transitions are explicit
- [x] playout policy owns preroll, pool size, headroom, and underrun behavior
- [x] output callbacks no longer synchronously wait for render production
- [x] render produces completed output frames into a bounded queue
- [x] underrun behavior is explicit and observable
- [x] late/drop recovery is measured rather than fixed skip-only
- [ ] backend health reports lifecycle, queue, underrun, late, and dropped state
- [ ] queue/lifecycle/scheduler behavior has non-DeckLink tests
## Open Questions
- What should the default ready-frame depth be at 30fps and 60fps?
- Should underrun reuse last completed, last scheduled, or black?
- Should output queue depth be user-configurable?
- Should render cadence be driven by backend demand, a timer, or queue-fill pressure?
- How should external keying influence stale-frame/black fallback?
- Should input and output lifecycle states be separate endpoints under one backend shell?
## Short Version
Phase 7 should stop making DeckLink callbacks wait for render.
Render produces ahead into a bounded queue. The backend consumes ready frames according to explicit lifecycle and playout policy. Queue depth, underruns, late frames, dropped frames, and degraded states become measured and visible.