Files
video-shader-toys/docs/PHASE_7_BACKEND_LIFECYCLE_PLAYOUT_DESIGN.md
Aiden d332dceb5b
Some checks failed
CI / React UI Build (push) Successful in 11s
CI / Native Windows Build And Tests (push) Successful in 2m43s
CI / Windows Release Package (push) Has been cancelled
Step 6
2026-05-11 19:25:29 +10:00

339 lines
11 KiB
Markdown

# Phase 7 Design: Backend Lifecycle And Playout
This document expands Phase 7 of [ARCHITECTURE_RESILIENCE_REVIEW.md](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/docs/ARCHITECTURE_RESILIENCE_REVIEW.md) into a concrete design target.
Phase 4 made the render thread the sole owner of normal runtime GL work, but output timing is still callback-coupled: DeckLink completion callbacks synchronously request render-thread output production before scheduling the next hardware frame. Phase 7 should make backend lifecycle, buffer policy, playout headroom, and recovery explicit.
Phase 5 clarified that live parameter layering stops at final render-state composition. Phase 7 should keep backend lifecycle, output queue ownership, buffer reuse, temporal/feedback resources, and stale-frame/underrun policy outside the persisted/committed/transient parameter model.
## Status
- Phase 7 design package: proposed.
- Phase 7 implementation: not started.
- Current alignment: `VideoBackend`, `VideoIODevice`, `DeckLinkSession`, and `VideoPlayoutScheduler` exist. Phase 4 removed callback-thread GL ownership, but the DeckLink completion path still waits for render-thread output production.
Current backend footholds:
- `VideoBackend` wraps device discovery/configuration, start/stop, input callback handling, output completion handling, and telemetry publication.
- `DeckLinkSession` owns DeckLink device handles, frame pool creation, preroll, keyer configuration, and scheduled playback.
- `VideoPlayoutScheduler` owns basic schedule time generation and simple late/drop skip-ahead behavior.
- `OpenGLVideoIOBridge` is the current adapter between `VideoBackend` and `RenderEngine`.
- `HealthTelemetry` receives some signal, render, and pacing stats.
## Why Phase 7 Exists
The current output path works only while render/readback stays comfortably inside budget. A late render can make the callback late, which reduces device-side headroom, which makes the next callback more fragile.
The resilience review calls this the main remaining live-resilience risk after Phase 4:
- output playout is still effectively render-on-demand from the DeckLink completion callback
- buffer pool size and preroll depth are not sourced from one policy
- late/dropped recovery is a fixed skip rule
- backend lifecycle is imperative rather than represented as explicit states
Phase 7 should separate hardware timing from render production.
## Goals
Phase 7 should establish:
- explicit backend lifecycle states and allowed transitions
- one playout policy for frame pool size, preroll, headroom, and underrun behavior
- a bounded producer/consumer output queue between render and DeckLink scheduling
- lightweight DeckLink callbacks that dequeue/schedule/account rather than render
- measured recovery from late/dropped frames
- structured backend health reporting
- tests for scheduler, queue, lifecycle, and underrun policy without DeckLink hardware
## Non-Goals
Phase 7 should not require:
- a new renderer
- changing shader/state composition
- changing committed-live or transient automation layering
- replacing DeckLink support with multiple backends
- full telemetry UI redesign
- removing every synchronous API immediately
- perfect adaptive latency policy in the first pass
## Target Timing Model
The target model is producer/consumer playout:
```text
RenderEngine/render scheduler produces completed output frames
-> bounded ready-frame queue
-> VideoBackend consumes ready frames
-> DeckLink callback schedules already-prepared frames
```
The callback should not wait for rendering. It should:
- record completion result
- recycle/release completed buffers
- dequeue a ready frame or apply underrun policy
- schedule the next frame
- publish backend timing/health observations
The queue contains rendered output-frame ownership and scheduling metadata, not live parameter state. Parameter composition should already be resolved before an output frame enters this playout boundary.
## Target Lifecycle Model
Suggested backend states:
1. `Uninitialized`
2. `Discovering`
3. `Discovered`
4. `Configuring`
5. `Configured`
6. `Prerolling`
7. `Running`
8. `Degraded`
9. `Stopping`
10. `Stopped`
11. `Failed`
Suggested transition rules:
- `Uninitialized -> Discovering`
- `Discovering -> Discovered | Failed`
- `Discovered -> Configuring | Stopped`
- `Configuring -> Configured | Failed`
- `Configured -> Prerolling | Stopped`
- `Prerolling -> Running | Failed | Stopping`
- `Running -> Degraded | Stopping | Failed`
- `Degraded -> Running | Stopping | Failed`
- `Stopping -> Stopped`
The exact enum can change, but the lifecycle should become observable and testable.
## Proposed Collaborators
### `VideoBackendStateMachine`
Pure or mostly pure lifecycle transition helper.
Responsibilities:
- validate state transitions
- produce transition observations
- track failure reasons
- keep start/stop/recovery behavior auditable
Non-responsibilities:
- DeckLink API calls
- rendering
- persistence
### `PlayoutPolicy`
Policy object for queue and timing behavior.
Expected fields:
- target preroll frames
- maximum ready frames
- minimum spare device buffers
- underrun behavior
- maximum catch-up frames
- adaptive headroom enabled/disabled
### `RenderOutputQueue`
Bounded queue or ring for completed output frames.
Responsibilities:
- accept completed render outputs
- expose ready frames for scheduling
- track depth, drops, stale reuse, and underruns
- keep ownership/lifetime clear between render and backend
### `OutputFramePool`
Backend-owned device buffer pool.
Responsibilities:
- own DeckLink mutable frames
- expose available buffers for render/readback or scheduling
- recycle completed frames
- report spare-buffer depth
### `PlayoutController`
Coordinates policy, ready frames, device schedule times, and completion accounting.
Responsibilities:
- preroll frames
- schedule next frame
- handle late/drop/completed/flushed results
- apply underrun policy
- publish timing state
## Output Queue Policy
The initial output queue should be small and bounded.
Candidate defaults:
- target ready frames: 2-3
- max ready frames: 3-5
- underrun: reuse last completed frame if available, otherwise black
- late/drop: increase degraded counters and optionally increase headroom within limits
The exact numbers should be measured, but the policy should live in one place instead of being split across constants.
## Underrun Policy
When no fresh rendered frame is available, options are:
1. reuse newest completed frame
2. reuse last scheduled frame
3. schedule black/degraded frame
4. skip/catch up schedule time
Phase 7 should pick one default and make it visible in telemetry. Reusing the newest completed frame is often the best first policy for live visual continuity, but key/fill behavior may require careful testing.
## Migration Plan
### Step 1. Name Lifecycle States
Introduce backend state enum and transition reporting without changing scheduling behavior much.
Initial target:
- state changes are explicit
- invalid transitions are detectable
- tests cover allowed transitions
### Step 2. Create Playout Policy Object
Unify fixed constants and scheduler assumptions.
Initial target:
- frame pool size derives from policy
- preroll count derives from policy
- late/drop recovery reads policy
### Step 3. Add Ready Output Queue
Introduce a bounded queue for completed output frames.
Initial target:
- pure queue tests
- explicit depth/underrun metrics
- no DeckLink dependency in queue tests
### Step 4. Move Callback Toward Dequeue/Schedule
Stop producing frames directly in the completion callback path.
Transitional target:
- callback wakes/schedules a backend worker
- worker consumes ready frames
Final target:
- callback only records, recycles, dequeues, schedules
### Step 5. Make Render Produce Ahead
Teach render/output code to keep the ready queue filled to target headroom.
Initial target:
- render thread produces on demand until queue has target depth
- callback does not synchronously wait for fresh render
- stale/black fallback is explicit on underrun
### Step 6. Replace Fixed Late/Drop Recovery
Replace fixed `+2` schedule-index recovery with measured lag/headroom accounting.
Initial target:
- track scheduled index, completed index, queue depth, late streak, drop streak
- recovery decisions use measured lag
### Step 7. Route Backend Health Structurally
Publish backend lifecycle, queue depth, underrun, late/drop, and degraded-state observations through `HealthTelemetry`.
## Testing Strategy
Recommended tests:
- allowed lifecycle transitions pass
- invalid lifecycle transitions fail
- playout policy derives frame pool/preroll sizes consistently
- output queue preserves ordering
- bounded output queue rejects/drops according to policy
- underrun reuses last frame or black according to policy
- late/drop accounting updates degraded state
- scheduler catch-up uses measured lag, not fixed skip
- stop drains/recycles device-frame ownership in pure fakes
Useful homes:
- `VideoPlayoutSchedulerTests` for scheduler evolution
- `VideoIODeviceFakeTests` for fake backend lifecycle
- a new `VideoBackendStateMachineTests`
- a new `RenderOutputQueueTests`
## Risks
### Latency Risk
More headroom means more latency. Phase 7 should make latency a visible policy choice.
### Buffer Lifetime Risk
Render and backend will share ownership boundaries around output buffers. Frame ownership must be explicit to avoid reuse while hardware still owns a frame.
### Underrun Policy Risk
Reusing stale frames can be visually acceptable, but wrong key/fill behavior may be worse than black. Test with real output.
### Callback Thread Risk
Even after decoupling render, callback work must stay small and bounded.
### Scope Risk
Backend lifecycle and playout queue are related, but either can grow large. Implement in small, testable slices.
## Phase 7 Exit Criteria
Phase 7 can be considered complete once the project can say:
- [ ] backend lifecycle states and transitions are explicit
- [ ] playout policy owns preroll, pool size, headroom, and underrun behavior
- [ ] output callbacks no longer synchronously wait for render production
- [ ] render produces completed output frames into a bounded queue
- [ ] underrun behavior is explicit and observable
- [ ] late/drop recovery is measured rather than fixed skip-only
- [ ] backend health reports lifecycle, queue, underrun, late, and dropped state
- [ ] queue/lifecycle/scheduler behavior has non-DeckLink tests
## Open Questions
- What should the default ready-frame depth be at 30fps and 60fps?
- Should underrun reuse last completed, last scheduled, or black?
- Should output queue depth be user-configurable?
- Should render cadence be driven by backend demand, a timer, or queue-fill pressure?
- How should external keying influence stale-frame/black fallback?
- Should input and output lifecycle states be separate endpoints under one backend shell?
## Short Version
Phase 7 should stop making DeckLink callbacks wait for render.
Render produces ahead into a bounded queue. The backend consumes ready frames according to explicit lifecycle and playout policy. Queue depth, underruns, late frames, dropped frames, and degraded states become measured and visible.