339 lines
11 KiB
Markdown
339 lines
11 KiB
Markdown
# Phase 7 Design: Backend Lifecycle And Playout
|
|
|
|
This document expands Phase 7 of [ARCHITECTURE_RESILIENCE_REVIEW.md](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/docs/ARCHITECTURE_RESILIENCE_REVIEW.md) into a concrete design target.
|
|
|
|
Phase 4 made the render thread the sole owner of normal runtime GL work, but output timing is still callback-coupled: DeckLink completion callbacks synchronously request render-thread output production before scheduling the next hardware frame. Phase 7 should make backend lifecycle, buffer policy, playout headroom, and recovery explicit.
|
|
|
|
Phase 5 clarified that live parameter layering stops at final render-state composition. Phase 7 should keep backend lifecycle, output queue ownership, buffer reuse, temporal/feedback resources, and stale-frame/underrun policy outside the persisted/committed/transient parameter model.
|
|
|
|
## Status
|
|
|
|
- Phase 7 design package: proposed.
|
|
- Phase 7 implementation: not started.
|
|
- Current alignment: `VideoBackend`, `VideoIODevice`, `DeckLinkSession`, and `VideoPlayoutScheduler` exist. Phase 4 removed callback-thread GL ownership, but the DeckLink completion path still waits for render-thread output production.
|
|
|
|
Current backend footholds:
|
|
|
|
- `VideoBackend` wraps device discovery/configuration, start/stop, input callback handling, output completion handling, and telemetry publication.
|
|
- `DeckLinkSession` owns DeckLink device handles, frame pool creation, preroll, keyer configuration, and scheduled playback.
|
|
- `VideoPlayoutScheduler` owns basic schedule time generation and simple late/drop skip-ahead behavior.
|
|
- `OpenGLVideoIOBridge` is the current adapter between `VideoBackend` and `RenderEngine`.
|
|
- `HealthTelemetry` receives some signal, render, and pacing stats.
|
|
|
|
## Why Phase 7 Exists
|
|
|
|
The current output path works only while render/readback stays comfortably inside budget. A late render can make the callback late, which reduces device-side headroom, which makes the next callback more fragile.
|
|
|
|
The resilience review calls this the main remaining live-resilience risk after Phase 4:
|
|
|
|
- output playout is still effectively render-on-demand from the DeckLink completion callback
|
|
- buffer pool size and preroll depth are not sourced from one policy
|
|
- late/dropped recovery is a fixed skip rule
|
|
- backend lifecycle is imperative rather than represented as explicit states
|
|
|
|
Phase 7 should separate hardware timing from render production.
|
|
|
|
## Goals
|
|
|
|
Phase 7 should establish:
|
|
|
|
- explicit backend lifecycle states and allowed transitions
|
|
- one playout policy for frame pool size, preroll, headroom, and underrun behavior
|
|
- a bounded producer/consumer output queue between render and DeckLink scheduling
|
|
- lightweight DeckLink callbacks that dequeue/schedule/account rather than render
|
|
- measured recovery from late/dropped frames
|
|
- structured backend health reporting
|
|
- tests for scheduler, queue, lifecycle, and underrun policy without DeckLink hardware
|
|
|
|
## Non-Goals
|
|
|
|
Phase 7 should not require:
|
|
|
|
- a new renderer
|
|
- changing shader/state composition
|
|
- changing committed-live or transient automation layering
|
|
- replacing DeckLink support with multiple backends
|
|
- full telemetry UI redesign
|
|
- removing every synchronous API immediately
|
|
- perfect adaptive latency policy in the first pass
|
|
|
|
## Target Timing Model
|
|
|
|
The target model is producer/consumer playout:
|
|
|
|
```text
|
|
RenderEngine/render scheduler produces completed output frames
|
|
-> bounded ready-frame queue
|
|
-> VideoBackend consumes ready frames
|
|
-> DeckLink callback schedules already-prepared frames
|
|
```
|
|
|
|
The callback should not wait for rendering. It should:
|
|
|
|
- record completion result
|
|
- recycle/release completed buffers
|
|
- dequeue a ready frame or apply underrun policy
|
|
- schedule the next frame
|
|
- publish backend timing/health observations
|
|
|
|
The queue contains rendered output-frame ownership and scheduling metadata, not live parameter state. Parameter composition should already be resolved before an output frame enters this playout boundary.
|
|
|
|
## Target Lifecycle Model
|
|
|
|
Suggested backend states:
|
|
|
|
1. `Uninitialized`
|
|
2. `Discovering`
|
|
3. `Discovered`
|
|
4. `Configuring`
|
|
5. `Configured`
|
|
6. `Prerolling`
|
|
7. `Running`
|
|
8. `Degraded`
|
|
9. `Stopping`
|
|
10. `Stopped`
|
|
11. `Failed`
|
|
|
|
Suggested transition rules:
|
|
|
|
- `Uninitialized -> Discovering`
|
|
- `Discovering -> Discovered | Failed`
|
|
- `Discovered -> Configuring | Stopped`
|
|
- `Configuring -> Configured | Failed`
|
|
- `Configured -> Prerolling | Stopped`
|
|
- `Prerolling -> Running | Failed | Stopping`
|
|
- `Running -> Degraded | Stopping | Failed`
|
|
- `Degraded -> Running | Stopping | Failed`
|
|
- `Stopping -> Stopped`
|
|
|
|
The exact enum can change, but the lifecycle should become observable and testable.
|
|
|
|
## Proposed Collaborators
|
|
|
|
### `VideoBackendStateMachine`
|
|
|
|
Pure or mostly pure lifecycle transition helper.
|
|
|
|
Responsibilities:
|
|
|
|
- validate state transitions
|
|
- produce transition observations
|
|
- track failure reasons
|
|
- keep start/stop/recovery behavior auditable
|
|
|
|
Non-responsibilities:
|
|
|
|
- DeckLink API calls
|
|
- rendering
|
|
- persistence
|
|
|
|
### `PlayoutPolicy`
|
|
|
|
Policy object for queue and timing behavior.
|
|
|
|
Expected fields:
|
|
|
|
- target preroll frames
|
|
- maximum ready frames
|
|
- minimum spare device buffers
|
|
- underrun behavior
|
|
- maximum catch-up frames
|
|
- adaptive headroom enabled/disabled
|
|
|
|
### `RenderOutputQueue`
|
|
|
|
Bounded queue or ring for completed output frames.
|
|
|
|
Responsibilities:
|
|
|
|
- accept completed render outputs
|
|
- expose ready frames for scheduling
|
|
- track depth, drops, stale reuse, and underruns
|
|
- keep ownership/lifetime clear between render and backend
|
|
|
|
### `OutputFramePool`
|
|
|
|
Backend-owned device buffer pool.
|
|
|
|
Responsibilities:
|
|
|
|
- own DeckLink mutable frames
|
|
- expose available buffers for render/readback or scheduling
|
|
- recycle completed frames
|
|
- report spare-buffer depth
|
|
|
|
### `PlayoutController`
|
|
|
|
Coordinates policy, ready frames, device schedule times, and completion accounting.
|
|
|
|
Responsibilities:
|
|
|
|
- preroll frames
|
|
- schedule next frame
|
|
- handle late/drop/completed/flushed results
|
|
- apply underrun policy
|
|
- publish timing state
|
|
|
|
## Output Queue Policy
|
|
|
|
The initial output queue should be small and bounded.
|
|
|
|
Candidate defaults:
|
|
|
|
- target ready frames: 2-3
|
|
- max ready frames: 3-5
|
|
- underrun: reuse last completed frame if available, otherwise black
|
|
- late/drop: increase degraded counters and optionally increase headroom within limits
|
|
|
|
The exact numbers should be measured, but the policy should live in one place instead of being split across constants.
|
|
|
|
## Underrun Policy
|
|
|
|
When no fresh rendered frame is available, options are:
|
|
|
|
1. reuse newest completed frame
|
|
2. reuse last scheduled frame
|
|
3. schedule black/degraded frame
|
|
4. skip/catch up schedule time
|
|
|
|
Phase 7 should pick one default and make it visible in telemetry. Reusing the newest completed frame is often the best first policy for live visual continuity, but key/fill behavior may require careful testing.
|
|
|
|
## Migration Plan
|
|
|
|
### Step 1. Name Lifecycle States
|
|
|
|
Introduce backend state enum and transition reporting without changing scheduling behavior much.
|
|
|
|
Initial target:
|
|
|
|
- state changes are explicit
|
|
- invalid transitions are detectable
|
|
- tests cover allowed transitions
|
|
|
|
### Step 2. Create Playout Policy Object
|
|
|
|
Unify fixed constants and scheduler assumptions.
|
|
|
|
Initial target:
|
|
|
|
- frame pool size derives from policy
|
|
- preroll count derives from policy
|
|
- late/drop recovery reads policy
|
|
|
|
### Step 3. Add Ready Output Queue
|
|
|
|
Introduce a bounded queue for completed output frames.
|
|
|
|
Initial target:
|
|
|
|
- pure queue tests
|
|
- explicit depth/underrun metrics
|
|
- no DeckLink dependency in queue tests
|
|
|
|
### Step 4. Move Callback Toward Dequeue/Schedule
|
|
|
|
Stop producing frames directly in the completion callback path.
|
|
|
|
Transitional target:
|
|
|
|
- callback wakes/schedules a backend worker
|
|
- worker consumes ready frames
|
|
|
|
Final target:
|
|
|
|
- callback only records, recycles, dequeues, schedules
|
|
|
|
### Step 5. Make Render Produce Ahead
|
|
|
|
Teach render/output code to keep the ready queue filled to target headroom.
|
|
|
|
Initial target:
|
|
|
|
- render thread produces on demand until queue has target depth
|
|
- callback does not synchronously wait for fresh render
|
|
- stale/black fallback is explicit on underrun
|
|
|
|
### Step 6. Replace Fixed Late/Drop Recovery
|
|
|
|
Replace fixed `+2` schedule-index recovery with measured lag/headroom accounting.
|
|
|
|
Initial target:
|
|
|
|
- track scheduled index, completed index, queue depth, late streak, drop streak
|
|
- recovery decisions use measured lag
|
|
|
|
### Step 7. Route Backend Health Structurally
|
|
|
|
Publish backend lifecycle, queue depth, underrun, late/drop, and degraded-state observations through `HealthTelemetry`.
|
|
|
|
## Testing Strategy
|
|
|
|
Recommended tests:
|
|
|
|
- allowed lifecycle transitions pass
|
|
- invalid lifecycle transitions fail
|
|
- playout policy derives frame pool/preroll sizes consistently
|
|
- output queue preserves ordering
|
|
- bounded output queue rejects/drops according to policy
|
|
- underrun reuses last frame or black according to policy
|
|
- late/drop accounting updates degraded state
|
|
- scheduler catch-up uses measured lag, not fixed skip
|
|
- stop drains/recycles device-frame ownership in pure fakes
|
|
|
|
Useful homes:
|
|
|
|
- `VideoPlayoutSchedulerTests` for scheduler evolution
|
|
- `VideoIODeviceFakeTests` for fake backend lifecycle
|
|
- a new `VideoBackendStateMachineTests`
|
|
- a new `RenderOutputQueueTests`
|
|
|
|
## Risks
|
|
|
|
### Latency Risk
|
|
|
|
More headroom means more latency. Phase 7 should make latency a visible policy choice.
|
|
|
|
### Buffer Lifetime Risk
|
|
|
|
Render and backend will share ownership boundaries around output buffers. Frame ownership must be explicit to avoid reuse while hardware still owns a frame.
|
|
|
|
### Underrun Policy Risk
|
|
|
|
Reusing stale frames can be visually acceptable, but wrong key/fill behavior may be worse than black. Test with real output.
|
|
|
|
### Callback Thread Risk
|
|
|
|
Even after decoupling render, callback work must stay small and bounded.
|
|
|
|
### Scope Risk
|
|
|
|
Backend lifecycle and playout queue are related, but either can grow large. Implement in small, testable slices.
|
|
|
|
## Phase 7 Exit Criteria
|
|
|
|
Phase 7 can be considered complete once the project can say:
|
|
|
|
- [ ] backend lifecycle states and transitions are explicit
|
|
- [ ] playout policy owns preroll, pool size, headroom, and underrun behavior
|
|
- [ ] output callbacks no longer synchronously wait for render production
|
|
- [ ] render produces completed output frames into a bounded queue
|
|
- [ ] underrun behavior is explicit and observable
|
|
- [ ] late/drop recovery is measured rather than fixed skip-only
|
|
- [ ] backend health reports lifecycle, queue, underrun, late, and dropped state
|
|
- [ ] queue/lifecycle/scheduler behavior has non-DeckLink tests
|
|
|
|
## Open Questions
|
|
|
|
- What should the default ready-frame depth be at 30fps and 60fps?
|
|
- Should underrun reuse last completed, last scheduled, or black?
|
|
- Should output queue depth be user-configurable?
|
|
- Should render cadence be driven by backend demand, a timer, or queue-fill pressure?
|
|
- How should external keying influence stale-frame/black fallback?
|
|
- Should input and output lifecycle states be separate endpoints under one backend shell?
|
|
|
|
## Short Version
|
|
|
|
Phase 7 should stop making DeckLink callbacks wait for render.
|
|
|
|
Render produces ahead into a bounded queue. The backend consumes ready frames according to explicit lifecycle and playout policy. Queue depth, underruns, late frames, dropped frames, and degraded states become measured and visible.
|