13 KiB
Phase 7 Design: Backend Lifecycle And Playout
This document expands Phase 7 of ARCHITECTURE_RESILIENCE_REVIEW.md into a concrete design target.
Phase 4 made the render thread the sole owner of normal runtime GL work, but output timing is still callback-coupled: DeckLink completion callbacks synchronously request render-thread output production before scheduling the next hardware frame. Phase 7 should make backend lifecycle, buffer policy, playout headroom, and recovery explicit.
Phase 5 clarified that live parameter layering stops at final render-state composition. Phase 7 should keep backend lifecycle, output queue ownership, buffer reuse, temporal/feedback resources, and stale-frame/underrun policy outside the persisted/committed/transient parameter model.
Status
- Phase 7 design package: proposed.
- Phase 7 implementation: Step 3 complete.
- Current alignment:
VideoBackend,VideoIODevice,DeckLinkSession,VideoBackendLifecycle, andVideoPlayoutSchedulerexist. Phase 4 removed callback-thread GL ownership, but the DeckLink completion path still waits for render-thread output production.
Current backend footholds:
VideoBackendwraps device discovery/configuration, start/stop, input callback handling, output completion handling, and telemetry publication.DeckLinkSessionowns DeckLink device handles, frame pool creation, preroll, keyer configuration, and scheduled playback.VideoPlayoutPolicynames current frame pool, preroll, ready-frame, underrun, and catch-up policy defaults.RenderOutputQueuenames the future bounded ready-output-frame handoff and has pure queue tests.VideoPlayoutSchedulerowns basic schedule time generation and simple late/drop skip-ahead behavior.OpenGLVideoIOBridgeis the current adapter betweenVideoBackendandRenderEngine.HealthTelemetryreceives some signal, render, and pacing stats.
Why Phase 7 Exists
The current output path works only while render/readback stays comfortably inside budget. A late render can make the callback late, which reduces device-side headroom, which makes the next callback more fragile.
The resilience review calls this the main remaining live-resilience risk after Phase 4:
- output playout is still effectively render-on-demand from the DeckLink completion callback
- buffer pool size and preroll depth are not sourced from one policy
- late/dropped recovery is a fixed skip rule
- backend lifecycle is imperative rather than represented as explicit states
Phase 7 should separate hardware timing from render production.
Goals
Phase 7 should establish:
- explicit backend lifecycle states and allowed transitions
- one playout policy for frame pool size, preroll, headroom, and underrun behavior
- a bounded producer/consumer output queue between render and DeckLink scheduling
- lightweight DeckLink callbacks that dequeue/schedule/account rather than render
- measured recovery from late/dropped frames
- structured backend health reporting
- tests for scheduler, queue, lifecycle, and underrun policy without DeckLink hardware
Non-Goals
Phase 7 should not require:
- a new renderer
- changing shader/state composition
- changing committed-live or transient automation layering
- replacing DeckLink support with multiple backends
- full telemetry UI redesign
- removing every synchronous API immediately
- perfect adaptive latency policy in the first pass
Target Timing Model
The target model is producer/consumer playout:
RenderEngine/render scheduler produces completed output frames
-> bounded ready-frame queue
-> VideoBackend consumes ready frames
-> DeckLink callback schedules already-prepared frames
The callback should not wait for rendering. It should:
- record completion result
- recycle/release completed buffers
- dequeue a ready frame or apply underrun policy
- schedule the next frame
- publish backend timing/health observations
The queue contains rendered output-frame ownership and scheduling metadata, not live parameter state. Parameter composition should already be resolved before an output frame enters this playout boundary.
Target Lifecycle Model
Suggested backend states:
UninitializedDiscoveringDiscoveredConfiguringConfiguredPrerollingRunningDegradedStoppingStoppedFailed
Suggested transition rules:
Uninitialized -> DiscoveringDiscovering -> Discovered | FailedDiscovered -> Configuring | StoppedConfiguring -> Configured | FailedConfigured -> Prerolling | StoppedPrerolling -> Running | Failed | StoppingRunning -> Degraded | Stopping | FailedDegraded -> Running | Stopping | FailedStopping -> Stopped
The exact enum can change, but the lifecycle should become observable and testable.
Proposed Collaborators
VideoBackendStateMachine
Pure or mostly pure lifecycle transition helper.
Responsibilities:
- validate state transitions
- produce transition observations
- track failure reasons
- keep start/stop/recovery behavior auditable
Non-responsibilities:
- DeckLink API calls
- rendering
- persistence
PlayoutPolicy
Policy object for queue and timing behavior.
Expected fields:
- target preroll frames
- maximum ready frames
- minimum spare device buffers
- underrun behavior
- maximum catch-up frames
- adaptive headroom enabled/disabled
RenderOutputQueue
Bounded queue or ring for completed output frames.
Responsibilities:
- accept completed render outputs
- expose ready frames for scheduling
- track depth, drops, stale reuse, and underruns
- keep ownership/lifetime clear between render and backend
OutputFramePool
Backend-owned device buffer pool.
Responsibilities:
- own DeckLink mutable frames
- expose available buffers for render/readback or scheduling
- recycle completed frames
- report spare-buffer depth
PlayoutController
Coordinates policy, ready frames, device schedule times, and completion accounting.
Responsibilities:
- preroll frames
- schedule next frame
- handle late/drop/completed/flushed results
- apply underrun policy
- publish timing state
Output Queue Policy
The initial output queue should be small and bounded.
Candidate defaults:
- target ready frames: 2-3
- max ready frames: 3-5
- underrun: reuse last completed frame if available, otherwise black
- late/drop: increase degraded counters and optionally increase headroom within limits
The exact numbers should be measured, but the policy should live in one place instead of being split across constants.
Underrun Policy
When no fresh rendered frame is available, options are:
- reuse newest completed frame
- reuse last scheduled frame
- schedule black/degraded frame
- skip/catch up schedule time
Phase 7 should pick one default and make it visible in telemetry. Reusing the newest completed frame is often the best first policy for live visual continuity, but key/fill behavior may require careful testing.
Migration Plan
Step 1. Name Lifecycle States
Introduce backend state enum and transition reporting without changing scheduling behavior much.
Initial target:
- state changes are explicit
- invalid transitions are detectable
- tests cover allowed transitions
Current implementation:
VideoBackendLifecyclenames backend states and validates allowed transitions.VideoBackendapplies lifecycle transitions around discovery, configuration, start, stop, degradation, failure, and resource release.- Existing
BackendStateChangedEventpublication now uses lifecycle state names for backend lifecycle observations. VideoBackendLifecycleTestscover allowed transitions, rejected invalid transitions, failure reasons, retry, and stable state names.
Step 2. Create Playout Policy Object
Unify fixed constants and scheduler assumptions.
Initial target:
- frame pool size derives from policy
- preroll count derives from policy
- late/drop recovery reads policy
Current implementation:
VideoPlayoutPolicydefines current output frame pool, preroll, ready-frame, spare-buffer, underrun, catch-up, and adaptive-headroom settings.DeckLinkSessionuses the policy for output frame pool creation and preroll count.VideoPlayoutSchedulerstores the policy and useslateOrDropCatchUpFramesinstead of a hard-coded+2recovery step.VideoPlayoutSchedulerTestscover default compatibility behavior, policy-driven catch-up, and policy normalization.
Step 3. Add Ready Output Queue
Introduce a bounded queue for completed output frames.
Initial target:
- pure queue tests
- explicit depth/underrun metrics
- no DeckLink dependency in queue tests
Current implementation:
RenderOutputQueueowns a bounded FIFO ofRenderOutputFramevalues.- The queue is configured from
VideoPlayoutPolicy::maxReadyFrames. - Queue metrics report depth, capacity, pushed, popped, dropped, and underrun counts.
- Overflow drops the oldest ready frame, preserving the newest completed output for scheduling.
RenderOutputQueueTestscover ordering, bounded overflow, underrun counting, and capacity shrink behavior without DeckLink hardware.
Step 4. Move Callback Toward Dequeue/Schedule
Stop producing frames directly in the completion callback path.
Transitional target:
- callback wakes/schedules a backend worker
- worker consumes ready frames
Final target:
- callback only records, recycles, dequeues, schedules
Step 5. Make Render Produce Ahead
Teach render/output code to keep the ready queue filled to target headroom.
Initial target:
- render thread produces on demand until queue has target depth
- callback does not synchronously wait for fresh render
- stale/black fallback is explicit on underrun
Step 6. Replace Fixed Late/Drop Recovery
Replace fixed +2 schedule-index recovery with measured lag/headroom accounting.
Initial target:
- track scheduled index, completed index, queue depth, late streak, drop streak
- recovery decisions use measured lag
Step 7. Route Backend Health Structurally
Publish backend lifecycle, queue depth, underrun, late/drop, and degraded-state observations through HealthTelemetry.
Testing Strategy
Recommended tests:
- allowed lifecycle transitions pass
- invalid lifecycle transitions fail
- playout policy derives frame pool/preroll sizes consistently
- output queue preserves ordering
- bounded output queue rejects/drops according to policy
- underrun reuses last frame or black according to policy
- late/drop accounting updates degraded state
- scheduler catch-up uses measured lag, not fixed skip
- stop drains/recycles device-frame ownership in pure fakes
Useful homes:
VideoPlayoutSchedulerTestsfor scheduler evolutionVideoIODeviceFakeTestsfor fake backend lifecycle- a new
VideoBackendStateMachineTests - a new
RenderOutputQueueTests
Risks
Latency Risk
More headroom means more latency. Phase 7 should make latency a visible policy choice.
Buffer Lifetime Risk
Render and backend will share ownership boundaries around output buffers. Frame ownership must be explicit to avoid reuse while hardware still owns a frame.
Underrun Policy Risk
Reusing stale frames can be visually acceptable, but wrong key/fill behavior may be worse than black. Test with real output.
Callback Thread Risk
Even after decoupling render, callback work must stay small and bounded.
Scope Risk
Backend lifecycle and playout queue are related, but either can grow large. Implement in small, testable slices.
Phase 7 Exit Criteria
Phase 7 can be considered complete once the project can say:
- backend lifecycle states and transitions are explicit
- playout policy owns preroll, pool size, headroom, and underrun behavior
- output callbacks no longer synchronously wait for render production
- render produces completed output frames into a bounded queue
- underrun behavior is explicit and observable
- late/drop recovery is measured rather than fixed skip-only
- backend health reports lifecycle, queue, underrun, late, and dropped state
- queue/lifecycle/scheduler behavior has non-DeckLink tests
Open Questions
- What should the default ready-frame depth be at 30fps and 60fps?
- Should underrun reuse last completed, last scheduled, or black?
- Should output queue depth be user-configurable?
- Should render cadence be driven by backend demand, a timer, or queue-fill pressure?
- How should external keying influence stale-frame/black fallback?
- Should input and output lifecycle states be separate endpoints under one backend shell?
Short Version
Phase 7 should stop making DeckLink callbacks wait for render.
Render produces ahead into a bounded queue. The backend consumes ready frames according to explicit lifecycle and playout policy. Queue depth, underruns, late frames, dropped frames, and degraded states become measured and visible.