# Phase 7 Design: Backend Lifecycle And Playout

This document expands Phase 7 of [ARCHITECTURE_RESILIENCE_REVIEW.md](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/docs/ARCHITECTURE_RESILIENCE_REVIEW.md) into a concrete design target.

Phase 4 made the render thread the sole owner of normal runtime GL work. Phase 7 Step 4 moved DeckLink completion processing onto a backend worker, so the callback no longer directly waits for render-thread output production. Phase 7 Step 5 added a bounded ready-frame queue inside that worker, so scheduling now consumes completed output frames and falls back explicitly on underrun. Phase 7 should make backend lifecycle, buffer policy, playout headroom, and recovery explicit.

Phase 5 clarified that live parameter layering stops at final render-state composition. Phase 7 should keep backend lifecycle, output queue ownership, buffer reuse, temporal/feedback resources, and stale-frame/underrun policy outside the persisted/committed/transient parameter model.

## Status

- Phase 7 design package: proposed.
- Phase 7 implementation: Step 6 complete.
- Current alignment: `VideoBackend`, `VideoIODevice`, `DeckLinkSession`, `VideoBackendLifecycle`, and `VideoPlayoutScheduler` exist. Phase 4 removed callback-thread GL ownership, Step 4 moved completion processing onto a backend worker, Step 5 uses `RenderOutputQueue` as the ready-frame handoff inside that worker, and Step 6 replaces fixed late/drop skip-ahead with measured recovery decisions.

Current backend footholds:

- `VideoBackend` wraps device discovery/configuration, start/stop, input callback handling, output completion handling, and telemetry publication.
- `DeckLinkSession` owns DeckLink device handles, frame pool creation, preroll, keyer configuration, and scheduled playback.
- `VideoPlayoutPolicy` names current frame pool, preroll, ready-frame, underrun, and catch-up policy defaults.
- `RenderOutputQueue` names the future bounded ready-output-frame handoff and has pure queue tests.
- `VideoPlayoutScheduler` owns schedule time generation, completion indexing, late/drop streaks, ready-queue pressure input, and measured recovery decisions.
- `OpenGLVideoIOBridge` is the current adapter between `VideoBackend` and `RenderEngine`.
- `HealthTelemetry` receives some signal, render, and pacing stats.

## Why Phase 7 Exists

The current output path works only while render/readback stays comfortably inside budget. A late render can make the callback late, which reduces device-side headroom, which makes the next callback more fragile.

The resilience review calls this the main remaining live-resilience risk after Phase 4:

- output playout is still effectively filled on demand by a backend completion worker, but scheduling now consumes a bounded ready-frame queue
- buffer pool size and preroll depth are not sourced from one policy
- late/dropped recovery is a fixed skip rule
- backend lifecycle is imperative rather than represented as explicit states

Phase 7 should separate hardware timing from render production.

## Goals

Phase 7 should establish:

- explicit backend lifecycle states and allowed transitions
- one playout policy for frame pool size, preroll, headroom, and underrun behavior
- a bounded producer/consumer output queue between render and DeckLink scheduling
- lightweight DeckLink callbacks that dequeue/schedule/account rather than render
- measured recovery from late/dropped frames
- structured backend health reporting
- tests for scheduler, queue, lifecycle, and underrun policy without DeckLink hardware

## Non-Goals

Phase 7 should not require:

- a new renderer
- changing shader/state composition
- changing committed-live or transient automation layering
- replacing DeckLink support with multiple backends
- full telemetry UI redesign
- removing every synchronous API immediately
- perfect adaptive latency policy in the first pass

## Target Timing Model

The target model is producer/consumer playout:

```text
RenderEngine/render scheduler produces completed output frames
  -> bounded ready-frame queue
  -> VideoBackend consumes ready frames
  -> DeckLink callback schedules already-prepared frames
```

The callback should not wait for rendering. It should:

- record completion result
- recycle/release completed buffers
- dequeue a ready frame or apply underrun policy
- schedule the next frame
- publish backend timing/health observations

The queue contains rendered output-frame ownership and scheduling metadata, not live parameter state. Parameter composition should already be resolved before an output frame enters this playout boundary.

## Target Lifecycle Model

Suggested backend states:

1. `Uninitialized`
2. `Discovering`
3. `Discovered`
4. `Configuring`
5. `Configured`
6. `Prerolling`
7. `Running`
8. `Degraded`
9. `Stopping`
10. `Stopped`
11. `Failed`

Suggested transition rules:

- `Uninitialized -> Discovering`
- `Discovering -> Discovered | Failed`
- `Discovered -> Configuring | Stopped`
- `Configuring -> Configured | Failed`
- `Configured -> Prerolling | Stopped`
- `Prerolling -> Running | Failed | Stopping`
- `Running -> Degraded | Stopping | Failed`
- `Degraded -> Running | Stopping | Failed`
- `Stopping -> Stopped`

The exact enum can change, but the lifecycle should become observable and testable.

## Proposed Collaborators

### `VideoBackendStateMachine`

Pure or mostly pure lifecycle transition helper.

Responsibilities:

- validate state transitions
- produce transition observations
- track failure reasons
- keep start/stop/recovery behavior auditable

Non-responsibilities:

- DeckLink API calls
- rendering
- persistence

### `PlayoutPolicy`

Policy object for queue and timing behavior.

Expected fields:

- target preroll frames
- maximum ready frames
- minimum spare device buffers
- underrun behavior
- maximum catch-up frames
- adaptive headroom enabled/disabled

### `RenderOutputQueue`

Bounded queue or ring for completed output frames.

Responsibilities:

- accept completed render outputs
- expose ready frames for scheduling
- track depth, drops, stale reuse, and underruns
- keep ownership/lifetime clear between render and backend

### `OutputFramePool`

Backend-owned device buffer pool.

Responsibilities:

- own DeckLink mutable frames
- expose available buffers for render/readback or scheduling
- recycle completed frames
- report spare-buffer depth

### `PlayoutController`

Coordinates policy, ready frames, device schedule times, and completion accounting.

Responsibilities:

- preroll frames
- schedule next frame
- handle late/drop/completed/flushed results
- apply underrun policy
- publish timing state

## Output Queue Policy

The initial output queue should be small and bounded.

Candidate defaults:

- target ready frames: 2-3
- max ready frames: 3-5
- underrun: reuse last completed frame if available, otherwise black
- late/drop: increase degraded counters and optionally increase headroom within limits

The exact numbers should be measured, but the policy should live in one place instead of being split across constants.

## Underrun Policy

When no fresh rendered frame is available, options are:

1. reuse newest completed frame
2. reuse last scheduled frame
3. schedule black/degraded frame
4. skip/catch up schedule time

Phase 7 should pick one default and make it visible in telemetry. Reusing the newest completed frame is often the best first policy for live visual continuity, but key/fill behavior may require careful testing.

## Migration Plan

### Step 1. Name Lifecycle States

Introduce backend state enum and transition reporting without changing scheduling behavior much.

Initial target:

- [x] state changes are explicit
- [x] invalid transitions are detectable
- [x] tests cover allowed transitions

Current implementation:

- `VideoBackendLifecycle` names backend states and validates allowed transitions.
- `VideoBackend` applies lifecycle transitions around discovery, configuration, start, stop, degradation, failure, and resource release.
- Existing `BackendStateChangedEvent` publication now uses lifecycle state names for backend lifecycle observations.
- `VideoBackendLifecycleTests` cover allowed transitions, rejected invalid transitions, failure reasons, retry, and stable state names.

### Step 2. Create Playout Policy Object

Unify fixed constants and scheduler assumptions.

Initial target:

- [x] frame pool size derives from policy
- [x] preroll count derives from policy
- [x] late/drop recovery reads policy

Current implementation:

- `VideoPlayoutPolicy` defines current output frame pool, preroll, ready-frame, spare-buffer, underrun, catch-up, and adaptive-headroom settings.
- `DeckLinkSession` uses the policy for output frame pool creation and preroll count.
- `VideoPlayoutScheduler` stores the policy and uses `lateOrDropCatchUpFrames` instead of a hard-coded `+2` recovery step.
- `VideoPlayoutSchedulerTests` cover default compatibility behavior, policy-driven catch-up, and policy normalization.

### Step 3. Add Ready Output Queue

Introduce a bounded queue for completed output frames.

Initial target:

- [x] pure queue tests
- [x] explicit depth/underrun metrics
- [x] no DeckLink dependency in queue tests

Current implementation:

- `RenderOutputQueue` owns a bounded FIFO of `RenderOutputFrame` values.
- The queue is configured from `VideoPlayoutPolicy::maxReadyFrames`.
- Queue metrics report depth, capacity, pushed, popped, dropped, and underrun counts.
- Overflow drops the oldest ready frame, preserving the newest completed output for scheduling.
- `RenderOutputQueueTests` cover ordering, bounded overflow, underrun counting, and capacity shrink behavior without DeckLink hardware.

### Step 4. Move Callback Toward Dequeue/Schedule

Stop producing frames directly in the completion callback path.

Transitional target:

- [x] callback wakes/schedules a backend worker
- [x] worker consumes ready frames

Final target:

- callback only records, recycles, dequeues, schedules

Current implementation:

- `VideoBackend::HandleOutputFrameCompletion(...)` now enqueues completion work and wakes an output-completion worker.
- The output-completion worker drains pending completions and runs the existing render/schedule path.
- This preserves behavior while removing the direct callback-thread wait on render-thread output production.
- Step 5 now makes this worker consume ready frames from `RenderOutputQueue`; Step 4 remains the boundary that keeps output completion callbacks from doing render production directly.

### Step 5. Make Render Produce Ahead

Teach render/output code to keep the ready queue filled to target headroom.

Initial target:

- [x] render thread produces on demand until queue has target depth
- [x] callback does not synchronously wait for fresh render
- [x] stale/black fallback is explicit on underrun

Current implementation:

- The backend output-completion worker fills `RenderOutputQueue` to `VideoPlayoutPolicy::targetReadyFrames`.
- Scheduling now pops a ready frame from `RenderOutputQueue` instead of directly scheduling the freshly rendered frame.
- If no ready frame can be produced, the worker schedules an explicit black fallback frame and reports degraded lifecycle state.
- This is still demand-filled by the backend worker; a future pass can make render production more proactive or timer/pressure driven.

### Step 6. Replace Fixed Late/Drop Recovery

Replace fixed `+2` schedule-index recovery with measured lag/headroom accounting.

Initial target:

- [x] track scheduled index, completed index, queue depth, late streak, drop streak
- [x] recovery decisions use measured lag

Current implementation:

- `VideoPlayoutRecoveryDecision` reports completion result, completed index, scheduled index, ready queue depth, scheduled lead, measured lag, catch-up frames, late streak, and drop streak.
- `VideoPlayoutScheduler::AccountForCompletionResult(...)` now accepts ready queue depth and returns a recovery decision.
- Recovery is measured from late/drop streaks, scheduled lead, and ready queue pressure, then capped by `VideoPlayoutPolicy::lateOrDropCatchUpFrames`.
- `VideoBackend` passes the current ready queue depth into the video device completion-accounting call.
- `VideoPlayoutSchedulerTests` cover measured late recovery, measured drop recovery, policy caps, completed-index tracking, and streak clearing.

### Step 7. Route Backend Health Structurally

Publish backend lifecycle, queue depth, underrun, late/drop, and degraded-state observations through `HealthTelemetry`.

## Testing Strategy

Recommended tests:

- allowed lifecycle transitions pass
- invalid lifecycle transitions fail
- playout policy derives frame pool/preroll sizes consistently
- output queue preserves ordering
- bounded output queue rejects/drops according to policy
- underrun reuses last frame or black according to policy
- late/drop accounting updates degraded state
- scheduler catch-up uses measured lag, not fixed skip
- stop drains/recycles device-frame ownership in pure fakes

Useful homes:

- `VideoPlayoutSchedulerTests` for scheduler evolution
- `VideoIODeviceFakeTests` for fake backend lifecycle
- a new `VideoBackendStateMachineTests`
- a new `RenderOutputQueueTests`

## Risks

### Latency Risk

More headroom means more latency. Phase 7 should make latency a visible policy choice.

### Buffer Lifetime Risk

Render and backend will share ownership boundaries around output buffers. Frame ownership must be explicit to avoid reuse while hardware still owns a frame.

### Underrun Policy Risk

Reusing stale frames can be visually acceptable, but wrong key/fill behavior may be worse than black. Test with real output.

### Callback Thread Risk

Even after decoupling render, callback work must stay small and bounded.

### Scope Risk

Backend lifecycle and playout queue are related, but either can grow large. Implement in small, testable slices.

## Phase 7 Exit Criteria

Phase 7 can be considered complete once the project can say:

- [x] backend lifecycle states and transitions are explicit
- [x] playout policy owns preroll, pool size, headroom, and underrun behavior
- [x] output callbacks no longer synchronously wait for render production
- [x] render produces completed output frames into a bounded queue
- [x] underrun behavior is explicit and observable
- [x] late/drop recovery is measured rather than fixed skip-only
- [ ] backend health reports lifecycle, queue, underrun, late, and dropped state
- [ ] queue/lifecycle/scheduler behavior has non-DeckLink tests

## Open Questions

- What should the default ready-frame depth be at 30fps and 60fps?
- Should underrun reuse last completed, last scheduled, or black?
- Should output queue depth be user-configurable?
- Should render cadence be driven by backend demand, a timer, or queue-fill pressure?
- How should external keying influence stale-frame/black fallback?
- Should input and output lifecycle states be separate endpoints under one backend shell?

## Short Version

Phase 7 should stop making DeckLink callbacks wait for render.

Render produces ahead into a bounded queue. The backend consumes ready frames according to explicit lifecycle and playout policy. Queue depth, underruns, late frames, dropped frames, and degraded states become measured and visible.