Stage 1 rewrite

This commit is contained in:
Aiden
2026-05-12 00:52:33 +10:00
parent bf23cd880a
commit ac729dc2b9
20 changed files with 1047 additions and 25 deletions

View File

@@ -0,0 +1,430 @@
# Phase 7.7: Render Cadence And Playout Separation Design
## Status
Proposed.
Phase 7.5 and 7.6 proved useful pieces individually:
- BGRA8 pack/readback can be fast enough on the current test machine.
- System-memory frame slots can be wrapped for DeckLink scheduling.
- A producer can keep frames ready and keep a small scheduled buffer filled.
But the experiments also showed that the current hybrid ownership model is fragile:
- completion-driven rendering caused app-ready starvation
- completion-time black fallback caused visible black flicker
- producer-side scheduling without a cadence target overfed the schedule timeline
- capping scheduled count helped, but completion and producer scheduling fought each other
- making completion passive exposed startup and scheduling-trigger gaps
- late/drop catch-up skipping created smooth/freeze/smooth cadence
The lesson is that the app needs a larger architectural split, not more local recovery branches.
## Goal
Make the output path behave like two cooperating real-time systems:
```text
Render cadence thread
renders at the selected output cadence, for example 59.94 fps
writes completed frames into system-memory slots
DeckLink playout scheduler
keeps the device scheduled buffer topped up
consumes completed system-memory frames
never asks rendering to happen synchronously
```
The system-memory frame buffer becomes the contract between render timing and device timing.
## Non-Goals
- Do not hide failure by repeating frames as the primary strategy.
- Do not make DeckLink completion callbacks render frames.
- Do not use synthetic schedule-index catch-up as normal recovery.
- Do not change shader semantics or live-state semantics.
- Do not require v210/YUV packing in the first implementation.
- Do not pursue DVP/pinned-memory fast transfer as the main path on unsupported hardware.
## Target Architecture
### Current Problem Shape
The current Phase 7.5/7.6 implementation still has too many timing authorities:
- DeckLink completion callbacks release frames and influence scheduling
- the producer renders based on queue pressure
- the producer also schedules some frames
- `VideoPlayoutScheduler` advances synthetic stream-time indexes
- fallback behavior can schedule black when the app-ready queue is briefly empty
That means the system can be full and still look wrong, because "full" is not tied to one clear cadence owner.
### Target Shape
```text
RenderCadenceController
owns output frame tick: frame 0, 1, 2...
owns render target time
asks RenderEngine to render frame N
publishes completed frame N into PlayoutFrameStore
PlayoutFrameStore
owns free / rendering / completed / scheduled slots
tracks frame number, render time, completion time, and schedule state
exposes completed frames to DeckLink scheduler
DeckLinkPlayoutScheduler
owns DeckLink schedule time
tops up device buffered frames to target depth
consumes completed frames only
releases scheduled slots on completion callbacks
DeckLink completion callback
releases completed slots
records result and device timing
wakes scheduler
does not render
```
## Cadence Model
The render side should be time-driven, not completion-driven.
For a 59.94 fps mode:
```text
frameDuration = 1001 / 60000 seconds
nextRenderTime = now
loop:
wait until nextRenderTime, or run immediately if behind
render frameIndex for nextRenderTime
read back into free system-memory slot
publish completed slot
frameIndex += 1
nextRenderTime += frameDuration
```
Rules:
- If the render thread is early, it waits/yields.
- If it is slightly late, it renders the next frame immediately and records lateness.
- If it is badly late, policy may skip render ticks before rendering the newest frame.
- Skipping render ticks is a render-cadence decision, not a DeckLink stream-time jump.
- DeckLink schedule time should remain continuous unless a deliberate device recovery policy says otherwise.
## Buffer Model
Use a fixed system-memory slot pool.
Suggested starting values:
- completed-frame target: 2-4 frames
- DeckLink scheduled target: 4 frames for experiments
- total system slots: scheduled target + completed target + rendering spare + safety spare
For example:
```text
scheduled target: 4
completed target: 3
rendering/spare: 2
total slots: 9
```
Slot states:
- `Free`
- `Rendering`
- `Completed`
- `Scheduled`
Each slot should carry:
- frame index
- render target timestamp
- render completion timestamp
- pixel format
- row bytes and size
- schedule timestamp/index when scheduled
- completion result when released
## Scheduling Model
The DeckLink scheduler should top up to a target device depth.
```text
on scheduler wake:
while actualDeckLinkBufferedFrames < targetScheduledFrames:
frame = completedStore.popOldestCompleted()
if no frame:
record completed-frame underrun
break
schedule frame at next continuous DeckLink stream time
```
Important:
- Use DeckLink `GetBufferedVideoFrameCount()` where available.
- Keep synthetic scheduled/completed indexes as diagnostics only.
- Do not infer device buffer depth from `mScheduledFrameIndex - mCompletedFrameIndex`.
- Do not schedule black because the app completed queue is momentarily empty while the device still has frames buffered.
- Use black only before the first valid frame or in explicit emergency fallback.
## Thread Ownership
### Render Cadence Thread
Owns:
- render tick timing
- acquiring a free system-memory slot
- requesting render-thread output render/readback
- publishing completed frames
Does not own:
- DeckLink schedule time
- completion callback processing
- fallback black scheduling
### RenderEngine Render Thread
Owns:
- GL context
- input upload
- shader rendering
- output packing/readback
- preview present when allowed
Output render work should have priority over preview/screenshot work.
### DeckLink Scheduler Thread
Owns:
- schedule top-up policy
- DeckLink `ScheduleVideoFrame`
- device buffered-frame telemetry
- consuming completed frames
Does not own:
- rendering a missing frame
- running live-state composition directly
### Completion Callback / Worker
Owns:
- releasing scheduled system slots
- recording completion result
- waking scheduler and render cadence loops
Does not own:
- rendering
- scheduling fallback black during normal steady state
## What Happens Under Stress
### Render Is Temporarily Late
- Completed-frame queue drains.
- DeckLink scheduled buffer drains.
- Telemetry shows render lateness and completed queue depth drop.
- If render catches up before device buffer reaches zero, output remains smooth.
### Render Cannot Sustain Cadence
- Completed-frame queue stays low.
- DeckLink buffer trends down.
- Late/drop telemetry increases.
- Policy may choose to skip render ticks, lower preview load, or enter degraded state.
### DeckLink Timing Jitters
- Scheduler tops up based on actual device buffered count.
- Render cadence continues independently.
- System-memory buffer absorbs short mismatch.
### UI Loses Focus
- Render cadence should continue.
- Preview present may be disabled or deprioritized.
- Output/render threads may need elevated priority.
- Device buffer telemetry should reveal whether Windows focus changes affect render cadence or only preview.
## Migration Plan
### Step 1: Add Real DeckLink Buffer Telemetry
Before more scheduling changes, measure the real device buffer.
Deliverables:
- call DeckLink `GetBufferedVideoFrameCount()` after schedule/completion where available
- expose `actualDeckLinkBufferedFrames`
- keep `scheduledLeadFrames` but label it synthetic/internal
- record schedule-call duration and failures
Exit criteria:
- runtime telemetry distinguishes app completed queue, system scheduled slots, synthetic lead, and actual DeckLink buffer depth
### Step 2: Rename Existing Queues To Match Their Roles
Clarify vocabulary before rewriting behavior.
Deliverables:
- rename or document `RenderOutputQueue` as completed/unscheduled frame queue
- distinguish completed-frame depth from device scheduled depth
- update telemetry labels where possible
Exit criteria:
- logs no longer imply `readyQueue.depth == 0` means DeckLink starvation
### Step 3: Introduce `RenderCadenceController`
Add a pure timing helper first.
Responsibilities:
- compute next render tick
- track frame duration
- report early/late/drift
- decide whether to render, wait, or skip render ticks
Tests:
- exact cadence advances
- late ticks are measured
- large lateness can skip according to policy
- no dependency on GL or DeckLink
### Step 4: Move Output Production To Cadence Ticks
Replace queue-pressure-only production with cadence-driven production.
Initial behavior:
- render at selected output cadence
- produce into system-memory slots
- publish completed frames
- pause when completed queue is at max depth
Exit criteria:
- output rendering continues without DeckLink completions
- output rendering does not schedule DeckLink directly
### Step 5: Make DeckLink Scheduler A Separate Top-Up Loop
Create a scheduler loop that consumes completed frames.
Initial behavior:
- wake on completion, completed-frame publish, and periodic safety timer
- top up actual DeckLink buffer to target
- schedule only completed system-memory frames
- do not render or black-fill during normal steady state
Exit criteria:
- producer and DeckLink scheduler are separate loops
- one component owns schedule time
### Step 6: Remove Synthetic Catch-Up From Steady State
Disable catch-up frame skipping for proactive mode.
Replacement:
- render cadence may skip render ticks if the renderer is late
- completed queue may drop oldest or newest according to explicit policy
- DeckLink schedule time remains continuous
Exit criteria:
- scheduled stream time advances one frame per scheduled frame unless emergency recovery is explicitly enabled
### Step 7: Prioritize Output Render Work
Reduce render-thread interference.
Deliverables:
- output render commands outrank preview present
- preview skipped/deferred count is visible
- input upload timing is measured separately
- screenshot/readback cannot block output cadence unless explicitly requested
Exit criteria:
- focus changes and preview present do not drain playout buffer
### Step 8: Tune Thread Priority And Wait Strategy
Only after ownership is separated, tune scheduling.
Deliverables:
- set render cadence and DeckLink scheduler threads to appropriate Windows priorities
- avoid busy spinning
- use waitable timers or high-resolution waits where useful
- record wake jitter
Exit criteria:
- cadence jitter is measurable and bounded
## Telemetry
Add or clarify:
- `renderCadence.targetFps`
- `renderCadence.frameIndex`
- `renderCadence.lateMs`
- `renderCadence.maxLateMs`
- `renderCadence.skippedTicks`
- `completedFrames.depth`
- `completedFrames.capacity`
- `completedFrames.underruns`
- `systemMemory.free`
- `systemMemory.rendering`
- `systemMemory.completed`
- `systemMemory.scheduled`
- `decklink.actualBufferedFrames`
- `decklink.targetBufferedFrames`
- `decklink.scheduleCallMs`
- `decklink.scheduleFailures`
- `decklink.completionIntervalMs`
- `decklink.lateFrames`
- `decklink.droppedFrames`
- `scheduler.syntheticLeadFrames`
## Risks
- A cadence thread can render frames that DeckLink later drops if scheduling is wrong.
- Too much buffering adds latency.
- Too little buffering exposes Windows scheduling jitter.
- If output render and input upload still share one GL thread, render cadence can still be disturbed by uploads.
- Actual DeckLink buffer telemetry may differ from app-owned scheduled-slot counts.
## Exit Criteria
Phase 7.7 is complete when:
- output rendering is driven by a render cadence controller
- DeckLink completion callbacks do not render
- DeckLink scheduling is owned by a scheduler/top-up loop
- system-memory completed frames are the only contract between render and DeckLink scheduling
- real DeckLink buffered-frame count is visible
- synthetic schedule lead no longer drives normal recovery
- black fallback is startup/emergency only
- playback can be tested with 4-frame and larger buffers without changing ownership logic