Stage 1 rewrite
This commit is contained in:
430
docs/PHASE_7_7_RENDER_CADENCE_PLAYOUT_DESIGN.md
Normal file
430
docs/PHASE_7_7_RENDER_CADENCE_PLAYOUT_DESIGN.md
Normal file
@@ -0,0 +1,430 @@
|
||||
# Phase 7.7: Render Cadence And Playout Separation Design
|
||||
|
||||
## Status
|
||||
|
||||
Proposed.
|
||||
|
||||
Phase 7.5 and 7.6 proved useful pieces individually:
|
||||
|
||||
- BGRA8 pack/readback can be fast enough on the current test machine.
|
||||
- System-memory frame slots can be wrapped for DeckLink scheduling.
|
||||
- A producer can keep frames ready and keep a small scheduled buffer filled.
|
||||
|
||||
But the experiments also showed that the current hybrid ownership model is fragile:
|
||||
|
||||
- completion-driven rendering caused app-ready starvation
|
||||
- completion-time black fallback caused visible black flicker
|
||||
- producer-side scheduling without a cadence target overfed the schedule timeline
|
||||
- capping scheduled count helped, but completion and producer scheduling fought each other
|
||||
- making completion passive exposed startup and scheduling-trigger gaps
|
||||
- late/drop catch-up skipping created smooth/freeze/smooth cadence
|
||||
|
||||
The lesson is that the app needs a larger architectural split, not more local recovery branches.
|
||||
|
||||
## Goal
|
||||
|
||||
Make the output path behave like two cooperating real-time systems:
|
||||
|
||||
```text
|
||||
Render cadence thread
|
||||
renders at the selected output cadence, for example 59.94 fps
|
||||
writes completed frames into system-memory slots
|
||||
|
||||
DeckLink playout scheduler
|
||||
keeps the device scheduled buffer topped up
|
||||
consumes completed system-memory frames
|
||||
never asks rendering to happen synchronously
|
||||
```
|
||||
|
||||
The system-memory frame buffer becomes the contract between render timing and device timing.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Do not hide failure by repeating frames as the primary strategy.
|
||||
- Do not make DeckLink completion callbacks render frames.
|
||||
- Do not use synthetic schedule-index catch-up as normal recovery.
|
||||
- Do not change shader semantics or live-state semantics.
|
||||
- Do not require v210/YUV packing in the first implementation.
|
||||
- Do not pursue DVP/pinned-memory fast transfer as the main path on unsupported hardware.
|
||||
|
||||
## Target Architecture
|
||||
|
||||
### Current Problem Shape
|
||||
|
||||
The current Phase 7.5/7.6 implementation still has too many timing authorities:
|
||||
|
||||
- DeckLink completion callbacks release frames and influence scheduling
|
||||
- the producer renders based on queue pressure
|
||||
- the producer also schedules some frames
|
||||
- `VideoPlayoutScheduler` advances synthetic stream-time indexes
|
||||
- fallback behavior can schedule black when the app-ready queue is briefly empty
|
||||
|
||||
That means the system can be full and still look wrong, because "full" is not tied to one clear cadence owner.
|
||||
|
||||
### Target Shape
|
||||
|
||||
```text
|
||||
RenderCadenceController
|
||||
owns output frame tick: frame 0, 1, 2...
|
||||
owns render target time
|
||||
asks RenderEngine to render frame N
|
||||
publishes completed frame N into PlayoutFrameStore
|
||||
|
||||
PlayoutFrameStore
|
||||
owns free / rendering / completed / scheduled slots
|
||||
tracks frame number, render time, completion time, and schedule state
|
||||
exposes completed frames to DeckLink scheduler
|
||||
|
||||
DeckLinkPlayoutScheduler
|
||||
owns DeckLink schedule time
|
||||
tops up device buffered frames to target depth
|
||||
consumes completed frames only
|
||||
releases scheduled slots on completion callbacks
|
||||
|
||||
DeckLink completion callback
|
||||
releases completed slots
|
||||
records result and device timing
|
||||
wakes scheduler
|
||||
does not render
|
||||
```
|
||||
|
||||
## Cadence Model
|
||||
|
||||
The render side should be time-driven, not completion-driven.
|
||||
|
||||
For a 59.94 fps mode:
|
||||
|
||||
```text
|
||||
frameDuration = 1001 / 60000 seconds
|
||||
nextRenderTime = now
|
||||
|
||||
loop:
|
||||
wait until nextRenderTime, or run immediately if behind
|
||||
render frameIndex for nextRenderTime
|
||||
read back into free system-memory slot
|
||||
publish completed slot
|
||||
frameIndex += 1
|
||||
nextRenderTime += frameDuration
|
||||
```
|
||||
|
||||
Rules:
|
||||
|
||||
- If the render thread is early, it waits/yields.
|
||||
- If it is slightly late, it renders the next frame immediately and records lateness.
|
||||
- If it is badly late, policy may skip render ticks before rendering the newest frame.
|
||||
- Skipping render ticks is a render-cadence decision, not a DeckLink stream-time jump.
|
||||
- DeckLink schedule time should remain continuous unless a deliberate device recovery policy says otherwise.
|
||||
|
||||
## Buffer Model
|
||||
|
||||
Use a fixed system-memory slot pool.
|
||||
|
||||
Suggested starting values:
|
||||
|
||||
- completed-frame target: 2-4 frames
|
||||
- DeckLink scheduled target: 4 frames for experiments
|
||||
- total system slots: scheduled target + completed target + rendering spare + safety spare
|
||||
|
||||
For example:
|
||||
|
||||
```text
|
||||
scheduled target: 4
|
||||
completed target: 3
|
||||
rendering/spare: 2
|
||||
total slots: 9
|
||||
```
|
||||
|
||||
Slot states:
|
||||
|
||||
- `Free`
|
||||
- `Rendering`
|
||||
- `Completed`
|
||||
- `Scheduled`
|
||||
|
||||
Each slot should carry:
|
||||
|
||||
- frame index
|
||||
- render target timestamp
|
||||
- render completion timestamp
|
||||
- pixel format
|
||||
- row bytes and size
|
||||
- schedule timestamp/index when scheduled
|
||||
- completion result when released
|
||||
|
||||
## Scheduling Model
|
||||
|
||||
The DeckLink scheduler should top up to a target device depth.
|
||||
|
||||
```text
|
||||
on scheduler wake:
|
||||
while actualDeckLinkBufferedFrames < targetScheduledFrames:
|
||||
frame = completedStore.popOldestCompleted()
|
||||
if no frame:
|
||||
record completed-frame underrun
|
||||
break
|
||||
schedule frame at next continuous DeckLink stream time
|
||||
```
|
||||
|
||||
Important:
|
||||
|
||||
- Use DeckLink `GetBufferedVideoFrameCount()` where available.
|
||||
- Keep synthetic scheduled/completed indexes as diagnostics only.
|
||||
- Do not infer device buffer depth from `mScheduledFrameIndex - mCompletedFrameIndex`.
|
||||
- Do not schedule black because the app completed queue is momentarily empty while the device still has frames buffered.
|
||||
- Use black only before the first valid frame or in explicit emergency fallback.
|
||||
|
||||
## Thread Ownership
|
||||
|
||||
### Render Cadence Thread
|
||||
|
||||
Owns:
|
||||
|
||||
- render tick timing
|
||||
- acquiring a free system-memory slot
|
||||
- requesting render-thread output render/readback
|
||||
- publishing completed frames
|
||||
|
||||
Does not own:
|
||||
|
||||
- DeckLink schedule time
|
||||
- completion callback processing
|
||||
- fallback black scheduling
|
||||
|
||||
### RenderEngine Render Thread
|
||||
|
||||
Owns:
|
||||
|
||||
- GL context
|
||||
- input upload
|
||||
- shader rendering
|
||||
- output packing/readback
|
||||
- preview present when allowed
|
||||
|
||||
Output render work should have priority over preview/screenshot work.
|
||||
|
||||
### DeckLink Scheduler Thread
|
||||
|
||||
Owns:
|
||||
|
||||
- schedule top-up policy
|
||||
- DeckLink `ScheduleVideoFrame`
|
||||
- device buffered-frame telemetry
|
||||
- consuming completed frames
|
||||
|
||||
Does not own:
|
||||
|
||||
- rendering a missing frame
|
||||
- running live-state composition directly
|
||||
|
||||
### Completion Callback / Worker
|
||||
|
||||
Owns:
|
||||
|
||||
- releasing scheduled system slots
|
||||
- recording completion result
|
||||
- waking scheduler and render cadence loops
|
||||
|
||||
Does not own:
|
||||
|
||||
- rendering
|
||||
- scheduling fallback black during normal steady state
|
||||
|
||||
## What Happens Under Stress
|
||||
|
||||
### Render Is Temporarily Late
|
||||
|
||||
- Completed-frame queue drains.
|
||||
- DeckLink scheduled buffer drains.
|
||||
- Telemetry shows render lateness and completed queue depth drop.
|
||||
- If render catches up before device buffer reaches zero, output remains smooth.
|
||||
|
||||
### Render Cannot Sustain Cadence
|
||||
|
||||
- Completed-frame queue stays low.
|
||||
- DeckLink buffer trends down.
|
||||
- Late/drop telemetry increases.
|
||||
- Policy may choose to skip render ticks, lower preview load, or enter degraded state.
|
||||
|
||||
### DeckLink Timing Jitters
|
||||
|
||||
- Scheduler tops up based on actual device buffered count.
|
||||
- Render cadence continues independently.
|
||||
- System-memory buffer absorbs short mismatch.
|
||||
|
||||
### UI Loses Focus
|
||||
|
||||
- Render cadence should continue.
|
||||
- Preview present may be disabled or deprioritized.
|
||||
- Output/render threads may need elevated priority.
|
||||
- Device buffer telemetry should reveal whether Windows focus changes affect render cadence or only preview.
|
||||
|
||||
## Migration Plan
|
||||
|
||||
### Step 1: Add Real DeckLink Buffer Telemetry
|
||||
|
||||
Before more scheduling changes, measure the real device buffer.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- call DeckLink `GetBufferedVideoFrameCount()` after schedule/completion where available
|
||||
- expose `actualDeckLinkBufferedFrames`
|
||||
- keep `scheduledLeadFrames` but label it synthetic/internal
|
||||
- record schedule-call duration and failures
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- runtime telemetry distinguishes app completed queue, system scheduled slots, synthetic lead, and actual DeckLink buffer depth
|
||||
|
||||
### Step 2: Rename Existing Queues To Match Their Roles
|
||||
|
||||
Clarify vocabulary before rewriting behavior.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- rename or document `RenderOutputQueue` as completed/unscheduled frame queue
|
||||
- distinguish completed-frame depth from device scheduled depth
|
||||
- update telemetry labels where possible
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- logs no longer imply `readyQueue.depth == 0` means DeckLink starvation
|
||||
|
||||
### Step 3: Introduce `RenderCadenceController`
|
||||
|
||||
Add a pure timing helper first.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- compute next render tick
|
||||
- track frame duration
|
||||
- report early/late/drift
|
||||
- decide whether to render, wait, or skip render ticks
|
||||
|
||||
Tests:
|
||||
|
||||
- exact cadence advances
|
||||
- late ticks are measured
|
||||
- large lateness can skip according to policy
|
||||
- no dependency on GL or DeckLink
|
||||
|
||||
### Step 4: Move Output Production To Cadence Ticks
|
||||
|
||||
Replace queue-pressure-only production with cadence-driven production.
|
||||
|
||||
Initial behavior:
|
||||
|
||||
- render at selected output cadence
|
||||
- produce into system-memory slots
|
||||
- publish completed frames
|
||||
- pause when completed queue is at max depth
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- output rendering continues without DeckLink completions
|
||||
- output rendering does not schedule DeckLink directly
|
||||
|
||||
### Step 5: Make DeckLink Scheduler A Separate Top-Up Loop
|
||||
|
||||
Create a scheduler loop that consumes completed frames.
|
||||
|
||||
Initial behavior:
|
||||
|
||||
- wake on completion, completed-frame publish, and periodic safety timer
|
||||
- top up actual DeckLink buffer to target
|
||||
- schedule only completed system-memory frames
|
||||
- do not render or black-fill during normal steady state
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- producer and DeckLink scheduler are separate loops
|
||||
- one component owns schedule time
|
||||
|
||||
### Step 6: Remove Synthetic Catch-Up From Steady State
|
||||
|
||||
Disable catch-up frame skipping for proactive mode.
|
||||
|
||||
Replacement:
|
||||
|
||||
- render cadence may skip render ticks if the renderer is late
|
||||
- completed queue may drop oldest or newest according to explicit policy
|
||||
- DeckLink schedule time remains continuous
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- scheduled stream time advances one frame per scheduled frame unless emergency recovery is explicitly enabled
|
||||
|
||||
### Step 7: Prioritize Output Render Work
|
||||
|
||||
Reduce render-thread interference.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- output render commands outrank preview present
|
||||
- preview skipped/deferred count is visible
|
||||
- input upload timing is measured separately
|
||||
- screenshot/readback cannot block output cadence unless explicitly requested
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- focus changes and preview present do not drain playout buffer
|
||||
|
||||
### Step 8: Tune Thread Priority And Wait Strategy
|
||||
|
||||
Only after ownership is separated, tune scheduling.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- set render cadence and DeckLink scheduler threads to appropriate Windows priorities
|
||||
- avoid busy spinning
|
||||
- use waitable timers or high-resolution waits where useful
|
||||
- record wake jitter
|
||||
|
||||
Exit criteria:
|
||||
|
||||
- cadence jitter is measurable and bounded
|
||||
|
||||
## Telemetry
|
||||
|
||||
Add or clarify:
|
||||
|
||||
- `renderCadence.targetFps`
|
||||
- `renderCadence.frameIndex`
|
||||
- `renderCadence.lateMs`
|
||||
- `renderCadence.maxLateMs`
|
||||
- `renderCadence.skippedTicks`
|
||||
- `completedFrames.depth`
|
||||
- `completedFrames.capacity`
|
||||
- `completedFrames.underruns`
|
||||
- `systemMemory.free`
|
||||
- `systemMemory.rendering`
|
||||
- `systemMemory.completed`
|
||||
- `systemMemory.scheduled`
|
||||
- `decklink.actualBufferedFrames`
|
||||
- `decklink.targetBufferedFrames`
|
||||
- `decklink.scheduleCallMs`
|
||||
- `decklink.scheduleFailures`
|
||||
- `decklink.completionIntervalMs`
|
||||
- `decklink.lateFrames`
|
||||
- `decklink.droppedFrames`
|
||||
- `scheduler.syntheticLeadFrames`
|
||||
|
||||
## Risks
|
||||
|
||||
- A cadence thread can render frames that DeckLink later drops if scheduling is wrong.
|
||||
- Too much buffering adds latency.
|
||||
- Too little buffering exposes Windows scheduling jitter.
|
||||
- If output render and input upload still share one GL thread, render cadence can still be disturbed by uploads.
|
||||
- Actual DeckLink buffer telemetry may differ from app-owned scheduled-slot counts.
|
||||
|
||||
## Exit Criteria
|
||||
|
||||
Phase 7.7 is complete when:
|
||||
|
||||
- output rendering is driven by a render cadence controller
|
||||
- DeckLink completion callbacks do not render
|
||||
- DeckLink scheduling is owned by a scheduler/top-up loop
|
||||
- system-memory completed frames are the only contract between render and DeckLink scheduling
|
||||
- real DeckLink buffered-frame count is visible
|
||||
- synthetic schedule lead no longer drives normal recovery
|
||||
- black fallback is startup/emergency only
|
||||
- playback can be tested with 4-frame and larger buffers without changing ownership logic
|
||||
Reference in New Issue
Block a user