15 KiB
Phase 7.7: Render Cadence And Playout Separation Design
Status
In progress.
Implemented so far:
- real DeckLink buffered-frame telemetry is exposed separately from synthetic scheduler lead
- pure
RenderCadenceControllerexists with non-GL tests SystemOutputFramePoolnow exposes the Phase 7.7 state vocabulary:Free,Rendering,Completed,Scheduled- the output producer now uses
RenderCadenceControllerto render one output frame per cadence tick - DeckLink scheduling remains a separate top-up pass capped by the configured preroll target
Phase 7.5 and 7.6 proved useful pieces individually:
- BGRA8 pack/readback can be fast enough on the current test machine.
- System-memory frame slots can be wrapped for DeckLink scheduling.
- A producer can keep frames ready and keep a small scheduled buffer filled.
But the experiments also showed that the current hybrid ownership model is fragile:
- completion-driven rendering caused app-ready starvation
- completion-time black fallback caused visible black flicker
- producer-side scheduling without a cadence target overfed the schedule timeline
- capping scheduled count helped, but completion and producer scheduling fought each other
- making completion passive exposed startup and scheduling-trigger gaps
- late/drop catch-up skipping created smooth/freeze/smooth cadence
The lesson is that the app needs a larger architectural split, not more local recovery branches.
Goal
Make the output path behave like two cooperating real-time systems:
Render cadence thread
renders at the selected output cadence, for example 59.94 fps
writes completed frames into system-memory slots
DeckLink playout scheduler
keeps the device scheduled buffer topped up
consumes completed system-memory frames
never asks rendering to happen synchronously
The system-memory frame buffer becomes the contract between render timing and device timing.
Core principle:
- The render cadence should be stable and boring.
- If the selected output mode is 59.94 fps, the render producer should attempt to render at 59.94 fps.
- It should not speed up just because the DeckLink buffer is empty.
- It should not slow down because DeckLink is full or because completed frames have not drained.
- Completed-but-unscheduled frames are a latest-N cache. Old completed frames may be dropped/recycled to keep rendering at cadence.
- Scheduled frames are protected until DeckLink completes them.
- The only normal reason for the render cadence to deviate is that rendering/GPU work itself overruns the frame budget.
Non-Goals
- Do not hide failure by repeating frames as the primary strategy.
- Do not make DeckLink completion callbacks render frames.
- Do not use synthetic schedule-index catch-up as normal recovery.
- Do not change shader semantics or live-state semantics.
- Do not require v210/YUV packing in the first implementation.
- Do not pursue DVP/pinned-memory fast transfer as the main path on unsupported hardware.
Target Architecture
Current Problem Shape
The current Phase 7.5/7.6 implementation still has too many timing authorities:
- DeckLink completion callbacks release frames and influence scheduling
- the producer renders based on queue pressure
- the producer also schedules some frames
VideoPlayoutScheduleradvances synthetic stream-time indexes- fallback behavior can schedule black when the app-ready queue is briefly empty
That means the system can be full and still look wrong, because "full" is not tied to one clear cadence owner.
Target Shape
Startup / warmup
render cadence starts first
render thread produces warmup frames at the selected cadence
completed system-memory queue reaches warmup target
DeckLink preroll is scheduled from completed frames
DeckLink playback starts with a filled buffer
Steady state
RenderCadenceController
owns output frame tick: frame 0, 1, 2...
owns render target time
asks RenderEngine to render frame N
publishes completed frame N into PlayoutFrameStore
PlayoutFrameStore
owns free / rendering / completed / scheduled slots
tracks frame number, render time, completion time, and schedule state
exposes latest completed frames to DeckLink scheduler
may drop/recycle oldest unscheduled completed frames when render cadence needs space
DeckLinkPlayoutScheduler
owns DeckLink schedule time
tops up device buffered frames to target depth
consumes completed frames only
releases scheduled slots on completion callbacks
DeckLink completion callback
releases completed slots
records result and device timing
wakes scheduler
does not render
Cadence Model
The render side should be time-driven, not completion-driven.
For a 59.94 fps mode:
frameDuration = 1001 / 60000 seconds
nextRenderTime = now
loop:
wait until nextRenderTime, or run immediately if behind
render frameIndex for nextRenderTime
read back into free system-memory slot
publish completed slot
frameIndex += 1
nextRenderTime += frameDuration
Rules:
- If the render thread is early, it waits/yields.
- If it is slightly late, it renders the next frame immediately and records lateness.
- If it is badly late because render/GPU work overran the frame budget, policy may skip render ticks before rendering the newest frame.
- Skipping render ticks is an overrun policy, not a buffer-fill strategy.
- DeckLink schedule time should remain continuous unless a deliberate device recovery policy says otherwise.
Non-rule:
- The render producer must not render faster than the selected cadence to refill DeckLink.
- DeckLink should start only after warmup/preroll has filled enough completed frames.
- If the DeckLink buffer drains in steady state, that is a real timing failure to measure, not a signal for the render thread to sprint.
Buffer Model
Use a fixed system-memory slot pool.
The completed portion of the pool is not a strict consume-before-render queue. It is a latest-N rendered-frame cache:
- render cadence writes one frame per selected output tick
- if completed-but-unscheduled frames are full, the oldest completed frame is disposable
- DeckLink scheduling consumes from the completed cache when it needs frames
- frames already scheduled to DeckLink are never recycled until completion
- if all slots are scheduled/in flight, cadence may miss because there is genuinely no safe system-memory target
Suggested starting values:
- completed-frame target: 2-4 frames
- DeckLink scheduled target: 4 frames for experiments
- total system slots: scheduled target + completed target + rendering spare + safety spare
For example:
scheduled target: 4
completed target: 3
rendering/spare: 2
total slots: 9
Slot states:
FreeRenderingCompletedScheduled
Each slot should carry:
- frame index
- render target timestamp
- render completion timestamp
- pixel format
- row bytes and size
- schedule timestamp/index when scheduled
- completion result when released
Scheduling Model
The DeckLink scheduler should top up to a target device depth.
on scheduler wake:
while actualDeckLinkBufferedFrames < targetScheduledFrames:
frame = completedStore.popOldestCompleted()
if no frame:
record completed-frame underrun
break
schedule frame at next continuous DeckLink stream time
Important:
- Use DeckLink
GetBufferedVideoFrameCount()where available. - Keep synthetic scheduled/completed indexes as diagnostics only.
- Do not infer device buffer depth from
mScheduledFrameIndex - mCompletedFrameIndex. - Do not schedule black because the app completed queue is momentarily empty while the device still has frames buffered.
- Use black only before the first valid frame or in explicit emergency fallback.
Thread Ownership
Render Cadence Thread
Owns:
- render tick timing
- acquiring a free system-memory slot
- requesting render-thread output render/readback
- publishing completed frames
Does not own:
- DeckLink schedule time
- completion callback processing
- fallback black scheduling
RenderEngine Render Thread
Owns:
- GL context
- input upload
- shader rendering
- output packing/readback
- preview present when allowed
Output render work should have priority over preview/screenshot work.
DeckLink Scheduler Thread
Owns:
- schedule top-up policy
- DeckLink
ScheduleVideoFrame - device buffered-frame telemetry
- consuming completed frames
Does not own:
- rendering a missing frame
- running live-state composition directly
Completion Callback / Worker
Owns:
- releasing scheduled system slots
- recording completion result
- waking scheduler and render cadence loops
Does not own:
- rendering
- scheduling fallback black during normal steady state
What Happens Under Stress
Render Is Temporarily Late
- Completed-frame queue drains.
- DeckLink scheduled buffer drains.
- Telemetry shows render lateness and completed queue depth drop.
- If render catches up before device buffer reaches zero, output remains smooth.
Render Cannot Sustain Cadence
- Completed-frame queue stays low.
- DeckLink buffer trends down.
- Late/drop telemetry increases.
- Policy may choose to skip render ticks, lower preview load, or enter degraded state.
DeckLink Timing Jitters
- Scheduler tops up based on actual device buffered count.
- Render cadence continues independently.
- System-memory buffer absorbs short mismatch.
UI Loses Focus
- Render cadence should continue.
- Preview present may be disabled or deprioritized.
- Output/render threads may need elevated priority.
- Device buffer telemetry should reveal whether Windows focus changes affect render cadence or only preview.
Migration Plan
Step 1: Add Real DeckLink Buffer Telemetry
Before more scheduling changes, measure the real device buffer.
Deliverables:
- call DeckLink
GetBufferedVideoFrameCount()after schedule/completion where available - expose
actualDeckLinkBufferedFrames - keep
scheduledLeadFramesbut label it synthetic/internal - record schedule-call duration and failures
Exit criteria:
- runtime telemetry distinguishes app completed queue, system scheduled slots, synthetic lead, and actual DeckLink buffer depth
Step 2: Rename Existing Queues To Match Their Roles
Clarify vocabulary before rewriting behavior.
Deliverables:
- rename or document
RenderOutputQueueas completed/unscheduled frame queue - distinguish completed-frame depth from device scheduled depth
- update telemetry labels where possible
Exit criteria:
- logs no longer imply
readyQueue.depth == 0means DeckLink starvation
Step 3: Introduce RenderCadenceController
Add a pure timing helper first.
Responsibilities:
- compute next render tick
- track frame duration
- report early/late/drift
- decide whether to render, wait, or skip render ticks
Tests:
- exact cadence advances
- late ticks are measured
- large lateness can skip according to policy
- no dependency on GL or DeckLink
Step 4: Move Output Production To Cadence Ticks
Replace queue-pressure-only production with cadence-driven production.
Initial behavior:
- render at selected output cadence
- produce into system-memory slots
- publish completed frames
- recycle/drop oldest unscheduled completed frames when cadence needs a slot
- only wait when every safe slot is scheduled/in flight
Exit criteria:
- output rendering continues without DeckLink completions
- output rendering does not schedule DeckLink directly
- completed-frame buffering behaves as latest-N, not consume-before-render
Step 4a: Add Warmup Before DeckLink Playback
DeckLink output should not start consuming before the render cadence has prepared an initial cushion.
Initial behavior:
- configure DeckLink output without starting scheduled playback
- start the render cadence producer
- render warmup frames at the selected cadence, not faster
- wait until completed-frame depth reaches
targetWarmupFrames - schedule those completed frames as DeckLink preroll
- call
StartScheduledPlayback()
Exit criteria:
- startup does not require the render producer to catch up by rendering faster than cadence
- DeckLink begins playback with a real completed-frame buffer
- if warmup cannot fill within a bounded timeout, startup enters degraded state with telemetry
Step 5: Make DeckLink Scheduler A Separate Top-Up Loop
Create a scheduler loop that consumes completed frames.
Initial behavior:
- wake on completion, completed-frame publish, and periodic safety timer
- top up actual DeckLink buffer to target
- schedule only completed system-memory frames
- do not render or black-fill during normal steady state
Exit criteria:
- producer and DeckLink scheduler are separate loops
- one component owns schedule time
Step 6: Remove Synthetic Catch-Up From Steady State
Disable catch-up frame skipping for proactive mode.
Replacement:
- render cadence may skip render ticks if the renderer is late
- completed queue may drop oldest or newest according to explicit policy
- DeckLink schedule time remains continuous
Exit criteria:
- scheduled stream time advances one frame per scheduled frame unless emergency recovery is explicitly enabled
Step 7: Prioritize Output Render Work
Reduce render-thread interference.
Deliverables:
- output render commands outrank preview present
- preview skipped/deferred count is visible
- input upload timing is measured separately
- screenshot/readback cannot block output cadence unless explicitly requested
Exit criteria:
- focus changes and preview present do not drain playout buffer
Step 8: Tune Thread Priority And Wait Strategy
Only after ownership is separated, tune scheduling.
Deliverables:
- set render cadence and DeckLink scheduler threads to appropriate Windows priorities
- avoid busy spinning
- use waitable timers or high-resolution waits where useful
- record wake jitter
Exit criteria:
- cadence jitter is measurable and bounded
Telemetry
Add or clarify:
renderCadence.targetFpsrenderCadence.frameIndexrenderCadence.lateMsrenderCadence.maxLateMsrenderCadence.skippedTickscompletedFrames.depthcompletedFrames.capacitycompletedFrames.underrunssystemMemory.freesystemMemory.renderingsystemMemory.completedsystemMemory.scheduleddecklink.actualBufferedFramesdecklink.targetBufferedFramesdecklink.scheduleCallMsdecklink.scheduleFailuresdecklink.completionIntervalMsdecklink.lateFramesdecklink.droppedFramesscheduler.syntheticLeadFrames
Risks
- A cadence thread can render frames that DeckLink later drops if scheduling is wrong.
- Too much buffering adds latency.
- Too little buffering exposes Windows scheduling jitter.
- If output render and input upload still share one GL thread, render cadence can still be disturbed by uploads.
- Actual DeckLink buffer telemetry may differ from app-owned scheduled-slot counts.
Exit Criteria
Phase 7.7 is complete when:
- output rendering is driven by a render cadence controller
- DeckLink completion callbacks do not render
- DeckLink scheduling is owned by a scheduler/top-up loop
- system-memory completed frames are the only contract between render and DeckLink scheduling
- real DeckLink buffered-frame count is visible
- synthetic schedule lead no longer drives normal recovery
- black fallback is startup/emergency only
- playback can be tested with 4-frame and larger buffers without changing ownership logic