# Phase 7.7: Render Cadence And Playout Separation Design ## Status In progress. Implemented so far: - real DeckLink buffered-frame telemetry is exposed separately from synthetic scheduler lead - pure `RenderCadenceController` exists with non-GL tests - `SystemOutputFramePool` now exposes the Phase 7.7 state vocabulary: `Free`, `Rendering`, `Completed`, `Scheduled` - the output producer now uses `RenderCadenceController` to render one output frame per cadence tick - DeckLink scheduling remains a separate top-up pass capped by the configured preroll target Phase 7.5 and 7.6 proved useful pieces individually: - BGRA8 pack/readback can be fast enough on the current test machine. - System-memory frame slots can be wrapped for DeckLink scheduling. - A producer can keep frames ready and keep a small scheduled buffer filled. But the experiments also showed that the current hybrid ownership model is fragile: - completion-driven rendering caused app-ready starvation - completion-time black fallback caused visible black flicker - producer-side scheduling without a cadence target overfed the schedule timeline - capping scheduled count helped, but completion and producer scheduling fought each other - making completion passive exposed startup and scheduling-trigger gaps - late/drop catch-up skipping created smooth/freeze/smooth cadence The lesson is that the app needs a larger architectural split, not more local recovery branches. ## Goal Make the output path behave like two cooperating real-time systems: ```text Render cadence thread renders at the selected output cadence, for example 59.94 fps writes completed frames into system-memory slots DeckLink playout scheduler keeps the device scheduled buffer topped up consumes completed system-memory frames never asks rendering to happen synchronously ``` The system-memory frame buffer becomes the contract between render timing and device timing. Core principle: - The render cadence should be stable and boring. - If the selected output mode is 59.94 fps, the render producer should attempt to render at 59.94 fps. - It should not speed up just because the DeckLink buffer is empty. - It should not slow down because DeckLink is full or because completed frames have not drained. - Completed-but-unscheduled frames are a latest-N cache. Old completed frames may be dropped/recycled to keep rendering at cadence. - Scheduled frames are protected until DeckLink completes them. - The only normal reason for the render cadence to deviate is that rendering/GPU work itself overruns the frame budget. ## Non-Goals - Do not hide failure by repeating frames as the primary strategy. - Do not make DeckLink completion callbacks render frames. - Do not use synthetic schedule-index catch-up as normal recovery. - Do not change shader semantics or live-state semantics. - Do not require v210/YUV packing in the first implementation. - Do not pursue DVP/pinned-memory fast transfer as the main path on unsupported hardware. ## Target Architecture ### Current Problem Shape The current Phase 7.5/7.6 implementation still has too many timing authorities: - DeckLink completion callbacks release frames and influence scheduling - the producer renders based on queue pressure - the producer also schedules some frames - `VideoPlayoutScheduler` advances synthetic stream-time indexes - fallback behavior can schedule black when the app-ready queue is briefly empty That means the system can be full and still look wrong, because "full" is not tied to one clear cadence owner. ### Target Shape ```text Startup / warmup render cadence starts first render thread produces warmup frames at the selected cadence completed system-memory queue reaches warmup target DeckLink preroll is scheduled from completed frames DeckLink playback starts with a filled buffer Steady state RenderCadenceController owns output frame tick: frame 0, 1, 2... owns render target time asks RenderEngine to render frame N publishes completed frame N into PlayoutFrameStore PlayoutFrameStore owns free / rendering / completed / scheduled slots tracks frame number, render time, completion time, and schedule state exposes latest completed frames to DeckLink scheduler may drop/recycle oldest unscheduled completed frames when render cadence needs space DeckLinkPlayoutScheduler owns DeckLink schedule time tops up device buffered frames to target depth consumes completed frames only releases scheduled slots on completion callbacks DeckLink completion callback releases completed slots records result and device timing wakes scheduler does not render ``` ## Cadence Model The render side should be time-driven, not completion-driven. For a 59.94 fps mode: ```text frameDuration = 1001 / 60000 seconds nextRenderTime = now loop: wait until nextRenderTime, or run immediately if behind render frameIndex for nextRenderTime read back into free system-memory slot publish completed slot frameIndex += 1 nextRenderTime += frameDuration ``` Rules: - If the render thread is early, it waits/yields. - If it is slightly late, it renders the next frame immediately and records lateness. - If it is badly late because render/GPU work overran the frame budget, policy may skip render ticks before rendering the newest frame. - Skipping render ticks is an overrun policy, not a buffer-fill strategy. - DeckLink schedule time should remain continuous unless a deliberate device recovery policy says otherwise. Non-rule: - The render producer must not render faster than the selected cadence to refill DeckLink. - DeckLink should start only after warmup/preroll has filled enough completed frames. - If the DeckLink buffer drains in steady state, that is a real timing failure to measure, not a signal for the render thread to sprint. ## Buffer Model Use a fixed system-memory slot pool. The completed portion of the pool is not a strict consume-before-render queue. It is a latest-N rendered-frame cache: - render cadence writes one frame per selected output tick - if completed-but-unscheduled frames are full, the oldest completed frame is disposable - DeckLink scheduling consumes from the completed cache when it needs frames - frames already scheduled to DeckLink are never recycled until completion - if all slots are scheduled/in flight, cadence may miss because there is genuinely no safe system-memory target Suggested starting values: - completed-frame target: 2-4 frames - DeckLink scheduled target: 4 frames for experiments - total system slots: scheduled target + completed target + rendering spare + safety spare For example: ```text scheduled target: 4 completed target: 3 rendering/spare: 2 total slots: 9 ``` Slot states: - `Free` - `Rendering` - `Completed` - `Scheduled` Each slot should carry: - frame index - render target timestamp - render completion timestamp - pixel format - row bytes and size - schedule timestamp/index when scheduled - completion result when released ## Scheduling Model The DeckLink scheduler should top up to a target device depth. ```text on scheduler wake: while actualDeckLinkBufferedFrames < targetScheduledFrames: frame = completedStore.popOldestCompleted() if no frame: record completed-frame underrun break schedule frame at next continuous DeckLink stream time ``` Important: - Use DeckLink `GetBufferedVideoFrameCount()` where available. - Keep synthetic scheduled/completed indexes as diagnostics only. - Do not infer device buffer depth from `mScheduledFrameIndex - mCompletedFrameIndex`. - Do not schedule black because the app completed queue is momentarily empty while the device still has frames buffered. - Use black only before the first valid frame or in explicit emergency fallback. ## Thread Ownership ### Render Cadence Thread Owns: - render tick timing - acquiring a free system-memory slot - requesting render-thread output render/readback - publishing completed frames Does not own: - DeckLink schedule time - completion callback processing - fallback black scheduling ### RenderEngine Render Thread Owns: - GL context - input upload - shader rendering - output packing/readback - preview present when allowed Output render work should have priority over preview/screenshot work. ### DeckLink Scheduler Thread Owns: - schedule top-up policy - DeckLink `ScheduleVideoFrame` - device buffered-frame telemetry - consuming completed frames Does not own: - rendering a missing frame - running live-state composition directly ### Completion Callback / Worker Owns: - releasing scheduled system slots - recording completion result - waking scheduler and render cadence loops Does not own: - rendering - scheduling fallback black during normal steady state ## What Happens Under Stress ### Render Is Temporarily Late - Completed-frame queue drains. - DeckLink scheduled buffer drains. - Telemetry shows render lateness and completed queue depth drop. - If render catches up before device buffer reaches zero, output remains smooth. ### Render Cannot Sustain Cadence - Completed-frame queue stays low. - DeckLink buffer trends down. - Late/drop telemetry increases. - Policy may choose to skip render ticks, lower preview load, or enter degraded state. ### DeckLink Timing Jitters - Scheduler tops up based on actual device buffered count. - Render cadence continues independently. - System-memory buffer absorbs short mismatch. ### UI Loses Focus - Render cadence should continue. - Preview present may be disabled or deprioritized. - Output/render threads may need elevated priority. - Device buffer telemetry should reveal whether Windows focus changes affect render cadence or only preview. ## Migration Plan ### Step 1: Add Real DeckLink Buffer Telemetry Before more scheduling changes, measure the real device buffer. Deliverables: - [x] call DeckLink `GetBufferedVideoFrameCount()` after schedule/completion where available - [x] expose `actualDeckLinkBufferedFrames` - [x] keep `scheduledLeadFrames` but label it synthetic/internal - [x] record schedule-call duration and failures Exit criteria: - [x] runtime telemetry distinguishes app completed queue, system scheduled slots, synthetic lead, and actual DeckLink buffer depth ### Step 2: Rename Existing Queues To Match Their Roles Clarify vocabulary before rewriting behavior. Deliverables: - rename or document `RenderOutputQueue` as completed/unscheduled frame queue - distinguish completed-frame depth from device scheduled depth - update telemetry labels where possible Exit criteria: - logs no longer imply `readyQueue.depth == 0` means DeckLink starvation ### Step 3: Introduce `RenderCadenceController` Add a pure timing helper first. Responsibilities: - [x] compute next render tick - [x] track frame duration - [x] report early/late/drift - [x] decide whether to render, wait, or skip render ticks Tests: - [x] exact cadence advances - [x] late ticks are measured - [x] large lateness can skip according to policy - [x] no dependency on GL or DeckLink ### Step 4: Move Output Production To Cadence Ticks Replace queue-pressure-only production with cadence-driven production. Initial behavior: - [x] render at selected output cadence - [x] produce into system-memory slots - [x] publish completed frames - [x] recycle/drop oldest unscheduled completed frames when cadence needs a slot - [ ] only wait when every safe slot is scheduled/in flight Exit criteria: - output rendering continues without DeckLink completions - output rendering does not schedule DeckLink directly - completed-frame buffering behaves as latest-N, not consume-before-render ### Step 4a: Add Warmup Before DeckLink Playback DeckLink output should not start consuming before the render cadence has prepared an initial cushion. Initial behavior: - [x] configure DeckLink output without starting scheduled playback - [x] start the render cadence producer - [x] render warmup frames at the selected cadence, not faster - [x] wait until scheduled preroll reaches `targetPrerollFrames` - [x] schedule completed system-memory frames as DeckLink preroll - [x] call `StartScheduledPlayback()` Exit criteria: - [x] startup does not require the render producer to catch up by rendering faster than cadence - [x] DeckLink begins playback with a real rendered preroll buffer - [x] if warmup cannot fill within a bounded timeout, startup enters degraded state with telemetry ### Step 5: Make DeckLink Scheduler A Separate Top-Up Loop Create a scheduler loop that consumes completed frames. Initial behavior: - wake on completion, completed-frame publish, and periodic safety timer - top up actual DeckLink buffer to target - schedule only completed system-memory frames - do not render or black-fill during normal steady state Exit criteria: - producer and DeckLink scheduler are separate loops - one component owns schedule time ### Step 6: Remove Synthetic Catch-Up From Steady State Disable catch-up frame skipping for proactive mode. Replacement: - render cadence may skip render ticks if the renderer is late - completed queue may drop oldest or newest according to explicit policy - DeckLink schedule time remains continuous Exit criteria: - scheduled stream time advances one frame per scheduled frame unless emergency recovery is explicitly enabled ### Step 7: Prioritize Output Render Work Reduce render-thread interference. Deliverables: - output render commands outrank preview present - preview skipped/deferred count is visible - input upload timing is measured separately - screenshot/readback cannot block output cadence unless explicitly requested Exit criteria: - focus changes and preview present do not drain playout buffer ### Step 8: Tune Thread Priority And Wait Strategy Only after ownership is separated, tune scheduling. Deliverables: - set render cadence and DeckLink scheduler threads to appropriate Windows priorities - avoid busy spinning - use waitable timers or high-resolution waits where useful - record wake jitter Exit criteria: - cadence jitter is measurable and bounded ## Telemetry Add or clarify: - `renderCadence.targetFps` - `renderCadence.frameIndex` - `renderCadence.lateMs` - `renderCadence.maxLateMs` - `renderCadence.skippedTicks` - `completedFrames.depth` - `completedFrames.capacity` - `completedFrames.underruns` - `systemMemory.free` - `systemMemory.rendering` - `systemMemory.completed` - `systemMemory.scheduled` - `decklink.actualBufferedFrames` - `decklink.targetBufferedFrames` - `decklink.scheduleCallMs` - `decklink.scheduleFailures` - `decklink.completionIntervalMs` - `decklink.lateFrames` - `decklink.droppedFrames` - `scheduler.syntheticLeadFrames` ## Risks - A cadence thread can render frames that DeckLink later drops if scheduling is wrong. - Too much buffering adds latency. - Too little buffering exposes Windows scheduling jitter. - If output render and input upload still share one GL thread, render cadence can still be disturbed by uploads. - Actual DeckLink buffer telemetry may differ from app-owned scheduled-slot counts. ## Exit Criteria Phase 7.7 is complete when: - output rendering is driven by a render cadence controller - DeckLink completion callbacks do not render - DeckLink scheduling is owned by a scheduler/top-up loop - system-memory completed frames are the only contract between render and DeckLink scheduling - real DeckLink buffered-frame count is visible - synthetic schedule lead no longer drives normal recovery - black fallback is startup/emergency only - playback can be tested with 4-frame and larger buffers without changing ownership logic