Files
video-shader-toys/docs/PHASE_7_7_RENDER_CADENCE_PLAYOUT_DESIGN.md
Aiden f1f4e3421b
All checks were successful
CI / React UI Build (push) Successful in 10s
CI / Native Windows Build And Tests (push) Successful in 2m53s
CI / Windows Release Package (push) Successful in 3m6s
Frame timing
2026-05-12 01:08:32 +10:00

15 KiB

Phase 7.7: Render Cadence And Playout Separation Design

Status

In progress.

Implemented so far:

  • real DeckLink buffered-frame telemetry is exposed separately from synthetic scheduler lead
  • pure RenderCadenceController exists with non-GL tests
  • SystemOutputFramePool now exposes the Phase 7.7 state vocabulary: Free, Rendering, Completed, Scheduled
  • the output producer now uses RenderCadenceController to render one output frame per cadence tick
  • DeckLink scheduling remains a separate top-up pass capped by the configured preroll target

Phase 7.5 and 7.6 proved useful pieces individually:

  • BGRA8 pack/readback can be fast enough on the current test machine.
  • System-memory frame slots can be wrapped for DeckLink scheduling.
  • A producer can keep frames ready and keep a small scheduled buffer filled.

But the experiments also showed that the current hybrid ownership model is fragile:

  • completion-driven rendering caused app-ready starvation
  • completion-time black fallback caused visible black flicker
  • producer-side scheduling without a cadence target overfed the schedule timeline
  • capping scheduled count helped, but completion and producer scheduling fought each other
  • making completion passive exposed startup and scheduling-trigger gaps
  • late/drop catch-up skipping created smooth/freeze/smooth cadence

The lesson is that the app needs a larger architectural split, not more local recovery branches.

Goal

Make the output path behave like two cooperating real-time systems:

Render cadence thread
  renders at the selected output cadence, for example 59.94 fps
  writes completed frames into system-memory slots

DeckLink playout scheduler
  keeps the device scheduled buffer topped up
  consumes completed system-memory frames
  never asks rendering to happen synchronously

The system-memory frame buffer becomes the contract between render timing and device timing.

Core principle:

  • The render cadence should be stable and boring.
  • If the selected output mode is 59.94 fps, the render producer should attempt to render at 59.94 fps.
  • It should not speed up just because the DeckLink buffer is empty.
  • It should not slow down because DeckLink is full or because completed frames have not drained.
  • Completed-but-unscheduled frames are a latest-N cache. Old completed frames may be dropped/recycled to keep rendering at cadence.
  • Scheduled frames are protected until DeckLink completes them.
  • The only normal reason for the render cadence to deviate is that rendering/GPU work itself overruns the frame budget.

Non-Goals

  • Do not hide failure by repeating frames as the primary strategy.
  • Do not make DeckLink completion callbacks render frames.
  • Do not use synthetic schedule-index catch-up as normal recovery.
  • Do not change shader semantics or live-state semantics.
  • Do not require v210/YUV packing in the first implementation.
  • Do not pursue DVP/pinned-memory fast transfer as the main path on unsupported hardware.

Target Architecture

Current Problem Shape

The current Phase 7.5/7.6 implementation still has too many timing authorities:

  • DeckLink completion callbacks release frames and influence scheduling
  • the producer renders based on queue pressure
  • the producer also schedules some frames
  • VideoPlayoutScheduler advances synthetic stream-time indexes
  • fallback behavior can schedule black when the app-ready queue is briefly empty

That means the system can be full and still look wrong, because "full" is not tied to one clear cadence owner.

Target Shape

Startup / warmup
  render cadence starts first
  render thread produces warmup frames at the selected cadence
  completed system-memory queue reaches warmup target
  DeckLink preroll is scheduled from completed frames
  DeckLink playback starts with a filled buffer

Steady state
RenderCadenceController
  owns output frame tick: frame 0, 1, 2...
  owns render target time
  asks RenderEngine to render frame N
  publishes completed frame N into PlayoutFrameStore

PlayoutFrameStore
  owns free / rendering / completed / scheduled slots
  tracks frame number, render time, completion time, and schedule state
  exposes latest completed frames to DeckLink scheduler
  may drop/recycle oldest unscheduled completed frames when render cadence needs space

DeckLinkPlayoutScheduler
  owns DeckLink schedule time
  tops up device buffered frames to target depth
  consumes completed frames only
  releases scheduled slots on completion callbacks

DeckLink completion callback
  releases completed slots
  records result and device timing
  wakes scheduler
  does not render

Cadence Model

The render side should be time-driven, not completion-driven.

For a 59.94 fps mode:

frameDuration = 1001 / 60000 seconds
nextRenderTime = now

loop:
  wait until nextRenderTime, or run immediately if behind
  render frameIndex for nextRenderTime
  read back into free system-memory slot
  publish completed slot
  frameIndex += 1
  nextRenderTime += frameDuration

Rules:

  • If the render thread is early, it waits/yields.
  • If it is slightly late, it renders the next frame immediately and records lateness.
  • If it is badly late because render/GPU work overran the frame budget, policy may skip render ticks before rendering the newest frame.
  • Skipping render ticks is an overrun policy, not a buffer-fill strategy.
  • DeckLink schedule time should remain continuous unless a deliberate device recovery policy says otherwise.

Non-rule:

  • The render producer must not render faster than the selected cadence to refill DeckLink.
  • DeckLink should start only after warmup/preroll has filled enough completed frames.
  • If the DeckLink buffer drains in steady state, that is a real timing failure to measure, not a signal for the render thread to sprint.

Buffer Model

Use a fixed system-memory slot pool.

The completed portion of the pool is not a strict consume-before-render queue. It is a latest-N rendered-frame cache:

  • render cadence writes one frame per selected output tick
  • if completed-but-unscheduled frames are full, the oldest completed frame is disposable
  • DeckLink scheduling consumes from the completed cache when it needs frames
  • frames already scheduled to DeckLink are never recycled until completion
  • if all slots are scheduled/in flight, cadence may miss because there is genuinely no safe system-memory target

Suggested starting values:

  • completed-frame target: 2-4 frames
  • DeckLink scheduled target: 4 frames for experiments
  • total system slots: scheduled target + completed target + rendering spare + safety spare

For example:

scheduled target: 4
completed target: 3
rendering/spare: 2
total slots: 9

Slot states:

  • Free
  • Rendering
  • Completed
  • Scheduled

Each slot should carry:

  • frame index
  • render target timestamp
  • render completion timestamp
  • pixel format
  • row bytes and size
  • schedule timestamp/index when scheduled
  • completion result when released

Scheduling Model

The DeckLink scheduler should top up to a target device depth.

on scheduler wake:
  while actualDeckLinkBufferedFrames < targetScheduledFrames:
    frame = completedStore.popOldestCompleted()
    if no frame:
      record completed-frame underrun
      break
    schedule frame at next continuous DeckLink stream time

Important:

  • Use DeckLink GetBufferedVideoFrameCount() where available.
  • Keep synthetic scheduled/completed indexes as diagnostics only.
  • Do not infer device buffer depth from mScheduledFrameIndex - mCompletedFrameIndex.
  • Do not schedule black because the app completed queue is momentarily empty while the device still has frames buffered.
  • Use black only before the first valid frame or in explicit emergency fallback.

Thread Ownership

Render Cadence Thread

Owns:

  • render tick timing
  • acquiring a free system-memory slot
  • requesting render-thread output render/readback
  • publishing completed frames

Does not own:

  • DeckLink schedule time
  • completion callback processing
  • fallback black scheduling

RenderEngine Render Thread

Owns:

  • GL context
  • input upload
  • shader rendering
  • output packing/readback
  • preview present when allowed

Output render work should have priority over preview/screenshot work.

Owns:

  • schedule top-up policy
  • DeckLink ScheduleVideoFrame
  • device buffered-frame telemetry
  • consuming completed frames

Does not own:

  • rendering a missing frame
  • running live-state composition directly

Completion Callback / Worker

Owns:

  • releasing scheduled system slots
  • recording completion result
  • waking scheduler and render cadence loops

Does not own:

  • rendering
  • scheduling fallback black during normal steady state

What Happens Under Stress

Render Is Temporarily Late

  • Completed-frame queue drains.
  • DeckLink scheduled buffer drains.
  • Telemetry shows render lateness and completed queue depth drop.
  • If render catches up before device buffer reaches zero, output remains smooth.

Render Cannot Sustain Cadence

  • Completed-frame queue stays low.
  • DeckLink buffer trends down.
  • Late/drop telemetry increases.
  • Policy may choose to skip render ticks, lower preview load, or enter degraded state.
  • Scheduler tops up based on actual device buffered count.
  • Render cadence continues independently.
  • System-memory buffer absorbs short mismatch.

UI Loses Focus

  • Render cadence should continue.
  • Preview present may be disabled or deprioritized.
  • Output/render threads may need elevated priority.
  • Device buffer telemetry should reveal whether Windows focus changes affect render cadence or only preview.

Migration Plan

Before more scheduling changes, measure the real device buffer.

Deliverables:

  • call DeckLink GetBufferedVideoFrameCount() after schedule/completion where available
  • expose actualDeckLinkBufferedFrames
  • keep scheduledLeadFrames but label it synthetic/internal
  • record schedule-call duration and failures

Exit criteria:

  • runtime telemetry distinguishes app completed queue, system scheduled slots, synthetic lead, and actual DeckLink buffer depth

Step 2: Rename Existing Queues To Match Their Roles

Clarify vocabulary before rewriting behavior.

Deliverables:

  • rename or document RenderOutputQueue as completed/unscheduled frame queue
  • distinguish completed-frame depth from device scheduled depth
  • update telemetry labels where possible

Exit criteria:

  • logs no longer imply readyQueue.depth == 0 means DeckLink starvation

Step 3: Introduce RenderCadenceController

Add a pure timing helper first.

Responsibilities:

  • compute next render tick
  • track frame duration
  • report early/late/drift
  • decide whether to render, wait, or skip render ticks

Tests:

  • exact cadence advances
  • late ticks are measured
  • large lateness can skip according to policy
  • no dependency on GL or DeckLink

Step 4: Move Output Production To Cadence Ticks

Replace queue-pressure-only production with cadence-driven production.

Initial behavior:

  • render at selected output cadence
  • produce into system-memory slots
  • publish completed frames
  • recycle/drop oldest unscheduled completed frames when cadence needs a slot
  • only wait when every safe slot is scheduled/in flight

Exit criteria:

  • output rendering continues without DeckLink completions
  • output rendering does not schedule DeckLink directly
  • completed-frame buffering behaves as latest-N, not consume-before-render

DeckLink output should not start consuming before the render cadence has prepared an initial cushion.

Initial behavior:

  • configure DeckLink output without starting scheduled playback
  • start the render cadence producer
  • render warmup frames at the selected cadence, not faster
  • wait until completed-frame depth reaches targetWarmupFrames
  • schedule those completed frames as DeckLink preroll
  • call StartScheduledPlayback()

Exit criteria:

  • startup does not require the render producer to catch up by rendering faster than cadence
  • DeckLink begins playback with a real completed-frame buffer
  • if warmup cannot fill within a bounded timeout, startup enters degraded state with telemetry

Create a scheduler loop that consumes completed frames.

Initial behavior:

  • wake on completion, completed-frame publish, and periodic safety timer
  • top up actual DeckLink buffer to target
  • schedule only completed system-memory frames
  • do not render or black-fill during normal steady state

Exit criteria:

  • producer and DeckLink scheduler are separate loops
  • one component owns schedule time

Step 6: Remove Synthetic Catch-Up From Steady State

Disable catch-up frame skipping for proactive mode.

Replacement:

  • render cadence may skip render ticks if the renderer is late
  • completed queue may drop oldest or newest according to explicit policy
  • DeckLink schedule time remains continuous

Exit criteria:

  • scheduled stream time advances one frame per scheduled frame unless emergency recovery is explicitly enabled

Step 7: Prioritize Output Render Work

Reduce render-thread interference.

Deliverables:

  • output render commands outrank preview present
  • preview skipped/deferred count is visible
  • input upload timing is measured separately
  • screenshot/readback cannot block output cadence unless explicitly requested

Exit criteria:

  • focus changes and preview present do not drain playout buffer

Step 8: Tune Thread Priority And Wait Strategy

Only after ownership is separated, tune scheduling.

Deliverables:

  • set render cadence and DeckLink scheduler threads to appropriate Windows priorities
  • avoid busy spinning
  • use waitable timers or high-resolution waits where useful
  • record wake jitter

Exit criteria:

  • cadence jitter is measurable and bounded

Telemetry

Add or clarify:

  • renderCadence.targetFps
  • renderCadence.frameIndex
  • renderCadence.lateMs
  • renderCadence.maxLateMs
  • renderCadence.skippedTicks
  • completedFrames.depth
  • completedFrames.capacity
  • completedFrames.underruns
  • systemMemory.free
  • systemMemory.rendering
  • systemMemory.completed
  • systemMemory.scheduled
  • decklink.actualBufferedFrames
  • decklink.targetBufferedFrames
  • decklink.scheduleCallMs
  • decklink.scheduleFailures
  • decklink.completionIntervalMs
  • decklink.lateFrames
  • decklink.droppedFrames
  • scheduler.syntheticLeadFrames

Risks

  • A cadence thread can render frames that DeckLink later drops if scheduling is wrong.
  • Too much buffering adds latency.
  • Too little buffering exposes Windows scheduling jitter.
  • If output render and input upload still share one GL thread, render cadence can still be disturbed by uploads.
  • Actual DeckLink buffer telemetry may differ from app-owned scheduled-slot counts.

Exit Criteria

Phase 7.7 is complete when:

  • output rendering is driven by a render cadence controller
  • DeckLink completion callbacks do not render
  • DeckLink scheduling is owned by a scheduler/top-up loop
  • system-memory completed frames are the only contract between render and DeckLink scheduling
  • real DeckLink buffered-frame count is visible
  • synthetic schedule lead no longer drives normal recovery
  • black fallback is startup/emergency only
  • playback can be tested with 4-frame and larger buffers without changing ownership logic