From 3ffb562ff7502a0f364b6f6fb79b258a6d7aa029 Mon Sep 17 00:00:00 2001 From: Aiden <68633820+awils27@users.noreply.github.com> Date: Wed, 13 May 2026 01:06:20 +1000 Subject: [PATCH] docs update --- apps/RenderCadenceCompositor/README.md | 22 ++++++++++++----- docs/CURRENT_SYSTEM_ARCHITECTURE.md | 13 ++++++---- docs/DECKLINK_OPENGL_LESSONS_LEARNED.md | 33 +++++++++++++++++++------ docs/NEW_RENDER_CADENCE_APP_PLAN.md | 10 +++++--- docs/RENDER_CADENCE_GOLDEN_RULES.md | 10 ++++++-- docs/RENDER_THREAD_OWNERSHIP_PLAN.md | 2 +- 6 files changed, 65 insertions(+), 25 deletions(-) diff --git a/apps/RenderCadenceCompositor/README.md b/apps/RenderCadenceCompositor/README.md index ba251d4..ad04edc 100644 --- a/apps/RenderCadenceCompositor/README.md +++ b/apps/RenderCadenceCompositor/README.md @@ -26,7 +26,7 @@ InputFrameMailbox SystemFrameExchange owns Free / Rendering / Completed / Scheduled slots - preserves completed output frames once they are waiting for playout + preserves completed output frames as a bounded FIFO reserve once they are waiting for playout protects scheduled frames until DeckLink completion DeckLinkOutputThread @@ -52,8 +52,9 @@ Included now: - bounded three-frame input warmup before render cadence starts - render-thread-owned input texture upload - async PBO readback -- latest-N system-memory frame exchange +- bounded FIFO system-memory frame exchange - bounded completed-frame output preroll reserve before DeckLink playback, with DeckLink scheduled depth still targeted at four +- conservative DeckLink schedule-lead telemetry and recovery - background Slang compile of `shaders/happy-accident` - app-owned display/render layer model for shader build readiness - app-owned submission of a completed shader artifact @@ -266,7 +267,7 @@ The app samples telemetry once per second. Normal cadence samples are available through `GET /api/state` and are not printed to the console. The telemetry monitor only logs health events: -- warning when DeckLink late/dropped-frame counters increase +- warning when DeckLink late/dropped-frame counters increase, including schedule lead and recovery count - warning when schedule failures increase - error when the app/DeckLink output buffer is starved @@ -285,8 +286,14 @@ Input telemetry: - `renderFrameMaxMs`: maximum observed render-thread draw duration for this process - `readbackQueueMs`: time spent queueing the most recent async BGRA8 PBO readback - `completedReadbackCopyMs`: time spent mapping/copying the most recent completed readback into system-memory frame storage -- `completedDrops`: completed unscheduled system-memory frames dropped; expected to stay flat in the cadence compositor output path +- `completedDrops`: oldest completed unscheduled system-memory frames dropped because the bounded completed reserve overflowed; this is an app-side reserve drop, not a DeckLink dropped-frame report - `acquireMisses`: times render/readback could not acquire a writable system-memory frame slot; completed frames waiting for playout are preserved instead of being displaced +- `deckLinkScheduleLeadAvailable`: whether DeckLink schedule-lead telemetry was available for the latest sample +- `deckLinkScheduleLeadFrames`: estimated distance between the DeckLink playback cursor and the next scheduled stream time +- `deckLinkPlaybackFrameIndex`: latest sampled DeckLink playback frame index +- `deckLinkNextScheduleFrameIndex`: next stream frame index the app intends to schedule +- `deckLinkPlaybackStreamTime`: latest sampled DeckLink playback stream time in DeckLink time units +- `deckLinkScheduleRealignments`: conservative schedule-cursor recoveries triggered by late/drop pressure or dangerously low lead - `inputConsumeMisses`: render ticks where no ready input frame was available to upload - `inputUploadMisses`: input texture upload attempts that reused the previous GL input texture - `inputReadyFrames`: ready input frames currently queued in `InputFrameMailbox` @@ -303,7 +310,7 @@ Input telemetry: - `inputUnsupportedFrames`: input frames rejected before mailbox submission - `inputSubmitMisses`: input frames that could not be submitted to the mailbox -Runtime shaders continue rendering when input is missing. If no mailbox frame has been uploaded yet, shader samplers use the runtime fallback source texture; once DeckLink input is flowing, shaders such as CRT and trigger-ripple sample the real/latest input through `gVideoInput`. +Runtime shaders continue rendering when input is missing. If no mailbox frame has been uploaded yet, shader samplers use the runtime fallback source texture; once DeckLink input is flowing, shaders such as CRT and trigger-ripple sample the current input through `gVideoInput`. Healthy first-run signs: @@ -312,6 +319,8 @@ Healthy first-run signs: - `scheduleFps` is close to the selected cadence after warmup - `scheduled` stays near 4 - `decklinkBuffered` stays near 4 when available +- `deckLinkScheduleLeadFrames` remains positive and stable when available +- `deckLinkScheduleRealignments` does not increase continuously - `late` and `dropped` do not increase continuously - `scheduleFailures` does not increase - `shaderCommitted` becomes `1` after the background Happy Accident compile completes @@ -386,6 +395,7 @@ Read: - render cadence and DeckLink schedule cadence both held roughly 60 fps - app scheduled depth stayed at 4 - actual DeckLink buffered depth stayed at 4 +- DeckLink schedule lead remained positive during healthy playback - no late frames, dropped frames, or schedule failures were observed - completed poll misses were benign because playout remained fully fed @@ -406,7 +416,7 @@ This app keeps the same core behavior but splits it into modules that can grow: - `platform/`: COM/Win32/hidden GL context support - `render/`: cadence thread, clock, and simple renderer - `frames/InputFrameMailbox`: non-blocking bounded FIFO CPU input handoff with contiguous-copy fast path for matching row strides -- `render/InputFrameTexture`: render-thread-owned upload of the latest CPU input frame into GL, including raw UYVY8 decode into the shader-visible input texture +- `render/InputFrameTexture`: render-thread-owned upload of the currently acquired CPU input frame into GL, including raw UYVY8 decode into the shader-visible input texture - `render/readback/`: PBO-backed BGRA8 readback and completed-frame publication - `render/runtime/RuntimeRenderScene`: render-thread-owned GL scene for ready runtime shader layers - `render/runtime/RuntimeShaderPrepareWorker`: shared-context runtime shader program compile/link worker diff --git a/docs/CURRENT_SYSTEM_ARCHITECTURE.md b/docs/CURRENT_SYSTEM_ARCHITECTURE.md index cdc460b..2e1af8f 100644 --- a/docs/CURRENT_SYSTEM_ARCHITECTURE.md +++ b/docs/CURRENT_SYSTEM_ARCHITECTURE.md @@ -10,7 +10,10 @@ The active plan for tightening render-thread ownership is: The plan for building a fresh modular app around the proven probe architecture is: -- [New Render Cadence App Plan](NEW_RENDER_CADENCE_APP_PLAN.md) +- [RenderCadenceCompositor README](../apps/RenderCadenceCompositor/README.md) +- [Render Cadence Golden Rules](RENDER_CADENCE_GOLDEN_RULES.md) + +`NEW_RENDER_CADENCE_APP_PLAN.md` remains as historical planning context, but the README and golden rules are the current contract for the new cadence-first app. ## Application Shape @@ -287,7 +290,7 @@ Slots have four states: - `Completed` - `Scheduled` -Completed-but-unscheduled frames are treated as a latest-N cache. If render cadence needs space and old completed frames have not been scheduled, the oldest unscheduled completed frame can be recycled. +In the current legacy app, completed-but-unscheduled frames are treated as a latest-N cache. The newer `RenderCadenceCompositor` uses a bounded FIFO completed reserve instead; see its README for the cadence-first contract. Scheduled frames are protected until DeckLink reports completion. @@ -295,7 +298,7 @@ Scheduled frames are protected until DeckLink reports completion. `RenderOutputQueue` holds completed unscheduled output frames waiting to be scheduled. -It is bounded and latest-N: +In the legacy app it is bounded and latest-N: - pushing beyond capacity releases/drops the oldest ready frame - `DropOldestFrame()` is used when the frame pool needs to recycle old completed work @@ -363,7 +366,7 @@ The probe does not use the main runtime, shader system, preview path, input uplo - one OpenGL render thread with its own hidden GL context - simple BGRA8 motion rendering - async PBO readback -- latest-N system-memory frame slots +- legacy latest-N system-memory frame slots; bounded FIFO completed reserve in `RenderCadenceCompositor` - a playout thread that feeds DeckLink - real rendered warmup before scheduled playback @@ -531,7 +534,7 @@ When `VST_DISABLE_INPUT_CAPTURE=1`, this flow is skipped. - Keep one owner for each kind of state. - Keep GL work on the render thread. - Keep DeckLink completion callbacks passive. -- Treat completed unscheduled output frames as latest-N cache entries. +- In the legacy app, treat completed unscheduled output frames as latest-N cache entries; in `RenderCadenceCompositor`, preserve completed frames as a bounded FIFO reserve. - Protect scheduled output frames until DeckLink completion. - Keep output timing more important than preview/screenshot. - Measure timing by domain instead of adding fallback branches blindly. diff --git a/docs/DECKLINK_OPENGL_LESSONS_LEARNED.md b/docs/DECKLINK_OPENGL_LESSONS_LEARNED.md index f855d38..7135e1c 100644 --- a/docs/DECKLINK_OPENGL_LESSONS_LEARNED.md +++ b/docs/DECKLINK_OPENGL_LESSONS_LEARNED.md @@ -115,6 +115,24 @@ Lesson: - keep synthetic counters only as diagnostics - do not infer device health from internal stream indexes alone +### Schedule Cursor Recovery Must Be Conservative + +The DeckLink schedule cursor should normally advance as a continuous stream timeline. Continuously realigning the next scheduled stream time to the sampled playback cursor can create its own timing fault: output may look like low FPS even when render and scheduling counters average 59.94/60 fps. + +What worked better: + +- use the exact DeckLink frame duration for the render cadence +- keep healthy scheduling on a continuous stream cursor +- measure schedule lead from DeckLink playback time versus the next schedule time +- realign only after real pressure, such as a late/drop report or dangerously low measured lead +- re-arm proactive realignment only after lead has recovered + +Lesson: + +- schedule recovery is an output-edge safety valve, not a per-frame timing policy +- if recovery increments continuously, the recovery path has become the problem +- include schedule lead and realignment count in telemetry/logs so drift is visible before guessing + ### More Buffer Is Not Automatically Smoother Increasing DeckLink scheduled frames sometimes made the reported device buffer look healthier while visible motion still stuttered. @@ -196,7 +214,7 @@ Lesson: - system-memory slots are the contract between render and playout - scheduled slots must not be recycled early -- completed-but-unscheduled slots can be latest-N cache entries +- completed-but-unscheduled slots should form a bounded FIFO reserve for playout ### Startup Needs Real Preroll @@ -222,18 +240,18 @@ Lesson: The app has at least two important frame stores: -- system-memory completed/latest-N frames +- system-memory completed FIFO reserve frames - DeckLink scheduled/device buffer They have different ownership rules. -Completed-but-unscheduled frames are disposable if a newer frame is available and cadence needs the slot. +Completed-but-unscheduled frames should be a bounded FIFO reserve for playout. If that reserve overflows, dropping the oldest completed frame is an app-side reserve policy and should be counted separately from DeckLink dropped frames. Scheduled frames are not disposable because DeckLink may still read them. Lesson: -- latest-N completed frames are a cache +- completed frames waiting for playout are a bounded FIFO reserve - scheduled frames are owned by DeckLink until completion - keep metrics for both @@ -246,7 +264,8 @@ That couples the clocks again. Lesson: - render cadence should keep rendering at selected cadence -- if completed cache is full, recycle/drop the oldest unscheduled completed frame +- render acquire should not evict completed frames that are waiting for playout +- if the completed reserve overflows, drop/count the oldest unscheduled completed frame - only scheduled/in-flight saturation should prevent rendering to a safe slot ## Render Thread Lessons @@ -340,7 +359,7 @@ The current direction is still sound: ```text Render cadence loop renders at selected output cadence - writes latest-N completed system-memory frames + writes completed system-memory frames into a bounded FIFO reserve never sprints to refill DeckLink Frame store @@ -387,7 +406,7 @@ A full rewrite becomes attractive only if the current GL ownership model cannot - Render cadence is time-driven, not completion-driven. - DeckLink scheduling is device-buffer-driven, not render-driven. - Completion callbacks release and report; they do not render. -- System-memory completed frames are latest-N cache entries. +- System-memory completed frames are a bounded FIFO reserve. - Scheduled frames are protected until DeckLink completion. - Startup uses real rendered warmup/preroll. - Black fallback is degraded/error behavior, not steady-state behavior. diff --git a/docs/NEW_RENDER_CADENCE_APP_PLAN.md b/docs/NEW_RENDER_CADENCE_APP_PLAN.md index 2efbd9d..4168e19 100644 --- a/docs/NEW_RENDER_CADENCE_APP_PLAN.md +++ b/docs/NEW_RENDER_CADENCE_APP_PLAN.md @@ -1,5 +1,7 @@ # New Render Cadence App Plan +Status: historical implementation plan. `apps/RenderCadenceCompositor` now exists; use [apps/RenderCadenceCompositor/README.md](../apps/RenderCadenceCompositor/README.md) and [Render Cadence Golden Rules](RENDER_CADENCE_GOLDEN_RULES.md) as the current implementation contract. + This plan describes a new application folder that rebuilds the output path from the proven `DeckLinkRenderCadenceProbe` architecture, but as a maintainable app foundation rather than a monolithic probe file. The first goal is not to port the current compositor feature set. The first goal is to reproduce the probe's smooth 59.94/60 fps DeckLink output with clean module boundaries, tests where possible, and a structure that can later accept the shader/runtime/control systems without compromising timing. @@ -43,7 +45,7 @@ Render cadence thread System frame exchange -> owns Free / Rendering / Completed / Scheduled slots - -> latest-N semantics for completed unscheduled frames + -> bounded FIFO reserve for completed unscheduled frames -> protects scheduled frames until DeckLink completion DeckLink output thread @@ -63,7 +65,7 @@ Everything else must fit around that spine. - Completion callbacks never render. - No synchronous render request exists in the output path. - Preview, screenshot, input upload, shader rebuild, and runtime control cannot run ahead of a due output frame. -- Completed unscheduled frames are latest-N and disposable. +- Completed unscheduled frames are a bounded FIFO reserve; overflow drops are counted separately from DeckLink drops. - Scheduled frames are protected until DeckLink completion. - Startup warms up real rendered frames before scheduled playback starts. @@ -77,7 +79,7 @@ Keep these behaviors from `DeckLinkRenderCadenceProbe`: - PBO ring readback - non-blocking fence polling with zero timeout - system-memory slots with `Free`, `Rendering`, `Completed`, `Scheduled` -- drop oldest completed unscheduled frame if render needs space +- preserve completed frames waiting for playout; drop/count the oldest completed frame only if the bounded reserve overflows - DeckLink playout thread only schedules completed frames - warmup completed frames before `StartScheduledPlayback()` - one-line-per-second timing telemetry @@ -430,7 +432,7 @@ Feature set: - simple motion renderer - BGRA8 only - PBO async readback -- latest-N system-memory frame exchange +- bounded FIFO system-memory frame exchange - warmup before playback - one-line telemetry diff --git a/docs/RENDER_CADENCE_GOLDEN_RULES.md b/docs/RENDER_CADENCE_GOLDEN_RULES.md index 768b17f..6e55e22 100644 --- a/docs/RENDER_CADENCE_GOLDEN_RULES.md +++ b/docs/RENDER_CADENCE_GOLDEN_RULES.md @@ -48,6 +48,7 @@ The output/scheduling side may: - release frames after DeckLink completion - report late/dropped/schedule telemetry - record app-side poll misses +- conservatively realign the DeckLink schedule cursor after measured timing pressure It must not: @@ -55,6 +56,7 @@ It must not: - invoke GL - compile shaders - block the render cadence waiting for DeckLink +- continuously rewrite healthy scheduled timestamps If no completed frame is available, record the miss and keep the ownership boundary intact. @@ -93,9 +95,11 @@ Short mutex use for exchanging small already-prepared objects is acceptable. Hol ## 6. System Memory Frames Are A Handoff, Not A Render Driver -The system-memory frame exchange stores the latest rendered frames and protects frames scheduled to DeckLink. +The system-memory frame exchange stores completed frames as a bounded FIFO reserve and protects frames scheduled to DeckLink. -It may drop old completed, unscheduled frames when the render thread needs a free slot. It must never force the render thread to wait for the output side to consume a frame. +Render acquire must not evict completed frames that are waiting for playout, and it must never force the render thread to wait for the output side to consume a frame. + +If the completed reserve overflows, the exchange may drop the oldest completed, unscheduled frame and record `completedDrops`. That is an app-side reserve drop, not a DeckLink dropped frame. ## 7. Startup Uses Warmup, Not Burst Rendering @@ -114,6 +118,8 @@ Good examples: - `completedPollMisses` - `scheduleFailures` - `decklinkBuffered` +- `deckLinkScheduleLeadFrames` +- `deckLinkScheduleRealignments` - `inputCaptureFps` - `inputSubmitMs` - `inputUploadMs` diff --git a/docs/RENDER_THREAD_OWNERSHIP_PLAN.md b/docs/RENDER_THREAD_OWNERSHIP_PLAN.md index 6298124..7e6a6b2 100644 --- a/docs/RENDER_THREAD_OWNERSHIP_PLAN.md +++ b/docs/RENDER_THREAD_OWNERSHIP_PLAN.md @@ -98,7 +98,7 @@ render cadence thread -> samples latest render input/state -> renders one frame -> queues async readback/copies completed readback into system-memory slot - -> publishes completed frame to latest-N output buffer + -> publishes completed frame to bounded FIFO output reserve video output thread -> consumes completed system-memory frames