Preroll udpate

2026-05-10 22:30:47 +10:00
parent c8a4bd4c7b
commit c38c22834d
7 changed files with 198 additions and 7 deletions
--- a/apps/LoopThroughWithOpenGLCompositing/gl/renderer/GlRenderConstants.h
+++ b/apps/LoopThroughWithOpenGLCompositing/gl/renderer/GlRenderConstants.h
@@ -7,4 +7,4 @@ constexpr GLuint kDecodedVideoTextureUnit = 1;
 constexpr GLuint kSourceHistoryTextureUnitBase = 2;
 constexpr GLuint kPackedVideoTextureUnit = 2;
 constexpr GLuint kGlobalParamsBindingPoint = 0;
-constexpr unsigned kPrerollFrameCount = 8;
+constexpr unsigned kPrerollFrameCount = 12;
--- a/docs/ARCHITECTURE_RESILIENCE_REVIEW.md
+++ b/docs/ARCHITECTURE_RESILIENCE_REVIEW.md
@@ -2,6 +2,29 @@
 This note summarizes the main architectural improvements that would make the app more resilient during live use, especially around timing isolation, failure isolation, and recoverability.
 Phase checklist:
 - [ ] Define subsystem boundaries and target architecture
 - [ ] Introduce an internal event model
 - [ ] Split `RuntimeHost`
 - [ ] Make the render thread the sole GL owner
 - [ ] Refactor live state layering into an explicit composition model
 - [ ] Move persistence onto a background snapshot writer
 - [ ] Make DeckLink/backend lifecycle explicit with a state machine
 - [ ] Add structured health, telemetry, and operational reporting
 ## Timing Review
 The recent OSC work removed several control-path stalls, but the app still has a few deeper timing characteristics that matter for live resilience:
 - output playout is still effectively render-on-demand from the DeckLink completion callback
 - output buffering and preroll are now larger, but the buffering model is still static and only loosely related to actual render cost
 - GPU readback is partly asynchronous, but the fallback path still returns to synchronous readback on any miss
 - preview presentation is still tied to the playout render path
 - background service timing still relies on coarse polling sleeps
 Those points are important because they affect not just average performance, but how the app behaves under brief spikes, device jitter, or load bursts.
 ## Key Findings
 ### 1. `RuntimeHost` is carrying too many responsibilities
@@ -133,6 +156,97 @@ Recommended direction:
 - centralize recovery behavior
 - make shutdown ordering and degraded-mode behavior more predictable
 Timing-specific additions:
 - separate "device callback received" from "render the next output frame" so output cadence is not driven directly by the completion callback thread
 - make playout headroom configurable and adaptive instead of using a fixed compile-time preroll count
 - track an explicit backend health state such as `running-steady`, `catching-up`, `late`, and `dropping`
 Relevant timing code:
 - [OpenGLVideoIOBridge.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/pipeline/OpenGLVideoIOBridge.cpp:86)
 - [DeckLinkSession.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/videoio/decklink/DeckLinkSession.cpp:420)
 - [DeckLinkSession.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/videoio/decklink/DeckLinkSession.cpp:487)
 - [VideoPlayoutScheduler.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/videoio/VideoPlayoutScheduler.cpp:26)
 Why this matters:
 - `PlayoutFrameCompleted()` currently begins an output frame, takes the shared GL path, renders, reads back, and schedules the next frame in one callback-driven flow.
 - `VideoPlayoutScheduler::AccountForCompletionResult()` currently reacts to both late and dropped frames by blindly advancing the schedule index by `2`, which is simple but not especially robust.
 - `kPrerollFrameCount` is now `12`, but `DeckLinkSession::ConfigureOutput()` still creates a fixed pool of `10` mutable output frames. That mismatch suggests the buffering model is not being sized from one coherent source of truth.
 Recommended direction:
 - move playout to a producer/consumer model where a render worker fills output buffers ahead of the DeckLink callback
 - define buffer-pool sizing from one policy object, for example: preroll depth, minimum spare buffers, and allowed catch-up depth
 - replace fixed "skip two frames" recovery with measured lag accounting based on actual scheduled-versus-completed position
 - expose playout latency as a runtime setting or policy, rather than burying it in a constant
 ### 6a. The current playout timing model is still callback-coupled
 The app now has more headroom, but the next output frame is still produced directly in the scheduled-frame completion callback path.
 Relevant code:
 - [OpenGLVideoIOBridge.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/pipeline/OpenGLVideoIOBridge.cpp:86)
 - [DeckLinkFrameTransfer.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/videoio/decklink/DeckLinkFrameTransfer.cpp:53)
 That means the completion callback is currently responsible for:
 - frame pacing accounting
 - acquiring the next output buffer
 - taking the GL critical section
 - rendering the composite
 - performing output readback
 - scheduling the next frame
 This works when the app is comfortably within budget, but it makes deadline misses much harder to absorb gracefully.
 Recommended direction:
 - make the DeckLink callback a lightweight notifier
 - have a dedicated playout worker or render worker keep an ahead-of-time queue of ready output frames
 - treat callback time as control-plane time, not render time
 ### 6b. A producer/consumer playout model would be a better long-term fit
 The stronger architecture for this app is:
 - a render scheduler or dedicated render thread runs at the configured video cadence
 - rendering produces completed output frames ahead of need
 - those frames are placed into a bounded queue or ring buffer
 - the DeckLink side consumes already-prepared frames when callbacks indicate they are needed
 That is a better fit than callback-driven rendering because it separates:
 - render timing
 - GL ownership
 - output-device timing
 - latency policy
 In that model:
 - render is the producer
 - DeckLink is the timing consumer
 - the queue between them becomes the main place to manage latency versus resilience
 Why this is preferable:
 - brief callback jitter is less likely to become a visible dropped frame
 - render spikes can be absorbed by queue headroom instead of immediately missing output deadlines
 - latency becomes an explicit policy choice rather than an incidental side effect of callback timing
 - queue depth, underruns, stale-frame reuse, and catch-up behavior become measurable and tunable
 Recommended direction:
 - move toward a bounded producer/consumer playout queue
 - make queue depth and target headroom runtime policy, not compile-time constants
 - define explicit underrun behavior, for example:
  - reuse newest completed frame
  - reuse last scheduled frame
  - output black or degraded frame
 - keep DeckLink callbacks limited to dequeue/schedule/accounting work wherever possible
 ### 7. Persistence should be more asynchronous and debounced
 `SavePersistentState()` is still called directly from many update paths.
@@ -163,15 +277,77 @@ Recommended direction:
 Add lightweight tracing for:
 - input callback latency
 - input upload skip count
 - GL lock wait time
 - render queue depth
 - render time
 - pass build/compile latency
 - readback time
 - output scheduling lag
 - output queue depth
 - preroll depth versus spare-buffer depth
 - preview present cost and skipped-preview count
 - control queue depth
 - `RuntimeHost` lock contention
 That would make future tuning and failure diagnosis much easier.
 Timing-specific observations from the current code:
 - render time is captured as one total number in [OpenGLRenderPipeline.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/pipeline/OpenGLRenderPipeline.cpp:24), but not split into draw, pack, readback wait, readback copy, or preview present
 - frame pacing stats are recorded in [OpenGLVideoIOBridge.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/pipeline/OpenGLVideoIOBridge.cpp:17), but there is no explicit visibility into how much queued playout headroom remains
 - input uploads are intentionally skipped when the GL bridge is busy in [OpenGLVideoIOBridge.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/pipeline/OpenGLVideoIOBridge.cpp:60), but the app does not currently surface how often that is happening
 ### 8a. Preview and playout are still too close together
 The desktop preview is rate-limited, but still presented from inside the render pipeline path.
 Relevant code:
 - [OpenGLRenderPipeline.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/pipeline/OpenGLRenderPipeline.cpp:54)
 - [OpenGLComposite.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/OpenGLComposite.cpp:235)
 This means preview presentation can still consume time on the same path that is trying to meet output deadlines.
 Recommended direction:
 - treat preview as best-effort and entirely subordinate to playout
 - move preview present to a separate presentation schedule fed from the latest completed render
 - record preview skips and preview present cost independently from playout timing
 ### 8b. Readback is improved, but still not fully deadline-safe
 The async readback path is a good step, but the miss path still falls back to synchronous `glReadPixels()` and then flushes the async pipeline.
 Relevant code:
 - [OpenGLRenderPipeline.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/pipeline/OpenGLRenderPipeline.cpp:150)
 - [OpenGLRenderPipeline.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/pipeline/OpenGLRenderPipeline.cpp:228)
 That means a single late GPU fence can push the app back onto the most timing-sensitive path exactly when it is already under pressure.
 Recommended direction:
 - increase readback instrumentation before changing policy again
 - consider deeper readback buffering or a true stale-frame reuse policy instead of immediate synchronous fallback
 - separate "freshest possible frame" policy from "never miss output deadline" policy and make that tradeoff explicit
 ### 8c. Background control and file-watch timing are still coarse
 `RuntimeServices::PollLoop()` currently uses a `25 x Sleep(10)` loop, which gives it a coarse `~250 ms` cadence for file-watch polling and deferred OSC commit work.
 Relevant code:
 - [RuntimeServices.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/control/RuntimeServices.cpp:245)
 That is acceptable for non-critical background work, but it is still too blunt to be the long-term timing model for coordination-heavy runtime services.
 Recommended direction:
 - replace coarse sleep polling with waitable events or condition-variable driven wakeups where practical
 - isolate truly background work from latency-sensitive control reconciliation
 - add separate metrics for queue age, not just queue depth
 ## Phased Roadmap
 This roadmap is ordered by architectural dependency rather than by “quick wins.” The goal is to move the app toward clearer ownership boundaries and safer live behavior without doing later work on top of foundations that are likely to change again.
@@ -387,6 +563,7 @@ Expected benefits:
 - safer startup/shutdown ordering
 - clearer recovery behavior
 - easier handling of missing input, dropped frames, or reconfiguration
 - a clearer place to own playout headroom policy, output queue sizing, and late-frame recovery behavior
 ### Phase 8. Add structured health, telemetry, and operational reporting
@@ -397,8 +574,13 @@ Recommended coverage:
 - render queue depth
 - GL lock wait time, if any shared lock remains
 - input callback latency
 - input upload skip count
 - output scheduling lag
 - output queue depth and spare-buffer depth
 - readback timing
 - readback fence wait timing
 - synchronous readback fallback count
 - preview present timing and skipped-preview count
 - snapshot publish frequency
 - persistence queue depth
 - event queue depth
--- a/shaders/SHADER_CONTRACT.md
+++ b/shaders/SHADER_CONTRACT.md
@@ -105,6 +105,13 @@ Optional fields:
 Parameter objects may also include an optional `description` string. The control UI displays it as one-line helper text with the full text available on hover, so use it for short operational guidance rather than long documentation.
 Metadata conventions:
 - Keep `name` short, human-facing, and in title case.
 - Keep `category` consistent with existing library groups such as `Color`, `Transform`, `Projection`, `Temporal`, `Scopes & Guides`, `Utility`, `Feedback`, and `Calibration`.
 - Keep `description` to one clear sentence in present tense that explains what the shader does for an operator.
 - Avoid placeholder, joke, or overly implementation-heavy descriptions unless the shader is intentionally a diagnostic or broken example.
 Shader-visible identifiers must be valid Slang-style identifiers:
 - `entryPoint`
@@ -276,13 +283,15 @@ On the first frame, or after a reset, `sampleFeedback` returns transparent black
 Feedback resets when:
 - layers are added, removed, or reordered
 - a layer bypass state changes
 - a layer changes shader
 - the layer itself is removed
 - a shader is reloaded or recompiled
 - render dimensions change
 - the app restarts
 Ordinary stack add/remove/reorder operations on other layers are intended to preserve feedback state for unchanged feedback-enabled layers.
 So feedback should be treated as live runtime state, not durable saved state.
 ## Slang Entry Point
--- a/shaders/feedback-data-blocks/shader.json
+++ b/shaders/feedback-data-blocks/shader.json
@@ -1,7 +1,7 @@
 {
  "id": "feedback-data-blocks",
  "name": "Feedback Data Blocks",
-  "description": "Demonstrates using the feedback surface as coarse data storage by reserving eight 3x3 texel cells for sampled colors and a hidden metadata cell for timed or trigger-driven refresh state.",
+  "description": "Demonstrates coarse shader-local data storage by reserving eight 3x3 feedback cells for sampled colors and one hidden metadata cell for refresh state.",
  "category": "Feedback",
  "entryPoint": "storeProbeData",
  "passes": [
--- a/shaders/feedback-highlight-accumulator/shader.json
+++ b/shaders/feedback-highlight-accumulator/shader.json
@@ -1,7 +1,7 @@
 {
  "id": "feedback-highlight-accumulator",
  "name": "Feedback Background Memory",
-  "description": "Demonstrates writable full-frame shader feedback by learning a persistent per-pixel background model, then comparing the live frame against that learned plate. This cannot be reproduced by only reading ordinary history frames because the shader writes its own evolving state back each frame.",
+  "description": "Learns a persistent per-pixel background plate in shader-local feedback and compares the live frame against that evolving full-frame state.",
  "category": "Feedback",
  "entryPoint": "updateBackgroundModel",
  "passes": [
--- a/shaders/white-balance-correction/shader.json
+++ b/shaders/white-balance-correction/shader.json
@@ -1,7 +1,7 @@
 {
  "id": "white-balance-correction",
  "name": "White Balance Correction",
-  "description": "Operator-friendly tint, color balance, and exposure correction intended to pair with the white match probe.",
+  "description": "Provides operator-friendly warm/cool, green/magenta, and exposure correction intended to pair with the White Match Probe.",
  "category": "Color",
  "entryPoint": "shadeVideo",
  "parameters": [
--- a/shaders/white-match-probe/shader.json
+++ b/shaders/white-match-probe/shader.json
@@ -1,8 +1,8 @@
 {
  "id": "white-match-probe",
  "name": "White Match Probe",
-  "description": "Samples a movable box, stores a reference color on trigger using shader-local feedback, and compares the current sample against the held reference for camera matching.",
+  "description": "Samples a movable box, stores a reference color on trigger using shader-local feedback, and compares the current sample against a captured or manual reference for camera matching.",
-  "category": "Utility",
+  "category": "Scopes & Guides",
  "entryPoint": "storeReferenceState",
  "passes": [
    {