# Render Thread Ownership Plan This plan describes how to make the main compositor behave like the successful `DeckLinkRenderCadenceProbe`: one render cadence owner, one GL context owner, no unrelated work able to interrupt output frame production. The goal is not just "all GL calls happen on one thread". The current app mostly does that during runtime already. The real goal is: - the output render thread owns its GL context for its whole lifetime - output cadence is driven by the render thread, not by DeckLink completion timing - non-output GL work cannot sit ahead of output frames - callers cannot block the render thread while waiting for synchronous answers - DeckLink scheduling consumes completed system-memory frames and never causes rendering ## Current Risk Points The current main app still has several ways to interrupt output cadence. ### Shared GL Executor `RenderEngine` owns the GL context during runtime, but it acts as a general task executor. The same queue/path can run: - output frame render - input upload - preview present - screenshot capture - render resets - shader/program commits - resource resize - state clearing That means output frames are not guaranteed to be the next GL work item at the selected frame time. ### Synchronous Output Render Request `VideoBackend` drives output production from its output producer thread, then calls: ```text VideoBackend -> OpenGLVideoIOBridge::RenderScheduledFrame -> RenderEngine::RequestOutputFrame -> TryInvokeOnRenderThread ``` That makes output production a request/response interaction. The producer waits for the render thread, and the render thread is still shared with other work. ### Input Upload Shares Output Context DeckLink input capture currently flows into: ```text VideoBackend::HandleInputFrame -> OpenGLVideoIOBridge::UploadInputFrame -> RenderEngine::QueueInputFrame -> render thread upload ``` Even with coalescing, input upload can consume render-thread time and GPU bandwidth directly before output rendering. ### Preview And Screenshot Share Output Context Preview and screenshot are lower-priority features, but today they still execute on the render thread. Preview is best-effort at the caller side, but once queued it can still occupy the same context. Screenshot capture can be more expensive because it performs readback and CPU-side image preparation. ### Startup Context Ownership Is Transitional The Win32 startup path creates and binds the GL context before `RenderEngine::StartRenderThread()`. That is acceptable as a transitional state, but the final model should make context ownership explicit: - bootstrap thread creates the window/context - bootstrap thread releases it - render thread binds it - only render thread initializes GL resources - only render thread destroys GL resources ### Render Callback Re-enters App State `OpenGLRenderPipeline::RenderFrame()` calls a callback into `OpenGLComposite::renderEffect()`. That callback builds `RenderFrameInput`, resolves frame state, drains runtime live state, and then calls back into `RenderEngine` to draw the prepared frame. This works, but it means the output render path still reaches up into app/runtime code at frame time. ## Target Runtime Shape The main app should match this ownership model: ```text runtime/control threads -> publish snapshots, live overlays, reset requests, shader-build results -> never call GL render cadence thread -> sole owner of output GL context -> wakes at selected render cadence -> samples latest render input/state -> renders one frame -> queues async readback/copies completed readback into system-memory slot -> publishes completed frame to bounded FIFO output reserve video output thread -> consumes completed system-memory frames -> schedules DeckLink frames to target buffer depth -> processes completion results -> never calls GL optional input upload path -> writes latest input frame into CPU-side latest-frame buffer -> render thread imports/uploads at a controlled point in its frame preview/screenshot path -> consumes already-rendered output/system-memory frame when possible -> never interrupts output render cadence ``` ## Non-Negotiable Rules - The render thread never waits for DeckLink. - DeckLink callbacks never render. - Runtime/control threads never directly execute GL. - Preview and screenshot never execute ahead of output frames. - Input upload is never a separate urgent GL task ahead of output render. - Shader/resource commits are applied only at a frame boundary. - Telemetry on the hot path must be lock-light or try-lock only. - The render thread cadence does not speed up to refill buffers. - If output work overruns, the render thread records the overrun and resumes the selected cadence policy. ## Implementation Plan ### 1. Add Thread/Context Ownership Guards Add explicit render-thread ownership checks around all GL entry points. Deliverables: - `RenderEngine` exposes `IsOnRenderThread()` for assertions/tests. - GL-facing classes get debug-only owner checks where practical. - wrong-thread GL access becomes a counted telemetry warning, not just `OutputDebugStringA`. - tests cover that public request methods do not execute GL directly. Acceptance: - every `RenderEngine` public method is classified as either request-only, lifecycle-only, or render-thread-only. - render-thread-only methods are private or guarded. - no normal runtime caller can accidentally invoke GL work inline. ### 2. Move GL Initialization Fully Onto The Render Thread Start the render thread before compiling shaders and initializing GL resources. Current startup does: ```text InitOpenGLState() -> CompileDecodeShader -> CompileOutputPackShader -> InitializeResources -> CompileLayerPrograms StartRenderThread() ``` Move toward: ```text create context on Win32 thread release context on Win32 thread StartRenderThread() render thread binds context render thread initializes extensions, shaders, resources ``` Deliverables: - a single `RenderEngine::StartAndInitialize(RenderInitializationConfig)` path. - GL extension resolution happens on the render thread. - shader/resource initialization is a render-thread startup phase. - `RenderEngine` destructor only destroys resources on the render thread. Acceptance: - after `StartRenderThread()`, no non-render thread binds or uses the app GL context. - shutdown order is deterministic: stop video output, stop render cadence, destroy GL resources, release context. ### 3. Replace Synchronous Output Render Requests With Render-Owned Cadence Move output cadence out of `VideoBackend` and into the render system. Current: ```text VideoBackend output producer -> cadence tick -> acquire output slot -> synchronous render-thread request ``` Target: ```text RenderEngine output cadence loop -> cadence tick -> acquire/free output slot through a non-blocking frame-sink interface -> render frame -> publish completed frame ``` Deliverables: - introduce `RenderedFrameSink` or similar interface owned by video output. - render thread pulls/claims a free system-memory slot without waiting. - if no free slot exists, render thread drops/recycles the oldest unscheduled completed frame or records backpressure without blocking. - remove `RenderEngine::RequestOutputFrame()` from the steady-state output path. Acceptance: - output rendering continues even if DeckLink completion is delayed. - no `std::future` wait exists in the output cadence path. - `VideoBackend` no longer owns the producer render loop; it owns scheduling/completion only. ### 4. Make The Render Thread A Frame Loop, Not A Task Queue Keep a command mailbox, but process it only at safe frame-boundary points. Frame loop: ```text while running: wait until next render timestamp apply bounded frame-boundary commands sample latest frame input/state upload latest input frame if enabled and budget allows render output frame queue/consume readback publish completed frame record timings ``` Command classes: - frame-boundary commands: reset temporal history, reset shader feedback, commit prepared shader programs - background/low-priority commands: preview, screenshot, diagnostic readback - non-GL commands: state publication, telemetry, persistence Deliverables: - replace FIFO render task queue with a priority/mailbox model. - output cadence is the loop's main clock. - commands have budget classes and max work per frame. - long commands are deferred rather than blocking the current output tick. Acceptance: - preview/screenshot cannot run immediately before a due output frame. - reset/shader work is applied between frames and measured. - output render starts within a small jitter window when the GPU is not overrun. ### 5. Move Input Capture To A CPU Latest-Frame Buffer Input capture should not enqueue independent GL upload tasks. Target: ```text DeckLink input callback -> copy/coalesce latest CPU input frame -> return quickly render thread frame boundary -> if input version changed, upload latest frame -> render using last successfully uploaded input texture ``` Deliverables: - introduce `InputFrameMailbox` with latest-frame semantics. - remove `RenderEngine::QueueInputFrame()` from the callback path. - render thread owns the upload moment. - if upload would exceed budget, render thread can reuse the previous input texture and record an input-upload skip. Acceptance: - input capture enabled does not create arbitrary render-thread tasks. - output cadence remains stable when input frames arrive. - telemetry separates input-frame arrival, upload count, upload skips, and upload cost. ### 6. Move Preview To A Consumer Path Preview should consume the latest completed output image instead of asking the output GL context to present. Options: - CPU preview from latest system-memory output frame. - a separate preview GL context fed asynchronously from completed frames. - a low-priority render-thread blit only when output has measurable slack. Recommended first step: - use latest system-memory BGRA8 output for the window preview. Deliverables: - preview reads from latest completed/scheduled output frame copy. - `TryPresentPreview()` no longer queues GL work on the output render thread. - preview FPS throttling remains caller-side. Acceptance: - forcing preview cannot delay output rendering. - minimizing/focusing the window does not affect output cadence. ### 7. Move Screenshot To Completed Frame Capture Screenshot should capture from the latest completed output frame unless an explicit "exact render capture" mode is requested. Deliverables: - screenshot request reads the latest system-memory output frame. - PNG write remains async. - optional diagnostic exact-GL screenshot is disabled during live output or explicitly marked disruptive. Acceptance: - screenshot request does not call `glReadPixels` on the output render context during steady-state playout. ### 8. Make Shader Commits Frame-Boundary Work Prepared shader builds are CPU/background work; GL program commit is still GL work. Deliverables: - shader build queue produces `PreparedShaderBuild`. - render thread sees latest pending prepared build at a frame boundary. - commit is applied only between frames. - expensive commits can temporarily enter a measured "render reconfigure" state. Acceptance: - shader commits do not interleave midway through output render. - output timing telemetry records commit duration separately from normal render duration. ### 9. Split Output Scheduling From Rendering Completely `VideoBackend` should become a playout/scheduling owner, not a render producer. Target: ```text RenderEngine -> produces completed frames at render cadence VideoBackend -> schedules completed frames up to target DeckLink depth -> processes completions -> releases scheduled slots ``` Deliverables: - `VideoBackend` owns `SystemOutputFramePool`, or a new `SystemFrameExchange` owns it between render/video. - render thread publishes completed frames into the exchange. - video output thread schedules from the exchange. - no render calls exist in completion handling or scheduling paths. Acceptance: - DeckLink buffer depth changes cannot directly cause render-thread wakeups except through non-blocking availability signals. - render cadence can be tested without DeckLink by using a fake frame sink. - video scheduling can be tested without GL by using synthetic frames. ### 10. Preserve The Probe As The Reference Contract The `DeckLinkRenderCadenceProbe` is now the control sample. Deliverables: - document which main-app components correspond to the probe components. - add a small regression checklist: - render FPS near target - schedule FPS near target - DeckLink buffered frames stable - no late/drop frames - no PBO misses or readback stalls - focus/minimize does not change output cadence Acceptance: - after each migration step, compare the main app telemetry against the probe's known-good behavior. ## Suggested Order Of Work 1. Add ownership guards and classify render methods. 2. Move GL initialization/destruction fully onto the render thread. 3. Introduce a render-owned cadence loop behind a feature flag. 4. Add a frame-sink/exchange interface between render and video. 5. Move output production from `VideoBackend` to the render cadence loop. 6. Convert input upload to latest-frame mailbox semantics. 7. Move preview to completed-frame consumption. 8. Move screenshot to completed-frame capture. 9. Convert shader commits/resets to frame-boundary mailbox commands. 10. Remove old synchronous output render request path. ## Feature Flags During Migration Use flags only to keep testing safe, not as long-term compatibility layers. Suggested flags: ```text VST_RENDER_CADENCE_OWNER=render_thread VST_DISABLE_INPUT_CAPTURE=1 VST_PREVIEW_SOURCE=system_frame VST_SCREENSHOT_SOURCE=system_frame ``` Remove each flag once the new behavior is proven and becomes the only supported path. ## Telemetry Needed Add or preserve counters for: - render tick jitter - render tick overrun - output render duration - GL command mailbox depth by class - frame-boundary command duration - input upload duration and skips - readback queue/consume duration - completed system-memory frame depth - scheduled DeckLink frame depth - DeckLink actual buffered frames - preview frames consumed - screenshot requests served from system memory The key metric is whether output render starts on time. Buffer depth alone is not enough; a full buffer can still contain stale or repeated frames. ## Completion Definition This work is complete when: - the output render thread owns the app GL context from initialization through shutdown - output rendering is driven by the render thread's selected frame cadence - no non-output task can run ahead of a due output frame - `VideoBackend` never asks the render thread to render synchronously - DeckLink scheduling consumes already completed system-memory frames - input upload, preview, screenshot, shader commits, and resets are all frame-boundary, mailbox, or consumer-side operations - main-app telemetry approaches the cadence probe behavior under the same output mode