Files
video-shader-toys/docs/PHASE_4_RENDER_THREAD_OWNERSHIP_DESIGN.md
Aiden 761df3b2d0
All checks were successful
CI / React UI Build (push) Successful in 11s
CI / Native Windows Build And Tests (push) Successful in 2m39s
CI / Windows Release Package (push) Successful in 2m45s
Phase 4 complete
2026-05-11 18:39:02 +10:00

19 KiB

Phase 4 Design: Render Thread Ownership

This document expands Phase 4 of ARCHITECTURE_RESILIENCE_REVIEW.md into a concrete design target.

Phase 1 named the subsystems. Phase 2 added the typed event substrate. Phase 3 made render-facing live state explicit through RuntimeLiveState, RenderStateComposer, RenderFrameInput, RenderFrameState, RenderFrameStateResolver, and RuntimeServiceLiveBridge. Phase 4 can now focus on the core timing-risk boundary: making one render thread the only owner of OpenGL work.

Status

  • Phase 4 design package: implemented.
  • Phase 4 implementation: complete for GL ownership. RenderEngine starts a dedicated render thread, owns the GL context during normal runtime work, and exposes queue/request entrypoints for input upload, output render, preview presentation, screenshot capture, shader rebuild application, and render-local resets.
  • Current alignment: normal runtime GL work is routed through the render thread after startup. Startup initialization still runs before the render thread starts while the app explicitly owns the context, and shutdown now stops DeckLink/backend work before destroying render-thread GL resources and deleting the context.

Current GL ownership footholds:

  • RenderEngine owns GL resources, a dedicated render thread, synchronous request/response for output frames, a small render command mailbox, named render-thread helper methods, and wrong-thread diagnostics for those helpers.
  • RenderFrameInput / RenderFrameState provide the frame-state contract that a render thread can consume.
  • RenderFrameStateResolver prepares the render-facing layer state before drawing.
  • OpenGLVideoIOBridge calls RenderEngine::QueueInputFrame(...) from the input path and RenderEngine::RequestOutputFrame(...) from the output path.
  • OpenGLComposite::paintGL(...), screenshot capture, input upload, and output rendering enter render work through explicit RenderEngine requests. After OpenGLComposite::Start() starts the render thread, those requests do not bind the GL context on the caller thread.

Why Phase 4 Exists

The resilience review identifies shared GL ownership as the main remaining timing and failure-isolation risk. Today the shared context lock protects correctness, but it does not isolate timing:

  • input callbacks can attempt texture upload
  • output callbacks can trigger frame rendering and readback
  • preview paint can enter the same GL context
  • screenshot capture can enter the same GL context
  • the DeckLink completion path is still too close to render work

That means brief input, preview, readback, or callback stalls can still collide on the most timing-sensitive path.

Phase 4 should turn GL from a shared resource guarded by a lock into a resource owned by one thread with explicit queues and handoff points.

Goals

Phase 4 should establish:

  • one render thread as the sole long-lived owner of the GL context
  • non-render threads enqueue work instead of binding the GL context
  • input upload requests are accepted and executed by the render thread
  • output frame rendering is requested or scheduled through render-owned work
  • preview and screenshot requests become render-thread commands or consumers
  • RenderFrameInput / RenderFrameState become the stable data contract for frame production
  • GL context entrypoints are reduced to render-thread-only code paths
  • tests for queue semantics and request coalescing without requiring DeckLink hardware, plus explicit lifecycle ordering in code

Non-Goals

Phase 4 should not require:

  • the final producer/consumer playout queue for DeckLink
  • the final DeckLink lifecycle state machine
  • replacing the async readback policy
  • implementing background persistence
  • completing Phase 5's deeper live-state layering
  • replacing every UI or backend API at once

Those are later phases or follow-on work. Phase 4 is about making GL ownership deterministic first.

Current GL Entry Points

The current code paths that matter most are:

Entry point Current behavior Phase 4 direction
RenderEngine::QueueInputFrame(...) copies the latest input frame into the render mailbox and returns without waiting for GL render thread uploads latest input without callback-owned GL
RenderEngine::RequestOutputFrame(...) synchronous output request; after render-thread startup it queues output render work and waits for render-thread completion with timeout/failure reporting render thread executes output frame production
RenderEngine::TryPresentPreview(...) best-effort request; callers queue preview presentation and return render thread consumes latest completed frame for preview
RenderEngine::RequestScreenshotCapture(...) queues screenshot capture and async disk write completion screenshot capture is a render-thread command
OpenGLVideoIOBridge::UploadInputFrame(...) copies the latest input frame into the render mailbox and returns without waiting for GL render thread uploads the latest queued input frame
OpenGLVideoIOBridge::RenderScheduledFrame(...) requests render-thread output production and reports success/failure to the backend consume render-produced output without callback-owned GL

Target Ownership Model

Render Thread

The render thread should own:

  • wglMakeCurrent(...) for the rendering context
  • all GL resource creation/destruction
  • input texture upload
  • pass execution
  • output pack conversion
  • async readback buffers and fences
  • preview presentation or preview frame publication
  • screenshot readback
  • temporal history and feedback resources

Other Threads

Other threads may:

  • enqueue input frames or replace the latest input frame
  • publish control/runtime/backend events
  • request shader build application
  • request render-local resets
  • request screenshots
  • consume ready output frames or receive completion notifications

Other threads should not:

  • call GL directly
  • bind or unbind the render context
  • wait on GL fences directly
  • mutate render-local resource state

Proposed Collaborators

RenderThread

Owns the OS thread, wakeup primitive, lifecycle, and render-loop execution.

Responsibilities:

  • start and stop the render thread
  • bind the GL context for the thread lifetime or render-loop lifetime
  • drain render commands
  • execute frame production work
  • publish lifecycle and failure observations

Non-responsibilities:

  • runtime mutation policy
  • DeckLink scheduling policy
  • durable persistence

RenderCommandQueue

Small bounded queue or command mailbox for render-thread work.

Current implementation:

  • RenderCommandQueue exists as a pure C++ mailbox helper.
  • Preview present and screenshot capture requests use latest-value coalescing.
  • Input upload requests use latest-value coalescing with owned frame bytes copied at enqueue time.
  • Output frame requests use FIFO semantics so scheduled output demand is not collapsed.
  • Render-local reset requests coalesce to the strongest pending reset scope.
  • Output frame requests use synchronous request/response through the render thread as the remaining transitional playout bridge.

Possible commands:

  • UploadInputFrame
  • RenderOutputFrame
  • PrepareFrameState
  • ApplyShaderBuild
  • ResetTemporalHistory
  • ResetShaderFeedback
  • PresentPreview
  • CaptureScreenshot
  • Stop

High-rate commands should be coalesced where appropriate. Input frames should likely be latest-value rather than unbounded FIFO.

RenderFrameCoordinator

Optional helper that combines Phase 3's frame contract with render-thread execution.

Responsibilities:

  • build or receive RenderFrameInput
  • call RuntimeServiceLiveBridge and RenderFrameStateResolver
  • hand RenderFrameState to RenderEngine

This can begin as a thin helper. The important part is that it keeps frame-state preparation explicit when renderEffect() stops being called directly from the callback path.

RenderOutputMailbox

Optional transitional bridge for output frames.

Responsibilities:

  • hold the latest completed output frame or a small bounded set
  • let backend code consume output without owning GL
  • report underrun/stale-frame reuse observations

This may be a Phase 4 late step or a Phase 7 playout-policy step. Phase 4 should at least avoid designing the render thread in a way that blocks it.

Threading Contract

Phase 4 should make thread ownership visible in APIs.

Candidate naming:

  • RenderEngine::StartRenderThread(...)
  • RenderEngine::StopRenderThread()
  • RenderEngine::EnqueueInputFrame(...)
  • RenderEngine::RequestOutputFrame(...)
  • RenderEngine::RequestPreviewPresent(...)
  • RenderEngine::RequestScreenshot(...)

Render-thread-only methods should be private or clearly named:

  • RenderEngine::UploadInputFrameOnRenderThread(...)
  • RenderEngine::RenderOutputFrameOnRenderThread(...)
  • RenderEngine::CaptureOutputFrameRgbaTopDownOnRenderThread(...)

The public runtime entrypoints now use queue/request language. RequestOutputFrame(...) remains synchronous so the existing DeckLink callback path can keep producing an output frame while Phase 7's producer/consumer playout queue is still future work.

Frame Production Shape

A target render-thread frame should look like:

  1. wake for input, output demand, preview demand, shader build, reset, screenshot, or stop
  2. drain bounded render commands
  3. coalesce to the latest input frame and latest control/live state
  4. build RenderFrameInput
  5. prepare RenderFrameState
  6. upload accepted input frame
  7. render layer stack
  8. pack output if needed
  9. stage readback or output buffer
  10. publish preview/screenshot/output completion as needed
  11. record timing and queue metrics

The exact cadence can remain demand-driven initially. The architectural win is that the demand wakes the render thread rather than borrowing GL from the caller.

Migration Plan

Step 1. Name Render-Thread-Only Methods

Split existing direct GL methods into public request methods and private render-thread methods without changing behavior much.

Initial target:

  • keep current synchronous behavior where callers need a result
  • move GL bodies into clearly render-thread-owned helpers for upload, output render, preview presentation, and screenshot readback
  • make future queue migration mechanical

Step 2. Add Render Command Queue

Introduce a small queue/mailbox for render commands.

Start with low-risk commands:

  • preview present request
  • screenshot request
  • render-local reset requests
  • input upload request
  • output render request

The queue and wakeup behavior still need the dedicated render thread before the callbacks stop borrowing the GL context.

Step 3. Start A Dedicated Render Thread

Create the render thread and make it own context binding.

  • create a dedicated render thread owned by RenderEngine
  • bind the existing GL context on the render thread for normal runtime work
  • stop the render thread before GL context destruction
  • keep transitional synchronous request/response for output frames
  • remove normal runtime dependence on the shared GL CRITICAL_SECTION
  • add timeout/failure behavior for render-thread requests

Transitional behavior still allows synchronous request/response for output frames. Render-thread requests now fail fast if they cannot begin within the request timeout, and log over-budget tasks that have already started before waiting for safe completion. The important change is that the caller waits for render-thread completion rather than taking the GL context itself.

Step 4. Move Input Upload To The Render Thread

Change OpenGLVideoIOBridge::UploadInputFrame(...) so it enqueues or replaces the latest input frame.

Policy targets:

  • bounded memory
  • latest-frame wins under load
  • input upload skip count is observable through render command coalescing metrics
  • input callback never waits for GL

Current implementation: OpenGLVideoIOBridge::UploadInputFrame(...) calls RenderEngine::QueueInputFrame(...), which copies the input bytes into the latest-value render mailbox and schedules one bounded render-thread wakeup to upload the newest pending frame.

Step 5. Move Output Rendering To The Render Thread

Change OpenGLVideoIOBridge::RenderScheduledFrame(...) so it requests render-thread output production or consumes a completed render-thread output.

Transitional option:

  • synchronous request/response through the render thread

Better follow-up:

  • render ahead into a bounded output queue and let backend callbacks consume ready frames

Current implementation: OpenGLVideoIOBridge::RenderScheduledFrame(...) calls RenderEngine::RequestOutputFrame(...) and returns whether the render-thread request produced an output frame. VideoBackend skips scheduling that frame when render production fails or times out.

Step 6. Decouple Preview And Screenshot Requests

Preview should become best-effort:

  • request preview presentation from the render thread
  • skip/coalesce when render is busy or output deadline pressure is high
  • record preview skips through render command coalescing metrics

Screenshot should become:

  • queued render-thread capture request
  • async disk write remains outside render thread

Current implementation: OpenGLComposite::RequestScreenshot(...) builds the output path, queues RenderEngine::RequestScreenshotCapture(...), and the render thread captures pixels before handing them to the existing async PNG writer. Preview presentation is a latest-value best-effort render command that is queued behind output render work, even when requested from the render pipeline.

Step 7. Remove Shared GL Lock From Normal Paths

Once all GL entrypoints are render-thread-owned:

  • remove normal dependence on pMutex for render correctness
  • keep diagnostics that detect wrong-thread render-thread helper calls
  • leave only lifecycle context binding where needed

Current implementation: OpenGLComposite no longer owns or passes a shared CRITICAL_SECTION, and RenderEngine no longer has caller-thread GL fallback paths for preview, input upload, output render, or screenshot capture. Runtime callers must go through the render thread; pre-start direct GL fallback is limited to startup initialization while the app explicitly owns the context.

Shutdown Order

Current shutdown order is explicit in code:

  1. OpenGLComposite::Stop() stops runtime services so control/OSC work stops entering the runtime.
  2. VideoBackend::Stop() stops DeckLink streams/playout so input and output callbacks stop requesting render work.
  3. RenderEngine::StopRenderThread() destroys GL resources on the render thread, signals the render thread to stop, joins it, and unbinds the context on render-thread exit.
  4. WM_DESTROY deletes OpenGLComposite, unbinds the window context, and deletes the GL context.

This order is build-tested, and RenderCommandQueue behavior is covered by non-GL unit tests. It still benefits from a real-window/DeckLink shutdown smoke test, but the code path is explicit enough for Phase 4's design exit.

Testing Strategy

Phase 4 tests should avoid hardware where possible.

Recommended tests:

  • render command queue preserves FIFO for non-coalesced commands
  • latest-input mailbox drops older frames under load
  • shutdown path stops backend callbacks before stopping and joining the render thread
  • screenshot request receives one completion or failure
  • output render request reports failure if render thread is stopped
  • render reset commands coalesce where expected
  • wrong-thread render-only diagnostics are present on private render-thread helpers

Existing useful homes:

  • RuntimeEventTypeTests for new render/backend observations
  • RuntimeSubsystemTests for pure request/coalescing helpers
  • a future RenderThreadTests target if render-thread lifecycle is extracted behind a non-GL test seam

Manual verification will still be needed for:

  • real DeckLink input/output
  • preview interaction
  • screenshot capture
  • shader reload while rendering
  • real window/context shutdown

Telemetry Added During Phase 4

Phase 4 should add minimal metrics while moving ownership:

  • render command queue depth
  • input frames accepted, replaced, and dropped
  • render-thread wake reason counts
  • render-thread frame duration
  • output request latency
  • preview request skipped count
  • screenshot request success/failure count
  • wrong-thread GL call diagnostics if practical

Full operational reporting remains Phase 8, but these metrics make the threading migration debuggable.

Risks

Deadlock Risk

Synchronous request/response shims can deadlock if the caller is already on the render thread or holds a lock the render thread needs. Phase 4 should keep request waits narrow and add render-thread detection early.

Latency Risk

Moving work through queues can hide latency. Queue depth and output request latency should be measured from the first migration step.

Lifetime Risk

Moving context ownership changes startup and shutdown order. The render thread must stop before GL resources or window/context handles are destroyed.

Callback Pressure Risk

If DeckLink callbacks wait too long for render-thread work, Phase 4 may improve GL ownership but still leave callback timing fragile. A synchronous bridge is acceptable as a transition, but the design should keep the path open for producer/consumer playout.

Preview Coupling Risk

Preview can remain a hidden budget consumer if it stays in the output frame path. Phase 4 should keep preview explicitly best-effort, even if physical decoupling continues later.

Phase 4 Exit Criteria

Phase 4 can be considered complete once the project can say:

  • one render thread owns the GL context during normal operation
  • input callbacks do not bind GL or wait on GL upload
  • output callbacks do not bind GL directly
  • preview and screenshot requests enter render through explicit render-thread requests
  • RenderFrameInput / RenderFrameState remain the frame-state contract
  • normal frame production no longer depends on a shared GL CRITICAL_SECTION
  • render-thread queue/mailbox behavior has non-GL tests
  • shutdown order is explicit and tested or manually verified

Open Questions

  • What exact producer/consumer output queue shape should replace the current synchronous output request in Phase 7?
  • Should preview present on the render thread, or should render publish a preview image/texture to a separate presenter?
  • Should wrong-thread GL access eventually escalate from debug diagnostics to structured telemetry or assertions?

Short Version

Phase 4 should make GL ownership boring and deterministic.

One render thread owns the context. Other threads submit work or consume results. Input upload, frame rendering, readback, preview, and screenshot capture all move behind render-thread entrypoints. Output production remains a synchronous request/response bridge for now, but the app no longer relies on callback and UI paths borrowing the GL context under one shared lock.