aiden/video-shader-toys

Fork 0

Files

Aiden 761df3b2d0

CI / React UI Build (push) Successful in 11s

Details

CI / Native Windows Build And Tests (push) Successful in 2m39s

Details

CI / Windows Release Package (push) Successful in 2m45s

Details

Phase 4 complete

2026-05-11 18:39:02 +10:00

19 KiB

Raw Blame History

Phase 4 Design: Render Thread Ownership

This document expands Phase 4 of ARCHITECTURE_RESILIENCE_REVIEW.md into a concrete design target.

Phase 1 named the subsystems. Phase 2 added the typed event substrate. Phase 3 made render-facing live state explicit through RuntimeLiveState, RenderStateComposer, RenderFrameInput, RenderFrameState, RenderFrameStateResolver, and RuntimeServiceLiveBridge. Phase 4 can now focus on the core timing-risk boundary: making one render thread the only owner of OpenGL work.

Status

Phase 4 design package: implemented.
Phase 4 implementation: complete for GL ownership. RenderEngine starts a dedicated render thread, owns the GL context during normal runtime work, and exposes queue/request entrypoints for input upload, output render, preview presentation, screenshot capture, shader rebuild application, and render-local resets.
Current alignment: normal runtime GL work is routed through the render thread after startup. Startup initialization still runs before the render thread starts while the app explicitly owns the context, and shutdown now stops DeckLink/backend work before destroying render-thread GL resources and deleting the context.

Current GL ownership footholds:

RenderEngine owns GL resources, a dedicated render thread, synchronous request/response for output frames, a small render command mailbox, named render-thread helper methods, and wrong-thread diagnostics for those helpers.
RenderFrameInput / RenderFrameState provide the frame-state contract that a render thread can consume.
RenderFrameStateResolver prepares the render-facing layer state before drawing.
OpenGLVideoIOBridge calls RenderEngine::QueueInputFrame(...) from the input path and RenderEngine::RequestOutputFrame(...) from the output path.
OpenGLComposite::paintGL(...), screenshot capture, input upload, and output rendering enter render work through explicit RenderEngine requests. After OpenGLComposite::Start() starts the render thread, those requests do not bind the GL context on the caller thread.

Why Phase 4 Exists

The resilience review identifies shared GL ownership as the main remaining timing and failure-isolation risk. Today the shared context lock protects correctness, but it does not isolate timing:

input callbacks can attempt texture upload
output callbacks can trigger frame rendering and readback
preview paint can enter the same GL context
screenshot capture can enter the same GL context
the DeckLink completion path is still too close to render work

That means brief input, preview, readback, or callback stalls can still collide on the most timing-sensitive path.

Phase 4 should turn GL from a shared resource guarded by a lock into a resource owned by one thread with explicit queues and handoff points.

Goals

Phase 4 should establish:

one render thread as the sole long-lived owner of the GL context
non-render threads enqueue work instead of binding the GL context
input upload requests are accepted and executed by the render thread
output frame rendering is requested or scheduled through render-owned work
preview and screenshot requests become render-thread commands or consumers
RenderFrameInput / RenderFrameState become the stable data contract for frame production
GL context entrypoints are reduced to render-thread-only code paths
tests for queue semantics and request coalescing without requiring DeckLink hardware, plus explicit lifecycle ordering in code

Non-Goals

Phase 4 should not require:

the final producer/consumer playout queue for DeckLink
the final DeckLink lifecycle state machine
replacing the async readback policy
implementing background persistence
completing Phase 5's deeper live-state layering
replacing every UI or backend API at once

Those are later phases or follow-on work. Phase 4 is about making GL ownership deterministic first.

Current GL Entry Points

The current code paths that matter most are:

Entry point	Current behavior	Phase 4 direction
`RenderEngine::QueueInputFrame(...)`	copies the latest input frame into the render mailbox and returns without waiting for GL	render thread uploads latest input without callback-owned GL
`RenderEngine::RequestOutputFrame(...)`	synchronous output request; after render-thread startup it queues output render work and waits for render-thread completion with timeout/failure reporting	render thread executes output frame production
`RenderEngine::TryPresentPreview(...)`	best-effort request; callers queue preview presentation and return	render thread consumes latest completed frame for preview
`RenderEngine::RequestScreenshotCapture(...)`	queues screenshot capture and async disk write completion	screenshot capture is a render-thread command
`OpenGLVideoIOBridge::UploadInputFrame(...)`	copies the latest input frame into the render mailbox and returns without waiting for GL	render thread uploads the latest queued input frame
`OpenGLVideoIOBridge::RenderScheduledFrame(...)`	requests render-thread output production and reports success/failure to the backend	consume render-produced output without callback-owned GL

Target Ownership Model

Render Thread

The render thread should own:

wglMakeCurrent(...) for the rendering context
all GL resource creation/destruction
input texture upload
pass execution
output pack conversion
async readback buffers and fences
preview presentation or preview frame publication
screenshot readback
temporal history and feedback resources

Other Threads

Other threads may:

enqueue input frames or replace the latest input frame
publish control/runtime/backend events
request shader build application
request render-local resets
request screenshots
consume ready output frames or receive completion notifications

Other threads should not:

call GL directly
bind or unbind the render context
wait on GL fences directly
mutate render-local resource state

Proposed Collaborators

`RenderThread`

Owns the OS thread, wakeup primitive, lifecycle, and render-loop execution.

Responsibilities:

start and stop the render thread
bind the GL context for the thread lifetime or render-loop lifetime
drain render commands
execute frame production work
publish lifecycle and failure observations

Non-responsibilities:

runtime mutation policy
DeckLink scheduling policy
durable persistence

`RenderCommandQueue`

Small bounded queue or command mailbox for render-thread work.

Current implementation:

RenderCommandQueue exists as a pure C++ mailbox helper.
Preview present and screenshot capture requests use latest-value coalescing.
Input upload requests use latest-value coalescing with owned frame bytes copied at enqueue time.
Output frame requests use FIFO semantics so scheduled output demand is not collapsed.
Render-local reset requests coalesce to the strongest pending reset scope.
Output frame requests use synchronous request/response through the render thread as the remaining transitional playout bridge.

Possible commands:

UploadInputFrame
RenderOutputFrame
PrepareFrameState
ApplyShaderBuild
ResetTemporalHistory
ResetShaderFeedback
PresentPreview
CaptureScreenshot
Stop

High-rate commands should be coalesced where appropriate. Input frames should likely be latest-value rather than unbounded FIFO.

`RenderFrameCoordinator`

Optional helper that combines Phase 3's frame contract with render-thread execution.

Responsibilities:

build or receive RenderFrameInput
call RuntimeServiceLiveBridge and RenderFrameStateResolver
hand RenderFrameState to RenderEngine

This can begin as a thin helper. The important part is that it keeps frame-state preparation explicit when renderEffect() stops being called directly from the callback path.

`RenderOutputMailbox`

Optional transitional bridge for output frames.

Responsibilities:

hold the latest completed output frame or a small bounded set
let backend code consume output without owning GL
report underrun/stale-frame reuse observations

This may be a Phase 4 late step or a Phase 7 playout-policy step. Phase 4 should at least avoid designing the render thread in a way that blocks it.

Threading Contract

Phase 4 should make thread ownership visible in APIs.

Candidate naming:

RenderEngine::StartRenderThread(...)
RenderEngine::StopRenderThread()
RenderEngine::EnqueueInputFrame(...)
RenderEngine::RequestOutputFrame(...)
RenderEngine::RequestPreviewPresent(...)
RenderEngine::RequestScreenshot(...)

Render-thread-only methods should be private or clearly named:

RenderEngine::UploadInputFrameOnRenderThread(...)
RenderEngine::RenderOutputFrameOnRenderThread(...)
RenderEngine::CaptureOutputFrameRgbaTopDownOnRenderThread(...)

The public runtime entrypoints now use queue/request language. RequestOutputFrame(...) remains synchronous so the existing DeckLink callback path can keep producing an output frame while Phase 7's producer/consumer playout queue is still future work.

Frame Production Shape

A target render-thread frame should look like:

wake for input, output demand, preview demand, shader build, reset, screenshot, or stop
drain bounded render commands
coalesce to the latest input frame and latest control/live state
build RenderFrameInput
prepare RenderFrameState
upload accepted input frame
render layer stack
pack output if needed
stage readback or output buffer
publish preview/screenshot/output completion as needed
record timing and queue metrics

The exact cadence can remain demand-driven initially. The architectural win is that the demand wakes the render thread rather than borrowing GL from the caller.

Migration Plan

Step 1. Name Render-Thread-Only Methods

Split existing direct GL methods into public request methods and private render-thread methods without changing behavior much.

Initial target:

keep current synchronous behavior where callers need a result
move GL bodies into clearly render-thread-owned helpers for upload, output render, preview presentation, and screenshot readback
make future queue migration mechanical

Step 2. Add Render Command Queue

Introduce a small queue/mailbox for render commands.

Start with low-risk commands:

preview present request
screenshot request
render-local reset requests
input upload request
output render request

The queue and wakeup behavior still need the dedicated render thread before the callbacks stop borrowing the GL context.

Step 3. Start A Dedicated Render Thread

Create the render thread and make it own context binding.

create a dedicated render thread owned by RenderEngine
bind the existing GL context on the render thread for normal runtime work
stop the render thread before GL context destruction
keep transitional synchronous request/response for output frames
remove normal runtime dependence on the shared GL CRITICAL_SECTION
add timeout/failure behavior for render-thread requests

Transitional behavior still allows synchronous request/response for output frames. Render-thread requests now fail fast if they cannot begin within the request timeout, and log over-budget tasks that have already started before waiting for safe completion. The important change is that the caller waits for render-thread completion rather than taking the GL context itself.

Step 4. Move Input Upload To The Render Thread

Change OpenGLVideoIOBridge::UploadInputFrame(...) so it enqueues or replaces the latest input frame.

Policy targets:

bounded memory
latest-frame wins under load
input upload skip count is observable through render command coalescing metrics
input callback never waits for GL

Current implementation: OpenGLVideoIOBridge::UploadInputFrame(...) calls RenderEngine::QueueInputFrame(...), which copies the input bytes into the latest-value render mailbox and schedules one bounded render-thread wakeup to upload the newest pending frame.

Step 5. Move Output Rendering To The Render Thread

Change OpenGLVideoIOBridge::RenderScheduledFrame(...) so it requests render-thread output production or consumes a completed render-thread output.

Transitional option:

synchronous request/response through the render thread

Better follow-up:

render ahead into a bounded output queue and let backend callbacks consume ready frames

Current implementation: OpenGLVideoIOBridge::RenderScheduledFrame(...) calls RenderEngine::RequestOutputFrame(...) and returns whether the render-thread request produced an output frame. VideoBackend skips scheduling that frame when render production fails or times out.

Step 6. Decouple Preview And Screenshot Requests

Preview should become best-effort:

request preview presentation from the render thread
skip/coalesce when render is busy or output deadline pressure is high
record preview skips through render command coalescing metrics

Screenshot should become:

queued render-thread capture request
async disk write remains outside render thread

Current implementation: OpenGLComposite::RequestScreenshot(...) builds the output path, queues RenderEngine::RequestScreenshotCapture(...), and the render thread captures pixels before handing them to the existing async PNG writer. Preview presentation is a latest-value best-effort render command that is queued behind output render work, even when requested from the render pipeline.

Step 7. Remove Shared GL Lock From Normal Paths

Once all GL entrypoints are render-thread-owned:

remove normal dependence on pMutex for render correctness
keep diagnostics that detect wrong-thread render-thread helper calls
leave only lifecycle context binding where needed

Current implementation: OpenGLComposite no longer owns or passes a shared CRITICAL_SECTION, and RenderEngine no longer has caller-thread GL fallback paths for preview, input upload, output render, or screenshot capture. Runtime callers must go through the render thread; pre-start direct GL fallback is limited to startup initialization while the app explicitly owns the context.

Shutdown Order

Current shutdown order is explicit in code:

OpenGLComposite::Stop() stops runtime services so control/OSC work stops entering the runtime.
VideoBackend::Stop() stops DeckLink streams/playout so input and output callbacks stop requesting render work.
RenderEngine::StopRenderThread() destroys GL resources on the render thread, signals the render thread to stop, joins it, and unbinds the context on render-thread exit.
WM_DESTROY deletes OpenGLComposite, unbinds the window context, and deletes the GL context.

This order is build-tested, and RenderCommandQueue behavior is covered by non-GL unit tests. It still benefits from a real-window/DeckLink shutdown smoke test, but the code path is explicit enough for Phase 4's design exit.

Testing Strategy

Phase 4 tests should avoid hardware where possible.

Recommended tests:

render command queue preserves FIFO for non-coalesced commands
latest-input mailbox drops older frames under load
shutdown path stops backend callbacks before stopping and joining the render thread
screenshot request receives one completion or failure
output render request reports failure if render thread is stopped
render reset commands coalesce where expected
wrong-thread render-only diagnostics are present on private render-thread helpers

Existing useful homes:

RuntimeEventTypeTests for new render/backend observations
RuntimeSubsystemTests for pure request/coalescing helpers
a future RenderThreadTests target if render-thread lifecycle is extracted behind a non-GL test seam

Manual verification will still be needed for:

real DeckLink input/output
preview interaction
screenshot capture
shader reload while rendering
real window/context shutdown

Telemetry Added During Phase 4

Phase 4 should add minimal metrics while moving ownership:

render command queue depth
input frames accepted, replaced, and dropped
render-thread wake reason counts
render-thread frame duration
output request latency
preview request skipped count
screenshot request success/failure count
wrong-thread GL call diagnostics if practical

Full operational reporting remains Phase 8, but these metrics make the threading migration debuggable.

Risks

Deadlock Risk

Synchronous request/response shims can deadlock if the caller is already on the render thread or holds a lock the render thread needs. Phase 4 should keep request waits narrow and add render-thread detection early.

Latency Risk

Moving work through queues can hide latency. Queue depth and output request latency should be measured from the first migration step.

Lifetime Risk

Moving context ownership changes startup and shutdown order. The render thread must stop before GL resources or window/context handles are destroyed.

Callback Pressure Risk

If DeckLink callbacks wait too long for render-thread work, Phase 4 may improve GL ownership but still leave callback timing fragile. A synchronous bridge is acceptable as a transition, but the design should keep the path open for producer/consumer playout.

Preview Coupling Risk

Preview can remain a hidden budget consumer if it stays in the output frame path. Phase 4 should keep preview explicitly best-effort, even if physical decoupling continues later.

Phase 4 Exit Criteria

Phase 4 can be considered complete once the project can say:

one render thread owns the GL context during normal operation
input callbacks do not bind GL or wait on GL upload
output callbacks do not bind GL directly
preview and screenshot requests enter render through explicit render-thread requests
RenderFrameInput / RenderFrameState remain the frame-state contract
normal frame production no longer depends on a shared GL CRITICAL_SECTION
render-thread queue/mailbox behavior has non-GL tests
shutdown order is explicit and tested or manually verified

Open Questions

What exact producer/consumer output queue shape should replace the current synchronous output request in Phase 7?
Should preview present on the render thread, or should render publish a preview image/texture to a separate presenter?
Should wrong-thread GL access eventually escalate from debug diagnostics to structured telemetry or assertions?

Short Version

Phase 4 should make GL ownership boring and deterministic.

One render thread owns the context. Other threads submit work or consume results. Input upload, frame rendering, readback, preview, and screenshot capture all move behind render-thread entrypoints. Output production remains a synchronous request/response bridge for now, but the app no longer relies on callback and UI paths borrowing the GL context under one shared lock.

19 KiB Raw Blame History

Phase 4 Design: Render Thread Ownership

Status

Why Phase 4 Exists

Goals

Non-Goals

Current GL Entry Points

Target Ownership Model

Render Thread

Other Threads

Proposed Collaborators

RenderThread

RenderCommandQueue

RenderFrameCoordinator

RenderOutputMailbox

Threading Contract

Frame Production Shape

Migration Plan

Step 1. Name Render-Thread-Only Methods

Step 2. Add Render Command Queue

Step 3. Start A Dedicated Render Thread

Step 4. Move Input Upload To The Render Thread

Step 5. Move Output Rendering To The Render Thread

Step 6. Decouple Preview And Screenshot Requests

Step 7. Remove Shared GL Lock From Normal Paths

Shutdown Order

Testing Strategy

Telemetry Added During Phase 4

Risks

Deadlock Risk

Latency Risk

Lifetime Risk

Callback Pressure Risk

Preview Coupling Risk

Phase 4 Exit Criteria

Open Questions

Short Version

19 KiB

Raw Blame History

`RenderThread`

`RenderCommandQueue`

`RenderFrameCoordinator`

`RenderOutputMailbox`