video-shader-toys/docs/PHASE_4_RENDER_THREAD_OWNERSHIP_DESIGN.md

# Phase 4 Design: Render Thread Ownership

This document expands Phase 4 of [ARCHITECTURE_RESILIENCE_REVIEW.md](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/docs/ARCHITECTURE_RESILIENCE_REVIEW.md) into a concrete design target.

Phase 1 named the subsystems. Phase 2 added the typed event substrate. Phase 3 made render-facing live state explicit through `RuntimeLiveState`, `RenderStateComposer`, `RenderFrameInput`, `RenderFrameState`, `RenderFrameStateResolver`, and `RuntimeServiceLiveBridge`. Phase 4 can now focus on the core timing-risk boundary: making one render thread the only owner of OpenGL work.

## Status

- Phase 4 design package: proposed.
- Phase 4 implementation: Step 3 started. The existing synchronous `RenderEngine` entrypoints delegate their GL bodies to named `...OnRenderThread(...)` helpers, preview/screenshot/render-reset/input-upload/output-render requests pass through a small `RenderCommandQueue` compatibility mailbox, and `RenderEngine` now starts a dedicated render thread for normal runtime GL work.
- Current alignment: the repo has a named frame-state contract and cleaner render-state preparation. Normal runtime GL work is routed through the render thread after startup, while startup initialization still runs before the render thread is started.

Current GL ownership footholds:

- `RenderEngine` owns GL resources, a dedicated render thread, the current synchronous compatibility shims, a small render command mailbox, and named render-thread helper methods.
- `RenderFrameInput` / `RenderFrameState` provide the frame-state contract that a render thread can consume.
- `RenderFrameStateResolver` prepares the render-facing layer state before drawing.
- `OpenGLVideoIOBridge` still calls `RenderEngine::TryUploadInputFrame(...)` from the input path and `RenderEngine::RenderOutputFrame(...)` from the output path.
- `OpenGLComposite::paintGL(...)`, screenshot capture, input upload, and output rendering still call synchronous `RenderEngine` methods, but those methods now invoke render-thread work once `OpenGLComposite::Start()` has started the render thread.

## Why Phase 4 Exists

The resilience review identifies shared GL ownership as the main remaining timing and failure-isolation risk. Today the shared context lock protects correctness, but it does not isolate timing:

- input callbacks can attempt texture upload
- output callbacks can trigger frame rendering and readback
- preview paint can enter the same GL context
- screenshot capture can enter the same GL context
- the DeckLink completion path is still too close to render work

That means brief input, preview, readback, or callback stalls can still collide on the most timing-sensitive path.

Phase 4 should turn GL from a shared resource guarded by a lock into a resource owned by one thread with explicit queues and handoff points.

## Goals

Phase 4 should establish:

- one render thread as the sole long-lived owner of the GL context
- non-render threads enqueue work instead of binding the GL context
- input upload requests are accepted and executed by the render thread
- output frame rendering is requested or scheduled through render-owned work
- preview and screenshot requests become render-thread commands or consumers
- `RenderFrameInput` / `RenderFrameState` become the stable data contract for frame production
- GL context entrypoints are reduced to render-thread-only code paths
- tests for queue semantics, request coalescing, and lifecycle behavior without requiring DeckLink hardware

## Non-Goals

Phase 4 should not require:

- the final producer/consumer playout queue for DeckLink
- the final DeckLink lifecycle state machine
- replacing the async readback policy
- implementing background persistence
- completing Phase 5's deeper live-state layering
- replacing every UI or backend API at once

Those are later phases or follow-on work. Phase 4 is about making GL ownership deterministic first.

## Current GL Entry Points

The current code paths that matter most are:

| Entry point | Current behavior | Phase 4 direction |
| --- | --- | --- |
| `RenderEngine::TryUploadInputFrame(...)` | synchronous compatibility shim; after render-thread startup it queues input upload work and waits for render-thread completion | enqueue latest input frame; render thread uploads without callback-owned GL |
| `RenderEngine::RenderOutputFrame(...)` | synchronous compatibility shim; after render-thread startup it queues output render work and waits for render-thread completion | render thread executes output frame production |
| `RenderEngine::TryPresentPreview(...)` | synchronous compatibility shim; after render-thread startup it queues preview presentation and waits for render-thread completion | render thread or preview presenter consumes latest completed frame |
| `RenderEngine::CaptureOutputFrameRgbaTopDown(...)` | synchronous compatibility shim; after render-thread startup it queues screenshot readback and waits for render-thread completion | screenshot request becomes render-thread command |
| `OpenGLVideoIOBridge::UploadInputFrame(...)` | calls render upload directly | push input frame into render queue/mailbox |
| `OpenGLVideoIOBridge::RenderScheduledFrame(...)` | calls render output directly from backend path | request/consume render-produced output without callback-owned GL |

## Target Ownership Model

### Render Thread

The render thread should own:

- `wglMakeCurrent(...)` for the rendering context
- all GL resource creation/destruction
- input texture upload
- pass execution
- output pack conversion
- async readback buffers and fences
- preview presentation or preview frame publication
- screenshot readback
- temporal history and feedback resources

### Other Threads

Other threads may:

- enqueue input frames or replace the latest input frame
- publish control/runtime/backend events
- request shader build application
- request render-local resets
- request screenshots
- consume ready output frames or receive completion notifications

Other threads should not:

- call GL directly
- bind or unbind the render context
- wait on GL fences directly
- mutate render-local resource state

## Proposed Collaborators

### `RenderThread`

Owns the OS thread, wakeup primitive, lifecycle, and render-loop execution.

Responsibilities:

- start and stop the render thread
- bind the GL context for the thread lifetime or render-loop lifetime
- drain render commands
- execute frame production work
- publish lifecycle and failure observations

Non-responsibilities:

- runtime mutation policy
- DeckLink scheduling policy
- durable persistence

### `RenderCommandQueue`

Small bounded queue or command mailbox for render-thread work.

Current implementation:

- `RenderCommandQueue` exists as a pure C++ mailbox helper.
- Preview present and screenshot capture requests use latest-value coalescing.
- Input upload requests use latest-value coalescing. During the compatibility phase the input frame memory is still drained immediately; a real render thread will need copied or otherwise owned frame storage.
- Output frame requests use FIFO semantics so scheduled output demand is not collapsed.
- Render-local reset requests coalesce to the strongest pending reset scope.
- The synchronous compatibility shims submit queued work to the render thread and wait for completion once the render thread is running.

Possible commands:

- `UploadInputFrame`
- `RenderOutputFrame`
- `PrepareFrameState`
- `ApplyShaderBuild`
- `ResetTemporalHistory`
- `ResetShaderFeedback`
- `PresentPreview`
- `CaptureScreenshot`
- `Stop`

High-rate commands should be coalesced where appropriate. Input frames should likely be latest-value rather than unbounded FIFO.

### `RenderFrameCoordinator`

Optional helper that combines Phase 3's frame contract with render-thread execution.

Responsibilities:

- build or receive `RenderFrameInput`
- call `RuntimeServiceLiveBridge` and `RenderFrameStateResolver`
- hand `RenderFrameState` to `RenderEngine`

This can begin as a thin helper. The important part is that it keeps frame-state preparation explicit when `renderEffect()` stops being called directly from the callback path.

### `RenderOutputMailbox`

Optional transitional bridge for output frames.

Responsibilities:

- hold the latest completed output frame or a small bounded set
- let backend code consume output without owning GL
- report underrun/stale-frame reuse observations

This may be a Phase 4 late step or a Phase 7 playout-policy step. Phase 4 should at least avoid designing the render thread in a way that blocks it.

## Threading Contract

Phase 4 should make thread ownership visible in APIs.

Candidate naming:

- `RenderEngine::StartRenderThread(...)`
- `RenderEngine::StopRenderThread()`
- `RenderEngine::EnqueueInputFrame(...)`
- `RenderEngine::RequestOutputFrame(...)`
- `RenderEngine::RequestPreviewPresent(...)`
- `RenderEngine::RequestScreenshot(...)`

Render-thread-only methods should be private or clearly named:

- `RenderEngine::UploadInputFrameOnRenderThread(...)`
- `RenderEngine::RenderOutputFrameOnRenderThread(...)`
- `RenderEngine::CaptureOutputFrameRgbaTopDownOnRenderThread(...)`

The current `TryUploadInputFrame`, `RenderOutputFrame`, `TryPresentPreview`, and `CaptureOutputFrameRgbaTopDown` methods can remain as compatibility shims during migration, but their implementations should move toward enqueue-and-wait or enqueue-and-return behavior instead of binding GL directly from the caller's thread.

## Frame Production Shape

A target render-thread frame should look like:

1. wake for input, output demand, preview demand, shader build, reset, screenshot, or stop
2. drain bounded render commands
3. coalesce to the latest input frame and latest control/live state
4. build `RenderFrameInput`
5. prepare `RenderFrameState`
6. upload accepted input frame
7. render layer stack
8. pack output if needed
9. stage readback or output buffer
10. publish preview/screenshot/output completion as needed
11. record timing and queue metrics

The exact cadence can remain demand-driven initially. The architectural win is that the demand wakes the render thread rather than borrowing GL from the caller.

## Migration Plan

### Step 1. Name Render-Thread-Only Methods

Split existing direct GL methods into public request methods and private render-thread methods without changing behavior much.

Initial target:

- [x] keep current synchronous behavior where callers need a result
- [x] move GL bodies into clearly render-thread-owned helpers for upload, output render, preview presentation, and screenshot readback
- [x] make future queue migration mechanical

### Step 2. Add Render Command Queue

Introduce a small queue/mailbox for render commands.

Start with low-risk commands:

- [x] preview present request
- [x] screenshot request
- [x] render-local reset requests
- [x] input upload request
- [x] output render request

The queue and wakeup behavior still need the dedicated render thread before the callbacks stop borrowing the GL context.

### Step 3. Start A Dedicated Render Thread

Create the render thread and make it own context binding.

- [x] create a dedicated render thread owned by `RenderEngine`
- [x] bind the existing GL context on the render thread for normal runtime work
- [x] stop the render thread before GL context destruction
- [x] keep transitional synchronous request/response for output frames
- [x] remove normal runtime dependence on the shared GL `CRITICAL_SECTION`
- [x] add timeout/failure behavior for render-thread requests

Transitional behavior still allows synchronous request/response for output frames. Render-thread requests now fail fast if they cannot begin within the request timeout, and log over-budget tasks that have already started before waiting for safe completion. The important change is that the caller waits for render-thread completion rather than taking the GL context itself.

### Step 4. Move Input Upload To The Render Thread

Change `OpenGLVideoIOBridge::UploadInputFrame(...)` so it enqueues or replaces the latest input frame.

Policy targets:

- bounded memory
- latest-frame wins under load
- input upload skip count is observable
- input callback never waits for GL

### Step 5. Move Output Rendering To The Render Thread

Change `OpenGLVideoIOBridge::RenderScheduledFrame(...)` so it requests render-thread output production or consumes a completed render-thread output.

Transitional option:

- synchronous request/response through the render thread

Better follow-up:

- render ahead into a bounded output queue and let backend callbacks consume ready frames

### Step 6. Decouple Preview And Screenshot Requests

Preview should become best-effort:

- request preview presentation from the render thread
- skip when render is busy or output deadline pressure is high
- record preview skips

Screenshot should become:

- queued render-thread capture request
- async disk write remains outside render thread

### Step 7. Remove Shared GL Lock From Normal Paths

Once all GL entrypoints are render-thread-owned:

- remove normal dependence on `pMutex` for render correctness
- keep assertions or diagnostics that detect wrong-thread GL calls
- leave only lifecycle synchronization where needed

## Testing Strategy

Phase 4 tests should avoid hardware where possible.

Recommended tests:

- render command queue preserves FIFO for non-coalesced commands
- latest-input mailbox drops older frames under load
- stop command wakes and drains the render thread
- screenshot request receives one completion or failure
- output render request reports timeout/failure if render thread is stopped
- render reset commands coalesce where expected
- wrong-thread render-only methods are not publicly reachable

Existing useful homes:

- `RuntimeEventTypeTests` for new render/backend observations
- `RuntimeSubsystemTests` for pure request/coalescing helpers
- a new `RenderThreadTests` target for queue/mailbox/lifecycle helpers that do not require GL

Manual verification will still be needed for:

- real DeckLink input/output
- preview interaction
- screenshot capture
- shader reload while rendering

## Telemetry Added During Phase 4

Phase 4 should add minimal metrics while moving ownership:

- render command queue depth
- input frames accepted, replaced, and dropped
- render-thread wake reason counts
- render-thread frame duration
- output request latency
- preview request skipped count
- screenshot request success/failure count
- wrong-thread GL call diagnostics if practical

Full operational reporting remains Phase 8, but these metrics make the threading migration debuggable.

## Risks

### Deadlock Risk

Synchronous request/response shims can deadlock if the caller is already on the render thread or holds a lock the render thread needs. Phase 4 should keep request waits narrow and add render-thread detection early.

### Latency Risk

Moving work through queues can hide latency. Queue depth and output request latency should be measured from the first migration step.

### Lifetime Risk

Moving context ownership changes startup and shutdown order. The render thread must stop before GL resources or window/context handles are destroyed.

### Callback Pressure Risk

If DeckLink callbacks wait too long for render-thread work, Phase 4 may improve GL ownership but still leave callback timing fragile. A synchronous bridge is acceptable as a transition, but the design should keep the path open for producer/consumer playout.

### Preview Coupling Risk

Preview can remain a hidden budget consumer if it stays in the output frame path. Phase 4 should keep preview explicitly best-effort, even if physical decoupling continues later.

## Phase 4 Exit Criteria

Phase 4 can be considered complete once the project can say:

- [ ] one render thread owns the GL context during normal operation
- [ ] input callbacks do not bind GL or wait on GL upload
- [ ] output callbacks do not bind GL directly
- [ ] preview and screenshot requests enter render through explicit render-thread requests
- [ ] `RenderFrameInput` / `RenderFrameState` remain the frame-state contract
- [ ] normal frame production no longer depends on a shared GL `CRITICAL_SECTION`
- [ ] render-thread queue/mailbox behavior has non-GL tests
- [ ] shutdown order is explicit and tested or manually verified

## Open Questions

- Should the first output migration be synchronous request/response, or should Phase 4 go directly to a small ready-frame queue?
- Should the render thread own `RuntimeServiceLiveBridge` calls, or should frame state be prepared just before enqueue?
- How much input frame memory should be copied at enqueue time versus referenced from backend-owned buffers?
- Should preview present on the render thread, or should render publish a preview image/texture to a separate presenter?
- What timeout should output callbacks use if the render thread cannot produce a frame in time?
- Should wrong-thread GL access be enforced with assertions, telemetry, or both?

## Short Version

Phase 4 should make GL ownership boring and deterministic.

One render thread owns the context. Other threads submit work or consume results. Input upload, frame rendering, readback, preview, and screenshot capture all move behind render-thread entrypoints. The first implementation can be transitional and partly synchronous, but after Phase 4 the app should no longer rely on callback and UI paths borrowing the GL context under one shared lock.