392 lines
17 KiB
Markdown
392 lines
17 KiB
Markdown
# Phase 4 Design: Render Thread Ownership
|
|
|
|
This document expands Phase 4 of [ARCHITECTURE_RESILIENCE_REVIEW.md](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/docs/ARCHITECTURE_RESILIENCE_REVIEW.md) into a concrete design target.
|
|
|
|
Phase 1 named the subsystems. Phase 2 added the typed event substrate. Phase 3 made render-facing live state explicit through `RuntimeLiveState`, `RenderStateComposer`, `RenderFrameInput`, `RenderFrameState`, `RenderFrameStateResolver`, and `RuntimeServiceLiveBridge`. Phase 4 can now focus on the core timing-risk boundary: making one render thread the only owner of OpenGL work.
|
|
|
|
## Status
|
|
|
|
- Phase 4 design package: proposed.
|
|
- Phase 4 implementation: Step 3 started. The existing synchronous `RenderEngine` entrypoints delegate their GL bodies to named `...OnRenderThread(...)` helpers, preview/screenshot/render-reset/input-upload/output-render requests pass through a small `RenderCommandQueue` compatibility mailbox, and `RenderEngine` now starts a dedicated render thread for normal runtime GL work.
|
|
- Current alignment: the repo has a named frame-state contract and cleaner render-state preparation. Normal runtime GL work is routed through the render thread after startup, while startup initialization still runs before the render thread is started.
|
|
|
|
Current GL ownership footholds:
|
|
|
|
- `RenderEngine` owns GL resources, a dedicated render thread, the current synchronous compatibility shims, a small render command mailbox, and named render-thread helper methods.
|
|
- `RenderFrameInput` / `RenderFrameState` provide the frame-state contract that a render thread can consume.
|
|
- `RenderFrameStateResolver` prepares the render-facing layer state before drawing.
|
|
- `OpenGLVideoIOBridge` still calls `RenderEngine::TryUploadInputFrame(...)` from the input path and `RenderEngine::RenderOutputFrame(...)` from the output path.
|
|
- `OpenGLComposite::paintGL(...)`, screenshot capture, input upload, and output rendering still call synchronous `RenderEngine` methods, but those methods now invoke render-thread work once `OpenGLComposite::Start()` has started the render thread.
|
|
|
|
## Why Phase 4 Exists
|
|
|
|
The resilience review identifies shared GL ownership as the main remaining timing and failure-isolation risk. Today the shared context lock protects correctness, but it does not isolate timing:
|
|
|
|
- input callbacks can attempt texture upload
|
|
- output callbacks can trigger frame rendering and readback
|
|
- preview paint can enter the same GL context
|
|
- screenshot capture can enter the same GL context
|
|
- the DeckLink completion path is still too close to render work
|
|
|
|
That means brief input, preview, readback, or callback stalls can still collide on the most timing-sensitive path.
|
|
|
|
Phase 4 should turn GL from a shared resource guarded by a lock into a resource owned by one thread with explicit queues and handoff points.
|
|
|
|
## Goals
|
|
|
|
Phase 4 should establish:
|
|
|
|
- one render thread as the sole long-lived owner of the GL context
|
|
- non-render threads enqueue work instead of binding the GL context
|
|
- input upload requests are accepted and executed by the render thread
|
|
- output frame rendering is requested or scheduled through render-owned work
|
|
- preview and screenshot requests become render-thread commands or consumers
|
|
- `RenderFrameInput` / `RenderFrameState` become the stable data contract for frame production
|
|
- GL context entrypoints are reduced to render-thread-only code paths
|
|
- tests for queue semantics, request coalescing, and lifecycle behavior without requiring DeckLink hardware
|
|
|
|
## Non-Goals
|
|
|
|
Phase 4 should not require:
|
|
|
|
- the final producer/consumer playout queue for DeckLink
|
|
- the final DeckLink lifecycle state machine
|
|
- replacing the async readback policy
|
|
- implementing background persistence
|
|
- completing Phase 5's deeper live-state layering
|
|
- replacing every UI or backend API at once
|
|
|
|
Those are later phases or follow-on work. Phase 4 is about making GL ownership deterministic first.
|
|
|
|
## Current GL Entry Points
|
|
|
|
The current code paths that matter most are:
|
|
|
|
| Entry point | Current behavior | Phase 4 direction |
|
|
| --- | --- | --- |
|
|
| `RenderEngine::TryUploadInputFrame(...)` | synchronous compatibility shim; after render-thread startup it queues input upload work and waits for render-thread completion | enqueue latest input frame; render thread uploads without callback-owned GL |
|
|
| `RenderEngine::RenderOutputFrame(...)` | synchronous compatibility shim; after render-thread startup it queues output render work and waits for render-thread completion | render thread executes output frame production |
|
|
| `RenderEngine::TryPresentPreview(...)` | synchronous compatibility shim; after render-thread startup it queues preview presentation and waits for render-thread completion | render thread or preview presenter consumes latest completed frame |
|
|
| `RenderEngine::CaptureOutputFrameRgbaTopDown(...)` | synchronous compatibility shim; after render-thread startup it queues screenshot readback and waits for render-thread completion | screenshot request becomes render-thread command |
|
|
| `OpenGLVideoIOBridge::UploadInputFrame(...)` | calls render upload directly | push input frame into render queue/mailbox |
|
|
| `OpenGLVideoIOBridge::RenderScheduledFrame(...)` | calls render output directly from backend path | request/consume render-produced output without callback-owned GL |
|
|
|
|
## Target Ownership Model
|
|
|
|
### Render Thread
|
|
|
|
The render thread should own:
|
|
|
|
- `wglMakeCurrent(...)` for the rendering context
|
|
- all GL resource creation/destruction
|
|
- input texture upload
|
|
- pass execution
|
|
- output pack conversion
|
|
- async readback buffers and fences
|
|
- preview presentation or preview frame publication
|
|
- screenshot readback
|
|
- temporal history and feedback resources
|
|
|
|
### Other Threads
|
|
|
|
Other threads may:
|
|
|
|
- enqueue input frames or replace the latest input frame
|
|
- publish control/runtime/backend events
|
|
- request shader build application
|
|
- request render-local resets
|
|
- request screenshots
|
|
- consume ready output frames or receive completion notifications
|
|
|
|
Other threads should not:
|
|
|
|
- call GL directly
|
|
- bind or unbind the render context
|
|
- wait on GL fences directly
|
|
- mutate render-local resource state
|
|
|
|
## Proposed Collaborators
|
|
|
|
### `RenderThread`
|
|
|
|
Owns the OS thread, wakeup primitive, lifecycle, and render-loop execution.
|
|
|
|
Responsibilities:
|
|
|
|
- start and stop the render thread
|
|
- bind the GL context for the thread lifetime or render-loop lifetime
|
|
- drain render commands
|
|
- execute frame production work
|
|
- publish lifecycle and failure observations
|
|
|
|
Non-responsibilities:
|
|
|
|
- runtime mutation policy
|
|
- DeckLink scheduling policy
|
|
- durable persistence
|
|
|
|
### `RenderCommandQueue`
|
|
|
|
Small bounded queue or command mailbox for render-thread work.
|
|
|
|
Current implementation:
|
|
|
|
- `RenderCommandQueue` exists as a pure C++ mailbox helper.
|
|
- Preview present and screenshot capture requests use latest-value coalescing.
|
|
- Input upload requests use latest-value coalescing. During the compatibility phase the input frame memory is still drained immediately; a real render thread will need copied or otherwise owned frame storage.
|
|
- Output frame requests use FIFO semantics so scheduled output demand is not collapsed.
|
|
- Render-local reset requests coalesce to the strongest pending reset scope.
|
|
- The synchronous compatibility shims submit queued work to the render thread and wait for completion once the render thread is running.
|
|
|
|
Possible commands:
|
|
|
|
- `UploadInputFrame`
|
|
- `RenderOutputFrame`
|
|
- `PrepareFrameState`
|
|
- `ApplyShaderBuild`
|
|
- `ResetTemporalHistory`
|
|
- `ResetShaderFeedback`
|
|
- `PresentPreview`
|
|
- `CaptureScreenshot`
|
|
- `Stop`
|
|
|
|
High-rate commands should be coalesced where appropriate. Input frames should likely be latest-value rather than unbounded FIFO.
|
|
|
|
### `RenderFrameCoordinator`
|
|
|
|
Optional helper that combines Phase 3's frame contract with render-thread execution.
|
|
|
|
Responsibilities:
|
|
|
|
- build or receive `RenderFrameInput`
|
|
- call `RuntimeServiceLiveBridge` and `RenderFrameStateResolver`
|
|
- hand `RenderFrameState` to `RenderEngine`
|
|
|
|
This can begin as a thin helper. The important part is that it keeps frame-state preparation explicit when `renderEffect()` stops being called directly from the callback path.
|
|
|
|
### `RenderOutputMailbox`
|
|
|
|
Optional transitional bridge for output frames.
|
|
|
|
Responsibilities:
|
|
|
|
- hold the latest completed output frame or a small bounded set
|
|
- let backend code consume output without owning GL
|
|
- report underrun/stale-frame reuse observations
|
|
|
|
This may be a Phase 4 late step or a Phase 7 playout-policy step. Phase 4 should at least avoid designing the render thread in a way that blocks it.
|
|
|
|
## Threading Contract
|
|
|
|
Phase 4 should make thread ownership visible in APIs.
|
|
|
|
Candidate naming:
|
|
|
|
- `RenderEngine::StartRenderThread(...)`
|
|
- `RenderEngine::StopRenderThread()`
|
|
- `RenderEngine::EnqueueInputFrame(...)`
|
|
- `RenderEngine::RequestOutputFrame(...)`
|
|
- `RenderEngine::RequestPreviewPresent(...)`
|
|
- `RenderEngine::RequestScreenshot(...)`
|
|
|
|
Render-thread-only methods should be private or clearly named:
|
|
|
|
- `RenderEngine::UploadInputFrameOnRenderThread(...)`
|
|
- `RenderEngine::RenderOutputFrameOnRenderThread(...)`
|
|
- `RenderEngine::CaptureOutputFrameRgbaTopDownOnRenderThread(...)`
|
|
|
|
The current `TryUploadInputFrame`, `RenderOutputFrame`, `TryPresentPreview`, and `CaptureOutputFrameRgbaTopDown` methods can remain as compatibility shims during migration, but their implementations should move toward enqueue-and-wait or enqueue-and-return behavior instead of binding GL directly from the caller's thread.
|
|
|
|
## Frame Production Shape
|
|
|
|
A target render-thread frame should look like:
|
|
|
|
1. wake for input, output demand, preview demand, shader build, reset, screenshot, or stop
|
|
2. drain bounded render commands
|
|
3. coalesce to the latest input frame and latest control/live state
|
|
4. build `RenderFrameInput`
|
|
5. prepare `RenderFrameState`
|
|
6. upload accepted input frame
|
|
7. render layer stack
|
|
8. pack output if needed
|
|
9. stage readback or output buffer
|
|
10. publish preview/screenshot/output completion as needed
|
|
11. record timing and queue metrics
|
|
|
|
The exact cadence can remain demand-driven initially. The architectural win is that the demand wakes the render thread rather than borrowing GL from the caller.
|
|
|
|
## Migration Plan
|
|
|
|
### Step 1. Name Render-Thread-Only Methods
|
|
|
|
Split existing direct GL methods into public request methods and private render-thread methods without changing behavior much.
|
|
|
|
Initial target:
|
|
|
|
- [x] keep current synchronous behavior where callers need a result
|
|
- [x] move GL bodies into clearly render-thread-owned helpers for upload, output render, preview presentation, and screenshot readback
|
|
- [x] make future queue migration mechanical
|
|
|
|
### Step 2. Add Render Command Queue
|
|
|
|
Introduce a small queue/mailbox for render commands.
|
|
|
|
Start with low-risk commands:
|
|
|
|
- [x] preview present request
|
|
- [x] screenshot request
|
|
- [x] render-local reset requests
|
|
- [x] input upload request
|
|
- [x] output render request
|
|
|
|
The queue and wakeup behavior still need the dedicated render thread before the callbacks stop borrowing the GL context.
|
|
|
|
### Step 3. Start A Dedicated Render Thread
|
|
|
|
Create the render thread and make it own context binding.
|
|
|
|
- [x] create a dedicated render thread owned by `RenderEngine`
|
|
- [x] bind the existing GL context on the render thread for normal runtime work
|
|
- [x] stop the render thread before GL context destruction
|
|
- [x] keep transitional synchronous request/response for output frames
|
|
- [x] remove normal runtime dependence on the shared GL `CRITICAL_SECTION`
|
|
- [x] add timeout/failure behavior for render-thread requests
|
|
|
|
Transitional behavior still allows synchronous request/response for output frames. Render-thread requests now fail fast if they cannot begin within the request timeout, and log over-budget tasks that have already started before waiting for safe completion. The important change is that the caller waits for render-thread completion rather than taking the GL context itself.
|
|
|
|
### Step 4. Move Input Upload To The Render Thread
|
|
|
|
Change `OpenGLVideoIOBridge::UploadInputFrame(...)` so it enqueues or replaces the latest input frame.
|
|
|
|
Policy targets:
|
|
|
|
- bounded memory
|
|
- latest-frame wins under load
|
|
- input upload skip count is observable
|
|
- input callback never waits for GL
|
|
|
|
### Step 5. Move Output Rendering To The Render Thread
|
|
|
|
Change `OpenGLVideoIOBridge::RenderScheduledFrame(...)` so it requests render-thread output production or consumes a completed render-thread output.
|
|
|
|
Transitional option:
|
|
|
|
- synchronous request/response through the render thread
|
|
|
|
Better follow-up:
|
|
|
|
- render ahead into a bounded output queue and let backend callbacks consume ready frames
|
|
|
|
### Step 6. Decouple Preview And Screenshot Requests
|
|
|
|
Preview should become best-effort:
|
|
|
|
- request preview presentation from the render thread
|
|
- skip when render is busy or output deadline pressure is high
|
|
- record preview skips
|
|
|
|
Screenshot should become:
|
|
|
|
- queued render-thread capture request
|
|
- async disk write remains outside render thread
|
|
|
|
### Step 7. Remove Shared GL Lock From Normal Paths
|
|
|
|
Once all GL entrypoints are render-thread-owned:
|
|
|
|
- remove normal dependence on `pMutex` for render correctness
|
|
- keep assertions or diagnostics that detect wrong-thread GL calls
|
|
- leave only lifecycle synchronization where needed
|
|
|
|
## Testing Strategy
|
|
|
|
Phase 4 tests should avoid hardware where possible.
|
|
|
|
Recommended tests:
|
|
|
|
- render command queue preserves FIFO for non-coalesced commands
|
|
- latest-input mailbox drops older frames under load
|
|
- stop command wakes and drains the render thread
|
|
- screenshot request receives one completion or failure
|
|
- output render request reports timeout/failure if render thread is stopped
|
|
- render reset commands coalesce where expected
|
|
- wrong-thread render-only methods are not publicly reachable
|
|
|
|
Existing useful homes:
|
|
|
|
- `RuntimeEventTypeTests` for new render/backend observations
|
|
- `RuntimeSubsystemTests` for pure request/coalescing helpers
|
|
- a new `RenderThreadTests` target for queue/mailbox/lifecycle helpers that do not require GL
|
|
|
|
Manual verification will still be needed for:
|
|
|
|
- real DeckLink input/output
|
|
- preview interaction
|
|
- screenshot capture
|
|
- shader reload while rendering
|
|
|
|
## Telemetry Added During Phase 4
|
|
|
|
Phase 4 should add minimal metrics while moving ownership:
|
|
|
|
- render command queue depth
|
|
- input frames accepted, replaced, and dropped
|
|
- render-thread wake reason counts
|
|
- render-thread frame duration
|
|
- output request latency
|
|
- preview request skipped count
|
|
- screenshot request success/failure count
|
|
- wrong-thread GL call diagnostics if practical
|
|
|
|
Full operational reporting remains Phase 8, but these metrics make the threading migration debuggable.
|
|
|
|
## Risks
|
|
|
|
### Deadlock Risk
|
|
|
|
Synchronous request/response shims can deadlock if the caller is already on the render thread or holds a lock the render thread needs. Phase 4 should keep request waits narrow and add render-thread detection early.
|
|
|
|
### Latency Risk
|
|
|
|
Moving work through queues can hide latency. Queue depth and output request latency should be measured from the first migration step.
|
|
|
|
### Lifetime Risk
|
|
|
|
Moving context ownership changes startup and shutdown order. The render thread must stop before GL resources or window/context handles are destroyed.
|
|
|
|
### Callback Pressure Risk
|
|
|
|
If DeckLink callbacks wait too long for render-thread work, Phase 4 may improve GL ownership but still leave callback timing fragile. A synchronous bridge is acceptable as a transition, but the design should keep the path open for producer/consumer playout.
|
|
|
|
### Preview Coupling Risk
|
|
|
|
Preview can remain a hidden budget consumer if it stays in the output frame path. Phase 4 should keep preview explicitly best-effort, even if physical decoupling continues later.
|
|
|
|
## Phase 4 Exit Criteria
|
|
|
|
Phase 4 can be considered complete once the project can say:
|
|
|
|
- [ ] one render thread owns the GL context during normal operation
|
|
- [ ] input callbacks do not bind GL or wait on GL upload
|
|
- [ ] output callbacks do not bind GL directly
|
|
- [ ] preview and screenshot requests enter render through explicit render-thread requests
|
|
- [ ] `RenderFrameInput` / `RenderFrameState` remain the frame-state contract
|
|
- [ ] normal frame production no longer depends on a shared GL `CRITICAL_SECTION`
|
|
- [ ] render-thread queue/mailbox behavior has non-GL tests
|
|
- [ ] shutdown order is explicit and tested or manually verified
|
|
|
|
## Open Questions
|
|
|
|
- Should the first output migration be synchronous request/response, or should Phase 4 go directly to a small ready-frame queue?
|
|
- Should the render thread own `RuntimeServiceLiveBridge` calls, or should frame state be prepared just before enqueue?
|
|
- How much input frame memory should be copied at enqueue time versus referenced from backend-owned buffers?
|
|
- Should preview present on the render thread, or should render publish a preview image/texture to a separate presenter?
|
|
- What timeout should output callbacks use if the render thread cannot produce a frame in time?
|
|
- Should wrong-thread GL access be enforced with assertions, telemetry, or both?
|
|
|
|
## Short Version
|
|
|
|
Phase 4 should make GL ownership boring and deterministic.
|
|
|
|
One render thread owns the context. Other threads submit work or consume results. Input upload, frame rendering, readback, preview, and screenshot capture all move behind render-thread entrypoints. The first implementation can be transitional and partly synchronous, but after Phase 4 the app should no longer rely on callback and UI paths borrowing the GL context under one shared lock.
|