449 lines
15 KiB
Markdown
449 lines
15 KiB
Markdown
# Render Thread Ownership Plan
|
|
|
|
This plan describes how to make the main compositor behave like the successful `DeckLinkRenderCadenceProbe`: one render cadence owner, one GL context owner, no unrelated work able to interrupt output frame production.
|
|
|
|
The goal is not just "all GL calls happen on one thread". The current app mostly does that during runtime already. The real goal is:
|
|
|
|
- the output render thread owns its GL context for its whole lifetime
|
|
- output cadence is driven by the render thread, not by DeckLink completion timing
|
|
- non-output GL work cannot sit ahead of output frames
|
|
- callers cannot block the render thread while waiting for synchronous answers
|
|
- DeckLink scheduling consumes completed system-memory frames and never causes rendering
|
|
|
|
## Current Risk Points
|
|
|
|
The current main app still has several ways to interrupt output cadence.
|
|
|
|
### Shared GL Executor
|
|
|
|
`RenderEngine` owns the GL context during runtime, but it acts as a general task executor.
|
|
|
|
The same queue/path can run:
|
|
|
|
- output frame render
|
|
- input upload
|
|
- preview present
|
|
- screenshot capture
|
|
- render resets
|
|
- shader/program commits
|
|
- resource resize
|
|
- state clearing
|
|
|
|
That means output frames are not guaranteed to be the next GL work item at the selected frame time.
|
|
|
|
### Synchronous Output Render Request
|
|
|
|
`VideoBackend` drives output production from its output producer thread, then calls:
|
|
|
|
```text
|
|
VideoBackend
|
|
-> OpenGLVideoIOBridge::RenderScheduledFrame
|
|
-> RenderEngine::RequestOutputFrame
|
|
-> TryInvokeOnRenderThread
|
|
```
|
|
|
|
That makes output production a request/response interaction. The producer waits for the render thread, and the render thread is still shared with other work.
|
|
|
|
### Input Upload Shares Output Context
|
|
|
|
DeckLink input capture currently flows into:
|
|
|
|
```text
|
|
VideoBackend::HandleInputFrame
|
|
-> OpenGLVideoIOBridge::UploadInputFrame
|
|
-> RenderEngine::QueueInputFrame
|
|
-> render thread upload
|
|
```
|
|
|
|
Even with coalescing, input upload can consume render-thread time and GPU bandwidth directly before output rendering.
|
|
|
|
### Preview And Screenshot Share Output Context
|
|
|
|
Preview and screenshot are lower-priority features, but today they still execute on the render thread.
|
|
|
|
Preview is best-effort at the caller side, but once queued it can still occupy the same context. Screenshot capture can be more expensive because it performs readback and CPU-side image preparation.
|
|
|
|
### Startup Context Ownership Is Transitional
|
|
|
|
The Win32 startup path creates and binds the GL context before `RenderEngine::StartRenderThread()`.
|
|
|
|
That is acceptable as a transitional state, but the final model should make context ownership explicit:
|
|
|
|
- bootstrap thread creates the window/context
|
|
- bootstrap thread releases it
|
|
- render thread binds it
|
|
- only render thread initializes GL resources
|
|
- only render thread destroys GL resources
|
|
|
|
### Render Callback Re-enters App State
|
|
|
|
`OpenGLRenderPipeline::RenderFrame()` calls a callback into `OpenGLComposite::renderEffect()`.
|
|
|
|
That callback builds `RenderFrameInput`, resolves frame state, drains runtime live state, and then calls back into `RenderEngine` to draw the prepared frame.
|
|
|
|
This works, but it means the output render path still reaches up into app/runtime code at frame time.
|
|
|
|
## Target Runtime Shape
|
|
|
|
The main app should match this ownership model:
|
|
|
|
```text
|
|
runtime/control threads
|
|
-> publish snapshots, live overlays, reset requests, shader-build results
|
|
-> never call GL
|
|
|
|
render cadence thread
|
|
-> sole owner of output GL context
|
|
-> wakes at selected render cadence
|
|
-> samples latest render input/state
|
|
-> renders one frame
|
|
-> queues async readback/copies completed readback into system-memory slot
|
|
-> publishes completed frame to bounded FIFO output reserve
|
|
|
|
video output thread
|
|
-> consumes completed system-memory frames
|
|
-> schedules DeckLink frames to target buffer depth
|
|
-> processes completion results
|
|
-> never calls GL
|
|
|
|
optional input upload path
|
|
-> writes latest input frame into CPU-side latest-frame buffer
|
|
-> render thread imports/uploads at a controlled point in its frame
|
|
|
|
preview/screenshot path
|
|
-> consumes already-rendered output/system-memory frame when possible
|
|
-> never interrupts output render cadence
|
|
```
|
|
|
|
## Non-Negotiable Rules
|
|
|
|
- The render thread never waits for DeckLink.
|
|
- DeckLink callbacks never render.
|
|
- Runtime/control threads never directly execute GL.
|
|
- Preview and screenshot never execute ahead of output frames.
|
|
- Input upload is never a separate urgent GL task ahead of output render.
|
|
- Shader/resource commits are applied only at a frame boundary.
|
|
- Telemetry on the hot path must be lock-light or try-lock only.
|
|
- The render thread cadence does not speed up to refill buffers.
|
|
- If output work overruns, the render thread records the overrun and resumes the selected cadence policy.
|
|
|
|
## Implementation Plan
|
|
|
|
### 1. Add Thread/Context Ownership Guards
|
|
|
|
Add explicit render-thread ownership checks around all GL entry points.
|
|
|
|
Deliverables:
|
|
|
|
- `RenderEngine` exposes `IsOnRenderThread()` for assertions/tests.
|
|
- GL-facing classes get debug-only owner checks where practical.
|
|
- wrong-thread GL access becomes a counted telemetry warning, not just `OutputDebugStringA`.
|
|
- tests cover that public request methods do not execute GL directly.
|
|
|
|
Acceptance:
|
|
|
|
- every `RenderEngine` public method is classified as either request-only, lifecycle-only, or render-thread-only.
|
|
- render-thread-only methods are private or guarded.
|
|
- no normal runtime caller can accidentally invoke GL work inline.
|
|
|
|
### 2. Move GL Initialization Fully Onto The Render Thread
|
|
|
|
Start the render thread before compiling shaders and initializing GL resources.
|
|
|
|
Current startup does:
|
|
|
|
```text
|
|
InitOpenGLState()
|
|
-> CompileDecodeShader
|
|
-> CompileOutputPackShader
|
|
-> InitializeResources
|
|
-> CompileLayerPrograms
|
|
StartRenderThread()
|
|
```
|
|
|
|
Move toward:
|
|
|
|
```text
|
|
create context on Win32 thread
|
|
release context on Win32 thread
|
|
StartRenderThread()
|
|
render thread binds context
|
|
render thread initializes extensions, shaders, resources
|
|
```
|
|
|
|
Deliverables:
|
|
|
|
- a single `RenderEngine::StartAndInitialize(RenderInitializationConfig)` path.
|
|
- GL extension resolution happens on the render thread.
|
|
- shader/resource initialization is a render-thread startup phase.
|
|
- `RenderEngine` destructor only destroys resources on the render thread.
|
|
|
|
Acceptance:
|
|
|
|
- after `StartRenderThread()`, no non-render thread binds or uses the app GL context.
|
|
- shutdown order is deterministic: stop video output, stop render cadence, destroy GL resources, release context.
|
|
|
|
### 3. Replace Synchronous Output Render Requests With Render-Owned Cadence
|
|
|
|
Move output cadence out of `VideoBackend` and into the render system.
|
|
|
|
Current:
|
|
|
|
```text
|
|
VideoBackend output producer
|
|
-> cadence tick
|
|
-> acquire output slot
|
|
-> synchronous render-thread request
|
|
```
|
|
|
|
Target:
|
|
|
|
```text
|
|
RenderEngine output cadence loop
|
|
-> cadence tick
|
|
-> acquire/free output slot through a non-blocking frame-sink interface
|
|
-> render frame
|
|
-> publish completed frame
|
|
```
|
|
|
|
Deliverables:
|
|
|
|
- introduce `RenderedFrameSink` or similar interface owned by video output.
|
|
- render thread pulls/claims a free system-memory slot without waiting.
|
|
- if no free slot exists, render thread drops/recycles the oldest unscheduled completed frame or records backpressure without blocking.
|
|
- remove `RenderEngine::RequestOutputFrame()` from the steady-state output path.
|
|
|
|
Acceptance:
|
|
|
|
- output rendering continues even if DeckLink completion is delayed.
|
|
- no `std::future` wait exists in the output cadence path.
|
|
- `VideoBackend` no longer owns the producer render loop; it owns scheduling/completion only.
|
|
|
|
### 4. Make The Render Thread A Frame Loop, Not A Task Queue
|
|
|
|
Keep a command mailbox, but process it only at safe frame-boundary points.
|
|
|
|
Frame loop:
|
|
|
|
```text
|
|
while running:
|
|
wait until next render timestamp
|
|
apply bounded frame-boundary commands
|
|
sample latest frame input/state
|
|
upload latest input frame if enabled and budget allows
|
|
render output frame
|
|
queue/consume readback
|
|
publish completed frame
|
|
record timings
|
|
```
|
|
|
|
Command classes:
|
|
|
|
- frame-boundary commands: reset temporal history, reset shader feedback, commit prepared shader programs
|
|
- background/low-priority commands: preview, screenshot, diagnostic readback
|
|
- non-GL commands: state publication, telemetry, persistence
|
|
|
|
Deliverables:
|
|
|
|
- replace FIFO render task queue with a priority/mailbox model.
|
|
- output cadence is the loop's main clock.
|
|
- commands have budget classes and max work per frame.
|
|
- long commands are deferred rather than blocking the current output tick.
|
|
|
|
Acceptance:
|
|
|
|
- preview/screenshot cannot run immediately before a due output frame.
|
|
- reset/shader work is applied between frames and measured.
|
|
- output render starts within a small jitter window when the GPU is not overrun.
|
|
|
|
### 5. Move Input Capture To A CPU Latest-Frame Buffer
|
|
|
|
Input capture should not enqueue independent GL upload tasks.
|
|
|
|
Target:
|
|
|
|
```text
|
|
DeckLink input callback
|
|
-> copy/coalesce latest CPU input frame
|
|
-> return quickly
|
|
|
|
render thread frame boundary
|
|
-> if input version changed, upload latest frame
|
|
-> render using last successfully uploaded input texture
|
|
```
|
|
|
|
Deliverables:
|
|
|
|
- introduce `InputFrameMailbox` with latest-frame semantics.
|
|
- remove `RenderEngine::QueueInputFrame()` from the callback path.
|
|
- render thread owns the upload moment.
|
|
- if upload would exceed budget, render thread can reuse the previous input texture and record an input-upload skip.
|
|
|
|
Acceptance:
|
|
|
|
- input capture enabled does not create arbitrary render-thread tasks.
|
|
- output cadence remains stable when input frames arrive.
|
|
- telemetry separates input-frame arrival, upload count, upload skips, and upload cost.
|
|
|
|
### 6. Move Preview To A Consumer Path
|
|
|
|
Preview should consume the latest completed output image instead of asking the output GL context to present.
|
|
|
|
Options:
|
|
|
|
- CPU preview from latest system-memory output frame.
|
|
- a separate preview GL context fed asynchronously from completed frames.
|
|
- a low-priority render-thread blit only when output has measurable slack.
|
|
|
|
Recommended first step:
|
|
|
|
- use latest system-memory BGRA8 output for the window preview.
|
|
|
|
Deliverables:
|
|
|
|
- preview reads from latest completed/scheduled output frame copy.
|
|
- `TryPresentPreview()` no longer queues GL work on the output render thread.
|
|
- preview FPS throttling remains caller-side.
|
|
|
|
Acceptance:
|
|
|
|
- forcing preview cannot delay output rendering.
|
|
- minimizing/focusing the window does not affect output cadence.
|
|
|
|
### 7. Move Screenshot To Completed Frame Capture
|
|
|
|
Screenshot should capture from the latest completed output frame unless an explicit "exact render capture" mode is requested.
|
|
|
|
Deliverables:
|
|
|
|
- screenshot request reads the latest system-memory output frame.
|
|
- PNG write remains async.
|
|
- optional diagnostic exact-GL screenshot is disabled during live output or explicitly marked disruptive.
|
|
|
|
Acceptance:
|
|
|
|
- screenshot request does not call `glReadPixels` on the output render context during steady-state playout.
|
|
|
|
### 8. Make Shader Commits Frame-Boundary Work
|
|
|
|
Prepared shader builds are CPU/background work; GL program commit is still GL work.
|
|
|
|
Deliverables:
|
|
|
|
- shader build queue produces `PreparedShaderBuild`.
|
|
- render thread sees latest pending prepared build at a frame boundary.
|
|
- commit is applied only between frames.
|
|
- expensive commits can temporarily enter a measured "render reconfigure" state.
|
|
|
|
Acceptance:
|
|
|
|
- shader commits do not interleave midway through output render.
|
|
- output timing telemetry records commit duration separately from normal render duration.
|
|
|
|
### 9. Split Output Scheduling From Rendering Completely
|
|
|
|
`VideoBackend` should become a playout/scheduling owner, not a render producer.
|
|
|
|
Target:
|
|
|
|
```text
|
|
RenderEngine
|
|
-> produces completed frames at render cadence
|
|
|
|
VideoBackend
|
|
-> schedules completed frames up to target DeckLink depth
|
|
-> processes completions
|
|
-> releases scheduled slots
|
|
```
|
|
|
|
Deliverables:
|
|
|
|
- `VideoBackend` owns `SystemOutputFramePool`, or a new `SystemFrameExchange` owns it between render/video.
|
|
- render thread publishes completed frames into the exchange.
|
|
- video output thread schedules from the exchange.
|
|
- no render calls exist in completion handling or scheduling paths.
|
|
|
|
Acceptance:
|
|
|
|
- DeckLink buffer depth changes cannot directly cause render-thread wakeups except through non-blocking availability signals.
|
|
- render cadence can be tested without DeckLink by using a fake frame sink.
|
|
- video scheduling can be tested without GL by using synthetic frames.
|
|
|
|
### 10. Preserve The Probe As The Reference Contract
|
|
|
|
The `DeckLinkRenderCadenceProbe` is now the control sample.
|
|
|
|
Deliverables:
|
|
|
|
- document which main-app components correspond to the probe components.
|
|
- add a small regression checklist:
|
|
- render FPS near target
|
|
- schedule FPS near target
|
|
- DeckLink buffered frames stable
|
|
- no late/drop frames
|
|
- no PBO misses or readback stalls
|
|
- focus/minimize does not change output cadence
|
|
|
|
Acceptance:
|
|
|
|
- after each migration step, compare the main app telemetry against the probe's known-good behavior.
|
|
|
|
## Suggested Order Of Work
|
|
|
|
1. Add ownership guards and classify render methods.
|
|
2. Move GL initialization/destruction fully onto the render thread.
|
|
3. Introduce a render-owned cadence loop behind a feature flag.
|
|
4. Add a frame-sink/exchange interface between render and video.
|
|
5. Move output production from `VideoBackend` to the render cadence loop.
|
|
6. Convert input upload to latest-frame mailbox semantics.
|
|
7. Move preview to completed-frame consumption.
|
|
8. Move screenshot to completed-frame capture.
|
|
9. Convert shader commits/resets to frame-boundary mailbox commands.
|
|
10. Remove old synchronous output render request path.
|
|
|
|
## Feature Flags During Migration
|
|
|
|
Use flags only to keep testing safe, not as long-term compatibility layers.
|
|
|
|
Suggested flags:
|
|
|
|
```text
|
|
VST_RENDER_CADENCE_OWNER=render_thread
|
|
VST_DISABLE_INPUT_CAPTURE=1
|
|
VST_PREVIEW_SOURCE=system_frame
|
|
VST_SCREENSHOT_SOURCE=system_frame
|
|
```
|
|
|
|
Remove each flag once the new behavior is proven and becomes the only supported path.
|
|
|
|
## Telemetry Needed
|
|
|
|
Add or preserve counters for:
|
|
|
|
- render tick jitter
|
|
- render tick overrun
|
|
- output render duration
|
|
- GL command mailbox depth by class
|
|
- frame-boundary command duration
|
|
- input upload duration and skips
|
|
- readback queue/consume duration
|
|
- completed system-memory frame depth
|
|
- scheduled DeckLink frame depth
|
|
- DeckLink actual buffered frames
|
|
- preview frames consumed
|
|
- screenshot requests served from system memory
|
|
|
|
The key metric is whether output render starts on time. Buffer depth alone is not enough; a full buffer can still contain stale or repeated frames.
|
|
|
|
## Completion Definition
|
|
|
|
This work is complete when:
|
|
|
|
- the output render thread owns the app GL context from initialization through shutdown
|
|
- output rendering is driven by the render thread's selected frame cadence
|
|
- no non-output task can run ahead of a due output frame
|
|
- `VideoBackend` never asks the render thread to render synchronously
|
|
- DeckLink scheduling consumes already completed system-memory frames
|
|
- input upload, preview, screenshot, shader commits, and resets are all frame-boundary, mailbox, or consumer-side operations
|
|
- main-app telemetry approaches the cadence probe behavior under the same output mode
|