V2 working
This commit is contained in:
448
docs/RENDER_THREAD_OWNERSHIP_PLAN.md
Normal file
448
docs/RENDER_THREAD_OWNERSHIP_PLAN.md
Normal file
@@ -0,0 +1,448 @@
|
||||
# Render Thread Ownership Plan
|
||||
|
||||
This plan describes how to make the main compositor behave like the successful `DeckLinkRenderCadenceProbe`: one render cadence owner, one GL context owner, no unrelated work able to interrupt output frame production.
|
||||
|
||||
The goal is not just "all GL calls happen on one thread". The current app mostly does that during runtime already. The real goal is:
|
||||
|
||||
- the output render thread owns its GL context for its whole lifetime
|
||||
- output cadence is driven by the render thread, not by DeckLink completion timing
|
||||
- non-output GL work cannot sit ahead of output frames
|
||||
- callers cannot block the render thread while waiting for synchronous answers
|
||||
- DeckLink scheduling consumes completed system-memory frames and never causes rendering
|
||||
|
||||
## Current Risk Points
|
||||
|
||||
The current main app still has several ways to interrupt output cadence.
|
||||
|
||||
### Shared GL Executor
|
||||
|
||||
`RenderEngine` owns the GL context during runtime, but it acts as a general task executor.
|
||||
|
||||
The same queue/path can run:
|
||||
|
||||
- output frame render
|
||||
- input upload
|
||||
- preview present
|
||||
- screenshot capture
|
||||
- render resets
|
||||
- shader/program commits
|
||||
- resource resize
|
||||
- state clearing
|
||||
|
||||
That means output frames are not guaranteed to be the next GL work item at the selected frame time.
|
||||
|
||||
### Synchronous Output Render Request
|
||||
|
||||
`VideoBackend` drives output production from its output producer thread, then calls:
|
||||
|
||||
```text
|
||||
VideoBackend
|
||||
-> OpenGLVideoIOBridge::RenderScheduledFrame
|
||||
-> RenderEngine::RequestOutputFrame
|
||||
-> TryInvokeOnRenderThread
|
||||
```
|
||||
|
||||
That makes output production a request/response interaction. The producer waits for the render thread, and the render thread is still shared with other work.
|
||||
|
||||
### Input Upload Shares Output Context
|
||||
|
||||
DeckLink input capture currently flows into:
|
||||
|
||||
```text
|
||||
VideoBackend::HandleInputFrame
|
||||
-> OpenGLVideoIOBridge::UploadInputFrame
|
||||
-> RenderEngine::QueueInputFrame
|
||||
-> render thread upload
|
||||
```
|
||||
|
||||
Even with coalescing, input upload can consume render-thread time and GPU bandwidth directly before output rendering.
|
||||
|
||||
### Preview And Screenshot Share Output Context
|
||||
|
||||
Preview and screenshot are lower-priority features, but today they still execute on the render thread.
|
||||
|
||||
Preview is best-effort at the caller side, but once queued it can still occupy the same context. Screenshot capture can be more expensive because it performs readback and CPU-side image preparation.
|
||||
|
||||
### Startup Context Ownership Is Transitional
|
||||
|
||||
The Win32 startup path creates and binds the GL context before `RenderEngine::StartRenderThread()`.
|
||||
|
||||
That is acceptable as a transitional state, but the final model should make context ownership explicit:
|
||||
|
||||
- bootstrap thread creates the window/context
|
||||
- bootstrap thread releases it
|
||||
- render thread binds it
|
||||
- only render thread initializes GL resources
|
||||
- only render thread destroys GL resources
|
||||
|
||||
### Render Callback Re-enters App State
|
||||
|
||||
`OpenGLRenderPipeline::RenderFrame()` calls a callback into `OpenGLComposite::renderEffect()`.
|
||||
|
||||
That callback builds `RenderFrameInput`, resolves frame state, drains runtime live state, and then calls back into `RenderEngine` to draw the prepared frame.
|
||||
|
||||
This works, but it means the output render path still reaches up into app/runtime code at frame time.
|
||||
|
||||
## Target Runtime Shape
|
||||
|
||||
The main app should match this ownership model:
|
||||
|
||||
```text
|
||||
runtime/control threads
|
||||
-> publish snapshots, live overlays, reset requests, shader-build results
|
||||
-> never call GL
|
||||
|
||||
render cadence thread
|
||||
-> sole owner of output GL context
|
||||
-> wakes at selected render cadence
|
||||
-> samples latest render input/state
|
||||
-> renders one frame
|
||||
-> queues async readback/copies completed readback into system-memory slot
|
||||
-> publishes completed frame to latest-N output buffer
|
||||
|
||||
video output thread
|
||||
-> consumes completed system-memory frames
|
||||
-> schedules DeckLink frames to target buffer depth
|
||||
-> processes completion results
|
||||
-> never calls GL
|
||||
|
||||
optional input upload path
|
||||
-> writes latest input frame into CPU-side latest-frame buffer
|
||||
-> render thread imports/uploads at a controlled point in its frame
|
||||
|
||||
preview/screenshot path
|
||||
-> consumes already-rendered output/system-memory frame when possible
|
||||
-> never interrupts output render cadence
|
||||
```
|
||||
|
||||
## Non-Negotiable Rules
|
||||
|
||||
- The render thread never waits for DeckLink.
|
||||
- DeckLink callbacks never render.
|
||||
- Runtime/control threads never directly execute GL.
|
||||
- Preview and screenshot never execute ahead of output frames.
|
||||
- Input upload is never a separate urgent GL task ahead of output render.
|
||||
- Shader/resource commits are applied only at a frame boundary.
|
||||
- Telemetry on the hot path must be lock-light or try-lock only.
|
||||
- The render thread cadence does not speed up to refill buffers.
|
||||
- If output work overruns, the render thread records the overrun and resumes the selected cadence policy.
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### 1. Add Thread/Context Ownership Guards
|
||||
|
||||
Add explicit render-thread ownership checks around all GL entry points.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- `RenderEngine` exposes `IsOnRenderThread()` for assertions/tests.
|
||||
- GL-facing classes get debug-only owner checks where practical.
|
||||
- wrong-thread GL access becomes a counted telemetry warning, not just `OutputDebugStringA`.
|
||||
- tests cover that public request methods do not execute GL directly.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- every `RenderEngine` public method is classified as either request-only, lifecycle-only, or render-thread-only.
|
||||
- render-thread-only methods are private or guarded.
|
||||
- no normal runtime caller can accidentally invoke GL work inline.
|
||||
|
||||
### 2. Move GL Initialization Fully Onto The Render Thread
|
||||
|
||||
Start the render thread before compiling shaders and initializing GL resources.
|
||||
|
||||
Current startup does:
|
||||
|
||||
```text
|
||||
InitOpenGLState()
|
||||
-> CompileDecodeShader
|
||||
-> CompileOutputPackShader
|
||||
-> InitializeResources
|
||||
-> CompileLayerPrograms
|
||||
StartRenderThread()
|
||||
```
|
||||
|
||||
Move toward:
|
||||
|
||||
```text
|
||||
create context on Win32 thread
|
||||
release context on Win32 thread
|
||||
StartRenderThread()
|
||||
render thread binds context
|
||||
render thread initializes extensions, shaders, resources
|
||||
```
|
||||
|
||||
Deliverables:
|
||||
|
||||
- a single `RenderEngine::StartAndInitialize(RenderInitializationConfig)` path.
|
||||
- GL extension resolution happens on the render thread.
|
||||
- shader/resource initialization is a render-thread startup phase.
|
||||
- `RenderEngine` destructor only destroys resources on the render thread.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- after `StartRenderThread()`, no non-render thread binds or uses the app GL context.
|
||||
- shutdown order is deterministic: stop video output, stop render cadence, destroy GL resources, release context.
|
||||
|
||||
### 3. Replace Synchronous Output Render Requests With Render-Owned Cadence
|
||||
|
||||
Move output cadence out of `VideoBackend` and into the render system.
|
||||
|
||||
Current:
|
||||
|
||||
```text
|
||||
VideoBackend output producer
|
||||
-> cadence tick
|
||||
-> acquire output slot
|
||||
-> synchronous render-thread request
|
||||
```
|
||||
|
||||
Target:
|
||||
|
||||
```text
|
||||
RenderEngine output cadence loop
|
||||
-> cadence tick
|
||||
-> acquire/free output slot through a non-blocking frame-sink interface
|
||||
-> render frame
|
||||
-> publish completed frame
|
||||
```
|
||||
|
||||
Deliverables:
|
||||
|
||||
- introduce `RenderedFrameSink` or similar interface owned by video output.
|
||||
- render thread pulls/claims a free system-memory slot without waiting.
|
||||
- if no free slot exists, render thread drops/recycles the oldest unscheduled completed frame or records backpressure without blocking.
|
||||
- remove `RenderEngine::RequestOutputFrame()` from the steady-state output path.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- output rendering continues even if DeckLink completion is delayed.
|
||||
- no `std::future` wait exists in the output cadence path.
|
||||
- `VideoBackend` no longer owns the producer render loop; it owns scheduling/completion only.
|
||||
|
||||
### 4. Make The Render Thread A Frame Loop, Not A Task Queue
|
||||
|
||||
Keep a command mailbox, but process it only at safe frame-boundary points.
|
||||
|
||||
Frame loop:
|
||||
|
||||
```text
|
||||
while running:
|
||||
wait until next render timestamp
|
||||
apply bounded frame-boundary commands
|
||||
sample latest frame input/state
|
||||
upload latest input frame if enabled and budget allows
|
||||
render output frame
|
||||
queue/consume readback
|
||||
publish completed frame
|
||||
record timings
|
||||
```
|
||||
|
||||
Command classes:
|
||||
|
||||
- frame-boundary commands: reset temporal history, reset shader feedback, commit prepared shader programs
|
||||
- background/low-priority commands: preview, screenshot, diagnostic readback
|
||||
- non-GL commands: state publication, telemetry, persistence
|
||||
|
||||
Deliverables:
|
||||
|
||||
- replace FIFO render task queue with a priority/mailbox model.
|
||||
- output cadence is the loop's main clock.
|
||||
- commands have budget classes and max work per frame.
|
||||
- long commands are deferred rather than blocking the current output tick.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- preview/screenshot cannot run immediately before a due output frame.
|
||||
- reset/shader work is applied between frames and measured.
|
||||
- output render starts within a small jitter window when the GPU is not overrun.
|
||||
|
||||
### 5. Move Input Capture To A CPU Latest-Frame Buffer
|
||||
|
||||
Input capture should not enqueue independent GL upload tasks.
|
||||
|
||||
Target:
|
||||
|
||||
```text
|
||||
DeckLink input callback
|
||||
-> copy/coalesce latest CPU input frame
|
||||
-> return quickly
|
||||
|
||||
render thread frame boundary
|
||||
-> if input version changed, upload latest frame
|
||||
-> render using last successfully uploaded input texture
|
||||
```
|
||||
|
||||
Deliverables:
|
||||
|
||||
- introduce `InputFrameMailbox` with latest-frame semantics.
|
||||
- remove `RenderEngine::QueueInputFrame()` from the callback path.
|
||||
- render thread owns the upload moment.
|
||||
- if upload would exceed budget, render thread can reuse the previous input texture and record an input-upload skip.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- input capture enabled does not create arbitrary render-thread tasks.
|
||||
- output cadence remains stable when input frames arrive.
|
||||
- telemetry separates input-frame arrival, upload count, upload skips, and upload cost.
|
||||
|
||||
### 6. Move Preview To A Consumer Path
|
||||
|
||||
Preview should consume the latest completed output image instead of asking the output GL context to present.
|
||||
|
||||
Options:
|
||||
|
||||
- CPU preview from latest system-memory output frame.
|
||||
- a separate preview GL context fed asynchronously from completed frames.
|
||||
- a low-priority render-thread blit only when output has measurable slack.
|
||||
|
||||
Recommended first step:
|
||||
|
||||
- use latest system-memory BGRA8 output for the window preview.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- preview reads from latest completed/scheduled output frame copy.
|
||||
- `TryPresentPreview()` no longer queues GL work on the output render thread.
|
||||
- preview FPS throttling remains caller-side.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- forcing preview cannot delay output rendering.
|
||||
- minimizing/focusing the window does not affect output cadence.
|
||||
|
||||
### 7. Move Screenshot To Completed Frame Capture
|
||||
|
||||
Screenshot should capture from the latest completed output frame unless an explicit "exact render capture" mode is requested.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- screenshot request reads the latest system-memory output frame.
|
||||
- PNG write remains async.
|
||||
- optional diagnostic exact-GL screenshot is disabled during live output or explicitly marked disruptive.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- screenshot request does not call `glReadPixels` on the output render context during steady-state playout.
|
||||
|
||||
### 8. Make Shader Commits Frame-Boundary Work
|
||||
|
||||
Prepared shader builds are CPU/background work; GL program commit is still GL work.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- shader build queue produces `PreparedShaderBuild`.
|
||||
- render thread sees latest pending prepared build at a frame boundary.
|
||||
- commit is applied only between frames.
|
||||
- expensive commits can temporarily enter a measured "render reconfigure" state.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- shader commits do not interleave midway through output render.
|
||||
- output timing telemetry records commit duration separately from normal render duration.
|
||||
|
||||
### 9. Split Output Scheduling From Rendering Completely
|
||||
|
||||
`VideoBackend` should become a playout/scheduling owner, not a render producer.
|
||||
|
||||
Target:
|
||||
|
||||
```text
|
||||
RenderEngine
|
||||
-> produces completed frames at render cadence
|
||||
|
||||
VideoBackend
|
||||
-> schedules completed frames up to target DeckLink depth
|
||||
-> processes completions
|
||||
-> releases scheduled slots
|
||||
```
|
||||
|
||||
Deliverables:
|
||||
|
||||
- `VideoBackend` owns `SystemOutputFramePool`, or a new `SystemFrameExchange` owns it between render/video.
|
||||
- render thread publishes completed frames into the exchange.
|
||||
- video output thread schedules from the exchange.
|
||||
- no render calls exist in completion handling or scheduling paths.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- DeckLink buffer depth changes cannot directly cause render-thread wakeups except through non-blocking availability signals.
|
||||
- render cadence can be tested without DeckLink by using a fake frame sink.
|
||||
- video scheduling can be tested without GL by using synthetic frames.
|
||||
|
||||
### 10. Preserve The Probe As The Reference Contract
|
||||
|
||||
The `DeckLinkRenderCadenceProbe` is now the control sample.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- document which main-app components correspond to the probe components.
|
||||
- add a small regression checklist:
|
||||
- render FPS near target
|
||||
- schedule FPS near target
|
||||
- DeckLink buffered frames stable
|
||||
- no late/drop frames
|
||||
- no PBO misses or readback stalls
|
||||
- focus/minimize does not change output cadence
|
||||
|
||||
Acceptance:
|
||||
|
||||
- after each migration step, compare the main app telemetry against the probe's known-good behavior.
|
||||
|
||||
## Suggested Order Of Work
|
||||
|
||||
1. Add ownership guards and classify render methods.
|
||||
2. Move GL initialization/destruction fully onto the render thread.
|
||||
3. Introduce a render-owned cadence loop behind a feature flag.
|
||||
4. Add a frame-sink/exchange interface between render and video.
|
||||
5. Move output production from `VideoBackend` to the render cadence loop.
|
||||
6. Convert input upload to latest-frame mailbox semantics.
|
||||
7. Move preview to completed-frame consumption.
|
||||
8. Move screenshot to completed-frame capture.
|
||||
9. Convert shader commits/resets to frame-boundary mailbox commands.
|
||||
10. Remove old synchronous output render request path.
|
||||
|
||||
## Feature Flags During Migration
|
||||
|
||||
Use flags only to keep testing safe, not as long-term compatibility layers.
|
||||
|
||||
Suggested flags:
|
||||
|
||||
```text
|
||||
VST_RENDER_CADENCE_OWNER=render_thread
|
||||
VST_DISABLE_INPUT_CAPTURE=1
|
||||
VST_PREVIEW_SOURCE=system_frame
|
||||
VST_SCREENSHOT_SOURCE=system_frame
|
||||
```
|
||||
|
||||
Remove each flag once the new behavior is proven and becomes the only supported path.
|
||||
|
||||
## Telemetry Needed
|
||||
|
||||
Add or preserve counters for:
|
||||
|
||||
- render tick jitter
|
||||
- render tick overrun
|
||||
- output render duration
|
||||
- GL command mailbox depth by class
|
||||
- frame-boundary command duration
|
||||
- input upload duration and skips
|
||||
- readback queue/consume duration
|
||||
- completed system-memory frame depth
|
||||
- scheduled DeckLink frame depth
|
||||
- DeckLink actual buffered frames
|
||||
- preview frames consumed
|
||||
- screenshot requests served from system memory
|
||||
|
||||
The key metric is whether output render starts on time. Buffer depth alone is not enough; a full buffer can still contain stale or repeated frames.
|
||||
|
||||
## Completion Definition
|
||||
|
||||
This work is complete when:
|
||||
|
||||
- the output render thread owns the app GL context from initialization through shutdown
|
||||
- output rendering is driven by the render thread's selected frame cadence
|
||||
- no non-output task can run ahead of a due output frame
|
||||
- `VideoBackend` never asks the render thread to render synchronously
|
||||
- DeckLink scheduling consumes already completed system-memory frames
|
||||
- input upload, preview, screenshot, shader commits, and resets are all frame-boundary, mailbox, or consumer-side operations
|
||||
- main-app telemetry approaches the cadence probe behavior under the same output mode
|
||||
Reference in New Issue
Block a user