V2 working
This commit is contained in:
@@ -4,6 +4,14 @@ This document describes how the application currently works.
|
||||
|
||||
It replaces the phase-by-phase design trail as the best entry point for understanding the repo. The older phase documents remain useful history, but they mix implementation notes, experiments, and target designs. This document is organized by current runtime behavior and subsystem ownership instead.
|
||||
|
||||
The active plan for tightening render-thread ownership is:
|
||||
|
||||
- [Render Thread Ownership Plan](RENDER_THREAD_OWNERSHIP_PLAN.md)
|
||||
|
||||
The plan for building a fresh modular app around the proven probe architecture is:
|
||||
|
||||
- [New Render Cadence App Plan](NEW_RENDER_CADENCE_APP_PLAN.md)
|
||||
|
||||
## Application Shape
|
||||
|
||||
The app is a live OpenGL compositor with DeckLink input/output, runtime control services, persistent layer-stack state, live state overlays, health telemetry, and a small internal event model.
|
||||
|
||||
557
docs/NEW_RENDER_CADENCE_APP_PLAN.md
Normal file
557
docs/NEW_RENDER_CADENCE_APP_PLAN.md
Normal file
@@ -0,0 +1,557 @@
|
||||
# New Render Cadence App Plan
|
||||
|
||||
This plan describes a new application folder that rebuilds the output path from the proven `DeckLinkRenderCadenceProbe` architecture, but as a maintainable app foundation rather than a monolithic probe file.
|
||||
|
||||
The first goal is not to port the current compositor feature set. The first goal is to reproduce the probe's smooth 59.94/60 fps DeckLink output with clean module boundaries, tests where possible, and a structure that can later accept the shader/runtime/control systems without compromising timing.
|
||||
|
||||
## Working Name
|
||||
|
||||
Suggested folder:
|
||||
|
||||
```text
|
||||
apps/RenderCadenceCompositor
|
||||
```
|
||||
|
||||
Suggested executable:
|
||||
|
||||
```text
|
||||
RenderCadenceCompositor
|
||||
```
|
||||
|
||||
The existing app remains intact:
|
||||
|
||||
```text
|
||||
apps/LoopThroughWithOpenGLCompositing
|
||||
```
|
||||
|
||||
The probe remains the control sample:
|
||||
|
||||
```text
|
||||
apps/DeckLinkRenderCadenceProbe
|
||||
```
|
||||
|
||||
## Design Principle
|
||||
|
||||
The app is built around one spine:
|
||||
|
||||
```text
|
||||
Render cadence thread
|
||||
-> owns GL context
|
||||
-> renders at selected frame cadence
|
||||
-> performs async BGRA8 readback
|
||||
-> publishes completed system-memory frames
|
||||
|
||||
System frame exchange
|
||||
-> owns Free / Rendering / Completed / Scheduled slots
|
||||
-> latest-N semantics for completed unscheduled frames
|
||||
-> protects scheduled frames until DeckLink completion
|
||||
|
||||
DeckLink output thread
|
||||
-> consumes completed frames
|
||||
-> schedules to target buffer depth
|
||||
-> releases scheduled frames on completion
|
||||
-> never renders
|
||||
```
|
||||
|
||||
Everything else must fit around that spine.
|
||||
|
||||
## Non-Negotiable Rules
|
||||
|
||||
- The render thread owns its GL context from initialization to shutdown.
|
||||
- The render thread is driven by selected render cadence, not DeckLink demand.
|
||||
- DeckLink scheduling never calls render code.
|
||||
- Completion callbacks never render.
|
||||
- No synchronous render request exists in the output path.
|
||||
- Preview, screenshot, input upload, shader rebuild, and runtime control cannot run ahead of a due output frame.
|
||||
- Completed unscheduled frames are latest-N and disposable.
|
||||
- Scheduled frames are protected until DeckLink completion.
|
||||
- Startup warms up real rendered frames before scheduled playback starts.
|
||||
|
||||
## Borrow From The Probe
|
||||
|
||||
Keep these behaviors from `DeckLinkRenderCadenceProbe`:
|
||||
|
||||
- hidden OpenGL context owned by the render thread
|
||||
- simple render loop with `nextRenderTime`
|
||||
- BGRA8 render target
|
||||
- PBO ring readback
|
||||
- non-blocking fence polling with zero timeout
|
||||
- system-memory slots with `Free`, `Rendering`, `Completed`, `Scheduled`
|
||||
- drop oldest completed unscheduled frame if render needs space
|
||||
- DeckLink playout thread only schedules completed frames
|
||||
- warmup completed frames before `StartScheduledPlayback()`
|
||||
- one-line-per-second timing telemetry
|
||||
|
||||
## Do Not Borrow Directly
|
||||
|
||||
The probe is deliberately compact. Do not carry over these probe limitations into the new app:
|
||||
|
||||
- one huge `.cpp` file
|
||||
- hard-coded output mode as permanent behavior
|
||||
- render pattern, frame store, PBO logic, DeckLink playout, COM setup, and telemetry mixed together
|
||||
- no reusable interfaces
|
||||
- no unit-testable non-GL core
|
||||
|
||||
## Proposed Folder Structure
|
||||
|
||||
```text
|
||||
apps/RenderCadenceCompositor/
|
||||
README.md
|
||||
RenderCadenceCompositor.cpp
|
||||
|
||||
app/
|
||||
RenderCadenceApp.cpp
|
||||
RenderCadenceApp.h
|
||||
AppConfig.cpp
|
||||
AppConfig.h
|
||||
|
||||
platform/
|
||||
ComInit.cpp
|
||||
ComInit.h
|
||||
HiddenGlWindow.cpp
|
||||
HiddenGlWindow.h
|
||||
Win32Console.cpp
|
||||
Win32Console.h
|
||||
|
||||
render/
|
||||
RenderThread.cpp
|
||||
RenderThread.h
|
||||
RenderCadenceClock.cpp
|
||||
RenderCadenceClock.h
|
||||
SimpleMotionRenderer.cpp
|
||||
SimpleMotionRenderer.h
|
||||
Bgra8ReadbackPipeline.cpp
|
||||
Bgra8ReadbackPipeline.h
|
||||
PboReadbackRing.cpp
|
||||
PboReadbackRing.h
|
||||
|
||||
frames/
|
||||
SystemFrameExchange.cpp
|
||||
SystemFrameExchange.h
|
||||
SystemFrameTypes.h
|
||||
|
||||
video/
|
||||
DeckLinkOutput.cpp
|
||||
DeckLinkOutput.h
|
||||
DeckLinkOutputThread.cpp
|
||||
DeckLinkOutputThread.h
|
||||
|
||||
telemetry/
|
||||
CadenceTelemetry.cpp
|
||||
CadenceTelemetry.h
|
||||
TelemetryPrinter.cpp
|
||||
TelemetryPrinter.h
|
||||
```
|
||||
|
||||
The new app can reuse selected existing source files from the current app at first:
|
||||
|
||||
- `videoio/decklink/DeckLinkSession.*`
|
||||
- `videoio/decklink/DeckLinkDisplayMode.*`
|
||||
- `videoio/decklink/DeckLinkVideoIOFormat.*`
|
||||
- `videoio/decklink/DeckLinkFrameTransfer.*`
|
||||
- `videoio/VideoIOFormat.*`
|
||||
- `videoio/VideoIOTypes.h`
|
||||
- `videoio/VideoPlayoutScheduler.*`
|
||||
- `gl/renderer/GLExtensions.*`
|
||||
|
||||
Longer term, shared code should move into common libraries, but the first version can link these files directly to avoid a big build-system refactor.
|
||||
|
||||
## Module Responsibilities
|
||||
|
||||
### `RenderCadenceApp`
|
||||
|
||||
Owns top-level startup/shutdown sequencing.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- initialize COM
|
||||
- discover/select DeckLink output
|
||||
- create frame exchange
|
||||
- start render thread
|
||||
- wait for completed-frame warmup
|
||||
- start DeckLink output thread
|
||||
- wait for scheduled buffer warmup
|
||||
- start DeckLink scheduled playback
|
||||
- start telemetry printer
|
||||
- stop in reverse order
|
||||
|
||||
It should not contain OpenGL drawing code, frame slot policy, or DeckLink scheduling loops.
|
||||
|
||||
### `AppConfig`
|
||||
|
||||
Owns runtime settings for the initial app.
|
||||
|
||||
Initial settings:
|
||||
|
||||
- output mode preference
|
||||
- output width/height validation
|
||||
- frame buffer capacity
|
||||
- PBO depth
|
||||
- warmup completed-frame count
|
||||
- target DeckLink scheduled depth
|
||||
- telemetry interval
|
||||
|
||||
Initial values should match the successful probe:
|
||||
|
||||
```text
|
||||
systemFrameSlots = 12
|
||||
pboDepth = 6
|
||||
warmupFrames = 4
|
||||
targetDeckLinkBufferedFrames = 4
|
||||
pixelFormat = BGRA8
|
||||
```
|
||||
|
||||
### `HiddenGlWindow`
|
||||
|
||||
Owns hidden Win32 window, device context, and OpenGL context creation.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- create hidden window with `CS_OWNDC`
|
||||
- choose/set pixel format
|
||||
- create `HGLRC`
|
||||
- expose `MakeCurrent()` and `ClearCurrent()`
|
||||
- destroy context/window safely
|
||||
|
||||
Only `RenderThread` should call `MakeCurrent()` after startup.
|
||||
|
||||
### `RenderThread`
|
||||
|
||||
Owns the render loop and GL context for its full lifetime.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- create/bind hidden GL context
|
||||
- resolve GL extensions
|
||||
- initialize renderer/readback pipeline
|
||||
- run cadence loop
|
||||
- render one frame when due
|
||||
- queue PBO readback
|
||||
- consume completed PBOs into `SystemFrameExchange`
|
||||
- record telemetry
|
||||
- destroy GL resources on the render thread
|
||||
|
||||
It must not:
|
||||
|
||||
- wait for DeckLink
|
||||
- schedule DeckLink frames
|
||||
- block on a system frame slot if only completed unscheduled frames can be dropped
|
||||
- accept arbitrary GL tasks ahead of output frames
|
||||
|
||||
### `RenderCadenceClock`
|
||||
|
||||
Small, testable cadence helper.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- track target frame duration
|
||||
- return whether a render is due
|
||||
- compute sleep duration
|
||||
- detect overrun/skipped ticks
|
||||
- never speed up to fill buffers
|
||||
|
||||
This should be unit tested without GL.
|
||||
|
||||
### `SimpleMotionRenderer`
|
||||
|
||||
First renderer only.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- render obvious smooth motion and color changes
|
||||
- produce BGRA8-compatible framebuffer content
|
||||
- make dropped/repeated frames visually obvious
|
||||
|
||||
This intentionally avoids shader-package/runtime complexity.
|
||||
|
||||
### `Bgra8ReadbackPipeline`
|
||||
|
||||
Owns output framebuffer and BGRA8 readback orchestration.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- configure render target dimensions
|
||||
- render into an RGBA8/BGRA-compatible texture
|
||||
- coordinate `PboReadbackRing`
|
||||
- publish completed frames into `SystemFrameExchange`
|
||||
|
||||
### `PboReadbackRing`
|
||||
|
||||
Owns PBO/fence state.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- queue readback into the next free PBO slot
|
||||
- poll completed fences with zero timeout
|
||||
- map/copy completed PBOs into provided system-memory slots
|
||||
- count PBO misses
|
||||
- clean up fences/PBOs on render thread
|
||||
|
||||
This is GL-backed, but the state model should be small and easy to reason about.
|
||||
|
||||
### `SystemFrameExchange`
|
||||
|
||||
The central handoff between render and video.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- own system-memory frame buffers
|
||||
- track slot states: `Free`, `Rendering`, `Completed`, `Scheduled`
|
||||
- provide `AcquireForRender()`
|
||||
- provide `PublishCompleted()`
|
||||
- provide `ConsumeCompletedForSchedule()`
|
||||
- provide `ReleaseScheduledByBytes()`
|
||||
- drop oldest completed unscheduled frame when render needs a slot
|
||||
- expose metrics
|
||||
|
||||
This should be unit tested heavily.
|
||||
|
||||
### `DeckLinkOutput`
|
||||
|
||||
Thin wrapper around `DeckLinkSession` for output-only use.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- discover/select output mode
|
||||
- configure output callback
|
||||
- prepare output schedule
|
||||
- schedule app-owned system-memory frames
|
||||
- start scheduled playback
|
||||
- stop/release resources
|
||||
- expose actual DeckLink buffered count
|
||||
|
||||
No input support in the first version.
|
||||
|
||||
### `DeckLinkOutputThread`
|
||||
|
||||
Owns playout scheduling loop.
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- keep scheduled depth near target
|
||||
- consume completed frames from `SystemFrameExchange`
|
||||
- schedule them through `DeckLinkOutput`
|
||||
- release frame if scheduling fails
|
||||
- sleep briefly when scheduled buffer is full or no completed frame exists
|
||||
|
||||
It must not render.
|
||||
|
||||
### `CadenceTelemetry`
|
||||
|
||||
Owns counters, not policy.
|
||||
|
||||
Initial counters:
|
||||
|
||||
- rendered frames
|
||||
- completed readback frames
|
||||
- scheduled frames
|
||||
- completion count
|
||||
- completed-frame drops
|
||||
- acquire misses
|
||||
- schedule underruns
|
||||
- PBO queue misses
|
||||
- DeckLink late count
|
||||
- DeckLink dropped count
|
||||
- free/rendering/completed/scheduled slot counts
|
||||
- actual DeckLink buffered frames
|
||||
|
||||
### `TelemetryPrinter`
|
||||
|
||||
Prints one stable line per interval, matching the probe where possible.
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
renderFps=59.9 scheduleFps=59.9 free=7 completed=1 scheduled=4 drops=0 pboMiss=0 completions=119 late=0 dropped=0 decklinkBuffered=4
|
||||
```
|
||||
|
||||
## Startup Sequence
|
||||
|
||||
Target first-version startup:
|
||||
|
||||
```text
|
||||
main
|
||||
-> parse AppConfig
|
||||
-> initialize COM
|
||||
-> DeckLinkOutput discover/select/configure output
|
||||
-> DeckLinkOutput prepare output schedule
|
||||
-> create SystemFrameExchange
|
||||
-> start RenderThread
|
||||
-> wait for completed frame warmup
|
||||
-> start DeckLinkOutputThread
|
||||
-> wait for scheduled depth warmup
|
||||
-> DeckLinkOutput start scheduled playback
|
||||
-> start TelemetryPrinter
|
||||
-> wait for Enter
|
||||
```
|
||||
|
||||
Shutdown:
|
||||
|
||||
```text
|
||||
stop TelemetryPrinter
|
||||
stop DeckLinkOutputThread
|
||||
DeckLinkOutput stop playback
|
||||
stop RenderThread
|
||||
DeckLinkOutput release resources
|
||||
release COM
|
||||
```
|
||||
|
||||
## First Milestone: Modular Probe Equivalent
|
||||
|
||||
This is the only goal for the initial implementation.
|
||||
|
||||
Feature set:
|
||||
|
||||
- console app
|
||||
- output-only DeckLink
|
||||
- no input
|
||||
- hidden GL context
|
||||
- simple motion renderer
|
||||
- BGRA8 only
|
||||
- PBO async readback
|
||||
- latest-N system-memory frame exchange
|
||||
- warmup before playback
|
||||
- one-line telemetry
|
||||
|
||||
Acceptance:
|
||||
|
||||
- visible DeckLink output is smooth
|
||||
- `renderFps` near selected cadence
|
||||
- `scheduleFps` near selected cadence
|
||||
- scheduled count/decklink buffered count stable around 4
|
||||
- no continuous late/drop count
|
||||
- no continuous PBO misses
|
||||
- behavior matches or exceeds `DeckLinkRenderCadenceProbe`
|
||||
|
||||
## Second Milestone: Testable Core
|
||||
|
||||
Before porting compositor features, add tests for non-GL/non-DeckLink pieces.
|
||||
|
||||
Test targets:
|
||||
|
||||
- `SystemFrameExchangeTests`
|
||||
- `RenderCadenceClockTests`
|
||||
- `CadenceTelemetryTests`
|
||||
|
||||
Important cases:
|
||||
|
||||
- slot lifecycle transitions
|
||||
- scheduled slots are protected
|
||||
- completed unscheduled frames can be dropped
|
||||
- stale handles/generations are rejected
|
||||
- cadence does not speed up to refill buffers
|
||||
- cadence records overrun/skipped ticks
|
||||
|
||||
## Third Milestone: Replace Simple Renderer With Render Interface
|
||||
|
||||
Add an interface around frame rendering:
|
||||
|
||||
```text
|
||||
IRenderScene
|
||||
-> InitializeGl()
|
||||
-> RenderFrame(frameIndex, time)
|
||||
-> ShutdownGl()
|
||||
```
|
||||
|
||||
The first implementation remains `SimpleMotionRenderer`.
|
||||
|
||||
This creates the insertion point for shader-package rendering later without changing timing/scheduling.
|
||||
|
||||
## Fourth Milestone: Begin Porting Current App Features
|
||||
|
||||
Port only after the modular probe equivalent is stable.
|
||||
|
||||
Suggested order:
|
||||
|
||||
1. shader package compile/load
|
||||
2. render pass/layer stack drawing
|
||||
3. runtime snapshot input to renderer
|
||||
4. live state overlays
|
||||
5. control services
|
||||
6. persistence/runtime store
|
||||
7. preview from system-memory frames
|
||||
8. screenshot from system-memory frames
|
||||
9. input capture via CPU latest-frame mailbox
|
||||
|
||||
Each port must preserve the rule that the render thread cadence is primary.
|
||||
|
||||
## What Not To Port Early
|
||||
|
||||
Do not port these until the output spine is proven:
|
||||
|
||||
- DeckLink input
|
||||
- preview GL presentation
|
||||
- screenshot GL readback
|
||||
- HTTP/OSC control services
|
||||
- shader hot reload
|
||||
- persistence
|
||||
- runtime state JSON/open API
|
||||
- complex telemetry/event dispatch
|
||||
|
||||
These are useful, but they are exactly the kinds of features that can accidentally reintroduce timing coupling.
|
||||
|
||||
## Build Plan
|
||||
|
||||
Initial CMake can follow the probe pattern:
|
||||
|
||||
```cmake
|
||||
set(RENDER_CADENCE_APP_DIR "${CMAKE_CURRENT_SOURCE_DIR}/apps/RenderCadenceCompositor")
|
||||
|
||||
add_executable(RenderCadenceCompositor
|
||||
# selected shared DeckLink/video/gl support files
|
||||
# new modular app files
|
||||
)
|
||||
```
|
||||
|
||||
Later, shared source should be split into libraries:
|
||||
|
||||
```text
|
||||
video_shader_decklink
|
||||
video_shader_videoio
|
||||
video_shader_gl_support
|
||||
render_cadence_core
|
||||
```
|
||||
|
||||
Avoid doing that library split before the first modular app works.
|
||||
|
||||
## VS Code Launch
|
||||
|
||||
Add a separate launch profile:
|
||||
|
||||
```text
|
||||
Debug RenderCadenceCompositor
|
||||
```
|
||||
|
||||
Run it as a console app so telemetry remains visible.
|
||||
|
||||
## Documentation
|
||||
|
||||
Add:
|
||||
|
||||
```text
|
||||
apps/RenderCadenceCompositor/README.md
|
||||
```
|
||||
|
||||
The README should record:
|
||||
|
||||
- intended architecture
|
||||
- build/run instructions
|
||||
- expected telemetry
|
||||
- test result notes
|
||||
- differences from the old app
|
||||
- differences from the probe
|
||||
|
||||
## Success Criteria Before Porting More Features
|
||||
|
||||
Do not start feature porting until the new app can run with:
|
||||
|
||||
- stable smooth DeckLink output
|
||||
- stable target scheduled depth
|
||||
- stable actual DeckLink buffered count
|
||||
- no regular visible freezes
|
||||
- no steady PBO misses
|
||||
- no steadily increasing late/dropped completions
|
||||
- focus/minimize changes do not affect output cadence
|
||||
- clean shutdown without hangs
|
||||
|
||||
This gives us a clean foundation. Once this is true, every feature added later has to prove it does not damage the spine.
|
||||
448
docs/RENDER_THREAD_OWNERSHIP_PLAN.md
Normal file
448
docs/RENDER_THREAD_OWNERSHIP_PLAN.md
Normal file
@@ -0,0 +1,448 @@
|
||||
# Render Thread Ownership Plan
|
||||
|
||||
This plan describes how to make the main compositor behave like the successful `DeckLinkRenderCadenceProbe`: one render cadence owner, one GL context owner, no unrelated work able to interrupt output frame production.
|
||||
|
||||
The goal is not just "all GL calls happen on one thread". The current app mostly does that during runtime already. The real goal is:
|
||||
|
||||
- the output render thread owns its GL context for its whole lifetime
|
||||
- output cadence is driven by the render thread, not by DeckLink completion timing
|
||||
- non-output GL work cannot sit ahead of output frames
|
||||
- callers cannot block the render thread while waiting for synchronous answers
|
||||
- DeckLink scheduling consumes completed system-memory frames and never causes rendering
|
||||
|
||||
## Current Risk Points
|
||||
|
||||
The current main app still has several ways to interrupt output cadence.
|
||||
|
||||
### Shared GL Executor
|
||||
|
||||
`RenderEngine` owns the GL context during runtime, but it acts as a general task executor.
|
||||
|
||||
The same queue/path can run:
|
||||
|
||||
- output frame render
|
||||
- input upload
|
||||
- preview present
|
||||
- screenshot capture
|
||||
- render resets
|
||||
- shader/program commits
|
||||
- resource resize
|
||||
- state clearing
|
||||
|
||||
That means output frames are not guaranteed to be the next GL work item at the selected frame time.
|
||||
|
||||
### Synchronous Output Render Request
|
||||
|
||||
`VideoBackend` drives output production from its output producer thread, then calls:
|
||||
|
||||
```text
|
||||
VideoBackend
|
||||
-> OpenGLVideoIOBridge::RenderScheduledFrame
|
||||
-> RenderEngine::RequestOutputFrame
|
||||
-> TryInvokeOnRenderThread
|
||||
```
|
||||
|
||||
That makes output production a request/response interaction. The producer waits for the render thread, and the render thread is still shared with other work.
|
||||
|
||||
### Input Upload Shares Output Context
|
||||
|
||||
DeckLink input capture currently flows into:
|
||||
|
||||
```text
|
||||
VideoBackend::HandleInputFrame
|
||||
-> OpenGLVideoIOBridge::UploadInputFrame
|
||||
-> RenderEngine::QueueInputFrame
|
||||
-> render thread upload
|
||||
```
|
||||
|
||||
Even with coalescing, input upload can consume render-thread time and GPU bandwidth directly before output rendering.
|
||||
|
||||
### Preview And Screenshot Share Output Context
|
||||
|
||||
Preview and screenshot are lower-priority features, but today they still execute on the render thread.
|
||||
|
||||
Preview is best-effort at the caller side, but once queued it can still occupy the same context. Screenshot capture can be more expensive because it performs readback and CPU-side image preparation.
|
||||
|
||||
### Startup Context Ownership Is Transitional
|
||||
|
||||
The Win32 startup path creates and binds the GL context before `RenderEngine::StartRenderThread()`.
|
||||
|
||||
That is acceptable as a transitional state, but the final model should make context ownership explicit:
|
||||
|
||||
- bootstrap thread creates the window/context
|
||||
- bootstrap thread releases it
|
||||
- render thread binds it
|
||||
- only render thread initializes GL resources
|
||||
- only render thread destroys GL resources
|
||||
|
||||
### Render Callback Re-enters App State
|
||||
|
||||
`OpenGLRenderPipeline::RenderFrame()` calls a callback into `OpenGLComposite::renderEffect()`.
|
||||
|
||||
That callback builds `RenderFrameInput`, resolves frame state, drains runtime live state, and then calls back into `RenderEngine` to draw the prepared frame.
|
||||
|
||||
This works, but it means the output render path still reaches up into app/runtime code at frame time.
|
||||
|
||||
## Target Runtime Shape
|
||||
|
||||
The main app should match this ownership model:
|
||||
|
||||
```text
|
||||
runtime/control threads
|
||||
-> publish snapshots, live overlays, reset requests, shader-build results
|
||||
-> never call GL
|
||||
|
||||
render cadence thread
|
||||
-> sole owner of output GL context
|
||||
-> wakes at selected render cadence
|
||||
-> samples latest render input/state
|
||||
-> renders one frame
|
||||
-> queues async readback/copies completed readback into system-memory slot
|
||||
-> publishes completed frame to latest-N output buffer
|
||||
|
||||
video output thread
|
||||
-> consumes completed system-memory frames
|
||||
-> schedules DeckLink frames to target buffer depth
|
||||
-> processes completion results
|
||||
-> never calls GL
|
||||
|
||||
optional input upload path
|
||||
-> writes latest input frame into CPU-side latest-frame buffer
|
||||
-> render thread imports/uploads at a controlled point in its frame
|
||||
|
||||
preview/screenshot path
|
||||
-> consumes already-rendered output/system-memory frame when possible
|
||||
-> never interrupts output render cadence
|
||||
```
|
||||
|
||||
## Non-Negotiable Rules
|
||||
|
||||
- The render thread never waits for DeckLink.
|
||||
- DeckLink callbacks never render.
|
||||
- Runtime/control threads never directly execute GL.
|
||||
- Preview and screenshot never execute ahead of output frames.
|
||||
- Input upload is never a separate urgent GL task ahead of output render.
|
||||
- Shader/resource commits are applied only at a frame boundary.
|
||||
- Telemetry on the hot path must be lock-light or try-lock only.
|
||||
- The render thread cadence does not speed up to refill buffers.
|
||||
- If output work overruns, the render thread records the overrun and resumes the selected cadence policy.
|
||||
|
||||
## Implementation Plan
|
||||
|
||||
### 1. Add Thread/Context Ownership Guards
|
||||
|
||||
Add explicit render-thread ownership checks around all GL entry points.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- `RenderEngine` exposes `IsOnRenderThread()` for assertions/tests.
|
||||
- GL-facing classes get debug-only owner checks where practical.
|
||||
- wrong-thread GL access becomes a counted telemetry warning, not just `OutputDebugStringA`.
|
||||
- tests cover that public request methods do not execute GL directly.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- every `RenderEngine` public method is classified as either request-only, lifecycle-only, or render-thread-only.
|
||||
- render-thread-only methods are private or guarded.
|
||||
- no normal runtime caller can accidentally invoke GL work inline.
|
||||
|
||||
### 2. Move GL Initialization Fully Onto The Render Thread
|
||||
|
||||
Start the render thread before compiling shaders and initializing GL resources.
|
||||
|
||||
Current startup does:
|
||||
|
||||
```text
|
||||
InitOpenGLState()
|
||||
-> CompileDecodeShader
|
||||
-> CompileOutputPackShader
|
||||
-> InitializeResources
|
||||
-> CompileLayerPrograms
|
||||
StartRenderThread()
|
||||
```
|
||||
|
||||
Move toward:
|
||||
|
||||
```text
|
||||
create context on Win32 thread
|
||||
release context on Win32 thread
|
||||
StartRenderThread()
|
||||
render thread binds context
|
||||
render thread initializes extensions, shaders, resources
|
||||
```
|
||||
|
||||
Deliverables:
|
||||
|
||||
- a single `RenderEngine::StartAndInitialize(RenderInitializationConfig)` path.
|
||||
- GL extension resolution happens on the render thread.
|
||||
- shader/resource initialization is a render-thread startup phase.
|
||||
- `RenderEngine` destructor only destroys resources on the render thread.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- after `StartRenderThread()`, no non-render thread binds or uses the app GL context.
|
||||
- shutdown order is deterministic: stop video output, stop render cadence, destroy GL resources, release context.
|
||||
|
||||
### 3. Replace Synchronous Output Render Requests With Render-Owned Cadence
|
||||
|
||||
Move output cadence out of `VideoBackend` and into the render system.
|
||||
|
||||
Current:
|
||||
|
||||
```text
|
||||
VideoBackend output producer
|
||||
-> cadence tick
|
||||
-> acquire output slot
|
||||
-> synchronous render-thread request
|
||||
```
|
||||
|
||||
Target:
|
||||
|
||||
```text
|
||||
RenderEngine output cadence loop
|
||||
-> cadence tick
|
||||
-> acquire/free output slot through a non-blocking frame-sink interface
|
||||
-> render frame
|
||||
-> publish completed frame
|
||||
```
|
||||
|
||||
Deliverables:
|
||||
|
||||
- introduce `RenderedFrameSink` or similar interface owned by video output.
|
||||
- render thread pulls/claims a free system-memory slot without waiting.
|
||||
- if no free slot exists, render thread drops/recycles the oldest unscheduled completed frame or records backpressure without blocking.
|
||||
- remove `RenderEngine::RequestOutputFrame()` from the steady-state output path.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- output rendering continues even if DeckLink completion is delayed.
|
||||
- no `std::future` wait exists in the output cadence path.
|
||||
- `VideoBackend` no longer owns the producer render loop; it owns scheduling/completion only.
|
||||
|
||||
### 4. Make The Render Thread A Frame Loop, Not A Task Queue
|
||||
|
||||
Keep a command mailbox, but process it only at safe frame-boundary points.
|
||||
|
||||
Frame loop:
|
||||
|
||||
```text
|
||||
while running:
|
||||
wait until next render timestamp
|
||||
apply bounded frame-boundary commands
|
||||
sample latest frame input/state
|
||||
upload latest input frame if enabled and budget allows
|
||||
render output frame
|
||||
queue/consume readback
|
||||
publish completed frame
|
||||
record timings
|
||||
```
|
||||
|
||||
Command classes:
|
||||
|
||||
- frame-boundary commands: reset temporal history, reset shader feedback, commit prepared shader programs
|
||||
- background/low-priority commands: preview, screenshot, diagnostic readback
|
||||
- non-GL commands: state publication, telemetry, persistence
|
||||
|
||||
Deliverables:
|
||||
|
||||
- replace FIFO render task queue with a priority/mailbox model.
|
||||
- output cadence is the loop's main clock.
|
||||
- commands have budget classes and max work per frame.
|
||||
- long commands are deferred rather than blocking the current output tick.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- preview/screenshot cannot run immediately before a due output frame.
|
||||
- reset/shader work is applied between frames and measured.
|
||||
- output render starts within a small jitter window when the GPU is not overrun.
|
||||
|
||||
### 5. Move Input Capture To A CPU Latest-Frame Buffer
|
||||
|
||||
Input capture should not enqueue independent GL upload tasks.
|
||||
|
||||
Target:
|
||||
|
||||
```text
|
||||
DeckLink input callback
|
||||
-> copy/coalesce latest CPU input frame
|
||||
-> return quickly
|
||||
|
||||
render thread frame boundary
|
||||
-> if input version changed, upload latest frame
|
||||
-> render using last successfully uploaded input texture
|
||||
```
|
||||
|
||||
Deliverables:
|
||||
|
||||
- introduce `InputFrameMailbox` with latest-frame semantics.
|
||||
- remove `RenderEngine::QueueInputFrame()` from the callback path.
|
||||
- render thread owns the upload moment.
|
||||
- if upload would exceed budget, render thread can reuse the previous input texture and record an input-upload skip.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- input capture enabled does not create arbitrary render-thread tasks.
|
||||
- output cadence remains stable when input frames arrive.
|
||||
- telemetry separates input-frame arrival, upload count, upload skips, and upload cost.
|
||||
|
||||
### 6. Move Preview To A Consumer Path
|
||||
|
||||
Preview should consume the latest completed output image instead of asking the output GL context to present.
|
||||
|
||||
Options:
|
||||
|
||||
- CPU preview from latest system-memory output frame.
|
||||
- a separate preview GL context fed asynchronously from completed frames.
|
||||
- a low-priority render-thread blit only when output has measurable slack.
|
||||
|
||||
Recommended first step:
|
||||
|
||||
- use latest system-memory BGRA8 output for the window preview.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- preview reads from latest completed/scheduled output frame copy.
|
||||
- `TryPresentPreview()` no longer queues GL work on the output render thread.
|
||||
- preview FPS throttling remains caller-side.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- forcing preview cannot delay output rendering.
|
||||
- minimizing/focusing the window does not affect output cadence.
|
||||
|
||||
### 7. Move Screenshot To Completed Frame Capture
|
||||
|
||||
Screenshot should capture from the latest completed output frame unless an explicit "exact render capture" mode is requested.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- screenshot request reads the latest system-memory output frame.
|
||||
- PNG write remains async.
|
||||
- optional diagnostic exact-GL screenshot is disabled during live output or explicitly marked disruptive.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- screenshot request does not call `glReadPixels` on the output render context during steady-state playout.
|
||||
|
||||
### 8. Make Shader Commits Frame-Boundary Work
|
||||
|
||||
Prepared shader builds are CPU/background work; GL program commit is still GL work.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- shader build queue produces `PreparedShaderBuild`.
|
||||
- render thread sees latest pending prepared build at a frame boundary.
|
||||
- commit is applied only between frames.
|
||||
- expensive commits can temporarily enter a measured "render reconfigure" state.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- shader commits do not interleave midway through output render.
|
||||
- output timing telemetry records commit duration separately from normal render duration.
|
||||
|
||||
### 9. Split Output Scheduling From Rendering Completely
|
||||
|
||||
`VideoBackend` should become a playout/scheduling owner, not a render producer.
|
||||
|
||||
Target:
|
||||
|
||||
```text
|
||||
RenderEngine
|
||||
-> produces completed frames at render cadence
|
||||
|
||||
VideoBackend
|
||||
-> schedules completed frames up to target DeckLink depth
|
||||
-> processes completions
|
||||
-> releases scheduled slots
|
||||
```
|
||||
|
||||
Deliverables:
|
||||
|
||||
- `VideoBackend` owns `SystemOutputFramePool`, or a new `SystemFrameExchange` owns it between render/video.
|
||||
- render thread publishes completed frames into the exchange.
|
||||
- video output thread schedules from the exchange.
|
||||
- no render calls exist in completion handling or scheduling paths.
|
||||
|
||||
Acceptance:
|
||||
|
||||
- DeckLink buffer depth changes cannot directly cause render-thread wakeups except through non-blocking availability signals.
|
||||
- render cadence can be tested without DeckLink by using a fake frame sink.
|
||||
- video scheduling can be tested without GL by using synthetic frames.
|
||||
|
||||
### 10. Preserve The Probe As The Reference Contract
|
||||
|
||||
The `DeckLinkRenderCadenceProbe` is now the control sample.
|
||||
|
||||
Deliverables:
|
||||
|
||||
- document which main-app components correspond to the probe components.
|
||||
- add a small regression checklist:
|
||||
- render FPS near target
|
||||
- schedule FPS near target
|
||||
- DeckLink buffered frames stable
|
||||
- no late/drop frames
|
||||
- no PBO misses or readback stalls
|
||||
- focus/minimize does not change output cadence
|
||||
|
||||
Acceptance:
|
||||
|
||||
- after each migration step, compare the main app telemetry against the probe's known-good behavior.
|
||||
|
||||
## Suggested Order Of Work
|
||||
|
||||
1. Add ownership guards and classify render methods.
|
||||
2. Move GL initialization/destruction fully onto the render thread.
|
||||
3. Introduce a render-owned cadence loop behind a feature flag.
|
||||
4. Add a frame-sink/exchange interface between render and video.
|
||||
5. Move output production from `VideoBackend` to the render cadence loop.
|
||||
6. Convert input upload to latest-frame mailbox semantics.
|
||||
7. Move preview to completed-frame consumption.
|
||||
8. Move screenshot to completed-frame capture.
|
||||
9. Convert shader commits/resets to frame-boundary mailbox commands.
|
||||
10. Remove old synchronous output render request path.
|
||||
|
||||
## Feature Flags During Migration
|
||||
|
||||
Use flags only to keep testing safe, not as long-term compatibility layers.
|
||||
|
||||
Suggested flags:
|
||||
|
||||
```text
|
||||
VST_RENDER_CADENCE_OWNER=render_thread
|
||||
VST_DISABLE_INPUT_CAPTURE=1
|
||||
VST_PREVIEW_SOURCE=system_frame
|
||||
VST_SCREENSHOT_SOURCE=system_frame
|
||||
```
|
||||
|
||||
Remove each flag once the new behavior is proven and becomes the only supported path.
|
||||
|
||||
## Telemetry Needed
|
||||
|
||||
Add or preserve counters for:
|
||||
|
||||
- render tick jitter
|
||||
- render tick overrun
|
||||
- output render duration
|
||||
- GL command mailbox depth by class
|
||||
- frame-boundary command duration
|
||||
- input upload duration and skips
|
||||
- readback queue/consume duration
|
||||
- completed system-memory frame depth
|
||||
- scheduled DeckLink frame depth
|
||||
- DeckLink actual buffered frames
|
||||
- preview frames consumed
|
||||
- screenshot requests served from system memory
|
||||
|
||||
The key metric is whether output render starts on time. Buffer depth alone is not enough; a full buffer can still contain stale or repeated frames.
|
||||
|
||||
## Completion Definition
|
||||
|
||||
This work is complete when:
|
||||
|
||||
- the output render thread owns the app GL context from initialization through shutdown
|
||||
- output rendering is driven by the render thread's selected frame cadence
|
||||
- no non-output task can run ahead of a due output frame
|
||||
- `VideoBackend` never asks the render thread to render synchronously
|
||||
- DeckLink scheduling consumes already completed system-memory frames
|
||||
- input upload, preview, screenshot, shader commits, and resets are all frame-boundary, mailbox, or consumer-side operations
|
||||
- main-app telemetry approaches the cadence probe behavior under the same output mode
|
||||
Reference in New Issue
Block a user