V2 working
All checks were successful
CI / React UI Build (push) Successful in 11s
CI / Native Windows Build And Tests (push) Successful in 2m54s
CI / Windows Release Package (push) Successful in 3m14s

This commit is contained in:
Aiden
2026-05-12 01:59:02 +10:00
parent 2531d871e8
commit e0ca548ef5
32 changed files with 3492 additions and 0 deletions

View File

@@ -4,6 +4,14 @@ This document describes how the application currently works.
It replaces the phase-by-phase design trail as the best entry point for understanding the repo. The older phase documents remain useful history, but they mix implementation notes, experiments, and target designs. This document is organized by current runtime behavior and subsystem ownership instead.
The active plan for tightening render-thread ownership is:
- [Render Thread Ownership Plan](RENDER_THREAD_OWNERSHIP_PLAN.md)
The plan for building a fresh modular app around the proven probe architecture is:
- [New Render Cadence App Plan](NEW_RENDER_CADENCE_APP_PLAN.md)
## Application Shape
The app is a live OpenGL compositor with DeckLink input/output, runtime control services, persistent layer-stack state, live state overlays, health telemetry, and a small internal event model.

View File

@@ -0,0 +1,557 @@
# New Render Cadence App Plan
This plan describes a new application folder that rebuilds the output path from the proven `DeckLinkRenderCadenceProbe` architecture, but as a maintainable app foundation rather than a monolithic probe file.
The first goal is not to port the current compositor feature set. The first goal is to reproduce the probe's smooth 59.94/60 fps DeckLink output with clean module boundaries, tests where possible, and a structure that can later accept the shader/runtime/control systems without compromising timing.
## Working Name
Suggested folder:
```text
apps/RenderCadenceCompositor
```
Suggested executable:
```text
RenderCadenceCompositor
```
The existing app remains intact:
```text
apps/LoopThroughWithOpenGLCompositing
```
The probe remains the control sample:
```text
apps/DeckLinkRenderCadenceProbe
```
## Design Principle
The app is built around one spine:
```text
Render cadence thread
-> owns GL context
-> renders at selected frame cadence
-> performs async BGRA8 readback
-> publishes completed system-memory frames
System frame exchange
-> owns Free / Rendering / Completed / Scheduled slots
-> latest-N semantics for completed unscheduled frames
-> protects scheduled frames until DeckLink completion
DeckLink output thread
-> consumes completed frames
-> schedules to target buffer depth
-> releases scheduled frames on completion
-> never renders
```
Everything else must fit around that spine.
## Non-Negotiable Rules
- The render thread owns its GL context from initialization to shutdown.
- The render thread is driven by selected render cadence, not DeckLink demand.
- DeckLink scheduling never calls render code.
- Completion callbacks never render.
- No synchronous render request exists in the output path.
- Preview, screenshot, input upload, shader rebuild, and runtime control cannot run ahead of a due output frame.
- Completed unscheduled frames are latest-N and disposable.
- Scheduled frames are protected until DeckLink completion.
- Startup warms up real rendered frames before scheduled playback starts.
## Borrow From The Probe
Keep these behaviors from `DeckLinkRenderCadenceProbe`:
- hidden OpenGL context owned by the render thread
- simple render loop with `nextRenderTime`
- BGRA8 render target
- PBO ring readback
- non-blocking fence polling with zero timeout
- system-memory slots with `Free`, `Rendering`, `Completed`, `Scheduled`
- drop oldest completed unscheduled frame if render needs space
- DeckLink playout thread only schedules completed frames
- warmup completed frames before `StartScheduledPlayback()`
- one-line-per-second timing telemetry
## Do Not Borrow Directly
The probe is deliberately compact. Do not carry over these probe limitations into the new app:
- one huge `.cpp` file
- hard-coded output mode as permanent behavior
- render pattern, frame store, PBO logic, DeckLink playout, COM setup, and telemetry mixed together
- no reusable interfaces
- no unit-testable non-GL core
## Proposed Folder Structure
```text
apps/RenderCadenceCompositor/
README.md
RenderCadenceCompositor.cpp
app/
RenderCadenceApp.cpp
RenderCadenceApp.h
AppConfig.cpp
AppConfig.h
platform/
ComInit.cpp
ComInit.h
HiddenGlWindow.cpp
HiddenGlWindow.h
Win32Console.cpp
Win32Console.h
render/
RenderThread.cpp
RenderThread.h
RenderCadenceClock.cpp
RenderCadenceClock.h
SimpleMotionRenderer.cpp
SimpleMotionRenderer.h
Bgra8ReadbackPipeline.cpp
Bgra8ReadbackPipeline.h
PboReadbackRing.cpp
PboReadbackRing.h
frames/
SystemFrameExchange.cpp
SystemFrameExchange.h
SystemFrameTypes.h
video/
DeckLinkOutput.cpp
DeckLinkOutput.h
DeckLinkOutputThread.cpp
DeckLinkOutputThread.h
telemetry/
CadenceTelemetry.cpp
CadenceTelemetry.h
TelemetryPrinter.cpp
TelemetryPrinter.h
```
The new app can reuse selected existing source files from the current app at first:
- `videoio/decklink/DeckLinkSession.*`
- `videoio/decklink/DeckLinkDisplayMode.*`
- `videoio/decklink/DeckLinkVideoIOFormat.*`
- `videoio/decklink/DeckLinkFrameTransfer.*`
- `videoio/VideoIOFormat.*`
- `videoio/VideoIOTypes.h`
- `videoio/VideoPlayoutScheduler.*`
- `gl/renderer/GLExtensions.*`
Longer term, shared code should move into common libraries, but the first version can link these files directly to avoid a big build-system refactor.
## Module Responsibilities
### `RenderCadenceApp`
Owns top-level startup/shutdown sequencing.
Responsibilities:
- initialize COM
- discover/select DeckLink output
- create frame exchange
- start render thread
- wait for completed-frame warmup
- start DeckLink output thread
- wait for scheduled buffer warmup
- start DeckLink scheduled playback
- start telemetry printer
- stop in reverse order
It should not contain OpenGL drawing code, frame slot policy, or DeckLink scheduling loops.
### `AppConfig`
Owns runtime settings for the initial app.
Initial settings:
- output mode preference
- output width/height validation
- frame buffer capacity
- PBO depth
- warmup completed-frame count
- target DeckLink scheduled depth
- telemetry interval
Initial values should match the successful probe:
```text
systemFrameSlots = 12
pboDepth = 6
warmupFrames = 4
targetDeckLinkBufferedFrames = 4
pixelFormat = BGRA8
```
### `HiddenGlWindow`
Owns hidden Win32 window, device context, and OpenGL context creation.
Responsibilities:
- create hidden window with `CS_OWNDC`
- choose/set pixel format
- create `HGLRC`
- expose `MakeCurrent()` and `ClearCurrent()`
- destroy context/window safely
Only `RenderThread` should call `MakeCurrent()` after startup.
### `RenderThread`
Owns the render loop and GL context for its full lifetime.
Responsibilities:
- create/bind hidden GL context
- resolve GL extensions
- initialize renderer/readback pipeline
- run cadence loop
- render one frame when due
- queue PBO readback
- consume completed PBOs into `SystemFrameExchange`
- record telemetry
- destroy GL resources on the render thread
It must not:
- wait for DeckLink
- schedule DeckLink frames
- block on a system frame slot if only completed unscheduled frames can be dropped
- accept arbitrary GL tasks ahead of output frames
### `RenderCadenceClock`
Small, testable cadence helper.
Responsibilities:
- track target frame duration
- return whether a render is due
- compute sleep duration
- detect overrun/skipped ticks
- never speed up to fill buffers
This should be unit tested without GL.
### `SimpleMotionRenderer`
First renderer only.
Responsibilities:
- render obvious smooth motion and color changes
- produce BGRA8-compatible framebuffer content
- make dropped/repeated frames visually obvious
This intentionally avoids shader-package/runtime complexity.
### `Bgra8ReadbackPipeline`
Owns output framebuffer and BGRA8 readback orchestration.
Responsibilities:
- configure render target dimensions
- render into an RGBA8/BGRA-compatible texture
- coordinate `PboReadbackRing`
- publish completed frames into `SystemFrameExchange`
### `PboReadbackRing`
Owns PBO/fence state.
Responsibilities:
- queue readback into the next free PBO slot
- poll completed fences with zero timeout
- map/copy completed PBOs into provided system-memory slots
- count PBO misses
- clean up fences/PBOs on render thread
This is GL-backed, but the state model should be small and easy to reason about.
### `SystemFrameExchange`
The central handoff between render and video.
Responsibilities:
- own system-memory frame buffers
- track slot states: `Free`, `Rendering`, `Completed`, `Scheduled`
- provide `AcquireForRender()`
- provide `PublishCompleted()`
- provide `ConsumeCompletedForSchedule()`
- provide `ReleaseScheduledByBytes()`
- drop oldest completed unscheduled frame when render needs a slot
- expose metrics
This should be unit tested heavily.
### `DeckLinkOutput`
Thin wrapper around `DeckLinkSession` for output-only use.
Responsibilities:
- discover/select output mode
- configure output callback
- prepare output schedule
- schedule app-owned system-memory frames
- start scheduled playback
- stop/release resources
- expose actual DeckLink buffered count
No input support in the first version.
### `DeckLinkOutputThread`
Owns playout scheduling loop.
Responsibilities:
- keep scheduled depth near target
- consume completed frames from `SystemFrameExchange`
- schedule them through `DeckLinkOutput`
- release frame if scheduling fails
- sleep briefly when scheduled buffer is full or no completed frame exists
It must not render.
### `CadenceTelemetry`
Owns counters, not policy.
Initial counters:
- rendered frames
- completed readback frames
- scheduled frames
- completion count
- completed-frame drops
- acquire misses
- schedule underruns
- PBO queue misses
- DeckLink late count
- DeckLink dropped count
- free/rendering/completed/scheduled slot counts
- actual DeckLink buffered frames
### `TelemetryPrinter`
Prints one stable line per interval, matching the probe where possible.
Example:
```text
renderFps=59.9 scheduleFps=59.9 free=7 completed=1 scheduled=4 drops=0 pboMiss=0 completions=119 late=0 dropped=0 decklinkBuffered=4
```
## Startup Sequence
Target first-version startup:
```text
main
-> parse AppConfig
-> initialize COM
-> DeckLinkOutput discover/select/configure output
-> DeckLinkOutput prepare output schedule
-> create SystemFrameExchange
-> start RenderThread
-> wait for completed frame warmup
-> start DeckLinkOutputThread
-> wait for scheduled depth warmup
-> DeckLinkOutput start scheduled playback
-> start TelemetryPrinter
-> wait for Enter
```
Shutdown:
```text
stop TelemetryPrinter
stop DeckLinkOutputThread
DeckLinkOutput stop playback
stop RenderThread
DeckLinkOutput release resources
release COM
```
## First Milestone: Modular Probe Equivalent
This is the only goal for the initial implementation.
Feature set:
- console app
- output-only DeckLink
- no input
- hidden GL context
- simple motion renderer
- BGRA8 only
- PBO async readback
- latest-N system-memory frame exchange
- warmup before playback
- one-line telemetry
Acceptance:
- visible DeckLink output is smooth
- `renderFps` near selected cadence
- `scheduleFps` near selected cadence
- scheduled count/decklink buffered count stable around 4
- no continuous late/drop count
- no continuous PBO misses
- behavior matches or exceeds `DeckLinkRenderCadenceProbe`
## Second Milestone: Testable Core
Before porting compositor features, add tests for non-GL/non-DeckLink pieces.
Test targets:
- `SystemFrameExchangeTests`
- `RenderCadenceClockTests`
- `CadenceTelemetryTests`
Important cases:
- slot lifecycle transitions
- scheduled slots are protected
- completed unscheduled frames can be dropped
- stale handles/generations are rejected
- cadence does not speed up to refill buffers
- cadence records overrun/skipped ticks
## Third Milestone: Replace Simple Renderer With Render Interface
Add an interface around frame rendering:
```text
IRenderScene
-> InitializeGl()
-> RenderFrame(frameIndex, time)
-> ShutdownGl()
```
The first implementation remains `SimpleMotionRenderer`.
This creates the insertion point for shader-package rendering later without changing timing/scheduling.
## Fourth Milestone: Begin Porting Current App Features
Port only after the modular probe equivalent is stable.
Suggested order:
1. shader package compile/load
2. render pass/layer stack drawing
3. runtime snapshot input to renderer
4. live state overlays
5. control services
6. persistence/runtime store
7. preview from system-memory frames
8. screenshot from system-memory frames
9. input capture via CPU latest-frame mailbox
Each port must preserve the rule that the render thread cadence is primary.
## What Not To Port Early
Do not port these until the output spine is proven:
- DeckLink input
- preview GL presentation
- screenshot GL readback
- HTTP/OSC control services
- shader hot reload
- persistence
- runtime state JSON/open API
- complex telemetry/event dispatch
These are useful, but they are exactly the kinds of features that can accidentally reintroduce timing coupling.
## Build Plan
Initial CMake can follow the probe pattern:
```cmake
set(RENDER_CADENCE_APP_DIR "${CMAKE_CURRENT_SOURCE_DIR}/apps/RenderCadenceCompositor")
add_executable(RenderCadenceCompositor
# selected shared DeckLink/video/gl support files
# new modular app files
)
```
Later, shared source should be split into libraries:
```text
video_shader_decklink
video_shader_videoio
video_shader_gl_support
render_cadence_core
```
Avoid doing that library split before the first modular app works.
## VS Code Launch
Add a separate launch profile:
```text
Debug RenderCadenceCompositor
```
Run it as a console app so telemetry remains visible.
## Documentation
Add:
```text
apps/RenderCadenceCompositor/README.md
```
The README should record:
- intended architecture
- build/run instructions
- expected telemetry
- test result notes
- differences from the old app
- differences from the probe
## Success Criteria Before Porting More Features
Do not start feature porting until the new app can run with:
- stable smooth DeckLink output
- stable target scheduled depth
- stable actual DeckLink buffered count
- no regular visible freezes
- no steady PBO misses
- no steadily increasing late/dropped completions
- focus/minimize changes do not affect output cadence
- clean shutdown without hangs
This gives us a clean foundation. Once this is true, every feature added later has to prove it does not damage the spine.

View File

@@ -0,0 +1,448 @@
# Render Thread Ownership Plan
This plan describes how to make the main compositor behave like the successful `DeckLinkRenderCadenceProbe`: one render cadence owner, one GL context owner, no unrelated work able to interrupt output frame production.
The goal is not just "all GL calls happen on one thread". The current app mostly does that during runtime already. The real goal is:
- the output render thread owns its GL context for its whole lifetime
- output cadence is driven by the render thread, not by DeckLink completion timing
- non-output GL work cannot sit ahead of output frames
- callers cannot block the render thread while waiting for synchronous answers
- DeckLink scheduling consumes completed system-memory frames and never causes rendering
## Current Risk Points
The current main app still has several ways to interrupt output cadence.
### Shared GL Executor
`RenderEngine` owns the GL context during runtime, but it acts as a general task executor.
The same queue/path can run:
- output frame render
- input upload
- preview present
- screenshot capture
- render resets
- shader/program commits
- resource resize
- state clearing
That means output frames are not guaranteed to be the next GL work item at the selected frame time.
### Synchronous Output Render Request
`VideoBackend` drives output production from its output producer thread, then calls:
```text
VideoBackend
-> OpenGLVideoIOBridge::RenderScheduledFrame
-> RenderEngine::RequestOutputFrame
-> TryInvokeOnRenderThread
```
That makes output production a request/response interaction. The producer waits for the render thread, and the render thread is still shared with other work.
### Input Upload Shares Output Context
DeckLink input capture currently flows into:
```text
VideoBackend::HandleInputFrame
-> OpenGLVideoIOBridge::UploadInputFrame
-> RenderEngine::QueueInputFrame
-> render thread upload
```
Even with coalescing, input upload can consume render-thread time and GPU bandwidth directly before output rendering.
### Preview And Screenshot Share Output Context
Preview and screenshot are lower-priority features, but today they still execute on the render thread.
Preview is best-effort at the caller side, but once queued it can still occupy the same context. Screenshot capture can be more expensive because it performs readback and CPU-side image preparation.
### Startup Context Ownership Is Transitional
The Win32 startup path creates and binds the GL context before `RenderEngine::StartRenderThread()`.
That is acceptable as a transitional state, but the final model should make context ownership explicit:
- bootstrap thread creates the window/context
- bootstrap thread releases it
- render thread binds it
- only render thread initializes GL resources
- only render thread destroys GL resources
### Render Callback Re-enters App State
`OpenGLRenderPipeline::RenderFrame()` calls a callback into `OpenGLComposite::renderEffect()`.
That callback builds `RenderFrameInput`, resolves frame state, drains runtime live state, and then calls back into `RenderEngine` to draw the prepared frame.
This works, but it means the output render path still reaches up into app/runtime code at frame time.
## Target Runtime Shape
The main app should match this ownership model:
```text
runtime/control threads
-> publish snapshots, live overlays, reset requests, shader-build results
-> never call GL
render cadence thread
-> sole owner of output GL context
-> wakes at selected render cadence
-> samples latest render input/state
-> renders one frame
-> queues async readback/copies completed readback into system-memory slot
-> publishes completed frame to latest-N output buffer
video output thread
-> consumes completed system-memory frames
-> schedules DeckLink frames to target buffer depth
-> processes completion results
-> never calls GL
optional input upload path
-> writes latest input frame into CPU-side latest-frame buffer
-> render thread imports/uploads at a controlled point in its frame
preview/screenshot path
-> consumes already-rendered output/system-memory frame when possible
-> never interrupts output render cadence
```
## Non-Negotiable Rules
- The render thread never waits for DeckLink.
- DeckLink callbacks never render.
- Runtime/control threads never directly execute GL.
- Preview and screenshot never execute ahead of output frames.
- Input upload is never a separate urgent GL task ahead of output render.
- Shader/resource commits are applied only at a frame boundary.
- Telemetry on the hot path must be lock-light or try-lock only.
- The render thread cadence does not speed up to refill buffers.
- If output work overruns, the render thread records the overrun and resumes the selected cadence policy.
## Implementation Plan
### 1. Add Thread/Context Ownership Guards
Add explicit render-thread ownership checks around all GL entry points.
Deliverables:
- `RenderEngine` exposes `IsOnRenderThread()` for assertions/tests.
- GL-facing classes get debug-only owner checks where practical.
- wrong-thread GL access becomes a counted telemetry warning, not just `OutputDebugStringA`.
- tests cover that public request methods do not execute GL directly.
Acceptance:
- every `RenderEngine` public method is classified as either request-only, lifecycle-only, or render-thread-only.
- render-thread-only methods are private or guarded.
- no normal runtime caller can accidentally invoke GL work inline.
### 2. Move GL Initialization Fully Onto The Render Thread
Start the render thread before compiling shaders and initializing GL resources.
Current startup does:
```text
InitOpenGLState()
-> CompileDecodeShader
-> CompileOutputPackShader
-> InitializeResources
-> CompileLayerPrograms
StartRenderThread()
```
Move toward:
```text
create context on Win32 thread
release context on Win32 thread
StartRenderThread()
render thread binds context
render thread initializes extensions, shaders, resources
```
Deliverables:
- a single `RenderEngine::StartAndInitialize(RenderInitializationConfig)` path.
- GL extension resolution happens on the render thread.
- shader/resource initialization is a render-thread startup phase.
- `RenderEngine` destructor only destroys resources on the render thread.
Acceptance:
- after `StartRenderThread()`, no non-render thread binds or uses the app GL context.
- shutdown order is deterministic: stop video output, stop render cadence, destroy GL resources, release context.
### 3. Replace Synchronous Output Render Requests With Render-Owned Cadence
Move output cadence out of `VideoBackend` and into the render system.
Current:
```text
VideoBackend output producer
-> cadence tick
-> acquire output slot
-> synchronous render-thread request
```
Target:
```text
RenderEngine output cadence loop
-> cadence tick
-> acquire/free output slot through a non-blocking frame-sink interface
-> render frame
-> publish completed frame
```
Deliverables:
- introduce `RenderedFrameSink` or similar interface owned by video output.
- render thread pulls/claims a free system-memory slot without waiting.
- if no free slot exists, render thread drops/recycles the oldest unscheduled completed frame or records backpressure without blocking.
- remove `RenderEngine::RequestOutputFrame()` from the steady-state output path.
Acceptance:
- output rendering continues even if DeckLink completion is delayed.
- no `std::future` wait exists in the output cadence path.
- `VideoBackend` no longer owns the producer render loop; it owns scheduling/completion only.
### 4. Make The Render Thread A Frame Loop, Not A Task Queue
Keep a command mailbox, but process it only at safe frame-boundary points.
Frame loop:
```text
while running:
wait until next render timestamp
apply bounded frame-boundary commands
sample latest frame input/state
upload latest input frame if enabled and budget allows
render output frame
queue/consume readback
publish completed frame
record timings
```
Command classes:
- frame-boundary commands: reset temporal history, reset shader feedback, commit prepared shader programs
- background/low-priority commands: preview, screenshot, diagnostic readback
- non-GL commands: state publication, telemetry, persistence
Deliverables:
- replace FIFO render task queue with a priority/mailbox model.
- output cadence is the loop's main clock.
- commands have budget classes and max work per frame.
- long commands are deferred rather than blocking the current output tick.
Acceptance:
- preview/screenshot cannot run immediately before a due output frame.
- reset/shader work is applied between frames and measured.
- output render starts within a small jitter window when the GPU is not overrun.
### 5. Move Input Capture To A CPU Latest-Frame Buffer
Input capture should not enqueue independent GL upload tasks.
Target:
```text
DeckLink input callback
-> copy/coalesce latest CPU input frame
-> return quickly
render thread frame boundary
-> if input version changed, upload latest frame
-> render using last successfully uploaded input texture
```
Deliverables:
- introduce `InputFrameMailbox` with latest-frame semantics.
- remove `RenderEngine::QueueInputFrame()` from the callback path.
- render thread owns the upload moment.
- if upload would exceed budget, render thread can reuse the previous input texture and record an input-upload skip.
Acceptance:
- input capture enabled does not create arbitrary render-thread tasks.
- output cadence remains stable when input frames arrive.
- telemetry separates input-frame arrival, upload count, upload skips, and upload cost.
### 6. Move Preview To A Consumer Path
Preview should consume the latest completed output image instead of asking the output GL context to present.
Options:
- CPU preview from latest system-memory output frame.
- a separate preview GL context fed asynchronously from completed frames.
- a low-priority render-thread blit only when output has measurable slack.
Recommended first step:
- use latest system-memory BGRA8 output for the window preview.
Deliverables:
- preview reads from latest completed/scheduled output frame copy.
- `TryPresentPreview()` no longer queues GL work on the output render thread.
- preview FPS throttling remains caller-side.
Acceptance:
- forcing preview cannot delay output rendering.
- minimizing/focusing the window does not affect output cadence.
### 7. Move Screenshot To Completed Frame Capture
Screenshot should capture from the latest completed output frame unless an explicit "exact render capture" mode is requested.
Deliverables:
- screenshot request reads the latest system-memory output frame.
- PNG write remains async.
- optional diagnostic exact-GL screenshot is disabled during live output or explicitly marked disruptive.
Acceptance:
- screenshot request does not call `glReadPixels` on the output render context during steady-state playout.
### 8. Make Shader Commits Frame-Boundary Work
Prepared shader builds are CPU/background work; GL program commit is still GL work.
Deliverables:
- shader build queue produces `PreparedShaderBuild`.
- render thread sees latest pending prepared build at a frame boundary.
- commit is applied only between frames.
- expensive commits can temporarily enter a measured "render reconfigure" state.
Acceptance:
- shader commits do not interleave midway through output render.
- output timing telemetry records commit duration separately from normal render duration.
### 9. Split Output Scheduling From Rendering Completely
`VideoBackend` should become a playout/scheduling owner, not a render producer.
Target:
```text
RenderEngine
-> produces completed frames at render cadence
VideoBackend
-> schedules completed frames up to target DeckLink depth
-> processes completions
-> releases scheduled slots
```
Deliverables:
- `VideoBackend` owns `SystemOutputFramePool`, or a new `SystemFrameExchange` owns it between render/video.
- render thread publishes completed frames into the exchange.
- video output thread schedules from the exchange.
- no render calls exist in completion handling or scheduling paths.
Acceptance:
- DeckLink buffer depth changes cannot directly cause render-thread wakeups except through non-blocking availability signals.
- render cadence can be tested without DeckLink by using a fake frame sink.
- video scheduling can be tested without GL by using synthetic frames.
### 10. Preserve The Probe As The Reference Contract
The `DeckLinkRenderCadenceProbe` is now the control sample.
Deliverables:
- document which main-app components correspond to the probe components.
- add a small regression checklist:
- render FPS near target
- schedule FPS near target
- DeckLink buffered frames stable
- no late/drop frames
- no PBO misses or readback stalls
- focus/minimize does not change output cadence
Acceptance:
- after each migration step, compare the main app telemetry against the probe's known-good behavior.
## Suggested Order Of Work
1. Add ownership guards and classify render methods.
2. Move GL initialization/destruction fully onto the render thread.
3. Introduce a render-owned cadence loop behind a feature flag.
4. Add a frame-sink/exchange interface between render and video.
5. Move output production from `VideoBackend` to the render cadence loop.
6. Convert input upload to latest-frame mailbox semantics.
7. Move preview to completed-frame consumption.
8. Move screenshot to completed-frame capture.
9. Convert shader commits/resets to frame-boundary mailbox commands.
10. Remove old synchronous output render request path.
## Feature Flags During Migration
Use flags only to keep testing safe, not as long-term compatibility layers.
Suggested flags:
```text
VST_RENDER_CADENCE_OWNER=render_thread
VST_DISABLE_INPUT_CAPTURE=1
VST_PREVIEW_SOURCE=system_frame
VST_SCREENSHOT_SOURCE=system_frame
```
Remove each flag once the new behavior is proven and becomes the only supported path.
## Telemetry Needed
Add or preserve counters for:
- render tick jitter
- render tick overrun
- output render duration
- GL command mailbox depth by class
- frame-boundary command duration
- input upload duration and skips
- readback queue/consume duration
- completed system-memory frame depth
- scheduled DeckLink frame depth
- DeckLink actual buffered frames
- preview frames consumed
- screenshot requests served from system memory
The key metric is whether output render starts on time. Buffer depth alone is not enough; a full buffer can still contain stale or repeated frames.
## Completion Definition
This work is complete when:
- the output render thread owns the app GL context from initialization through shutdown
- output rendering is driven by the render thread's selected frame cadence
- no non-output task can run ahead of a due output frame
- `VideoBackend` never asks the render thread to render synchronously
- DeckLink scheduling consumes already completed system-memory frames
- input upload, preview, screenshot, shader commits, and resets are all frame-boundary, mailbox, or consumer-side operations
- main-app telemetry approaches the cadence probe behavior under the same output mode