Improvement
This commit is contained in:
@@ -163,3 +163,53 @@ Read:
|
||||
Removing ongoing GPU readback recovers output timing immediately. The direct cause of the Phase 7.5 playback collapse is the per-frame GPU-to-CPU readback cost, not DeckLink frame acquisition, output frame end-access, PBO allocation, fence waiting, or CPU copy.
|
||||
|
||||
The internal ready queue depth still being low while DeckLink reports a healthy device buffer suggests the ready queue is acting as a short staging queue rather than the full device playout buffer. For the next fix, prioritize avoiding a blocking readback on every output frame instead of only increasing internal ready queue depth.
|
||||
|
||||
## Experiment 4: BGRA8 pack framebuffer async readback
|
||||
|
||||
Status: sampled
|
||||
|
||||
Date: 2026-05-11
|
||||
|
||||
Change:
|
||||
|
||||
- The output path now packs/blits the final output into a BGRA8-compatible framebuffer before readback.
|
||||
- Async readback reads from the pack framebuffer using `GL_BGRA` / `GL_UNSIGNED_INT_8_8_8_8_REV`.
|
||||
- The deeper async PBO ring remains active.
|
||||
|
||||
Question:
|
||||
|
||||
Does making the GPU output/readback format match the DeckLink BGRA8 scheduling format reduce the driver-side `glReadPixels` stall?
|
||||
|
||||
User-visible result:
|
||||
|
||||
- Long pauses appear to be gone.
|
||||
- Playback still stutters, but the stutters look limited to a few frames rather than multi-second freezes.
|
||||
|
||||
Telemetry summary:
|
||||
|
||||
- Throughput recovered to roughly real time in the sampled window.
|
||||
- Over 5 seconds, the app pushed and popped 305 output frames.
|
||||
- `asyncQueueReadPixelsMs` dropped from the earlier 8-14 ms range to roughly 0.05-0.13 ms in the representative samples.
|
||||
- `renderMs` usually sat around 2-5 ms in the sampled burst.
|
||||
- Late and dropped frame counts did not increase during the 5 second delta sample.
|
||||
- The ready queue still repeatedly touched 0 and accumulated underruns, which matches the remaining short stutters.
|
||||
|
||||
Representative samples:
|
||||
|
||||
| readyDepth | renderMs | smoothedRenderMs | drawMs | mapMs | copyMs | asyncQueueReadPixelsMs | queueWaitMs |
|
||||
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| 0 | 4.855 | 9.494 | 0.570 | 0.234 | 0.822 | 0.128 | 0.026 |
|
||||
| 0 | 1.957 | 9.041 | 0.468 | 0.139 | 0.604 | 0.048 | 0.016 |
|
||||
| 0 | 3.366 | 5.879 | 0.513 | 1.166 | 0.692 | 0.129 | 0.022 |
|
||||
| 0 | 5.208 | 6.492 | 2.209 | 1.358 | 0.714 | 0.090 | 0.061 |
|
||||
| 0 | 2.957 | 8.852 | 0.537 | 1.041 | 0.547 | 0.087 | 0.040 |
|
||||
|
||||
Five-second delta:
|
||||
|
||||
| pushed | popped | ready underruns | zero-depth samples | late delta | dropped delta | scheduled lead |
|
||||
| ---: | ---: | ---: | ---: | ---: | ---: | ---: |
|
||||
| 305 | 305 | 129 | 671 | 0 | 0 | 20 |
|
||||
|
||||
Read:
|
||||
|
||||
The main readback stall appears to have been the previous format/path combination, not unavoidable BGRA8 bandwidth. The remaining problem now looks like cadence and buffering: the producer can average real-time throughput again, but the ready queue still runs empty often enough to create visible short stutters.
|
||||
|
||||
Reference in New Issue
Block a user