Finished phase 1
Some checks failed
CI / React UI Build (push) Successful in 10s
CI / Native Windows Build And Tests (push) Successful in 2m18s
CI / Windows Release Package (push) Has been cancelled

This commit is contained in:
Aiden
2026-05-11 02:32:13 +10:00
parent 9cbb5d8004
commit 41677b71ec
14 changed files with 325 additions and 95 deletions

View File

@@ -4,7 +4,7 @@ This document expands the `HealthTelemetry` subsystem introduced in [PHASE_1_SUB
`HealthTelemetry` is the subsystem that owns operational visibility for the app. Its purpose is to gather health state, warnings, counters, logs, and timing observations from the other subsystems and publish them in a structured way without becoming a second control plane.
Today, those responsibilities are fragmented across `RuntimeHost` status setters, ad hoc `OutputDebugStringA` calls, callback-local warnings, and UI-facing runtime-state payloads. The result is that the app can often detect problems, but it does not yet have one clear place that answers:
Before the Phase 1 runtime split, those responsibilities were fragmented across `RuntimeHost` status setters, ad hoc `OutputDebugStringA` calls, callback-local warnings, and UI-facing runtime-state payloads. The result was that the app could often detect problems, but did not yet have one clear place that answered:
- what is healthy right now
- what is degraded right now
@@ -16,14 +16,14 @@ Today, those responsibilities are fragmented across `RuntimeHost` status setters
## Why This Subsystem Exists
The current code already contains meaningful health and timing signals, but they are spread through unrelated ownership domains:
The codebase already contains meaningful health and timing signals, but some are still spread through unrelated ownership domains:
- `RuntimeHost` stores signal and timing status:
- previous `RuntimeHost` status fields stored signal and timing status:
- `RuntimeHost.h`
- `RuntimeHost.cpp`
- `RuntimeHost.cpp`
- `RuntimeHost.cpp`
- render and bridge code report timing by writing back into `RuntimeHost`:
- render and bridge code historically reported timing by writing back into `RuntimeHost`:
- [OpenGLRenderPipeline.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/pipeline/OpenGLRenderPipeline.cpp:50)
- [OpenGLVideoIOBridge.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/pipeline/OpenGLVideoIOBridge.cpp:49)
- backend warning paths still log directly:
@@ -42,7 +42,7 @@ This creates several recurring problems:
- logging is mostly text-first instead of structured-first
- recovery behavior is hard to audit because the app does not retain a coherent health snapshot
`HealthTelemetry` exists so later phases can move timing and health concerns out of `RuntimeHost`, out of callback-local logging, and into one subsystem whose only job is observation and reporting.
`HealthTelemetry` exists so timing and health concerns have one subsystem whose only job is observation and reporting, instead of drifting back into runtime storage, callback-local logging, or UI payload assembly.
## Design Goals
@@ -373,9 +373,9 @@ Expected observations:
The current codebase already contains several telemetry responsibilities that should migrate here.
### `RuntimeHost` Status Setters
### Previous `RuntimeHost` Status Setters
These are the clearest existing candidates:
These were the clearest initial migration candidates:
- `SetSignalStatus(...)`
- `TrySetSignalStatus(...)`
@@ -391,7 +391,7 @@ See:
- `RuntimeHost.cpp`
- `RuntimeHost.cpp`
In the target architecture, this kind of state should no longer sit on the same object that owns persistent layer truth.
In the target architecture, this kind of state should not sit on the same object that owns persistent layer truth.
### Render Timing Production
@@ -403,7 +403,7 @@ That timing sample should conceptually become:
- `RenderEngine -> HealthTelemetry::RecordTimingSample(...)`
not:
not the old pattern:
- `RenderEngine -> RuntimeHost::TrySetPerformanceStats(...)`
@@ -468,7 +468,7 @@ So the design should assume:
- bounded memory
- no long-held global mutex that callbacks and render both depend on
Phase 1 does not require lock-free implementation, but it does require the architecture to avoid recreating the `RuntimeHost` problem where health writes share the same lock as durable state and render-facing concerns.
Phase 1 does not require lock-free implementation, but it does require the architecture to avoid recreating the old problem where health writes share the same lock as durable state and render-facing concerns.
Practical expectations:
@@ -494,13 +494,13 @@ Initial responsibilities:
The first implementation can still be backed by simple in-memory structures.
### Step 2: Move New Observations Off `RuntimeHost`
### Step 2: Keep New Observations Off Runtime Storage
Before removing old setters, route new health-style work into `HealthTelemetry` instead of adding more `RuntimeHost` status fields.
Route new health-style work into `HealthTelemetry` instead of adding more status fields to runtime storage.
This prevents the old status surface from growing during migration.
### Step 3: Replace `RuntimeHost` Status Setters With Telemetry Producers
### Step 3: Replace Legacy Status Setters With Telemetry Producers
Refactor:
@@ -641,4 +641,4 @@ It should not:
- coordinate recovery actions
- become a replacement for the render or backend policy layers
If this boundary holds, later phases can remove timing and warning state from `RuntimeHost` and move toward a much more diagnosable live system.
If this boundary holds, later phases can keep moving toward a much more diagnosable live system without putting timing and warning state back into runtime storage.