Finished phase 1

2026-05-11 02:32:13 +10:00
parent 9cbb5d8004
commit 41677b71ec
14 changed files with 325 additions and 95 deletions
--- a/docs/subsystems/HealthTelemetry.md
+++ b/docs/subsystems/HealthTelemetry.md
@@ -4,7 +4,7 @@ This document expands the `HealthTelemetry` subsystem introduced in [PHASE_1_SUB

 `HealthTelemetry` is the subsystem that owns operational visibility for the app. Its purpose is to gather health state, warnings, counters, logs, and timing observations from the other subsystems and publish them in a structured way without becoming a second control plane.

-Today, those responsibilities are fragmented across `RuntimeHost` status setters, ad hoc `OutputDebugStringA` calls, callback-local warnings, and UI-facing runtime-state payloads. The result is that the app can often detect problems, but it does not yet have one clear place that answers:
+Before the Phase 1 runtime split, those responsibilities were fragmented across `RuntimeHost` status setters, ad hoc `OutputDebugStringA` calls, callback-local warnings, and UI-facing runtime-state payloads. The result was that the app could often detect problems, but did not yet have one clear place that answered:

 - what is healthy right now
 - what is degraded right now
@@ -16,14 +16,14 @@ Today, those responsibilities are fragmented across `RuntimeHost` status setters

 ## Why This Subsystem Exists

-The current code already contains meaningful health and timing signals, but they are spread through unrelated ownership domains:
+The codebase already contains meaningful health and timing signals, but some are still spread through unrelated ownership domains:

- `RuntimeHost` stores signal and timing status:
+- previous `RuntimeHost` status fields stored signal and timing status:
  - `RuntimeHost.h`
  - `RuntimeHost.cpp`
  - `RuntimeHost.cpp`
  - `RuntimeHost.cpp`
- render and bridge code report timing by writing back into `RuntimeHost`:
+- render and bridge code historically reported timing by writing back into `RuntimeHost`:
  - [OpenGLRenderPipeline.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/pipeline/OpenGLRenderPipeline.cpp:50)
  - [OpenGLVideoIOBridge.cpp](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/apps/LoopThroughWithOpenGLCompositing/gl/pipeline/OpenGLVideoIOBridge.cpp:49)
 - backend warning paths still log directly:
@@ -42,7 +42,7 @@ This creates several recurring problems:
 - logging is mostly text-first instead of structured-first
 - recovery behavior is hard to audit because the app does not retain a coherent health snapshot

-`HealthTelemetry` exists so later phases can move timing and health concerns out of `RuntimeHost`, out of callback-local logging, and into one subsystem whose only job is observation and reporting.
+`HealthTelemetry` exists so timing and health concerns have one subsystem whose only job is observation and reporting, instead of drifting back into runtime storage, callback-local logging, or UI payload assembly.

 ## Design Goals

@@ -373,9 +373,9 @@ Expected observations:

 The current codebase already contains several telemetry responsibilities that should migrate here.

-### `RuntimeHost` Status Setters
+### Previous `RuntimeHost` Status Setters

-These are the clearest existing candidates:
+These were the clearest initial migration candidates:

 - `SetSignalStatus(...)`
 - `TrySetSignalStatus(...)`
@@ -391,7 +391,7 @@ See:
 - `RuntimeHost.cpp`
 - `RuntimeHost.cpp`

-In the target architecture, this kind of state should no longer sit on the same object that owns persistent layer truth.
+In the target architecture, this kind of state should not sit on the same object that owns persistent layer truth.

 ### Render Timing Production

@@ -403,7 +403,7 @@ That timing sample should conceptually become:

 - `RenderEngine -> HealthTelemetry::RecordTimingSample(...)`

-not:
+not the old pattern:

 - `RenderEngine -> RuntimeHost::TrySetPerformanceStats(...)`

@@ -468,7 +468,7 @@ So the design should assume:
 - bounded memory
 - no long-held global mutex that callbacks and render both depend on

-Phase 1 does not require lock-free implementation, but it does require the architecture to avoid recreating the `RuntimeHost` problem where health writes share the same lock as durable state and render-facing concerns.
+Phase 1 does not require lock-free implementation, but it does require the architecture to avoid recreating the old problem where health writes share the same lock as durable state and render-facing concerns.

 Practical expectations:

@@ -494,13 +494,13 @@ Initial responsibilities:

 The first implementation can still be backed by simple in-memory structures.

-### Step 2: Move New Observations Off `RuntimeHost`
+### Step 2: Keep New Observations Off Runtime Storage

-Before removing old setters, route new health-style work into `HealthTelemetry` instead of adding more `RuntimeHost` status fields.
+Route new health-style work into `HealthTelemetry` instead of adding more status fields to runtime storage.

 This prevents the old status surface from growing during migration.

-### Step 3: Replace `RuntimeHost` Status Setters With Telemetry Producers
+### Step 3: Replace Legacy Status Setters With Telemetry Producers

 Refactor:

@@ -641,4 +641,4 @@ It should not:
 - coordinate recovery actions
 - become a replacement for the render or backend policy layers

-If this boundary holds, later phases can remove timing and warning state from `RuntimeHost` and move toward a much more diagnosable live system.
+If this boundary holds, later phases can keep moving toward a much more diagnosable live system without putting timing and warning state back into runtime storage.