Files
video-shader-toys/docs/PHASE_2_INTERNAL_EVENT_MODEL_DESIGN.md
Aiden b3705d96cc
Some checks failed
CI / React UI Build (push) Successful in 11s
CI / Native Windows Build And Tests (push) Successful in 2m37s
CI / Windows Release Package (push) Has been cancelled
event dispatcher
2026-05-11 15:15:42 +10:00

600 lines
23 KiB
Markdown

# Phase 2 Design: Internal Event Model
This document expands Phase 2 of [ARCHITECTURE_RESILIENCE_REVIEW.md](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/docs/ARCHITECTURE_RESILIENCE_REVIEW.md) into a concrete design target.
Phase 1 established the subsystem vocabulary and moved the runtime path behind clearer collaborators. Phase 2 should now give those subsystems a safer way to coordinate than direct cross-calls, shared mutable result queues, and coarse polling loops.
## Status
- Phase 2 design package: proposed.
- Phase 2 implementation: not started.
The current repo already has useful footholds:
- `ControlServices` owns OSC/web/file-watch ingress and queues service-side work.
- `RuntimeCoordinator` owns mutation validation, classification, and coordinator result policy.
- `RuntimeUpdateController` applies coordinator outcomes and bridges toward render, shader builds, broadcasts, and backend state.
- `RuntimeSnapshotProvider` publishes render-facing snapshots.
- `HealthTelemetry` owns status/timing snapshots.
Those are good boundaries. The Phase 2 job is to stop using "poll, drain, then interpret side effects" as the main coordination style between them.
## Why Phase 2 Exists
The resilience review calls out three timing and ownership problems that an event model can directly improve:
- background service timing still relies on coarse sleeps and polling
- control, reload, persistence, and render-update work still travel through mixed shared state and result queues
- later render/backend refactors need a stable coordination model before they move more work across threads
The goal is not to make the app fully asynchronous in one pass. It is to introduce typed internal events so each subsystem can publish what happened without knowing who will react or how many downstream effects are needed.
## Goals
Phase 2 should establish:
- a small typed event vocabulary for control, runtime, render, backend, persistence, and health coordination
- one app-owned event pump or dispatcher that can route events deterministically
- bounded queues with clear ownership and no unbounded background growth
- wakeup-driven service coordination where practical, replacing coarse polling as the default shape
- explicit event-to-command boundaries so events do not become hidden global mutation APIs
- tests for event ordering, coalescing, rejection, and dispatch side effects
## Non-Goals
Phase 2 should not require:
- a dedicated render thread yet
- a full actor system
- lock-free queues everywhere
- background persistence implementation
- a complete DeckLink state machine
- final live-state layering
- replacing every direct call in one change
Those are later phases. Phase 2 provides the coordination substrate they can build on.
## Current Coordination Shape
The current runtime is much cleaner than before Phase 1, but coordination is still mostly pull-based:
- `ControlServices::PollLoop(...)` drains pending OSC commits, polls runtime file changes, queues `RuntimeCoordinatorResult` objects, then sleeps.
- `RuntimeUpdateController::ProcessRuntimeWork()` consumes queued coordinator results, applies them, and then checks whether a prepared shader build is ready.
- `RuntimeCoordinatorResult` carries many downstream effects: shader build request, compile status update, transient OSC clear, runtime-state broadcast, committed-state mode, render reset scope.
- shader-build readiness is polled from the app update path.
- runtime-state broadcasts are requested by direct calls rather than by an event publication contract.
This works, but it keeps timing behavior implicit. Phase 2 should make those transitions visible as typed events.
## Event Model Principles
### Events say what happened
Events should describe facts:
- `OscValueReceived`
- `RuntimeMutationAccepted`
- `RuntimeMutationRejected`
- `ShaderReloadRequested`
- `ShaderBuildPrepared`
- `ShaderBuildFailed`
- `RenderSnapshotPublished`
- `RuntimeStateBroadcastRequested`
They should not be vague commands like "do everything needed now."
### Commands request intent
Some work is still naturally command-shaped:
- "apply this parameter mutation"
- "request shader reload"
- "save this stack preset"
- "start backend output"
Commands enter an owner subsystem. Events leave a subsystem after the owner has accepted, rejected, or completed work.
### One owner mutates each state category
Events must not become a way to bypass Phase 1 ownership:
- `RuntimeCoordinator` remains the owner of mutation policy.
- `RuntimeStore` remains the owner of durable state.
- `RuntimeSnapshotProvider` remains the owner of render snapshot publication.
- `RenderEngine` remains the owner of render-local transient state.
- `VideoBackend` remains the owner of device lifecycle and pacing.
- `HealthTelemetry` observes and reports, but does not coordinate behavior.
### Event handlers should be small
Handlers should translate events into owner calls or follow-up events. They should not accumulate hidden long-lived state unless that state belongs to the handler's subsystem.
### Queues must be bounded or coalesced
High-rate control traffic can arrive faster than the app should process every individual sample. Phase 2 should preserve the useful current behavior of coalescing OSC updates by route, but make the coalescing policy explicit.
## Event Families
### Control Events
Produced by `ControlServices`.
Examples:
- `OscValueReceived`
- `OscValueCoalesced`
- `OscCommitRequested`
- `HttpControlMutationRequested`
- `WebSocketClientConnected`
- `RuntimeStateBroadcastRequested`
- `FileChangeDetected`
- `ManualReloadRequested`
Primary consumers:
- `RuntimeCoordinator`
- `HealthTelemetry`
- later, a persistence writer or diagnostics publisher
### Runtime Events
Produced by `RuntimeCoordinator`, `RuntimeStore`, and snapshot publication code.
Examples:
- `RuntimeMutationAccepted`
- `RuntimeMutationRejected`
- `RuntimeStateChanged`
- `RuntimePersistenceRequested`
- `RuntimeReloadRequested`
- `ShaderPackagesChanged`
- `RenderSnapshotPublishRequested`
- `RuntimeStatePresentationChanged`
Primary consumers:
- `RuntimeSnapshotProvider`
- `RenderEngine`
- `ControlServices`
- `HealthTelemetry`
- later, `PersistenceWriter`
### Shader Build Events
Produced by shader build orchestration and render-side build application.
Examples:
- `ShaderBuildRequested`
- `ShaderBuildPrepared`
- `ShaderBuildApplied`
- `ShaderBuildFailed`
- `CompileStatusChanged`
Primary consumers:
- `RenderEngine`
- `RuntimeCoordinator`
- `ControlServices`
- `HealthTelemetry`
### Render Events
Produced by `RenderEngine` and `RuntimeSnapshotProvider`.
Examples:
- `RenderSnapshotPublished`
- `RenderResetRequested`
- `RenderResetApplied`
- `OscOverlayApplied`
- `OscOverlaySettled`
- `FrameRendered`
- `PreviewFrameAvailable`
Primary consumers:
- `RenderEngine`
- `ControlServices`
- `VideoBackend`
- `HealthTelemetry`
### Backend Events
Produced by `VideoBackend` and backend adapters.
Examples:
- `InputSignalChanged`
- `InputFrameArrived`
- `OutputFrameScheduled`
- `OutputFrameCompleted`
- `OutputLateFrameDetected`
- `OutputDroppedFrameDetected`
- `BackendStateChanged`
Primary consumers:
- `RenderEngine`
- `HealthTelemetry`
- later, backend lifecycle state machine handlers
### Health Events
Produced by all major subsystems.
Examples:
- `SubsystemWarningRaised`
- `SubsystemWarningCleared`
- `SubsystemRecovered`
- `TimingSampleRecorded`
- `QueueDepthChanged`
Primary consumer:
- `HealthTelemetry`
Health events should be observational. They should not be required for core behavior to proceed.
## Event Envelope
A practical initial event envelope can stay simple:
```cpp
enum class RuntimeEventType
{
OscCommitRequested,
RuntimeMutationAccepted,
RuntimeMutationRejected,
RuntimeReloadRequested,
ShaderBuildRequested,
ShaderBuildPrepared,
ShaderBuildFailed,
RenderSnapshotPublishRequested,
RenderSnapshotPublished,
RuntimeStateBroadcastRequested,
BackendStateChanged,
SubsystemWarningRaised
};
struct RuntimeEvent
{
RuntimeEventType type;
uint64_t sequence = 0;
std::chrono::steady_clock::time_point createdAt;
std::string source;
std::variant<
OscCommitRequestedEvent,
RuntimeMutationEvent,
ShaderBuildEvent,
RenderSnapshotEvent,
BackendEvent,
HealthEvent> payload;
};
```
The exact C++ names can change. The key design requirements are:
- event type is explicit
- event order is observable
- source subsystem is recorded
- payload is typed, not a bag of optional strings
- timestamps exist for queue-age telemetry
- failures are events too, not just debug strings
## Event Bus Shape
Phase 2 does not need a large framework. A small app-owned dispatcher is enough.
Suggested components:
- `RuntimeEventBus`
- owns queues
- assigns sequence numbers
- exposes `Publish(...)`
- exposes `Drain(...)` or `DispatchPending(...)`
- `RuntimeEventHandler`
- narrow handler interface or function callback
- registered by subsystem/composition root
- `RuntimeEventQueue`
- bounded FIFO for ordinary events
- coalesced map for latest-value events such as high-rate OSC
- `RuntimeEventMetrics`
- queue depth
- oldest event age
- dropped/coalesced counts
Initial implementation can be single-process and mostly single-dispatch-thread. The important part is that event publication and event handling become explicit.
### Dispatcher Ownership Decision
The first concrete implementation uses one app-owned `RuntimeEventDispatcher`.
Ownership:
- `OpenGLComposite` owns the dispatcher as part of the current composition root.
References:
- `RuntimeServices` receives the dispatcher and passes it to `ControlServices`.
- `RuntimeCoordinator` receives the dispatcher so coordinator outcomes can become explicit events.
- `RuntimeUpdateController` receives the dispatcher so it can become the first effect/apply handler.
This is intentionally a composition-root dependency, not a new subsystem dependency. Subsystems should not construct their own dispatchers, and future tests should use `RuntimeEventTestHarness` rather than creating ad hoc event plumbing.
The dispatcher should move out of `OpenGLComposite` only if a later application-shell/composition-root object replaces `OpenGLComposite` as the owner of subsystem wiring.
## Queue Policy
Not every event deserves the same queue semantics.
### FIFO Events
Use FIFO for events where every item matters:
- mutation accepted/rejected
- shader build completed/failed
- backend state changed
- warning raised/cleared
### Coalesced Events
Use coalescing for high-rate latest-value flows:
- OSC parameter target updates by route
- runtime-state broadcast requests
- file-change reload requests during a burst
- queue-depth telemetry
Coalesced events should record how many updates were collapsed so telemetry can show pressure.
### Synchronous Boundaries
Some calls may remain synchronous during Phase 2:
- UI/API mutation calls that need an immediate success/error response
- startup configuration failures
- shutdown ordering
- tests
The rule is that synchronous calls should still publish events for accepted/rejected/completed work, so the rest of the app does not need to infer side effects from the call path.
## Event Bridge Policy
This section is the implementation rulebook for converting existing direct calls and result queues into events. Future Phase 2 lanes should use this table unless they deliberately update the policy here first.
### Bridge Categories
| Bridge category | Use when | Queue shape | Handler expectation |
| --- | --- | --- | --- |
| `fifo-fact` | every occurrence matters and must be observed in order | bounded FIFO | handler consumes each event exactly once |
| `coalesced-latest` | only the latest value per key matters | bounded coalescing queue | handler consumes the latest event and telemetry records collapsed count |
| `sync-command-with-event` | caller needs an immediate success/error result | direct owner call plus follow-up event publication | handler must not be required for the caller's response |
| `observation-only` | event is telemetry/diagnostic and must not drive core behavior | FIFO or coalesced depending on rate | handler failure must never block app behavior |
| `compatibility-poll` | source cannot yet publish an event directly | temporary poll adapter publishes typed events | poll interval should shrink or become wakeup-driven over Phase 2 |
### Current Bridge Decisions
| Current flow | First Phase 2 bridge | Event(s) | Queue policy |
| --- | --- | --- | --- |
| OSC latest-value updates | `ControlServices` ingress bridge | `OscValueReceived`, optional `OscValueCoalesced` | `coalesced-latest` by route key |
| OSC commit after settle | `ControlServices -> RuntimeCoordinator` bridge | `OscCommitRequested`, then `RuntimeMutationAccepted` or `RuntimeMutationRejected` | commit request `coalesced-latest` by route key; mutation result `fifo-fact` |
| HTTP/UI mutation needing response | direct call into `RuntimeCoordinator` | `RuntimeMutationAccepted` or `RuntimeMutationRejected` after the synchronous response path | `sync-command-with-event` |
| runtime-state broadcast request | presentation/broadcast bridge | `RuntimeStatePresentationChanged`, `RuntimeStateBroadcastRequested` | `coalesced-latest` by event type or reason family |
| manual reload button | control ingress bridge | `ManualReloadRequested`, then `RuntimeReloadRequested` | `fifo-fact` for manual request; reload execution may coalesce |
| file watcher changes | file-watch bridge | `FileChangeDetected`, then `RuntimeReloadRequested` | `coalesced-latest` by path, then coalesced reload request |
| runtime store poll fallback | compatibility poll adapter | `ShaderPackagesChanged`, `RuntimeReloadRequested`, or warning event | `compatibility-poll` until file events fully replace polling |
| shader build request | runtime/render bridge | `ShaderBuildRequested` | `coalesced-latest` by input dimensions and preserve-feedback flag |
| shader build ready/failure/apply | shader build lifecycle bridge | `ShaderBuildPrepared`, `ShaderBuildFailed`, `ShaderBuildApplied`, `CompileStatusChanged` | `fifo-fact` |
| render snapshot publication | snapshot bridge | `RenderSnapshotPublishRequested`, `RenderSnapshotPublished` | request may coalesce by output dimensions; published event is `fifo-fact` |
| render reset request/application | render bridge | `RenderResetRequested`, `RenderResetApplied` | `fifo-fact` |
| input signal changes | backend observation bridge | `InputSignalChanged` | `coalesced-latest` by signal lane |
| output late/dropped/completed frames | backend timing bridge | `OutputFrameCompleted`, `OutputLateFrameDetected`, `OutputDroppedFrameDetected` | late/dropped `fifo-fact`; high-rate completed frames may become `observation-only` coalesced metrics |
| warnings and recovery | telemetry bridge | `SubsystemWarningRaised`, `SubsystemWarningCleared`, `SubsystemRecovered` | `fifo-fact` for lifecycle transitions |
| queue depth/timing samples | telemetry metrics bridge | `QueueDepthChanged`, `TimingSampleRecorded` | `coalesced-latest` by metric key |
### Bridge Rules
- A bridge may translate an old direct call into an owner command, but it must publish the accepted/rejected/completed event that describes the outcome.
- A bridge must not mutate state owned by another subsystem just because it handles that subsystem's event.
- A coalesced event must have a stable key in code and a documented policy here.
- A FIFO event should be cheap enough that retaining every occurrence is useful. If not, turn it into a coalesced metric before putting it on a hot path.
- A synchronous bridge must treat event publication as a side effect of the owner decision, not as the mechanism that produces the direct caller's response.
- A compatibility poll adapter should be named as temporary in code so it does not become the new long-term coordination model.
- Handler failure should be reported through telemetry and dispatch metrics. It should not throw back across subsystem boundaries.
### First Integration Recommendation
The safest first behavior-changing bridge is `RuntimeStateBroadcastRequested`.
It is low risk because:
- it is already a side effect of many coordinator outcomes
- duplicate requests are naturally coalescable
- the handler can call the existing `ControlServices::BroadcastState()` path
- success can be verified through existing UI behavior and event tests
After that, the next bridge should be `ShaderBuildRequested`, because it already behaves like a queued side effect and has clear follow-up events.
## Target Flow Examples
### OSC Parameter Update
1. `OscServer` decodes a packet.
2. `ControlServices` publishes or coalesces `OscValueReceived`.
3. The dispatcher routes the event to the render-overlay path or coordinator policy, depending on whether the value is transient or committing.
4. `RuntimeCoordinator` publishes `RuntimeMutationAccepted` or `RuntimeMutationRejected` for committed changes.
5. Accepted committed changes publish `RenderSnapshotPublishRequested` and `RuntimePersistenceRequested` as needed.
6. `ControlServices` receives `RuntimeStateBroadcastRequested` or a presentation-changed event and broadcasts at its own cadence.
### File Reload
1. File-watch or manual reload produces `FileChangeDetected` or `ManualReloadRequested`.
2. `ControlServices` coalesces reload bursts into one `RuntimeReloadRequested`.
3. `RuntimeCoordinator` classifies the reload.
4. Package/store refresh produces `ShaderPackagesChanged` if package metadata changed.
5. Coordinator publishes `ShaderBuildRequested`.
6. Shader build completion publishes `ShaderBuildPrepared` or `ShaderBuildFailed`.
7. Render applies the ready build and publishes `ShaderBuildApplied`.
### Runtime State Broadcast
1. A mutation or reload publishes `RuntimeStatePresentationChanged`.
2. `ControlServices` coalesces this into a broadcast request.
3. The broadcast path asks `RuntimeStatePresenter` for the current presentation read model.
4. `HealthTelemetry` records broadcast count, failures, and queue age.
### Backend Signal Change
1. Backend adapter detects input signal change.
2. `VideoBackend` publishes `InputSignalChanged`.
3. `HealthTelemetry` records the new signal status.
4. Later phases may let the backend lifecycle state machine react to the same event.
## Migration Plan
### Step 1. Add Event Types And A Minimal Dispatcher
Introduce:
- `RuntimeEvent`
- `RuntimeEventType`
- typed payload structs for the smallest useful event family
- `RuntimeEventBus` or equivalent dispatcher
Start with events that do not change behavior:
- `RuntimeStateBroadcastRequested`
- `ShaderBuildRequested`
- `RuntimeMutationRejected`
- simple health/log observations
### Step 2. Convert `RuntimeUpdateController` Into An Event Handler
`RuntimeUpdateController` is already close to an event effect applier. Phase 2 should narrow it into a handler for:
- coordinator outcome events
- shader build readiness events
- snapshot publication requests
- broadcast requests
The class should stop being the place that polls every source of work.
### Step 3. Replace `ControlServices::PollLoop` Sleep With Wakeups
Keep coalescing, but replace the fixed `25 x Sleep(10)` cadence with:
- a condition variable or waitable event
- wakeups when OSC commit work arrives
- wakeups when file/reload work arrives
- a fallback timer only for compatibility polling that cannot yet be evented
This is the most direct Phase 2 timing win.
### Step 4. Route Shader Build Lifecycle Through Events
Turn the current request/apply/failure/success path into explicit events:
- `ShaderBuildRequested`
- `ShaderBuildPrepared`
- `ShaderBuildFailed`
- `ShaderBuildApplied`
- `CompileStatusChanged`
This should preserve the current off-frame-path compile behavior while making readiness visible.
### Step 5. Route Runtime Broadcasts Through Events
Replace direct "broadcast now" decisions with:
- `RuntimeStatePresentationChanged`
- `RuntimeStateBroadcastRequested`
- `RuntimeStateBroadcastCompleted`
- `RuntimeStateBroadcastFailed`
This keeps UI delivery in `ControlServices` while keeping presentation ownership in the runtime presentation layer.
### Step 6. Add Event Metrics
Before using the event system for hotter paths, add metrics:
- event queue depth
- oldest event age
- event dispatch duration
- coalesced event count
- dropped event count
- handler failure count
These should feed `HealthTelemetry`.
## Dependency Rules
Allowed:
- producers publish events to the bus
- the composition root registers handlers
- handlers call owner subsystem APIs
- `HealthTelemetry` observes event metrics and failures
Avoid:
- subsystems subscribing directly to each other in constructors
- event handlers mutating state outside their owner subsystem
- using one global event payload with many nullable fields
- making render hot paths block on the event bus
- requiring health/telemetry event delivery for core behavior
The dispatcher is coordination infrastructure, not a new domain owner.
## Testing Strategy
Phase 2 should add tests that do not require GL, DeckLink, or network sockets.
Recommended tests:
- FIFO events dispatch in sequence order
- coalesced events keep the latest payload and count collapsed updates
- rejected mutations publish rejection events without downstream snapshot/build events
- accepted parameter mutations publish the expected follow-up event set
- file reload bursts collapse into one reload request
- handler failures are reported as health/log events
- queue depth and oldest-event-age metrics update predictably
The existing runtime subsystem tests are a good home for the first pure event model tests, or a new `RuntimeEventTests.cpp` target can be added if the event layer grows enough.
## Phase 2 Exit Criteria
Phase 2 can be considered complete once the project can say:
- there is a typed internal event envelope and dispatcher
- `ControlServices` emits typed events for OSC commits, broadcast requests, and reload/file-change work
- `RuntimeCoordinator` publishes explicit accepted/rejected/follow-up events instead of callers interpreting broad result objects everywhere
- `RuntimeUpdateController` handles events rather than polling all runtime work sources directly
- shader build request/readiness/failure/application is represented as events
- runtime-state broadcasts are event-driven and coalesced
- event queues expose depth, age, coalescing, and failure metrics
- coarse sleep polling is no longer the default coordination model for service work
## Open Questions For Implementation
- Should the first dispatcher be single-threaded and pumped by the app loop, or should `ControlServices` own a dedicated service event thread?
- Should high-rate OSC transient overlay events go through the same bus, or should only commit/settle events enter the bus initially?
- Should event payloads use `std::variant`, type-erased handlers, or separate strongly typed queues per family?
- How much of `RuntimeCoordinatorResult` should survive as an internal helper versus being replaced by explicit events?
- Should persistence requests be represented in Phase 2 even though the background writer lands later?
- Should backend callback events be introduced now as observation-only events, or wait until the backend state-machine phase?
## Short Version
Phase 2 should give the app a typed nervous system.
- external inputs become typed events
- owner subsystems still make decisions
- decisions publish explicit outcomes
- follow-up work is routed by handlers, not inferred from scattered call paths
- high-rate work is bounded or coalesced
- timing and queue pressure become observable
If this boundary holds, later render-thread, persistence, backend, and telemetry work can move independently without returning to shared-object polling as the default coordination model.