phase 2
This commit is contained in:
524
docs/PHASE_2_INTERNAL_EVENT_MODEL_DESIGN.md
Normal file
524
docs/PHASE_2_INTERNAL_EVENT_MODEL_DESIGN.md
Normal file
@@ -0,0 +1,524 @@
|
||||
# Phase 2 Design: Internal Event Model
|
||||
|
||||
This document expands Phase 2 of [ARCHITECTURE_RESILIENCE_REVIEW.md](/c:/Users/Aiden/Documents/GitHub/video-shader-toys/docs/ARCHITECTURE_RESILIENCE_REVIEW.md) into a concrete design target.
|
||||
|
||||
Phase 1 established the subsystem vocabulary and moved the runtime path behind clearer collaborators. Phase 2 should now give those subsystems a safer way to coordinate than direct cross-calls, shared mutable result queues, and coarse polling loops.
|
||||
|
||||
## Status
|
||||
|
||||
- Phase 2 design package: proposed.
|
||||
- Phase 2 implementation: not started.
|
||||
|
||||
The current repo already has useful footholds:
|
||||
|
||||
- `ControlServices` owns OSC/web/file-watch ingress and queues service-side work.
|
||||
- `RuntimeCoordinator` owns mutation validation, classification, and coordinator result policy.
|
||||
- `RuntimeUpdateController` applies coordinator outcomes and bridges toward render, shader builds, broadcasts, and backend state.
|
||||
- `RuntimeSnapshotProvider` publishes render-facing snapshots.
|
||||
- `HealthTelemetry` owns status/timing snapshots.
|
||||
|
||||
Those are good boundaries. The Phase 2 job is to stop using "poll, drain, then interpret side effects" as the main coordination style between them.
|
||||
|
||||
## Why Phase 2 Exists
|
||||
|
||||
The resilience review calls out three timing and ownership problems that an event model can directly improve:
|
||||
|
||||
- background service timing still relies on coarse sleeps and polling
|
||||
- control, reload, persistence, and render-update work still travel through mixed shared state and result queues
|
||||
- later render/backend refactors need a stable coordination model before they move more work across threads
|
||||
|
||||
The goal is not to make the app fully asynchronous in one pass. It is to introduce typed internal events so each subsystem can publish what happened without knowing who will react or how many downstream effects are needed.
|
||||
|
||||
## Goals
|
||||
|
||||
Phase 2 should establish:
|
||||
|
||||
- a small typed event vocabulary for control, runtime, render, backend, persistence, and health coordination
|
||||
- one app-owned event pump or dispatcher that can route events deterministically
|
||||
- bounded queues with clear ownership and no unbounded background growth
|
||||
- wakeup-driven service coordination where practical, replacing coarse polling as the default shape
|
||||
- explicit event-to-command boundaries so events do not become hidden global mutation APIs
|
||||
- tests for event ordering, coalescing, rejection, and dispatch side effects
|
||||
|
||||
## Non-Goals
|
||||
|
||||
Phase 2 should not require:
|
||||
|
||||
- a dedicated render thread yet
|
||||
- a full actor system
|
||||
- lock-free queues everywhere
|
||||
- background persistence implementation
|
||||
- a complete DeckLink state machine
|
||||
- final live-state layering
|
||||
- replacing every direct call in one change
|
||||
|
||||
Those are later phases. Phase 2 provides the coordination substrate they can build on.
|
||||
|
||||
## Current Coordination Shape
|
||||
|
||||
The current runtime is much cleaner than before Phase 1, but coordination is still mostly pull-based:
|
||||
|
||||
- `ControlServices::PollLoop(...)` drains pending OSC commits, polls runtime file changes, queues `RuntimeCoordinatorResult` objects, then sleeps.
|
||||
- `RuntimeUpdateController::ProcessRuntimeWork()` consumes queued coordinator results, applies them, and then checks whether a prepared shader build is ready.
|
||||
- `RuntimeCoordinatorResult` carries many downstream effects: shader build request, compile status update, transient OSC clear, runtime-state broadcast, committed-state mode, render reset scope.
|
||||
- shader-build readiness is polled from the app update path.
|
||||
- runtime-state broadcasts are requested by direct calls rather than by an event publication contract.
|
||||
|
||||
This works, but it keeps timing behavior implicit. Phase 2 should make those transitions visible as typed events.
|
||||
|
||||
## Event Model Principles
|
||||
|
||||
### Events say what happened
|
||||
|
||||
Events should describe facts:
|
||||
|
||||
- `OscValueReceived`
|
||||
- `RuntimeMutationAccepted`
|
||||
- `RuntimeMutationRejected`
|
||||
- `ShaderReloadRequested`
|
||||
- `ShaderBuildPrepared`
|
||||
- `ShaderBuildFailed`
|
||||
- `RenderSnapshotPublished`
|
||||
- `RuntimeStateBroadcastRequested`
|
||||
|
||||
They should not be vague commands like "do everything needed now."
|
||||
|
||||
### Commands request intent
|
||||
|
||||
Some work is still naturally command-shaped:
|
||||
|
||||
- "apply this parameter mutation"
|
||||
- "request shader reload"
|
||||
- "save this stack preset"
|
||||
- "start backend output"
|
||||
|
||||
Commands enter an owner subsystem. Events leave a subsystem after the owner has accepted, rejected, or completed work.
|
||||
|
||||
### One owner mutates each state category
|
||||
|
||||
Events must not become a way to bypass Phase 1 ownership:
|
||||
|
||||
- `RuntimeCoordinator` remains the owner of mutation policy.
|
||||
- `RuntimeStore` remains the owner of durable state.
|
||||
- `RuntimeSnapshotProvider` remains the owner of render snapshot publication.
|
||||
- `RenderEngine` remains the owner of render-local transient state.
|
||||
- `VideoBackend` remains the owner of device lifecycle and pacing.
|
||||
- `HealthTelemetry` observes and reports, but does not coordinate behavior.
|
||||
|
||||
### Event handlers should be small
|
||||
|
||||
Handlers should translate events into owner calls or follow-up events. They should not accumulate hidden long-lived state unless that state belongs to the handler's subsystem.
|
||||
|
||||
### Queues must be bounded or coalesced
|
||||
|
||||
High-rate control traffic can arrive faster than the app should process every individual sample. Phase 2 should preserve the useful current behavior of coalescing OSC updates by route, but make the coalescing policy explicit.
|
||||
|
||||
## Event Families
|
||||
|
||||
### Control Events
|
||||
|
||||
Produced by `ControlServices`.
|
||||
|
||||
Examples:
|
||||
|
||||
- `OscValueReceived`
|
||||
- `OscValueCoalesced`
|
||||
- `OscCommitRequested`
|
||||
- `HttpControlMutationRequested`
|
||||
- `WebSocketClientConnected`
|
||||
- `RuntimeStateBroadcastRequested`
|
||||
- `FileChangeDetected`
|
||||
- `ManualReloadRequested`
|
||||
|
||||
Primary consumers:
|
||||
|
||||
- `RuntimeCoordinator`
|
||||
- `HealthTelemetry`
|
||||
- later, a persistence writer or diagnostics publisher
|
||||
|
||||
### Runtime Events
|
||||
|
||||
Produced by `RuntimeCoordinator`, `RuntimeStore`, and snapshot publication code.
|
||||
|
||||
Examples:
|
||||
|
||||
- `RuntimeMutationAccepted`
|
||||
- `RuntimeMutationRejected`
|
||||
- `RuntimeStateChanged`
|
||||
- `RuntimePersistenceRequested`
|
||||
- `RuntimeReloadRequested`
|
||||
- `ShaderPackagesChanged`
|
||||
- `RenderSnapshotPublishRequested`
|
||||
- `RuntimeStatePresentationChanged`
|
||||
|
||||
Primary consumers:
|
||||
|
||||
- `RuntimeSnapshotProvider`
|
||||
- `RenderEngine`
|
||||
- `ControlServices`
|
||||
- `HealthTelemetry`
|
||||
- later, `PersistenceWriter`
|
||||
|
||||
### Shader Build Events
|
||||
|
||||
Produced by shader build orchestration and render-side build application.
|
||||
|
||||
Examples:
|
||||
|
||||
- `ShaderBuildRequested`
|
||||
- `ShaderBuildPrepared`
|
||||
- `ShaderBuildApplied`
|
||||
- `ShaderBuildFailed`
|
||||
- `CompileStatusChanged`
|
||||
|
||||
Primary consumers:
|
||||
|
||||
- `RenderEngine`
|
||||
- `RuntimeCoordinator`
|
||||
- `ControlServices`
|
||||
- `HealthTelemetry`
|
||||
|
||||
### Render Events
|
||||
|
||||
Produced by `RenderEngine` and `RuntimeSnapshotProvider`.
|
||||
|
||||
Examples:
|
||||
|
||||
- `RenderSnapshotPublished`
|
||||
- `RenderResetRequested`
|
||||
- `RenderResetApplied`
|
||||
- `OscOverlayApplied`
|
||||
- `OscOverlaySettled`
|
||||
- `FrameRendered`
|
||||
- `PreviewFrameAvailable`
|
||||
|
||||
Primary consumers:
|
||||
|
||||
- `RenderEngine`
|
||||
- `ControlServices`
|
||||
- `VideoBackend`
|
||||
- `HealthTelemetry`
|
||||
|
||||
### Backend Events
|
||||
|
||||
Produced by `VideoBackend` and backend adapters.
|
||||
|
||||
Examples:
|
||||
|
||||
- `InputSignalChanged`
|
||||
- `InputFrameArrived`
|
||||
- `OutputFrameScheduled`
|
||||
- `OutputFrameCompleted`
|
||||
- `OutputLateFrameDetected`
|
||||
- `OutputDroppedFrameDetected`
|
||||
- `BackendStateChanged`
|
||||
|
||||
Primary consumers:
|
||||
|
||||
- `RenderEngine`
|
||||
- `HealthTelemetry`
|
||||
- later, backend lifecycle state machine handlers
|
||||
|
||||
### Health Events
|
||||
|
||||
Produced by all major subsystems.
|
||||
|
||||
Examples:
|
||||
|
||||
- `SubsystemWarningRaised`
|
||||
- `SubsystemWarningCleared`
|
||||
- `SubsystemRecovered`
|
||||
- `TimingSampleRecorded`
|
||||
- `QueueDepthChanged`
|
||||
|
||||
Primary consumer:
|
||||
|
||||
- `HealthTelemetry`
|
||||
|
||||
Health events should be observational. They should not be required for core behavior to proceed.
|
||||
|
||||
## Event Envelope
|
||||
|
||||
A practical initial event envelope can stay simple:
|
||||
|
||||
```cpp
|
||||
enum class RuntimeEventType
|
||||
{
|
||||
OscCommitRequested,
|
||||
RuntimeMutationAccepted,
|
||||
RuntimeMutationRejected,
|
||||
RuntimeReloadRequested,
|
||||
ShaderBuildRequested,
|
||||
ShaderBuildPrepared,
|
||||
ShaderBuildFailed,
|
||||
RenderSnapshotPublishRequested,
|
||||
RenderSnapshotPublished,
|
||||
RuntimeStateBroadcastRequested,
|
||||
BackendStateChanged,
|
||||
SubsystemWarningRaised
|
||||
};
|
||||
|
||||
struct RuntimeEvent
|
||||
{
|
||||
RuntimeEventType type;
|
||||
uint64_t sequence = 0;
|
||||
std::chrono::steady_clock::time_point createdAt;
|
||||
std::string source;
|
||||
std::variant<
|
||||
OscCommitRequestedEvent,
|
||||
RuntimeMutationEvent,
|
||||
ShaderBuildEvent,
|
||||
RenderSnapshotEvent,
|
||||
BackendEvent,
|
||||
HealthEvent> payload;
|
||||
};
|
||||
```
|
||||
|
||||
The exact C++ names can change. The key design requirements are:
|
||||
|
||||
- event type is explicit
|
||||
- event order is observable
|
||||
- source subsystem is recorded
|
||||
- payload is typed, not a bag of optional strings
|
||||
- timestamps exist for queue-age telemetry
|
||||
- failures are events too, not just debug strings
|
||||
|
||||
## Event Bus Shape
|
||||
|
||||
Phase 2 does not need a large framework. A small app-owned dispatcher is enough.
|
||||
|
||||
Suggested components:
|
||||
|
||||
- `RuntimeEventBus`
|
||||
- owns queues
|
||||
- assigns sequence numbers
|
||||
- exposes `Publish(...)`
|
||||
- exposes `Drain(...)` or `DispatchPending(...)`
|
||||
- `RuntimeEventHandler`
|
||||
- narrow handler interface or function callback
|
||||
- registered by subsystem/composition root
|
||||
- `RuntimeEventQueue`
|
||||
- bounded FIFO for ordinary events
|
||||
- coalesced map for latest-value events such as high-rate OSC
|
||||
- `RuntimeEventMetrics`
|
||||
- queue depth
|
||||
- oldest event age
|
||||
- dropped/coalesced counts
|
||||
|
||||
Initial implementation can be single-process and mostly single-dispatch-thread. The important part is that event publication and event handling become explicit.
|
||||
|
||||
## Queue Policy
|
||||
|
||||
Not every event deserves the same queue semantics.
|
||||
|
||||
### FIFO Events
|
||||
|
||||
Use FIFO for events where every item matters:
|
||||
|
||||
- mutation accepted/rejected
|
||||
- shader build completed/failed
|
||||
- backend state changed
|
||||
- warning raised/cleared
|
||||
|
||||
### Coalesced Events
|
||||
|
||||
Use coalescing for high-rate latest-value flows:
|
||||
|
||||
- OSC parameter target updates by route
|
||||
- runtime-state broadcast requests
|
||||
- file-change reload requests during a burst
|
||||
- queue-depth telemetry
|
||||
|
||||
Coalesced events should record how many updates were collapsed so telemetry can show pressure.
|
||||
|
||||
### Synchronous Boundaries
|
||||
|
||||
Some calls may remain synchronous during Phase 2:
|
||||
|
||||
- UI/API mutation calls that need an immediate success/error response
|
||||
- startup configuration failures
|
||||
- shutdown ordering
|
||||
- tests
|
||||
|
||||
The rule is that synchronous calls should still publish events for accepted/rejected/completed work, so the rest of the app does not need to infer side effects from the call path.
|
||||
|
||||
## Target Flow Examples
|
||||
|
||||
### OSC Parameter Update
|
||||
|
||||
1. `OscServer` decodes a packet.
|
||||
2. `ControlServices` publishes or coalesces `OscValueReceived`.
|
||||
3. The dispatcher routes the event to the render-overlay path or coordinator policy, depending on whether the value is transient or committing.
|
||||
4. `RuntimeCoordinator` publishes `RuntimeMutationAccepted` or `RuntimeMutationRejected` for committed changes.
|
||||
5. Accepted committed changes publish `RenderSnapshotPublishRequested` and `RuntimePersistenceRequested` as needed.
|
||||
6. `ControlServices` receives `RuntimeStateBroadcastRequested` or a presentation-changed event and broadcasts at its own cadence.
|
||||
|
||||
### File Reload
|
||||
|
||||
1. File-watch or manual reload produces `FileChangeDetected` or `ManualReloadRequested`.
|
||||
2. `ControlServices` coalesces reload bursts into one `RuntimeReloadRequested`.
|
||||
3. `RuntimeCoordinator` classifies the reload.
|
||||
4. Package/store refresh produces `ShaderPackagesChanged` if package metadata changed.
|
||||
5. Coordinator publishes `ShaderBuildRequested`.
|
||||
6. Shader build completion publishes `ShaderBuildPrepared` or `ShaderBuildFailed`.
|
||||
7. Render applies the ready build and publishes `ShaderBuildApplied`.
|
||||
|
||||
### Runtime State Broadcast
|
||||
|
||||
1. A mutation or reload publishes `RuntimeStatePresentationChanged`.
|
||||
2. `ControlServices` coalesces this into a broadcast request.
|
||||
3. The broadcast path asks `RuntimeStatePresenter` for the current presentation read model.
|
||||
4. `HealthTelemetry` records broadcast count, failures, and queue age.
|
||||
|
||||
### Backend Signal Change
|
||||
|
||||
1. Backend adapter detects input signal change.
|
||||
2. `VideoBackend` publishes `InputSignalChanged`.
|
||||
3. `HealthTelemetry` records the new signal status.
|
||||
4. Later phases may let the backend lifecycle state machine react to the same event.
|
||||
|
||||
## Migration Plan
|
||||
|
||||
### Step 1. Add Event Types And A Minimal Dispatcher
|
||||
|
||||
Introduce:
|
||||
|
||||
- `RuntimeEvent`
|
||||
- `RuntimeEventType`
|
||||
- typed payload structs for the smallest useful event family
|
||||
- `RuntimeEventBus` or equivalent dispatcher
|
||||
|
||||
Start with events that do not change behavior:
|
||||
|
||||
- `RuntimeStateBroadcastRequested`
|
||||
- `ShaderBuildRequested`
|
||||
- `RuntimeMutationRejected`
|
||||
- simple health/log observations
|
||||
|
||||
### Step 2. Convert `RuntimeUpdateController` Into An Event Handler
|
||||
|
||||
`RuntimeUpdateController` is already close to an event effect applier. Phase 2 should narrow it into a handler for:
|
||||
|
||||
- coordinator outcome events
|
||||
- shader build readiness events
|
||||
- snapshot publication requests
|
||||
- broadcast requests
|
||||
|
||||
The class should stop being the place that polls every source of work.
|
||||
|
||||
### Step 3. Replace `ControlServices::PollLoop` Sleep With Wakeups
|
||||
|
||||
Keep coalescing, but replace the fixed `25 x Sleep(10)` cadence with:
|
||||
|
||||
- a condition variable or waitable event
|
||||
- wakeups when OSC commit work arrives
|
||||
- wakeups when file/reload work arrives
|
||||
- a fallback timer only for compatibility polling that cannot yet be evented
|
||||
|
||||
This is the most direct Phase 2 timing win.
|
||||
|
||||
### Step 4. Route Shader Build Lifecycle Through Events
|
||||
|
||||
Turn the current request/apply/failure/success path into explicit events:
|
||||
|
||||
- `ShaderBuildRequested`
|
||||
- `ShaderBuildPrepared`
|
||||
- `ShaderBuildFailed`
|
||||
- `ShaderBuildApplied`
|
||||
- `CompileStatusChanged`
|
||||
|
||||
This should preserve the current off-frame-path compile behavior while making readiness visible.
|
||||
|
||||
### Step 5. Route Runtime Broadcasts Through Events
|
||||
|
||||
Replace direct "broadcast now" decisions with:
|
||||
|
||||
- `RuntimeStatePresentationChanged`
|
||||
- `RuntimeStateBroadcastRequested`
|
||||
- `RuntimeStateBroadcastCompleted`
|
||||
- `RuntimeStateBroadcastFailed`
|
||||
|
||||
This keeps UI delivery in `ControlServices` while keeping presentation ownership in the runtime presentation layer.
|
||||
|
||||
### Step 6. Add Event Metrics
|
||||
|
||||
Before using the event system for hotter paths, add metrics:
|
||||
|
||||
- event queue depth
|
||||
- oldest event age
|
||||
- event dispatch duration
|
||||
- coalesced event count
|
||||
- dropped event count
|
||||
- handler failure count
|
||||
|
||||
These should feed `HealthTelemetry`.
|
||||
|
||||
## Dependency Rules
|
||||
|
||||
Allowed:
|
||||
|
||||
- producers publish events to the bus
|
||||
- the composition root registers handlers
|
||||
- handlers call owner subsystem APIs
|
||||
- `HealthTelemetry` observes event metrics and failures
|
||||
|
||||
Avoid:
|
||||
|
||||
- subsystems subscribing directly to each other in constructors
|
||||
- event handlers mutating state outside their owner subsystem
|
||||
- using one global event payload with many nullable fields
|
||||
- making render hot paths block on the event bus
|
||||
- requiring health/telemetry event delivery for core behavior
|
||||
|
||||
The dispatcher is coordination infrastructure, not a new domain owner.
|
||||
|
||||
## Testing Strategy
|
||||
|
||||
Phase 2 should add tests that do not require GL, DeckLink, or network sockets.
|
||||
|
||||
Recommended tests:
|
||||
|
||||
- FIFO events dispatch in sequence order
|
||||
- coalesced events keep the latest payload and count collapsed updates
|
||||
- rejected mutations publish rejection events without downstream snapshot/build events
|
||||
- accepted parameter mutations publish the expected follow-up event set
|
||||
- file reload bursts collapse into one reload request
|
||||
- handler failures are reported as health/log events
|
||||
- queue depth and oldest-event-age metrics update predictably
|
||||
|
||||
The existing runtime subsystem tests are a good home for the first pure event model tests, or a new `RuntimeEventTests.cpp` target can be added if the event layer grows enough.
|
||||
|
||||
## Phase 2 Exit Criteria
|
||||
|
||||
Phase 2 can be considered complete once the project can say:
|
||||
|
||||
- there is a typed internal event envelope and dispatcher
|
||||
- `ControlServices` emits typed events for OSC commits, broadcast requests, and reload/file-change work
|
||||
- `RuntimeCoordinator` publishes explicit accepted/rejected/follow-up events instead of callers interpreting broad result objects everywhere
|
||||
- `RuntimeUpdateController` handles events rather than polling all runtime work sources directly
|
||||
- shader build request/readiness/failure/application is represented as events
|
||||
- runtime-state broadcasts are event-driven and coalesced
|
||||
- event queues expose depth, age, coalescing, and failure metrics
|
||||
- coarse sleep polling is no longer the default coordination model for service work
|
||||
|
||||
## Open Questions For Implementation
|
||||
|
||||
- Should the first dispatcher be single-threaded and pumped by the app loop, or should `ControlServices` own a dedicated service event thread?
|
||||
- Should high-rate OSC transient overlay events go through the same bus, or should only commit/settle events enter the bus initially?
|
||||
- Should event payloads use `std::variant`, type-erased handlers, or separate strongly typed queues per family?
|
||||
- How much of `RuntimeCoordinatorResult` should survive as an internal helper versus being replaced by explicit events?
|
||||
- Should persistence requests be represented in Phase 2 even though the background writer lands later?
|
||||
- Should backend callback events be introduced now as observation-only events, or wait until the backend state-machine phase?
|
||||
|
||||
## Short Version
|
||||
|
||||
Phase 2 should give the app a typed nervous system.
|
||||
|
||||
- external inputs become typed events
|
||||
- owner subsystems still make decisions
|
||||
- decisions publish explicit outcomes
|
||||
- follow-up work is routed by handlers, not inferred from scattered call paths
|
||||
- high-rate work is bounded or coalesced
|
||||
- timing and queue pressure become observable
|
||||
|
||||
If this boundary holds, later render-thread, persistence, backend, and telemetry work can move independently without returning to shared-object polling as the default coordination model.
|
||||
Reference in New Issue
Block a user