-
-
System architecture showing live audio ingestion, Gemini multimodal reasoning, and autonomous intervention layer.
-
System architecture showing live audio ingestion, Gemini multimodal reasoning, and autonomous intervention layer.
-
Cognitive load spike visualized in real time with burnout risk and agent-triggered intervention state.
Inspiration
Clinicians operate under intense cognitive and emotional pressure, yet are trained to ignore their own fatigue. Burnout and cognitive overload are silent contributors to medical error, but existing tools only measure stress after the fact through surveys or retrospective analytics. We wanted to build a system that could sense overload as it happens and intervene before performance degrades.
What it does
Resonance Live is a real-time cognitive load and burnout detection system for clinicians. It listens to live audio during consultations and estimates cognitive load based on prosodic features like speech rate, vocal energy, and temporal stress patterns. When sustained overload is detected, the system transitions into a low-stimulation UI mode and recommends recovery buffers — acting as a safety layer for clinician wellbeing.
How we built it
The system is designed as a three-layer architecture:
- Observation Layer: Streams live microphone audio and performs voice activity detection.
- Reasoning Layer: Uses Gemini’s multimodal reasoning to infer cognitive load directly from audio features, bypassing traditional speech-to-text pipelines.
- Action Layer: An autonomous agent that maintains short-term memory of clinician state and triggers interventions only when sustained overload is detected.
A FastAPI WebSocket server streams cognitive load updates to a live dashboard, visualizing trends, burnout risk, and session replay in real time.
Challenges we ran into
Real-time multimodal processing is hard to get right under tight latency constraints. Designing a system that reacts quickly without becoming noisy or intrusive required careful tuning of thresholds and temporal smoothing. We also had to balance building a compelling demo while being honest about the limitations of early-stage prosodic inference.
What we learned
We learned how to design agentic AI systems that go beyond dashboards and take meaningful action. We also gained hands-on experience with real-time audio pipelines, low-latency multimodal inference, and building user interfaces that adapt to human cognitive state rather than demanding more attention.
Built With
- chart.js
- fastapi
- gemini-api-(multimodal-reasoning)
- html/css
- numpy
- python
- silero-vad
- sounddevice
- websockets
Log in or sign up for Devpost to join the conversation.