Inspiration

Clinicians operate under intense cognitive and emotional pressure, yet are trained to ignore their own fatigue. Burnout and cognitive overload are silent contributors to medical error, but existing tools only measure stress after the fact through surveys or retrospective analytics. We wanted to build a system that could sense overload as it happens and intervene before performance degrades.

What it does

Resonance Live is a real-time cognitive load and burnout detection system for clinicians. It listens to live audio during consultations and estimates cognitive load based on prosodic features like speech rate, vocal energy, and temporal stress patterns. When sustained overload is detected, the system transitions into a low-stimulation UI mode and recommends recovery buffers — acting as a safety layer for clinician wellbeing.

How we built it

The system is designed as a three-layer architecture:

  • Observation Layer: Streams live microphone audio and performs voice activity detection.
  • Reasoning Layer: Uses Gemini’s multimodal reasoning to infer cognitive load directly from audio features, bypassing traditional speech-to-text pipelines.
  • Action Layer: An autonomous agent that maintains short-term memory of clinician state and triggers interventions only when sustained overload is detected.

A FastAPI WebSocket server streams cognitive load updates to a live dashboard, visualizing trends, burnout risk, and session replay in real time.

Challenges we ran into

Real-time multimodal processing is hard to get right under tight latency constraints. Designing a system that reacts quickly without becoming noisy or intrusive required careful tuning of thresholds and temporal smoothing. We also had to balance building a compelling demo while being honest about the limitations of early-stage prosodic inference.

What we learned

We learned how to design agentic AI systems that go beyond dashboards and take meaningful action. We also gained hands-on experience with real-time audio pipelines, low-latency multimodal inference, and building user interfaces that adapt to human cognitive state rather than demanding more attention.

Built With

Share this project:

Updates