Inspiration — (October 2024)

Alzheimer’s gradually erodes recognition and conversational continuity, creating emotional distance even when family members are deeply present. We wanted to preserve interaction, not just information, designing a system that recreates familiar conversational dynamics while remaining safe, interpretable, and human-in-the-loop.


What It Does

DEAR is a voice-first, multi-agent system that simulates familiar conversations between Alzheimer’s patients and their family members.

Family members upload short voice samples, which are used to generate personalized voice agents. When a call is initiated, DEAR orchestrates a live conversation where:

  • A voice agent handles natural speech interaction
  • A conversation agent manages dialogue state
  • A monitor agent tracks confusion, repetition, and escalation signals
  • A nurse agent intervenes when thresholds are crossed
  • A RAG agent retrieves only contextually relevant memory fragments

The result is a conversation that feels familiar, adaptive, and emotionally grounded — without automating diagnosis or replacing clinicians.


How We Built It (Technical)

System Architecture

At its core, DEAR functions like a doubly-linked conversational OS:

  • Each interaction node links backward to prior context and forward to possible next states
  • Memory, family identity, and conversational triggers form a knowledge graph traversed as a linked list
  • Agents pass control explicitly, not implicitly, enabling safe escalation and rollback

This mirrors OS-level scheduling: no single agent “owns” the conversation.

Frontend

  • Next.js + TailwindCSS
  • Voice recording UI streams audio directly to backend endpoints
  • Successful uploads trigger agent-availability notifications and guide users into live calls

Backend

  • Flask API layer
  • Handles:

    • Audio ingestion (/handleFamilyAgentCreation)
    • Voice embedding generation
    • Agent provisioning
    • Call lifecycle management (/initiateConversation, /endConversation)
  • Verified via 200-status responses and logged embeddings (see proof)

Voice & Agents

  • Cartesia: voice cloning + embeddings
  • VAPI: outbound calls + agent orchestration
  • Speech-to-Text → LLM → Text-to-Speech pipeline for reliability under hackathon constraints

Data Model (Proof-Linked)

The database schema forms the backbone of the linked system:

  • patientsfamilymemory
  • Memory nodes store:

    • embeddings
    • triggers
    • timestamps
    • confusion flags

This enables selective retrieval, not global recall, crucial for safety and emotional coherence.


Key Technical Insight

We intentionally modeled conversational memory as a linked structure, not a flat vector store.

This allows:

  • Controlled traversal
  • Bounded context windows
  • Explicit escalation paths
  • Interpretability of agent decisions

In other words: conversation as state, not vibes.


Challenges

  • Managing multi-agent coordination without race conditions
  • Avoiding over-automation in a sensitive clinical context
  • Balancing system ambition with a hackathon timeline

We solved this by reducing the system to explicit links and handoffs — fewer hidden abstractions, more guarantees.


Accomplishments

  • End-to-end working prototype: UI → backend → voice → live call
  • Verified agent creation and embedding pipelines
  • Designed a scalable, safety-aware conversational architecture
  • Demonstrated real multi-agent orchestration under time pressure

What We Learned

  • Multi-agent systems behave more like operating systems than chatbots
  • Linked structures outperform monolithic context windows in safety-critical domains
  • Voice UX forces architectural honesty — latency and state errors are immediately visible

What’s Next

  • Transition to real-time Speech-to-Speech
  • Expand memory graph traversal logic
  • Longitudinal emotional state tracking (with human oversight)
  • Clinician-configurable thresholds and escalation policies

Tech Stack

  • Next.js, TailwindCSS — frontend
  • Flask — backend orchestration layer
  • Cartesia — voice cloning & embeddings
  • VAPI — agent creation & outbound calling
  • LLMs + RAG — conversational reasoning & memory retrieval

Built With

  • agent
  • api
  • backend
  • cloning.
  • crostaia
  • engineering
  • flask
  • frontend.
  • next.js
  • prompt
  • tailwindcss
  • vapi
  • voice
Share this project:

Updates