Dear | Devpost

Dear_logo

Inspiration — (October 2024)

Alzheimer’s gradually erodes recognition and conversational continuity, creating emotional distance even when family members are deeply present. We wanted to preserve interaction, not just information, designing a system that recreates familiar conversational dynamics while remaining safe, interpretable, and human-in-the-loop.

What It Does

DEAR is a voice-first, multi-agent system that simulates familiar conversations between Alzheimer’s patients and their family members.

Family members upload short voice samples, which are used to generate personalized voice agents. When a call is initiated, DEAR orchestrates a live conversation where:

A voice agent handles natural speech interaction
A conversation agent manages dialogue state
A monitor agent tracks confusion, repetition, and escalation signals
A nurse agent intervenes when thresholds are crossed
A RAG agent retrieves only contextually relevant memory fragments

The result is a conversation that feels familiar, adaptive, and emotionally grounded — without automating diagnosis or replacing clinicians.

How We Built It (Technical)

System Architecture

At its core, DEAR functions like a doubly-linked conversational OS:

Each interaction node links backward to prior context and forward to possible next states
Memory, family identity, and conversational triggers form a knowledge graph traversed as a linked list
Agents pass control explicitly, not implicitly, enabling safe escalation and rollback

This mirrors OS-level scheduling: no single agent “owns” the conversation.

Frontend

Next.js + TailwindCSS
Voice recording UI streams audio directly to backend endpoints
Successful uploads trigger agent-availability notifications and guide users into live calls

Backend

Flask API layer
Handles:
- Audio ingestion (/handleFamilyAgentCreation)
- Voice embedding generation
- Agent provisioning
- Call lifecycle management (/initiateConversation, /endConversation)
Verified via 200-status responses and logged embeddings (see proof)

Voice & Agents

Cartesia: voice cloning + embeddings
VAPI: outbound calls + agent orchestration
Speech-to-Text → LLM → Text-to-Speech pipeline for reliability under hackathon constraints

Data Model (Proof-Linked)

The database schema forms the backbone of the linked system:

patients ↔ family ↔ memory
Memory nodes store:
- embeddings
- triggers
- timestamps
- confusion flags

This enables selective retrieval, not global recall, crucial for safety and emotional coherence.

Key Technical Insight

We intentionally modeled conversational memory as a linked structure, not a flat vector store.

This allows:

Controlled traversal
Bounded context windows
Explicit escalation paths
Interpretability of agent decisions

In other words: conversation as state, not vibes.

Challenges

Managing multi-agent coordination without race conditions
Avoiding over-automation in a sensitive clinical context
Balancing system ambition with a hackathon timeline

We solved this by reducing the system to explicit links and handoffs — fewer hidden abstractions, more guarantees.

Accomplishments

End-to-end working prototype: UI → backend → voice → live call
Verified agent creation and embedding pipelines
Designed a scalable, safety-aware conversational architecture
Demonstrated real multi-agent orchestration under time pressure

What We Learned

Multi-agent systems behave more like operating systems than chatbots
Linked structures outperform monolithic context windows in safety-critical domains
Voice UX forces architectural honesty — latency and state errors are immediately visible