Demo

🎯 Buster - AI Voice Agent Platform

Built for the Langflow Hacking Agents Hackathon

🧠 Inspiration

We wanted to build an AI voice agent that doesn't just chat—it actually does things for you. Booking restaurants, handling customer service, or making business calls shouldn't require your constant attention. We thought: what if your AI could literally call people and handle complex conversations autonomously, with memory and context?

🤖 What it does

Buster is an intelligent voice agent platform that makes autonomous phone calls using ElevenLabs Conversational AI and Twilio. It features:

Smart Voice Calls: Makes real phone calls with natural conversation flow using ElevenLabs
Memory & Context: Uses Mem0 for persistent memory across conversations - remembers previous calls and personalizes interactions
DTMF Navigation: Automatically detects and navigates answering machines, voicemail systems, and automated phone menus
Multi-Modal Communication: Integrates WhatsApp, email, and calendar for comprehensive communication
Real-Time Status Tracking: Live status updates and call transcripts with MongoDB persistence
Brain-Powered Context: Injects conversation history and user preferences directly into the AI agent's brain

🛠️ How we built it

Backend Architecture:

ElevenLabs Conversational AI: Real-time voice synthesis and natural language processing
Twilio Programmable Voice: WebSocket-based audio streaming and phone call management
Fastify Server: High-performance Node.js backend with WebSocket support
Mem0 Integration: Vector-based memory system for persistent conversation context
MongoDB: Call status tracking and data persistence
DTMF Handler: Automated navigation of phone systems and voicemail

Frontend:

Next.js 15: Modern React frontend with TypeScript
Tailwind CSS + Radix UI: Beautiful, accessible interface components
Zustand: Efficient state management for call status and transcripts
Real-time Updates: Live call monitoring and status polling

Memory System:

Mem0 Vector Database: Stores conversation history and user preferences
Context Injection: Personalizes each call with relevant past interactions
Brain Enhancement: Generates customized greetings and responses based on memory

Infrastructure:

Google Cloud Run: Scalable deployment with Docker containers
GitHub Actions: Automated CI/CD with Claude Code integration
Proxy Architecture: Seamless frontend-backend communication

⚠️ Challenges we ran into

Real-time Audio Streaming: Synchronizing WebSocket connections between Twilio and ElevenLabs while maintaining low latency
Memory Context Injection: Figuring out how to inject conversation history directly into ElevenLabs agent brain without breaking the conversation flow
DTMF Navigation: Building reliable detection and navigation of complex phone menu systems
Status Synchronization: Coordinating status updates across multiple services (orchestrator, status checker, frontend)
MongoDB Cursor Issues: Debugging backend database connection and query handling
Cross-Platform Integration: Managing OAuth and API integrations for Gmail, Calendar, and WhatsApp

🏆 Accomplishments that we're proud of

Successful Live Calls: Buster successfully made real phone calls and had natural conversations
Memory-Powered Personalization: Built a working memory system that remembers users across calls and personalizes each interaction
Complex System Integration: Connected 6+ different APIs and services into a cohesive platform
DTMF Automation: Implemented answering machine detection and automated menu navigation
Real-time Architecture: Built a robust WebSocket-based system handling live audio streams
Production-Ready Deployment: Full CI/CD pipeline with Google Cloud deployment

📚 What we learned

Audio Processing is Complex: Real-time voice communication requires careful latency management and error handling
Memory Makes All the Difference: Users trust AI agents significantly more when they remember previous interactions
Phone Systems Are Inconsistent: Every voicemail system and phone menu works differently - requires adaptive DTMF strategies
WebSocket Coordination: Managing multiple concurrent WebSocket connections requires robust error handling
Vector Databases for Context: Mem0's approach to conversation memory is powerful for maintaining context across sessions

🚀 What's next for Buster

Enterprise Integration: Add Salesforce, HubSpot, and other CRM integrations for business use cases
Advanced DTMF: Machine learning-based phone system recognition and navigation
Voice Cloning: User-specific voice cloning for truly personalized calls
Proactive Calling: Scheduled callbacks, appointment reminders, and follow-up campaigns
Analytics Dashboard: Call success rates, conversation analytics, and performance metrics
Multi-Language Support: Expand beyond English with localized voice models
API Marketplace: Allow third-party developers to build custom call workflows

🔧 Technical Implementation

Core Technologies:

ElevenLabs Conversational AI - Real-time voice synthesis and conversation
Twilio Programmable Voice - Phone call infrastructure and WebSocket streaming
Mem0 - Vector-based memory and context management
Fastify + WebSocket - High-performance real-time backend
Next.js 15 - Modern React frontend with TypeScript
MongoDB - Call status and data persistence
Google Cloud Run - Scalable container deployment