🎯 Buster - AI Voice Agent Platform

Built for the Langflow Hacking Agents Hackathon

🧠 Inspiration

We wanted to build an AI voice agent that doesn't just chatβ€”it actually does things for you. Booking restaurants, handling customer service, or making business calls shouldn't require your constant attention. We thought: what if your AI could literally call people and handle complex conversations autonomously, with memory and context?

πŸ€– What it does

Buster is an intelligent voice agent platform that makes autonomous phone calls using ElevenLabs Conversational AI and Twilio. It features:

  • Smart Voice Calls: Makes real phone calls with natural conversation flow using ElevenLabs
  • Memory & Context: Uses Mem0 for persistent memory across conversations - remembers previous calls and personalizes interactions
  • DTMF Navigation: Automatically detects and navigates answering machines, voicemail systems, and automated phone menus
  • Multi-Modal Communication: Integrates WhatsApp, email, and calendar for comprehensive communication
  • Real-Time Status Tracking: Live status updates and call transcripts with MongoDB persistence
  • Brain-Powered Context: Injects conversation history and user preferences directly into the AI agent's brain

πŸ› οΈ How we built it

Backend Architecture:

  • ElevenLabs Conversational AI: Real-time voice synthesis and natural language processing
  • Twilio Programmable Voice: WebSocket-based audio streaming and phone call management
  • Fastify Server: High-performance Node.js backend with WebSocket support
  • Mem0 Integration: Vector-based memory system for persistent conversation context
  • MongoDB: Call status tracking and data persistence
  • DTMF Handler: Automated navigation of phone systems and voicemail

Frontend:

  • Next.js 15: Modern React frontend with TypeScript
  • Tailwind CSS + Radix UI: Beautiful, accessible interface components
  • Zustand: Efficient state management for call status and transcripts
  • Real-time Updates: Live call monitoring and status polling

Memory System:

  • Mem0 Vector Database: Stores conversation history and user preferences
  • Context Injection: Personalizes each call with relevant past interactions
  • Brain Enhancement: Generates customized greetings and responses based on memory

Infrastructure:

  • Google Cloud Run: Scalable deployment with Docker containers
  • GitHub Actions: Automated CI/CD with Claude Code integration
  • Proxy Architecture: Seamless frontend-backend communication

⚠️ Challenges we ran into

  • Real-time Audio Streaming: Synchronizing WebSocket connections between Twilio and ElevenLabs while maintaining low latency
  • Memory Context Injection: Figuring out how to inject conversation history directly into ElevenLabs agent brain without breaking the conversation flow
  • DTMF Navigation: Building reliable detection and navigation of complex phone menu systems
  • Status Synchronization: Coordinating status updates across multiple services (orchestrator, status checker, frontend)
  • MongoDB Cursor Issues: Debugging backend database connection and query handling
  • Cross-Platform Integration: Managing OAuth and API integrations for Gmail, Calendar, and WhatsApp

πŸ† Accomplishments that we're proud of

  • Successful Live Calls: Buster successfully made real phone calls and had natural conversations
  • Memory-Powered Personalization: Built a working memory system that remembers users across calls and personalizes each interaction
  • Complex System Integration: Connected 6+ different APIs and services into a cohesive platform
  • DTMF Automation: Implemented answering machine detection and automated menu navigation
  • Real-time Architecture: Built a robust WebSocket-based system handling live audio streams
  • Production-Ready Deployment: Full CI/CD pipeline with Google Cloud deployment

πŸ“š What we learned

  • Audio Processing is Complex: Real-time voice communication requires careful latency management and error handling
  • Memory Makes All the Difference: Users trust AI agents significantly more when they remember previous interactions
  • Phone Systems Are Inconsistent: Every voicemail system and phone menu works differently - requires adaptive DTMF strategies
  • WebSocket Coordination: Managing multiple concurrent WebSocket connections requires robust error handling
  • Vector Databases for Context: Mem0's approach to conversation memory is powerful for maintaining context across sessions

πŸš€ What's next for Buster

  • Enterprise Integration: Add Salesforce, HubSpot, and other CRM integrations for business use cases
  • Advanced DTMF: Machine learning-based phone system recognition and navigation
  • Voice Cloning: User-specific voice cloning for truly personalized calls
  • Proactive Calling: Scheduled callbacks, appointment reminders, and follow-up campaigns
  • Analytics Dashboard: Call success rates, conversation analytics, and performance metrics
  • Multi-Language Support: Expand beyond English with localized voice models
  • API Marketplace: Allow third-party developers to build custom call workflows

πŸ”§ Technical Implementation

Core Technologies:

  • ElevenLabs Conversational AI - Real-time voice synthesis and conversation
  • Twilio Programmable Voice - Phone call infrastructure and WebSocket streaming
  • Mem0 - Vector-based memory and context management
  • Fastify + WebSocket - High-performance real-time backend
  • Next.js 15 - Modern React frontend with TypeScript
  • MongoDB - Call status and data persistence
  • Google Cloud Run - Scalable container deployment

Key Features:

  • Real-time WebSocket audio streaming
  • Persistent conversation memory across calls
  • Automated DTMF navigation for phone systems
  • Multi-modal communication (voice, WhatsApp, email)
  • Live status tracking and transcription
  • Production-ready deployment infrastructure

Built With

Share this project:

Updates