π― Buster - AI Voice Agent Platform
Built for the Langflow Hacking Agents Hackathon
π§ Inspiration
We wanted to build an AI voice agent that doesn't just chatβit actually does things for you. Booking restaurants, handling customer service, or making business calls shouldn't require your constant attention. We thought: what if your AI could literally call people and handle complex conversations autonomously, with memory and context?
π€ What it does
Buster is an intelligent voice agent platform that makes autonomous phone calls using ElevenLabs Conversational AI and Twilio. It features:
- Smart Voice Calls: Makes real phone calls with natural conversation flow using ElevenLabs
- Memory & Context: Uses Mem0 for persistent memory across conversations - remembers previous calls and personalizes interactions
- DTMF Navigation: Automatically detects and navigates answering machines, voicemail systems, and automated phone menus
- Multi-Modal Communication: Integrates WhatsApp, email, and calendar for comprehensive communication
- Real-Time Status Tracking: Live status updates and call transcripts with MongoDB persistence
- Brain-Powered Context: Injects conversation history and user preferences directly into the AI agent's brain
π οΈ How we built it
Backend Architecture:
- ElevenLabs Conversational AI: Real-time voice synthesis and natural language processing
- Twilio Programmable Voice: WebSocket-based audio streaming and phone call management
- Fastify Server: High-performance Node.js backend with WebSocket support
- Mem0 Integration: Vector-based memory system for persistent conversation context
- MongoDB: Call status tracking and data persistence
- DTMF Handler: Automated navigation of phone systems and voicemail
Frontend:
- Next.js 15: Modern React frontend with TypeScript
- Tailwind CSS + Radix UI: Beautiful, accessible interface components
- Zustand: Efficient state management for call status and transcripts
- Real-time Updates: Live call monitoring and status polling
Memory System:
- Mem0 Vector Database: Stores conversation history and user preferences
- Context Injection: Personalizes each call with relevant past interactions
- Brain Enhancement: Generates customized greetings and responses based on memory
Infrastructure:
- Google Cloud Run: Scalable deployment with Docker containers
- GitHub Actions: Automated CI/CD with Claude Code integration
- Proxy Architecture: Seamless frontend-backend communication
β οΈ Challenges we ran into
- Real-time Audio Streaming: Synchronizing WebSocket connections between Twilio and ElevenLabs while maintaining low latency
- Memory Context Injection: Figuring out how to inject conversation history directly into ElevenLabs agent brain without breaking the conversation flow
- DTMF Navigation: Building reliable detection and navigation of complex phone menu systems
- Status Synchronization: Coordinating status updates across multiple services (orchestrator, status checker, frontend)
- MongoDB Cursor Issues: Debugging backend database connection and query handling
- Cross-Platform Integration: Managing OAuth and API integrations for Gmail, Calendar, and WhatsApp
π Accomplishments that we're proud of
- Successful Live Calls: Buster successfully made real phone calls and had natural conversations
- Memory-Powered Personalization: Built a working memory system that remembers users across calls and personalizes each interaction
- Complex System Integration: Connected 6+ different APIs and services into a cohesive platform
- DTMF Automation: Implemented answering machine detection and automated menu navigation
- Real-time Architecture: Built a robust WebSocket-based system handling live audio streams
- Production-Ready Deployment: Full CI/CD pipeline with Google Cloud deployment
π What we learned
- Audio Processing is Complex: Real-time voice communication requires careful latency management and error handling
- Memory Makes All the Difference: Users trust AI agents significantly more when they remember previous interactions
- Phone Systems Are Inconsistent: Every voicemail system and phone menu works differently - requires adaptive DTMF strategies
- WebSocket Coordination: Managing multiple concurrent WebSocket connections requires robust error handling
- Vector Databases for Context: Mem0's approach to conversation memory is powerful for maintaining context across sessions
π What's next for Buster
- Enterprise Integration: Add Salesforce, HubSpot, and other CRM integrations for business use cases
- Advanced DTMF: Machine learning-based phone system recognition and navigation
- Voice Cloning: User-specific voice cloning for truly personalized calls
- Proactive Calling: Scheduled callbacks, appointment reminders, and follow-up campaigns
- Analytics Dashboard: Call success rates, conversation analytics, and performance metrics
- Multi-Language Support: Expand beyond English with localized voice models
- API Marketplace: Allow third-party developers to build custom call workflows
π§ Technical Implementation
Core Technologies:
- ElevenLabs Conversational AI - Real-time voice synthesis and conversation
- Twilio Programmable Voice - Phone call infrastructure and WebSocket streaming
- Mem0 - Vector-based memory and context management
- Fastify + WebSocket - High-performance real-time backend
- Next.js 15 - Modern React frontend with TypeScript
- MongoDB - Call status and data persistence
- Google Cloud Run - Scalable container deployment
Key Features:
- Real-time WebSocket audio streaming
- Persistent conversation memory across calls
- Automated DTMF navigation for phone systems
- Multi-modal communication (voice, WhatsApp, email)
- Live status tracking and transcription
- Production-ready deployment infrastructure
Built With
- docker
- dtmf
- elevenlabs-api
- fastapi
- fastify
- github-actions
- gmail-api
- google-calendar-api
- google-cloud
- google-cloud-run
- javascript
- mem0
- mongodb
- next.js
- ngrok
- node.js
- oauth-2.0
- python
- radix-ui
- react
- server-sent-events
- tailwind-css
- twilio-voice-api
- typescript
- vector-database
- vercel
- webrtc
- websocket
- whatsapp-api
- zustand
Log in or sign up for Devpost to join the conversation.