Inspiration

Too often, we can find ourselves in opportunities where we want help but are too afraid to ask for it. The accumulation of shyness, guilt, and shame from not being able to speak out can take a heavy toll on people's mental health. I was inspired to tackle this problem so that anyone can share their concerns and seek help conversationally without having to say a single word.

What it does

Echo directly connects to your phone, allowing you to call anyone with a single prompt related to your concern. Common examples of people you can call include hotlines, your therapist, or trusted professionals (but you can call anyone!). The agentic AI speaks on your behalf, recording the call—with the recipient's permission—to provide the received feedback back to you. After the call, a written transcription of the call and a recreated audio conversation (.wav file) generated by Gemini TTS using the transcript are available, allowing you to continuously revisit the conversation.

How I built it

For the frontend, I used Next.js with JavaScript and TailwindCSS for fast styling and a responsive UI. Phone control through Twilio. VAPI and openai-4o-mini handle the agentic AI representative. Express.js swiftly handles all backend functions, such as the calling feature and all logging. GeminiAPI-2.5-flash-preview-tts is used for creating a multi-speaker text-to-speech conversation from the txt transcript (handled through localStorage).

Challenges I ran into

  • Live call streaming: I faced many issues with WebSocket use (first time) through ws for Node.js, especially for live audio manipulation, forcing me to pivot into audio transcribing and recreating the conversation.
  • Twilio implementation: Large learning curve in the initial steps of setting up a Twilio phone carrier

Accomplishments that I'm proud of

  • Successful Prompt Engineering: Factored in many parameters from user background noise, to pause length, to sequential dialogue to output a relatable and human-like conversation.
  • No noticeable delays: Initial worries of delayed conversing, but Express.js backend ensured smooth, fast processes.

What I learned

  • Audio manipulation is stupidly difficult
  • Think twice before using API calls without having checked how many tokens get wasted

What's next for Echo

  • Live call streaming: Optional Opt-in to hear the conversation live (option still available to listen to the recreated conversation)
  • Prompt checks: Security measures to look for misuse or abuse in prompting. Next steps would include encryption and data privacy

Built With

Share this project:

Updates