Speech Coversational AI

Inspiration

We wanted to make communication easier and more interactive by combining speech and AI in multiple languages. The idea was to create an app where you can talk to it, see your words converted into text, and hear the AI respond back with voice—all in a natural, conversational way. Inspired by advancements in AI and tools like SambaNova cloud having Meta-Llama-3.1-8B-Instruct , Whisper and gTTS models, we aimed to build something that feels simple yet powerful, helping people communicate and connect seamlessly with AI with voice to voice. We can also integrate it to other applications to make it real time.

What it does

Speech Coversational AI is a dynamic AI-powered application that:

Accepts speech input directly through voice recording or audio file uploads.

Transcribes the audio into text using OpenAI Whisper.

Processes the transcribed text to generate thoughtful responses using the * SambaNova Meta-Llama-3.1-8B-Instruct * model.

Converts the AI-generated responses back into speech using Google Text-to-Speech (gTTS) for a complete conversational loop.

Supports multilingual input and output, enabling interaction in various languages.

The application is designed to provide an intuitive and engaging user experience, ideal for personal assistants, learning aids, or real-time communication tools.

How we built it

Backend:

Whisper for high-accuracy speech-to-text transcription. SambaNova Meta Llama 3.1 8b Instruct Model to generate responses. Google Text-to-Speech (gTTS) for converting text back into speech.