Inspiration
In a world where language differences can be a barrier to communication, we were motivated to create a bridge that connects people, regardless of the language they speak. We envisioned an innovative solution that leverages technology to make every conversation inclusive and understandable for all participants.
What it does
Imagine if Person A speaks English and Person B speaks German. They hop on a video call. Reslate enables A to speak English and have B hear A's voice, but in German. This tech works for pre-recorded videos too.
1. Voice Cloning Model
Our application uses the user's provided voice to deconstruct and reconstruct the user's voice in any desired language to create an immersive experience on the receiving end.
2. Tonality Classification Model
Our ultra-light custom audio classification model sets the tonality for the base voice of the user's voice clone to maintain clarity in any language translation
3. Two fine-tuned LLMs (Prediction Model)
Our super-fast LLMs, fine tuned on common conversation datasets, can predict short completions of sentences, and enable us to pre-generate cloned voice audio before the person even finishes speaking! This enables an insanely low latency.
How we built it
- ReactJs, NextJs, and Chakra UI for the front-end
- PeerJs and WebRTC for video call functionality
- OpenVoice ToneColorEncoder + OpenAI TTS for voice cloning
- Google translate API for translation
- Chrome web speech API for Speech to text
- Huggingface + FLAN T5 + generics_kb dataset for LLM fine tuning
- Mistral 7B model for more sentence prediction
- FastAPI + Python for back-end and parallel computing
- Ngrok for web hosting
- Supabase auth and DB for storing user and session data
- Support Vector Classifier + Python for the base TTS model classification
- FFMPEG for video editing
Challenges we ran into
- Getting all our platforms integrated in < 24 hours
- Training and testing so many custom ML models
- Finding a voice cloning technology that sounds good, supports many languages, and generates super fast.
Accomplishments that we're proud of
- Achieving near real-time translation through a bunch of cool engineering
- Making voice cloning that sounds like the person, while also being low latency
- Getting the platform built in time and working smoothly
What we learned
- Voice cloning
- Fine tuning LLMs
- Working with video streaming and sockets
What's next for Reslate
- Working towards lower latency
- More feature-rich video call platform (screen sharing, recording etc.)
Built With
- chakraui
- gcloud
- huggingface
- mistral
- nextjs
- ngrok
- openvoice
- peerjs
- python
- react
- supabase
- typescript
- webrtc

Log in or sign up for Devpost to join the conversation.