Inspiration
Language learning apps teach vocabulary and grammar, but they miss what makes someone truly understood: timing, rhythm, and prosody. I've seen talented professionals struggle in international settings not because they lack words, but because their delivery feels unnatural. The pause before a response, the pitch of a question, the rhythm of casual conversation—these are what build trust and connection across cultures. I wanted to build a tool that addresses this invisible gap.
What it does
Timing Coach is an AI-powered speech coach that helps you master natural timing in 15 languages across 8 real-world situations. You select a scenario (introductions, meetings, travel) and a delivery style (friendly, humble, playful), then practice with a live AI coach powered by Gemini's native audio capabilities. The coach demonstrates phrases with native timing, listens to your attempt, and gives feedback in English. A visual waveform overlay lets you literally see where your rhythm matches the coach's pattern—and where it doesn't. You can drag to align and scroll to zoom, making prosody tangible and learnable.
How we built it
I built Timing Coach using Vite for rapid development, integrating Gemini 2.5 Flash for text generation and Gemini 2.5 Flash Native Audio for real-time voice coaching. The Web Audio API handles bidirectional audio streaming, capturing user speech and rendering coach demonstrations. The prosody visualization analyzes audio waveforms in real-time, displaying amplitude patterns that users can manipulate to understand timing differences. Firebase Hosting provides global deployment. The architecture prioritizes low-latency interaction—critical for natural conversation practice—while supporting 15 languages and multiple personality modes through careful prompt engineering.
Challenges we ran into
The biggest challenge was making audio feedback feel natural and immediate. Early versions had noticeable latency that broke the conversational flow. I optimized audio buffering and stream handling to reduce lag. Visualizing prosody in a way that was both accurate and intuitive took several iterations—users needed to see rhythm patterns without getting overwhelmed by technical detail. Balancing the coach's personality (helpful but not patronizing) across languages and contexts required extensive prompt refinement. Finally, ensuring the system gracefully handled diverse accents and speech patterns without being overly prescriptive was a delicate calibration.
Accomplishments that we're proud of
I'm proud that Timing Coach addresses something most language tools ignore entirely. The real-time audio coaching works smoothly with genuine conversational flow. The waveform visualization makes an abstract concept—prosody—concrete and actionable. Supporting 15 languages with 8 situations and 6 delivery styles creates a rich practice environment that mirrors real-world diversity. The system provides constructive feedback without being discouraging, which is crucial for learning. And it's genuinely useful for the growing population of remote workers, international professionals, and travelers who need to sound natural across borders.
What we learned
I learned that timing and prosody are even more critical to communication than I initially thought—users often don't realize their timing is off until they see it visualized. Building for natural conversation requires obsessive attention to latency and flow. Prompt engineering for voice AI is fundamentally different from text—you're shaping personality, pace, and tone simultaneously. I also discovered that learners need both audio and visual feedback; the combination is far more effective than either alone. Finally, creating inclusive, non-judgmental feedback is just as important as technical accuracy.
What's next for Timing Coach
Next, I want to add conversational practice where users engage in multi-turn dialogues, not just phrase repetition. Spaced repetition would help users retain timing patterns long-term. I'd love to integrate speech analysis that tracks improvement over time and suggests personalized focus areas. Adding more niche situations (job interviews, customer service, dating) would increase relevance. Community features—sharing phrases, comparing rhythms—could create peer learning. Finally, I want to explore accessibility features for users with speech differences, ensuring Timing Coach helps everyone communicate more naturally across borders.

Log in or sign up for Devpost to join the conversation.