Sure! Here's a complete project story for SpeakSnap based on the Hackathon-style structure:
🧠 Inspiration
In meetings, we often found ourselves googling terms instead of engaging with the discussion. Whether it was technical jargon, cultural references, or niche acronyms, something always slipped by—especially for newcomers or non-experts. We wanted to build something that empowers everyone to stay in the conversation.
💡 What it does
SpeakSnap listens to live meetings (powered by Jitsi and Azure Speech-to-Text), identifies context-specific or complex terms using Gemini AI, and generates brief, understandable explanations. These are then shown as non-intrusive popups on the frontend, allowing participants to keep up without breaking the flow.
🛠 How we built it
- Core (Python): Uses Gemini via LangChain to process transcribed speech and extract terms that need explanation. It also generates a session-level summary and sentiment analysis.
- Suite (JS): Handles both the backend (Node.js + Express + WebSocket for real-time flow) and frontend (React + Vite) that displays the popups during meetings.
- Speech-to-text is handled by Azure, and the application uses MongoDB to log user sessions.
- We used Jitsi for real-time video/audio capture during meetings.
🚧 Challenges we ran into
- Managing real-time audio transcription with low latency.
- Aligning AI-generated results with what the user is actively seeing and hearing.
- Gemini's response unpredictability required strict prompt tuning and validation logic.
- Coordinating between multiple tech stacks across the Core and Suite repos.
🏆 Accomplishments that we're proud of
- Built a cross-stack AI assistant that works in real-time during actual meetings.
- Achieved accurate contextual term extraction, even with casual or domain-heavy conversations.
- Created an end-to-end system where frontend popups feel natural and helpful instead of intrusive.
📚 What we learned
- How to combine speech recognition, LLMs, and real-time frontend communication.
- Deeper understanding of prompt design, response schema validation, and language model limitations.
- Real-world testing taught us how important context and latency are in user-facing AI tools.
🚀 What's next for SpeakSnap
- Launch full deployment (currently only API is hosted).
- Add speaker identification and better personalization.
- Integrate multilingual support and audio playback for explanations.
- Explore use cases in classrooms, mental health support groups, and public discourse moderation.
Let me know if you'd like this in .md format or tailored for submission to a hackathon platform like Devpost.
Built With
- azure-stt
- built-with:-python
- express.js
- gemini-api
- jitsi
- langchain
- mongodb
- node.js
- react
- vite
- websocket
Log in or sign up for Devpost to join the conversation.