BabelFish: Instant Messaging wastes less time now!

Inspiration

As active members of online communities, student groups, and remote teams, we live in chat applications like Discord and Slack. We love the immediate connection, but we’re constantly drowning in a sea of messages. Key decisions, important files, and critical feedback get buried within hours. That one server with 200 messages a day is fun when I was a part of the discussion, but sucks when I'm not. But I know there's important information there, like weekend plans from my friend group. What if we could have an intelligent copilot that understood the full context of a conversation, including text, images, and history?

What It Does

BabelFish is a Chrome extension that uses the Google Gemini API to act as an context-aware assistant for your online conversations. Instead of just searching, you can ask.

BabelFish provides three core functionalities:

Multimodal Q&A: It can see what you see. A user can highlight a conversation that includes both text and images and ask a question. BabelFish sends both the text and visual data to to get a comprehensive answer that understands the full context. For example: "Summarize the team's feedback on this specific UI design screenshot."

Long-Context Summarization: Catch up on a week's worth of discussion in seconds. BabelFish can process enormous chat histories to provide high-level summaries, track the evolution of a decision, or find a specific piece of information mentioned days ago.

Actionable Insights via Function Calling: BabelFish transforms passive conversation into active intelligence. It uses Gemini's Function Calling to identify real-world entities mentioned in a chat, call external APIs for more information (e.g., getting Google Maps data for a mentioned restaurant), and return structured, useful data directly to the user.

How We Built It

Our architecture is a clean, three-part system designed for speed and power.

Frontend: Chrome Extension: Built with standard JavaScript, HTML, and CSS, the extension is responsible for the user interface and, most importantly, for intelligently scraping conversation context (text, image URLs, and message metadata) directly from the active webpage.

Backend: Python & Flask: We chose Flask for its asynchronous capabilities, allowing us to handle API calls to Google without blocking. This lightweight server acts as a secure intermediary: it receives the context from the extension, formats it into a valid request for the Gemini API, and processes the response to send back to the user.

The Brain: Google Gemini API: This is the core of our project. We specifically used: Gemini 2.5 Pro to handle requests requiring a vast context window.

Function Calling to enable our agent to interact with external tools and APIs.

Challenges We Ran Into

Our biggest challenge was reliably extracting clean, structured data from dynamic web applications. Modern chat apps are complex, and writing a JavaScript scraper that could consistently identify message boundaries, authors, timestamps, and embedded image URLs was a significant engineering hurdle. It didn't help that chrome extensions are really not super accessible in terms of example applications.

Additionally, crafting the perfect multimodal prompts for Gemini was an iterative process. We had to experiment extensively to learn how to best combine system instructions, unstructured chat history, and image data in a single request to get the most accurate and relevant insights from the model.

Accomplishments That We're Proud Of

We are incredibly proud of creating a fully functional, end-to-end prototype in such a short time. The first time we asked a question about a screenshot and received a perfectly nuanced summary from Gemini was our "wow" moment. It proved our core concept was not just possible, usable even outside the hackathon. We successfully built a seamless pipeline from a user's browser, through our backend, and back to the user in seconds.

What We Learned

This hackathon an awesome learning experience! We learnt a lot about creating an multi step agentic workflow, providing tools to LLMs, and creating a cohesive pipeline to pass and retrieve data.

What's Next for BabelFish

BabelFish has a clear path forward. Our next steps would be to expand Platform Support - Adapt the extension to work seamlessly with other major platforms like Slack, Microsoft Teams, and Telegram.

Share this project:

Updates