Inspiration

Drive-thru lines move fast, but language barriers slow everything down (and it’s stressful for both sides.) We’ve seen workers repeat orders 3-4 times, customers feel awkward, and mistakes happen.

Language barriers are linked to operational inefficiency and lost productivity (time spent clarifying, correcting mistakes), which is a real economic cost even when it doesn’t show up as “lost sales” on a financial report. ​ One frequently cited macro figure is that limited English skills of foreign-born workers cost U.S. corporations $65B annually in lost productivity (all industries), which suggests language-related costs can be high.

https://www.sciencedirect.com/science/article/pii/S0001691824000933

We wanted a kiosk-style, multilingual drive-thru AI that talks like a real cashier, stays menu-accurate, and keeps the kitchen perfectly in sync. By handling end-to-end ordering, the system allows each branch to operate with one fewer staff member during peak hours, reducing labour costs while speeding up service.

Our goal is not only to make ordering anywhere in the world feel like home (thanks to our multilingual AI agent) but also to save time, reduce errors, and unlock billions in operational savings for the fast-food industry.

What it does

Multilingual: Our product takes drive‑thru orders by voice through multiple languages. Menu-aware: Only offers what’s in our menu, tracks items, suggests promotions, add-ons, and list total. Order confirmation: reads the order back and asks, “Is this correct?” before completing, ensuring correct orders each time! Personality: Welcoming and energetic personality each time, keeping conversation professional and menu-related. Person detection: The AI greets customers and starts orders through person detection in its integrated sensor/camera pipeline. Multiple orders: Our AI manages to keep track of different orders and provides an organized list for the kitchen. Kitchen mode: broadcasts live order updates via Socket.IO, so staff see changes instantly.

How we built it

Frontend (React + Vite + Tailwind): kiosk UI, conversation log, order panel, menu display, kitchen display.

Backend (Node.js + Express): order state, signed session URL endpoint for the voice agent, and order update endpoints.

Real-time updates (Socket.IO): push order changes to kitchen display clients

ElevenLabs Conversational AI Agent: handles the “voice pipeline” (STT + conversational reasoning + TTS) so it feels natural and fast.

Gemini 2.5 Flush: Used by the ElevenLabs agent to make logical decisions about suggestions and human interactions.

TensorFlow: Used for person detection and starting the order process.

Challenges we ran into

Perfecting the Gemini and ElevenLabs interaction while keeping both fully menu-aware and aligned was a major challenge.
Getting the AI to consistently call update order with the right JSON shape took a lot of iteration. Without strict grounding, conversational models were hallucinating items or drifting off-menu. We had to ensure the AI stayed professional, followed guidelines, and referenced only verified menu data, while still responding naturally and in real-time menu tracking.

Accomplishments that we're proud of

Firstly, we’re proud of being able to consistently push through challenges in strategizing and debugging as a team throughout our product development process. One example is our ability to properly integrate ElevenLabs tech into our Drive Thru Agent to make it flawless in its purpose as an automated multilingual order taker. Secondly, we're proud of our perseverance to bring seemingly impossible features to fruition. For example, we incorporated a person-detecting camera to activate or deactivate the agent's listening based on whether a car is present in its sight. We’re also proud that we got real-time kitchen updates working, so it’s not just “a chatbot”, it’s an actual ordering workflow that solves global language-barrier problems, while being highly employable by fast food chains by saving valuable wasted time and billions of dollars for the companies.

What we learnt

Building thru.ai taught us that real projects evolve. We started with Python and Flutter, experimented with Gemini LLM, and ultimately pivoted to a Node.js + React stack with ElevenLabs and an integrated Gemini model for voice AI, learning that choosing the right tools matters more than sticking with the first ones. We learned how to wire up WebSocket-based conversational AI, so a voice agent can actually modify live orders, how to coordinate real-time state across multiple clients, and how to bring all of these pieces, computer vision, voice AI, a REST API, and a reactive frontend together into one cohesive full-stack system. Most importantly, we learned that building something ambitious means iterating, overcoming obstacles, pivoting, cleaning up failed experiments, and collaborating across branches until it all comes together. This Hackathon proved to us that with great united determination, even what seemed impossible could be done through supportive teamwork and dedication, even within such a time limit.

What's next for thru.ai

Run the MVP on a Raspberry Pi kiosk setup (mic + speaker + screen) for a real stall demo. Add payment integration + multilingual receipts Deploy to 5 pilot locations for real-world validation Measure revenue lift + order accuracy improvements. Expand menu tooling so restaurants can update prices/items without touching code. We proved in this hackathon that language barriers in food service aren’t insurmountable. It’s an opportunity; adopting this makes restaurants more inclusive, more profitable, and more resilient to demographic shifts.

Built With

Share this project:

Updates