G-3 Project Description

Inspiration

We were frustrated by switching between multiple Google apps (Calendar, Gmail, Drive, Maps, etc.) to complete simple tasks. We wanted a single conversational interface that could access all Google services and understand context across them—like asking "How long to my next meeting?" and getting an answer that checks your calendar, calculates travel time, and suggests when to leave—all in one conversation.

What it does

G-3 is an intelligent Google workspace assistant that unifies 20+ Google services into one natural language interface. Users can ask questions or give commands in plain English, and the assistant automatically:

Manages your schedule: View, create, edit, and delete calendar events; calculate travel times to meetings
Handles communications: Search and read Gmail, find contacts, create Google Meet links
Creates content: Generate Google Docs, Slides, and Sheets on the fly
Organizes work: Manage Google Tasks, create and search Google Keep notes, find files in Drive
Provides information: Search the web, get weather forecasts, find nearby places, check air quality, get timezone info
Manages forms: Create Google Forms with questions, edit questions, view responses
Entertainment: Search YouTube videos, get video details, access playlists

The assistant uses MongoDB to remember conversation context, so follow-up questions like "that meeting" or "the document I created" work naturally. It also features voice input (Web Speech API) and text-to-speech output (ElevenLabs) for hands-free interaction.

How we built it

Frontend: React with a chat interface that supports voice input and audio playback

Backend: Express.js server with modular service architecture

AI Engine: Google Gemini 2.5 Flash with function calling to route queries to the right services

Context Management: MongoDB stores conversation history, user context, and preferences to maintain continuity across sessions

Google APIs Integration: 20+ service integrations including Calendar, Gmail, Drive, Docs, Sheets, Slides, Maps, Tasks, Contacts, Meet, Keep, Forms, YouTube, Places, Timezone, Weather, Air Quality, and Google Search

Speech: ElevenLabs API for natural-sounding text-to-speech responses

Architecture: Model Context Protocol (MCP) for structured communication between the AI and services

Authentication: OAuth 2.0 for secure Google account access

Challenges we ran into

API complexity: Each Google service has different authentication, rate limits, and data formats. Managing 20+ integrations required careful error handling and retry logic.
Context management: Making the assistant remember previous conversations and understand references like "that meeting" or "the document" required building a robust context storage system in MongoDB.
Function calling with Gemini: Getting Gemini to reliably choose the right tools for complex queries took significant prompt engineering and system instruction refinement.
Voice integration: Syncing Web Speech API (client-side) with ElevenLabs TTS (server-side) while maintaining conversation flow was tricky.
Rate limiting: Google APIs have strict rate limits. We had to implement smart caching and request batching to avoid hitting limits during demos.
Error handling: When one service fails, the assistant should still provide useful information from other services. Building graceful degradation was essential.
MongoDB connection issues: Initial setup with MongoDB Atlas required troubleshooting SSL/TLS, IP whitelisting, and connection pooling.

Accomplishments that we're proud of

Full CRUD operations: The assistant can create, read, update, and delete across all integrated services—not just viewing data.
Context-aware conversations: MongoDB-powered context means the assistant remembers your previous queries, events discussed, and preferences across sessions.
20+ service integrations: Successfully integrated Calendar, Gmail, Drive, Docs, Slides, Sheets, Maps, Tasks, Contacts, Meet, Keep, Forms, YouTube, Weather, Places, Timezone, Air Quality, Google Search, and more.
Intelligent routing: Gemini automatically determines which services to call and combines results for complex queries like "What's my schedule and how long to get to my 3pm meeting?"
Voice-first experience: Complete voice input/output pipeline using Web Speech API and ElevenLabs for natural conversation.
MCP protocol implementation: Built a structured protocol layer for reliable AI-service communication.
Production-ready architecture: Modular service design makes it easy to add new Google services or features.

What we learned

Function calling with LLMs: Gemini's function calling is powerful but requires careful tool definitions and system instructions to work reliably.
Context is king: Storing conversation history and context in MongoDB dramatically improved the assistant's ability to handle follow-up questions and maintain continuity.
API integration patterns: Each Google service has unique quirks. Building a consistent abstraction layer across all services taught us a lot about API design.
Error resilience: Building systems that degrade gracefully when services fail is crucial for user experience.
Voice UX challenges: Voice interfaces need different UX patterns than text—audio feedback, clear error messages, and handling interruptions.
MCP protocol benefits: Using a structured protocol (MCP) made it easier to add new services and debug issues compared to ad-hoc integrations.
Rate limit management: Proactive caching and request optimization are essential when working with multiple rate-limited APIs.

What's next for G-3

Enhanced NLP: Improve query understanding for more natural, conversational interactions and better handling of ambiguous requests.
Multi-user support: Add team workspaces where multiple users can collaborate through the assistant.
Automation workflows: Let users create custom workflows like "Every Monday, create a weekly report from my calendar and send it via email."
Advanced calendar features: Smart meeting scheduling that considers travel time, suggests optimal meeting times, and automatically reschedules based on conflicts.
Email composition: Full email sending capabilities with smart drafting and scheduling.
Document editing: Direct editing of Google Docs and Sheets through natural language commands.
Integration expansion: Add more Google services (Photos, Classroom, Analytics) and third-party integrations (Slack, Notion, etc.).
Mobile app: Native iOS and Android apps for on-the-go access.
Voice improvements: Better voice recognition accuracy, support for multiple languages, and custom voice training.
Analytics dashboard: Insights into productivity patterns, most-used services, and time saved through automation.