Gemini integration
MotionPitch is powered by a multi-modal orchestration of the Gemini 3 ecosystem, serving as the project’s cognitive and creative engine. At the core, Gemini 3 Pro acts as the "Lead Architect." Rather than simply generating text, we utilize Search Grounding to anchor presentations in real-time global data and Code Execution to perform live mathematical simulations for market analysis slides. To ensure our backend remains resilient, we leverage Structured Outputs, forcing the model to adhere to strict JSON schemas that eliminate parsing errors during complex data handoffs.
On the visual front, Gemini 3 Image serves as our "Art Director." It interprets the context provided by the Pro model to generate high-fidelity, brand-consistent visuals for every slide. By utilizing the google-genai SDK, we chain these models into an automated pipeline: the Pro model defines the narrative and data, which then seeds the Image model’s creative prompts. This synergy allows MotionPitch to move beyond static templates, creating a dynamic flow where Gemini handles everything from factual research to aesthetic design. By integrating these specialized tools, we’ve transformed Gemini from a chatbot into a comprehensive, end-to-end production studio.
Inspiration
Did you know? according to statistics, 77% of students and professionals report severe burnout due to administrative workload and deadline pressure.
The Reality: We are drowning in data but starving for time. Between term papers and slide decks, the average creator spends 4+ hours on formatting alone, leading to decision fatigue, chronic stress, and "blank canvas" paralysis.
With the release of the Gemini 3 Family and Veo 3, we saw an opportunity to change the medium entirely. We wanted to move from "Slide Decks" to "Cinematic Experiences." We asked ourselves: What if an AI didn't just write slides, but acted as a Researcher, Data Analyst, Art Director, and Cinematographer all at once?
That was the birth of MotionPitch.
What it does
MotionPitch is an agentic presentation architect that transforms a simple topic (or a PDF/URL) into a fully animated, data-backed presentation.
- Agentic Planning: It doesn't just guess content. It uses Gemini 3 Pro with Search Grounding to find real-time facts (e.g., 2026 market trends) and Code Execution to calculate accurate market sizing data.
- Cinematic Visuals: instead of searching for stock photos, it generates brand-consistent art using Gemini 3 Image.
- The "Veo" Effect: This is our additional feature. The user can toggle "Cinematic Mode," which uses Veo 3.1 to take the static generated images and animate them (Image-to-Video), creating moving, high-definition backgrounds that grab attention.
How we built it
We built a robust Flask (Python) backend using the latest google-genai SDK to orchestrate a complex multi-model pipeline:
- The Brain (Gemini 3 Pro): We used Structured Outputs (JSON Schema) to force the model to return valid data structures, eliminating parsing errors. We enabled Tools (Google Search & Code Execution) to ensure the slide content is factual and mathematically correct.
- The Eyes (Gemini 3 Image): We use a batching system (
ThreadPoolExecutor) to generate high-fidelity visuals for all slides in parallel. - The Motion (Veo 3.1): We implemented an asynchronous polling system. When the user enables video, the app sends the generated image to Veo 3.1 with a specific motion prompt (e.g., "Cinematic slow pan, 4k") and waits for the render.
- The Database: We moved away from flat files to SQLAlchemy (SQLite) to manage user sessions and complex presentation relationships.
Challenges we ran into
- The "Veo Wait": High-quality video generation takes time (~60 seconds per clip). In a web context, this usually leads to timeouts. We solved this by implementing WebSockets (Socket.io) to stream real-time logs (e.g., "🎥 Veo is rendering...") to the frontend, turning the wait time into a transparent "terminal" experience for the user.
- Hallucinations vs. Reality: Early versions made up statistics. We fixed this by strictly enforcing Code Execution tools within the Gemini 3 Pro planning stage.
- JSON Instability: Large text prompts often broke the JSON format. Switching to Gemini 3's native Structured Outputs completely solved this, making our backend 100% resilient.
Accomplishments that we're proud of
- True Multi-Model Orchestration: We aren't just calling one API. We are chaining Text $\to$ Code $\to$ Image $\to$ Video in a single workflow.
- Factuality: Building a system that searches the live web for presentation content instead of relying on training data cut-offs.
- The UI: Successfully creating a "Matrix-style" logging window that visualizes the AI's "thought process" in real-time.
- Veo Integration: Successfully implementing the Image-to-Video flow to maintain visual consistency across the slides.
What we learned
- Agents > Wrappers: Give the model tools (Search, Code), and it performs infinitely better than just giving it a prompt.
- Latency Management: UX is just as important as AI. Handling long-running generation jobs requires good asynchronous architecture.
- The Power of Video: Adding motion to a slide deck fundamentally changes how professional it feels.
What's next for MotionPitch
- Voice Narration: Integrating Google's text-to-speech to narrate the slides automatically.
- Export Options: Generating an
.mp4file of the entire presentation for easy sharing. - Vector Search: Implementing RAG on the SQLite database so users can query their past presentations semantically.
Built With
- css3
- gemini3
- html5
- javascript
- python
- veo3.1

Log in or sign up for Devpost to join the conversation.