Inspiration
Most people can't automate their work because automation tools require programming skills. You need to understand APIs, data mapping, and conditional logic. Meanwhile, those same people teach their coworkers how to do things all the time by just showing them. We wanted to make automation work the way humans naturally teach: by demonstration, not configuration.
What it does
Snappier turns screen recordings into automated workflows in 3 steps:
Record — Perform your workflow once in the browser. Our Chrome extension captures everything (e.g., scrape LinkedIn → add to Google Sheets → send Slack message)
Generate — AI watches the video and builds the automation. Gemini extracts what you did, Claude figures out the API calls, and we map it to 100+ SaaS tools via Composio
Run — One click executes the entire workflow across your connected apps. Run it again anytime, or schedule it automatically
No drag-and-drop builders. No field mapping. Just show us once and we automate it forever.
How we built it
Frontend: Next.js 15 + React 19 + TypeScript + Tailwind CSS + React Flow (interactive workflow graphs)
AI Pipeline:
- Google Gemini (gemini-3-pro-preview) for multimodal video understanding. Analyzes recordings frame-by-frame to extract structured steps with intent, confidence scores, and timestamps
- Claude Haiku for fast intent summarization and noise filtering to remove accidental clicks
- Claude Sonnet for workflow execution reasoning with tool-use calling
- OpenAI Whisper for audio transcription for narrated workflows
- Vercel AI SDK as unified interface for all LLMs with structured output (Zod schemas)
Integrations:
- Composio provides 100+ pre-built SaaS tool integrations with OAuth and tool-calling APIs
- Cloudflare R2 for scalable video storage
- Clerk for user authentication
Chrome Extension: Manifest V3 extension that captures screen video (WebM) + granular DOM events (clicks, keyboard, navigation)
Architecture: Multi-modal step extraction (AI vision + DOM events + audio) → Intent summarization → Tool detection via URL pattern matching → Workflow generation → Composio execution with full audit logs
Challenges we ran into
Video understanding reliability – Getting Gemini to consistently extract structured steps from noisy screen recordings took extensive prompt engineering and low-temperature sampling (0.2)
Intent vs. action gap – Users click buttons in the browser, but workflows need to call APIs. Bridging "clicked the blue button" to "send_email via Gmail API" required semantic reasoning with Claude
Noise filtering – We didn't realize how many accidental clicks, scrolls, and micro-adjustments users make. Our AI had to learn to ignore 80% of recorded actions
OAuth at scale – Managing connections across 100+ SaaS tools (each with different auth flows) was only feasible thanks to Composio's abstraction layer
Real-time workflow editing – Keeping the React Flow graph, step list, and video player synchronized while the user edits was a complex state management challenge
Accomplishments that we're proud of
Multi-modal AI fusion – We combined video, DOM events, and audio into a single coherent understanding. Each modality validates the others (if Gemini sees a click AND the DOM recorded a click, high confidence)
End-to-end working demo – From recording to execution, the full pipeline works. You can actually record a workflow and run it against real SaaS APIs
Intent-aware automation – We don't just replay mechanical actions. Snappier understands what you were trying to accomplish and finds the best API-level way to do it
Control flow detection – Our AI detects loops ("for each row in this spreadsheet...") and conditionals, not just linear sequences
Production-ready architecture – Cloudflare R2 storage, Clerk auth, comprehensive test suite (Vitest), type-safe throughout
What we learned
Multimodal > unimodal – Video analysis alone misses context. DOM events alone can't infer intent. Audio alone is ambiguous. Together, they're powerful.
Low-temperature LLMs are essential – For structured extraction, we needed temperature=0.2 or lower. Creativity kills reliability here.
Tool abstractions unlock scale – Building 100+ integrations from scratch would've been impossible. Composio's tool-calling abstraction made it feasible.
Users are noisy – Humans don't realize how many random clicks, back-buttons, and scrolls they do. Filtering signal from noise is 80% of the challenge.
Video-first is fundamentally different – Teaching by demonstration feels more natural than form-based builders. It's how we teach humans; why not computers?
What's next for Snappier
- Runtime control flow – Execute loops and conditionals, not just detect them
- Scheduled workflows – Cron-style automation (run every morning, every week, etc.)
- Team collaboration – Share workflow libraries across organizations
- Workflow marketplace – Discover and remix workflows from the community
- Mobile recording – iOS/Android screen recording support
- Advanced parameter inference – Better extraction of dynamic values (names, emails, dates) from recordings
- Multi-step editing – Visual graph editor to rearrange, add, or remove steps post-recording
Built With
- ai
- anthropic-claude-(workflow-planning-&-execution)
- api
- chrome
- clerk
- cloudflare
- composio
- extensions
- framer-motion-ai/llm:-google-gemini-(video-understanding)
- gemini
- openai-whisper-(transcription)
- r2
- react-19
- react-flow
- saas
- sdk
- tailwind-css
- typescript
- vercel
- vitest
- zod
Log in or sign up for Devpost to join the conversation.