Inspiration
Education is a universal journey, but it often feels like a solitary one. We were inspired by the countless late nights students spend staring at complex diagrams or dense textbooks, wishing they had a knowledgeable friend by their side to break it all down. VisionStudy was born from the desire to bridge the gap between "seeing" a problem and "understanding" it. We wanted to leverage the power of multimodal AI to transform any camera-equipped device into a 24/7 tutor that could see what a student sees and explain it in a way that truly clicks.
What it does
VisionStudy is more than just a camera app; it's an intelligent study environment. Multimodal Document Support: Beyond just photos, students can upload existing PDF textbooks and documents. The app uses a dedicated DocumentService to handle secure file picking and persistent storage, making it a flexible repository for all study materials. Study Buddy Chat: Engage in a deep, interactive brainstorming session. Whether it's a complex equation like E = mc² or a historical event, your AI companion is ready to discuss it. Voice-First Interaction: With "Hold to Release" recording and text-to-speech, students can learn hands-free, making it accessible and natural. Automated Organization: It doesn't just analyse; it organizes. The app automatically detects the subject matter (e.g., "Organic Chemistry") and files it into folders for easy retrieval. Smart Quizzing: The "Quiz Me" feature turns static content into active learning by generating personalized questions based on your specific study material.
How we built it
We chose Flutter for its rich UI capabilities and cross-platform reach, ensuring a premium feel on both Android and iOS. AI Core: At its heart, VisionStudy uses the google_generative_ai package to tap into Gemini 2.5 Flash. We implemented specific system instructions to ensure the AI acts as a patient, encouraging tutor that can interpret both visual and text-based documents. Document Architecture: We built a robust document management system using file_picker and path_provider. This allows the app to not only analyse files on-the-fly but also to copy them into the app's permanent storage for long-term access. Voice System: We integrated speech_to_text and flutter_tts to create a seamless verbal learning loop. Data Persistence: We used local storage models to track study history, scan counts, and folder structures, allowing students to build a library of their knowledge over time.
Challenges we ran into
Voice Synchronization: Ensuring the "Hold to Release" gesture felt responsive while accurately capturing the end of speech was a significant hurdle. We had to implement a custom logic to "pull" the final result from the service upon release to guarantee no words were lost. Prompt Engineering: Crafting the perfect tutor persona required constant iteration. We had to balance technical accuracy with a tone that was accessible and motivating. Asynchronous Flows: Managing the transition from initial image analysis to a long-running chat session while maintaining a smooth UI state required careful handling of Flutter's Future and Stream APIs.
Accomplishments that we're proud of
Interactive Persona: We successfully created a "Brainstorm with AI" experience that feels like a conversation, not just a search query. Multimodal Consistency: The bridge between visual data (images/PDFs) and verbal communication (voice) feels cohesive and intuitive. Performance: Leveraging Gemini 2.5 Flash allowed us to achieve near-instant analysis, which is crucial for maintaining a student's focus.
What we learned
Building VisionStudy taught us that AI is most powerful when it's contextual. A generic chatbot is helpful, but a chatbot that "sees" your specific physics problem and can guide you through the derivation of Ampere's Law is transformative. We also deepened our expertise in Flutter's gesture systems and asynchronous state management.
What's next for VisionStudy
collaborative Sessions: We want to allow students to share their "Study Folders" and AI conversations with classmates. AR Integration: Projecting explanations directly onto physical textbooks through the camera view. Extended Memory: Implementing a system to help the AI remember a student's specific learning style and weak areas across different study sessions.
Built With
- camera
- dart
- filepicker.io
- flutter
- fluttertts
- gemini
- googlefonts
- hive
- sharedpreferences
- speechtotext
Log in or sign up for Devpost to join the conversation.