Inspiration
The spark for Le-Kellen comes from a startling statistic: nearly 20 million Americans—8% of the U.S. population—live with visual impairments . This number grows even more significant among older adults, where vision loss is the leading cause of reduced independence. We asked ourselves: In an age of unprecedented technological advancement, how can we leverage AI to dramatically improve the lives of those with visual impairments?
What it does
Le-Kellen is an AI-powered application that transforms visual information into clear, spoken guidance for visually impaired individuals. Using a smartphone's camera and advanced AI models, including Pixtral, Le-Kellen provides real-time assistance for navigating environments, identifying objects, and performing daily tasks. The app uses spoken language for user interaction, allowing users to verbally describe their needs or ask questions about their surroundings.
How we built it
Surroundings Analysis Mechanism: We built a sophisticated system to capture and understand the user's surroundings. This mechanism continuously feeds Pixtral, our core AI model, with real-time visual information about the environment.
Speech-to-Text Conversion: We implemented a local speech-to-text conversion system. This allows users to interact with Le-Vision using natural spoken language, which is then converted to text on the device itself, ensuring privacy and reducing latency.
Multimodal Input Integration: We created a system that combines the converted text from user speech with real-time visual information. This multimodal approach provides Pixtral with a comprehensive understanding of both the user's intentions and their physical environment.
Dynamic Guidance Generation: Using the integrated multimodal input, Pixtral generates contextual guidance. This guidance is constantly updated based on changes in the environment and the user's progress towards their goal. Adaptive Instruction System: We developed a mechanism that converts Pixtral's output into clear, step-by-step audio instructions. These instructions are dynamically adjusted based on the user's actions and environmental changes, ensuring continuous, relevant support. User-Friendly Interface: We designed an intuitive application that serves as the interface between the user and our system. This app manages the input/output processes and presents the audio guidance to the user.
Challenges we ran into
- Achieving real-time processing while maintaining accuracy was a significant technical hurdle.
- Contextual Understanding and not forgetting: Teaching Pixtral to understand and prioritize relevant information in complex, dynamic environments proved to be a significant challenge. Each setting presents unique obstacles and important elements that the AI needs to identify and interpret correctly.
- Balancing the power needs of continuous video processing without overloading the context length.
Accomplishments that we're proud of
We're particularly proud of Le-Vision's ability to navigate through this venue and successfully locate critical points of interest. Our system has demonstrated proficiency in finding:
- Fire exits, enhancing safety for visually impaired users
- Water sources, ensuring users can stay hydrated
- Coffee stations, helping users locate refreshments
- Restrooms, aiding in locating essential facilities
These accomplishments showcase Le-Vision's potential to significantly improve the independence and quality of life for visually impaired individuals in various environments.
What we learned
Our journey with Le-Vision has been an incredible learning experience:
- We gained deep insights into the power of Pixtral and its capabilities in interpreting complex visual environments.
- We developed a profound understanding of the challenges involved in navigating spaces as a visually impaired person.
- We learned the intricacies of creating AI systems that can provide real-time, context-aware assistance.
- We discovered the importance of user-centric design in creating solutions that truly enhance independence for visually impaired individuals.
What's next for Le-Vision
Looking ahead, we have exciting plans for Le-Vision's future:
- App Optimization: We aim to make our app more sleek and optimized for better performance and user experience.
- On-Device Processing: With the rapid advancement of AI models, we expect that in about a year, we'll be able to run Le-Vision fully locally on smartphone devices. This will allow for:
- Enhanced privacy, as all processing will occur on the user's device
- Reduced latency, leading to even more real-time assistance
- Offline functionality, enabling use in areas with poor network connectivity
- Expanded Feature Set: We plan to continually add new features and capabilities based on user feedback and technological advancements.
- Accessibility Improvements: We'll work on making the app even more accessible, potentially integrating with other assistive technologies.
By focusing on these areas, we hope to make Le-Vision an indispensable tool for visually impaired individuals, significantly enhancing their independence and quality of life.
Built With
- mistral
- pixtral
- python
- react
- typescript
- whisper
Log in or sign up for Devpost to join the conversation.