Inspiration
We wanted to explore how far on-device AI can go without relying on the cloud. Most AI assistants depend heavily on internet connectivity, raising concerns about latency, privacy, and reliability. CoreDex was inspired by the idea of building a fully offline, multimodal AI system that can see, understand, and speak, entirely on an iPhone, while also serving as a foundational framework for future AI powered applications.
What it does
CoreDex is an offline multimodal AI assistant for iOS. It uses the live camera to detect real world objects in real time, understands contextual information about them through on device language reasoning, and delivers intelligent explanations via voice interaction.
It combines computer vision, reasoning, and speech into one seamless pipeline, all running locally without any internet dependency. The architecture is modular, allowing custom Core ML models to be integrated for different domains.
How we built it
We built CoreDex using Swift and SwiftUI for a fully native iOS experience.
For vision, we integrated YOLO based models converted to Core ML for real time object detection, accelerated by the Apple Neural Engine.
The detected object labels are passed into an on device language reasoning module to generate contextual explanations.
We used AVFoundation for camera and audio handling, Apple’s Speech framework for speech to text, and AVSpeechSynthesizer for natural text to speech responses.
The entire system is optimized to run fully offline with low latency.
Challenges we ran into
One of the biggest challenges was maintaining real time performance while running detection and reasoning entirely on device. Balancing inference speed, memory usage, and UI responsiveness required careful optimization.
Another challenge was designing a clean multimodal pipeline where vision outputs seamlessly flow into reasoning and then voice generation without noticeable delay.
Accomplishments that we're proud of
We successfully built a fully offline multimodal AI assistant that integrates vision, reasoning, and voice in real time on an iPhone.
We achieved low latency hardware accelerated inference without any cloud dependency.
Most importantly, we designed CoreDex as a reusable foundation, enabling it to be extended into domain specific AI applications in the future.
What we learned
We gained deep insights into optimizing on device AI pipelines and leveraging Apple’s hardware acceleration effectively.
We learned how to design scalable multimodal architectures that remain modular and extensible.
Most importantly, we learned that powerful AI experiences don’t always need the cloud, thoughtful system design can unlock robust offline intelligence.
What's next for CoreDex
Next, we plan to expand CoreDex with support for custom domain specific detection models and deeper contextual reasoning.
We aim to explore on device fine tuning, richer visual context integration, and hybrid offline/online reasoning modes for advanced use cases.
Ultimately, we envision CoreDex becoming a foundational AI framework that powers specialized assistants across education, accessibility, field work, and beyond.
Built With
- avfoundation
- avspeechsynthesizer
- coreml
- foundationmodels
- swift
- swiftui
- yolo
Log in or sign up for Devpost to join the conversation.