See with me

Inspiration

Digital forensics investigations often hinge on identifying and tracking key technological devices—laptops, flash drives, crypto wallets, modems, and more. These objects may appear only briefly, be partially obscured, or go unnoticed during chaotic site visits. Traditional methods of documenting evidence often rely on manual photography, memory, or reviewing hours of footage after the fact. We wanted to change that.

We were inspired by the need to bring real-time intelligence and semantic understanding to the digital forensic process—right from the investigator’s point of view. With emerging smart eyewear like Meta Ray-Bans and the growing capability of on-device AI, we saw an opportunity to build a system that sees what matters, keeps what matters, and lets investigators focus on the investigation—not on the camera.

What It Does

See With Me is a digital forensics tool that uses smart glasses, computer vision, and AI to identify, track, and catalog objects of interest in real-time or from previously recorded video.

Here’s how it works:

Live footage is captured through Meta Ray-Bans and streamed via Instagram Live into our processing pipeline.
A fine-tuned YOLOv11n model detects specific high-value digital objects such as laptops, external drives, modems, phones, and more.
Using a custom algorithm, we monitor the bounding box movements of detected objects. When significant movement is detected, the relevant frames are saved.
Each keyframe is passed through an image analysis LLM to generate detailed, context-aware captions and red-flag alerts.
Custom video uploads are processed in the same way—breaking the video into frames, analyzing each frame through the object detection pipeline, and generating leads from each significant keyframe.
A searchable interface allows investigators to filter and find frames based on object detection results.
The Super Search function lets investigators query across all keyframes, identifying patterns such as objects appearing together in specific contexts.

How We Built It

The system is designed to handle both real-time and pre-recorded footage, integrating multiple technologies to achieve accurate detection and analysis:

Meta Ray-Bans stream live footage to Instagram Live, feeding directly into our backend.
A custom YOLOv11n model has been fine-tuned with over 1,700 annotated images of cybercrime-related objects to ensure high accuracy in detection.
A delta-based bounding box algorithm filters out redundant or irrelevant frames, ensuring that only key changes in object position are retained.
The LLM-based image analysis provides forensic insights, generating captions that highlight potential red flags or unusual object groupings.
Custom video upload functionality allows users to analyze archived footage, breaking the video into frames that go through the same object detection and semantic analysis pipeline.
The backend is powered by FastAPI, with a React frontend that enables easy searching and exploration of the analyzed frames.

What Makes It Special

What sets See With Me apart is the combination of real-time object detection, intelligent frame retention, and semantic image analysis. It’s a system that doesn’t just detect objects—it understands what those objects are and their potential relevance in a forensic investigation.

Key features that make our solution unique:

Real-time and custom video analysis: Whether you’re on the scene with smart glasses or reviewing old footage, our system ensures that significant frames are captured and analyzed.
Advanced object detection: The fine-tuned YOLOv11n model specifically targets cyber-relevant objects, making it far more focused than generic detectors.
Intelligent frame filtering: The delta-based algorithm intelligently selects frames where significant object movement occurs, reducing irrelevant data and saving time.
LLM-generated insights: Each keyframe receives detailed captions and potential forensic leads, making it easier for investigators to spot red flags.
Powerful search and analysis tools: The built-in search and Super Search functions allow investigators to find and cross-reference objects, making large-scale investigations more manageable.

Challenges We Ran Into

No project comes without its hurdles. For See With Me, we faced a few key challenges:

Object detection accuracy: Fine-tuning the YOLOv11n model to detect niche, cybercrime-related objects, such as specific hard drives and USB devices, required a highly customized dataset of over 1,700 annotated images.
Frame filtering efficiency: Ensuring that only the most relevant frames were kept involved balancing the sensitivity of our delta algorithm—too sensitive, and we'd keep too many frames; not sensitive enough, and we'd miss critical moments.
Semantic image analysis: Crafting effective LLM prompts that generated useful forensic insights (e.g., red flags, suspicious object interactions) was an iterative process, with many tweaks to ensure the output was truly useful.
Real-time performance: Optimizing the pipeline for low latency to ensure seamless real-time object detection while still processing large video files effectively was one of our major technical challenges.

Accomplishments We're Proud Of

Despite the challenges, we’ve created something truly functional and impactful:

Real-time object detection that works seamlessly through Meta Ray-Bans.
Custom video upload support that processes user-provided footage with the same object detection and semantic analysis pipeline.
Delta-based frame filtering that intelligently reduces noise and ensures that only relevant, action-oriented frames are retained.
LLM-based captioning that generates detailed and meaningful forensic leads for each identified object.
Searchable database of identified objects across keyframes, with a powerful Super Search tool to cross-reference multiple objects and scenarios.

What We Learned

Throughout the development process, we learned a great deal:

Object detection can always be improved: While the YOLOv11n model worked well, fine-tuning for niche objects required us to build a specialized dataset and continually evaluate detection performance.
Real-time processing requires balance: Achieving real-time performance without compromising the accuracy of object detection or image analysis was a delicate balance.
Semantic AI needs careful prompting: Generating useful, actionable captions from images required careful tuning of the image analysis prompts to extract forensic insights effectively.
User needs shape the design: The most important feature was the ability to search and filter keyframes based on real-world investigative needs, which required us to focus on intuitive interfaces and fast processing.

What’s Next

We’re just getting started:

Field testing with cybersecurity professionals and law enforcement to refine the tool based on real-world feedback.
Video source expansion to include other footage types, such as drone cameras and mobile recordings.
Smarter multi-frame stitching to analyze patterns and relationships between objects over time.
Mobile app for on-the-go review of analyzed footage, keyframes, and forensic insights.
Integrating case management tools to allow for better tagging, notes, and report generation.

The future of See With Me lies in enhancing its capabilities to bridge the gap between real-time and retrospective forensic analysis, empowering investigators with a smart, efficient way to track digital evidence.