video: video.intoai.de

Inspiration

Leading AI companies work on foundation models and bring AI research to the next level. In computer vision research, new approaches came out recently, like Segment Anything and DensePose from Meta or YOLOv7.

These models cannot be used seamlessly by researchers, but require writing custom code. This is because rather than solving an industry problem directly, they only yield segmentation masks or bounding boxes.

In other disciplines, code interpreters, like the one integrated in chatGPT could help.

Unfortunately, these models are not able to operate with complex computer vision models.

We are on the mission to change this!

Check out our live demo at intoai.de.

What it does

With VisionIntelligence, you can use the latest approaches of computer vision research with a chat interface. This allows you to quickly get work done easier than ever before

How we built it

We deployed a variety of foundation models in the cloud, for example models for classification, detection, segmentation, pose estimation, and flow. Then, we developed a dense library that acted as the translator to the LLM. Then, we injected knowledge of this library into GPT-4 that could use these endpoints and a python sandbox environment to do all the things you just saw. Lastly, we adapted a frontend and added more possibilities for visualization

Challenges we ran into

Complexity of the idea. It took long until we were able to run complex functions on our infrastructure to, e.g., segment videos by just using one single function.

Language models like GPT3.5 and GPT4 are not perfect at all! We had to employ quite a bit of engineering to make sure our solution works seamlessly even when GPT4 e.g. hallucinates functions or inputs!

Accomplishments that we're proud of

We are proud to make computer vision research more accessible and bring the latest research from other disciplines into this field. Foundation models and LLMs that can call functions and interpret code are quite new and experimental

What we learned

Everything is possible with a great team

What's next for VisionIntelligence

Make it accessible to others and add more models and functions!

Built With

Share this project:

Updates