Inspiration
What it does## Inspiration
We were inspired by the 3Blue1Brown YouTube channel. They have extremely high quality content supplemented by highly captivating and comprehensive animations. Unfortunately, not many educators have access to high quality animations. Common bottlenecks include lack of programming experience, lack of video editing abilities, cost of licensed software tools, and a substantial time commitment to develop animations.
What it does
A user can input a text block into the program. This can be a textbook chapter, a lesson plan, a text script for a video, or a text document. MathMatic uses GPT in the backend to identify key features in the text that can be animated. For each feature, it determines visual elements, transitions, motion patterns, and timing information for rendering that animation. Then, it uses the Manim framework to generate a script for each animation.
How we built it
We implemented a tree-based data structure to keep track of the prompts and responses. This is because the text, at times, can be broken up into smaller components, each with a specific focus. These can branch out into more queries–as children of the root node. Each node in the tree makes a query to GPT and infers output–whether a text-based description to increase the clarity for deeper nodes or if it is an actual Manim script that needs to be debugged and compiled. From here, we compile all the code to output MP4 files of all the animations described by the text.
Challenges we ran into
Our main challenges involved prompt engineering. Prompting LLMs is challenging because the quality and consistency of the output are indeterminable; the LLM is a black box function. We had to carefully engineer the workflow such that any given query to GPT was specific enough to not let it divert into tangents and extrapolate any potential unknowns. We used words with loaded diction. Through trial and error, we discovered patterns in how GPT responded to certain words. Thus, we ended up implementing highly fine-tuned and well drafted prompts to lead our program to the eventual result.
Accomplishments that we're proud of
We are proud of the tree-based architecture. Through our research, we discovered that if we throw a large problem at GPT, it will struggle to implement it. However, it can use its insights to break a large problem into subproblems which it solves to a much higher degree of accuracy and reliability. Our tree-based structure encapsulated this behavior very effectively–not only from a logical stand-point, but also given the crucial performance constraints we were restricted to.
What we learned
We learned the challenges of deploying LLMs into production ready, real world solutions. As individual users of ChatGPT, we never had delved into the intricacies of integrating LLMs with real-world models. The prompt and questions asked to LLMs must be drafted in a highly meticulous manner for consistent results.
What's next for MathMatic
We are planning on enabling the user to give feedback and improvement suggestions for the specific animations the program returns. Our goal is to build a highly sophisticated model which can reliably render results to educators around the world. We seek to explore potential users of our technology and help professors at Georgia Tech have an easier time supplementing lectures with animations.
Log in or sign up for Devpost to join the conversation.