Inspiration

Designing good exams is surprisingly hard. While question generation is often automated, assessment design—balancing difficulty, syllabus coverage, clarity, Bloom's Taxomony and grading fairness—is still largely manual and time-consuming. We were inspired by the idea that modern reasoning models shouldn’t just generate text, but actively reason about assessment structure, the way an educator would. This led us to explore whether Gemini 3 could be used not as a prompt-completion tool, but as the core logic engine for exam design.

What it does

AI Exam Engine is an intelligent exam-design platform that converts high-level exam configurations—such as syllabus scope, difficulty distribution, and question types—into fully structured exams. It generates questions, answers, step-by-step solutions, and marking rubrics, with strong support for STEM subjects using LaTeX-formatted mathematics (e.g., $E=mc^2$, $\int_0^1 x^2, dx$). The system emphasizes reasoning-first generation rather than static prompt outputs.

How we built it

The platform is built around Gemini 3, integrated through structured APIs. Exam generation is broken into modular stages: question generation, answer synthesis, rubric creation, and validation. Each stage includes automated quality checks for syllabus alignment, logical consistency, and LaTeX correctness. When outputs fail validation, targeted retry calls are made to Gemini 3, enabling self-correction rather than full regeneration. Gemini 3’s low latency makes this multi-pass pipeline viable in near real time.

Challenges we ran into

The biggest challenge was ensuring reliability. LLM-generated exams must be precise—small logical or formatting errors can invalidate questions. Handling LaTeX edge cases, enforcing consistent difficulty, and preventing concept repetition required careful validation logic and retry strategies. Designing feedback prompts that helped Gemini 3 correct mistakes rather than introduce new ones was also non-trivial.

Accomplishments that we're proud of

  • Built a feedback-driven exam generation pipeline instead of a single-shot LLM flow
  • Achieved reliable LaTeX-heavy STEM question generation
  • Designed explainable outputs with clear grading rubrics ## What we learned We learned that reasoning models like Gemini 3 are most powerful when treated as collaborators, not generators. Structured inputs, explicit constraints, and targeted retries dramatically improve output quality. We also learned that low latency is essential when building iterative, quality-controlled AI systems. ## What's next for AI Exam Engine Next, we plan to add adaptive exams, multimodal assessment and validation, deeper rubric analytics, and educator-in-the-loop controls. We also want to expand multimodal inputs further and explore personalized assessments driven by learner performance—continuing to push Gemini 3 beyond generation into true educational reasoning.
Share this project:

Updates