AI Agent Evaluations
Performance results of AI coding agents on Next.js code generation and migration tasks, measuring success rate and execution time.
View on GitHub
Last run date: February 10, 2026
Agent Performance Results
Model | Agent | Total Evals | Success Rate |
|---|---|---|---|
GPT 5.3 Codex (xhigh) | Codex | 20 | 90% |
Claude Opus 4.6 | Claude Code | 20 | 80% |
Gemini 3.0 Pro Preview | OpenCode | 20 | 75% |
Claude Sonnet 4.5 | Claude Code | 20 | 55% |
Kat Coder Pro V1 | OpenCode | 20 | 45% |
Devstral 2 | OpenCode | 20 | 40% |
GPT 5.2 Codex | Unknown | 20 | 40% |
Minimax M2.1 | OpenCode | 20 | 40% |
GPT 5.2 Codex (xhigh) | Codex | 20 | 35% |