Skip to content

AI Agent Evaluations

Performance results of AI coding agents on Next.js code generation and migration tasks, measuring success rate and execution time.

View on GitHub
Last run date: February 10, 2026

Agent Performance Results

Model
Agent
Total Evals
Success Rate
GPT 5.3 Codex (xhigh)
Codex
20
90%
Claude Opus 4.6
Claude Code
20
80%
Gemini 3.0 Pro Preview
OpenCode
20
75%
Claude Sonnet 4.5
Claude Code
20
55%
Kat Coder Pro V1
OpenCode
20
45%
Devstral 2
OpenCode
20
40%
GPT 5.2 Codex
Unknown
20
40%
Minimax M2.1
OpenCode
20
40%
GPT 5.2 Codex (xhigh)
Codex
20
35%