About Surgeon

What inspired us

CI failures eat hours. Logs, stack traces, and past incidents are scattered across tools. We've all spent 45+ minutes grepping logs, rerunning tests, and guessing at fixes. We wanted an agent that could do the detective work—find similar failures, spot patterns, and ship a fix—so engineers spend time building, not debugging pipelines.

What we learned

We learned how Elastic Agent Builder turns context into action: hybrid search for semantic + keyword retrieval over logs, ES|QL for piped analytics (failure spikes, runner correlation, correlated changes), and MCP tools for external actions (GitHub, Slack). The multi-step flow—retrieve → analyze → plan → act → verify—maps cleanly onto how humans actually debug CI, but faster.

How we built it

  1. Indexes — We defined ci_runs, ci_logs, and repo_knowledge in Elasticsearch to store run metadata, log chunks with error signatures, and runbooks/postmortems.

  2. Tools — We created Index Search tools for hybrid retrieval over logs and knowledge, and ES|QL tools for failure-spike analysis, runner correlation, and top-correlated file changes.

  3. Agent — We configured the Surgeon agent with a system prompt that orchestrates retrieval, ES|QL analytics, and action planning.

  4. MCP server — We built an MCP server exposing GitHub (create branch, open PR) and Slack (post evidence report) tools so Surgeon can ship fixes and notify teams.

  5. Deploy pipeline — Scripts to create indexes, ingest sample data, and deploy tools + agent to Kibana via the Agent Builder API.

Challenges we faced

  • Agent Builder API quirks — The tools API rejected the name and required fields we assumed from docs; we had to trim payloads to the exact schema (id, type, description, tags, configuration).

  • ES|QL params — Parameterized queries like NOW() - ?days_back day triggered validation errors. We switched to literal 7 day windows and simplified param usage.

  • Tool updates — The PUT endpoint for updating existing tools behaved differently than POST. We handled conflicts by skipping re-deploy of existing tools instead of forcing updates.

Built With

Share this project:

Updates