DocOps Agent

DocOps Agent - AI Document Conflict Detection | Elasticsearch Agent Builder Hackathon

Inspiration

Every enterprise has a hidden compliance problem: policy documents that contradict each other.

Your Security Policy says passwords need 16 characters. Your Employee Handbook says 12. Which one is right? Which one are employees actually following?

Finding these contradictions traditionally takes 50+ hours of manual review. We built DocOps Agent to do it in under 2 minutes.

What it does

DocOps Agent is an intelligent document analysis platform that:

Detects conflicts between policy documents using semantic analysis
Shows side-by-side comparisons with visual diff highlighting
Suggests AI-powered fixes with confidence scores (e.g., 85% confidence, 5 min to resolve)
Tracks resolution workflow from detection to verification
Generates compliance reports exportable as Markdown, Excel, or PDF

In our demo: 25 documents analyzed → 66 conflicts detected → 99.9% time saved.

How we built it

Multi-step Agent Reasoning: Unlike simple chatbots, DocOps Agent uses a 6-tool architecture that reasons through document analysis:

search_documents - Hybrid search combining BM25 + dense vectors
analyze_conflicts - Semantic comparison across document sections
get_suggestions - AI-generated remediation recommendations
track_resolution - Alert lifecycle management
generate_report - Automated compliance reporting
check_staleness - Document freshness analysis

Elasticsearch Powers Everything:

Hybrid search for intelligent document retrieval
Aggregations for real-time analytics dashboards
Vector embeddings for semantic similarity detection

Tech Stack:

Streamlit for the interactive UI
Elasticsearch for search, indexing, and aggregations
Python for agent orchestration
LLM integration for reasoning and suggestions

Challenges we faced

Semantic vs. Literal Conflicts: "minimum 16 characters" and "at least 12 characters" are semantically related but lexically different. We solved this with hybrid search combining keyword matching and vector similarity.
Confidence Calibration: How confident should the AI be when suggesting fixes? We implemented authority-based scoring that considers document recency and policy hierarchy.
Alert Fatigue: Initial versions flagged too many non-issues. We added severity classification and topic clustering to surface only actionable conflicts.

What we learned

Elasticsearch's aggregation framework is incredibly powerful for real-time document analytics
Multi-step agent reasoning dramatically outperforms single-prompt approaches for complex tasks
The gap between "finding problems" and "suggesting solutions" is where real value lives