Inspiration
Duplicate records are a persistent problem in enterprise systems such as CRM, KYC, customer onboarding, and support platforms. Traditional deduplication systems often behave like black boxes—flagging or merging records without explaining why a decision was made. This lack of transparency leads to mistrust, manual rework, audit challenges, and the risk of incorrect merges. We were inspired to build an explainable, agent‑driven approach that not only detects duplicates but also clearly explains the reasoning behind each decision, helping data quality teams make faster and more confident decisions.
What it does
The Explainable Duplicate Detection Agent is a multi‑step AI agent built using Elastic Agent Builder that automates duplicate record analysis and decision support. The agent:
Accepts a natural‑language request such as “Check duplicates for Sai Praneeth” Searches Elasticsearch for potential duplicate records Retrieves full document details for comparison Analyzes similarity across multiple fields (name, phone, email, address) Calculates a confidence score for each duplicate match Explains why records are considered duplicates Recommends an action: MERGE, REVIEW, or IGNORE Can simulate a merge and generate a master (golden) record with full audit details
This significantly reduces manual effort while improving trust and explainability in deduplication workflows.
How we built it
We built the project entirely on the Elastic Stack using the following components:
Elasticsearch to store customer records and perform similarity searches Elastic Agent Builder to create a custom multi‑step AI agent Custom Agent Builder Tool (find_duplicates) to query Elasticsearch using fuzzy search and relevance scoring Built‑in Agent Builder tools such as platform.core.get_document_by_id to retrieve full documents Multi‑step reasoning logic inside the agent to: ** Identify relevant indices Execute the duplicate search tool Fetch full records Compare attributes Generate explanations and recommendations **
The agent maintains conversational context, allowing follow‑up questions such as “Why was this record marked as high confidence?” or “Show the merged master record.”
Challenges we ran into
Balancing confidence scoring: Determining how much weight to assign to different attributes (e.g., phone vs name vs address) required iteration and tuning. Ensuring explainability: Making the agent’s reasoning clear and defensible was more important than just returning a similarity score. Tool orchestration: Designing the agent to reliably call the right tools in the correct order (search → fetch → analyze) required careful instruction design. Avoiding false positives: We had to ensure that minor variations (such as address granularity or name initials) did not incorrectly reduce confidence when stronger signals existed.
Accomplishments that we're proud of
Built a fully functional multi‑step agent using Elastic Agent Builder Successfully integrated custom tools + Elasticsearch data Delivered clear, human‑readable explanations for duplicate decisions Implemented merge simulation with master record generation Created a solution that is auditable, deterministic, and enterprise‑ready Demonstrated a real productivity improvement for data quality teams
What we learned
Explainability is just as important as accuracy in real‑world AI systems Agent‑based workflows are a natural fit for data quality and operational tasks Elasticsearch is not just a search engine—it is a powerful reasoning and retrieval platform for agentic applications Clear agent instructions dramatically improve tool usage reliability Multi‑step reasoning builds more trust than single‑prompt AI answers
What's next for Explainable Duplicate Detection Agent for Data Quality
Add Elastic Workflows to automate merges after approval Introduce region‑ and domain‑specific dedupe rules Store audit logs and decision history in Elasticsearch Add time‑series monitoring for duplicate trends
Built With
- elastic-agent-builder
- elastic-cloud
- elasticsearch
- elasticsearch-search-api
- es|ql
- json
- kibana
- llm?based
- rest-apis
Log in or sign up for Devpost to join the conversation.