Skip to content

rituraj981/ETL_Anomaly_Detection-using-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python ETL & Anomaly Detection

Overview

This project consolidates orders, payments, and refunds data, performs ETL cleaning, computes daily metrics, and detects anomalies in e-commerce transactions.

Requirements

  • Python 3.9+
  • pandas
  • openpyxl (for reading Excel files)
  • argparse (built-in)

Install dependencies:

pip install pandas openpyxl

Folder Structure

ETL_Anomaly_Detection/
├── dataset/
│   ├── orders.xlsx
│   ├── payments.xlsx
│   └── refunds.xlsx
├── outputs/           # Auto-generated
├── etl_anomaly.py     # Main script
├── demo_outputs.py    # get demo outputs
└── README.md

How to Run

Basic run command:

python etl_anomaly.py --orders dataset/orders.xlsx --payments dataset/payments.xlsx --refunds dataset/refunds.xlsx

Optional arguments:

  • --outdir : specify output folder (default: outputs)
  • --date-from YYYY-MM-DD : filter orders from this date
  • --date-to YYYY-MM-DD : filter orders until this date
  • --offhours "HH:MM-HH:MM" : flag orders in off-hours
  • --weekend : flag orders on weekend

Example with all options:

python etl_anomaly.py --orders dataset/orders.xlsx --payments dataset/payments.xlsx --refunds dataset/refunds.xlsx --outdir outputs --date-from 2025-06-15 --date-to 2025-07-15 --offhours "21:00-09:00" --weekend

or

suggested commands:

----------------------------
Basic Run (default output folder):

python etl_anomaly.py --orders dataset/orders.xlsx --payments dataset/payments.xlsx --refunds dataset/refunds.xlsx
----------------------------
Specify Output Folder:

  python etl_anomaly.py --orders dataset/orders.xlsx --payments dataset/payments.xlsx --refunds dataset/refunds.xlsx --outdir outputs_custom
----------------------------
Filter by Date Range:

  python etl_anomaly.py --orders dataset/orders.xlsx --payments dataset/payments.xlsx --refunds dataset/refunds.xlsx --date-from 2025-06-15 --date-to 2025-07-15
----------------------------
Enable Off-Hours Detection:

  python etl_anomaly.py --orders dataset/orders.xlsx --payments dataset/payments.xlsx --refunds dataset/refunds.xlsx --offhours "21:00-09:00"
----------------------------
Enable Weekend Detection:

  python etl_anomaly.py --orders dataset/orders.xlsx --payments dataset/payments.xlsx --refunds dataset/refunds.xlsx --weekend
----------------------------
Full Command (All Options Combined):

  python etl_anomaly.py --orders dataset/orders.xlsx --payments dataset/payments.xlsx --refunds dataset/refunds.xlsx --outdir outputs --date-from 2025-06-15 --date-to 2025-07-15 --offhours "21:00-09:00" --weekend
----------------------------

## Outputs
After running, `outputs/` will contain:
- `summary.csv` : daily metrics and anomaly counts
- `anomalies.csv` : detailed anomalies per order
- `report.json` : structured JSON report with detailed anomalies and metadata


## Demo outputs
Basic run command:
```bash
python demo_outputs.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages