An AI-powered assistant for classifying multi-modal documents into Public, Confidential, Highly Sensitive, or Unsafe categories.
- Multi-modal preprocessing for text, PDFs, and images with optional OCR.
- Configurable prompt tree library leveraging LangChain/LlamaIndex orchestration patterns.
- Dual-LLM cross verification with safety keyword detection.
- Citation generation and evidence tracking.
- Human-in-the-loop feedback persistence powered by SQLite and SQLAlchemy.
- FastAPI backend with Streamlit dashboard for interactive and batch workflows.
- Automated report generation in JSON and PDF formats.
- Dockerized deployment, comprehensive tests, and formatting/linting via Makefile.
python -m venv .datathon_env
source .datathon_env/bin/activate
pip install -r requirements.txt- Install Ollama on the host where this backend runs and start the daemon with
ollama serve(it listens onhttp://127.0.0.1:11434by default). - Pull the models referenced in
config/config.yaml, for example:ollama pull llama3.1:8b ollama pull llama3.1:13b
- Optional: verify the service is reachable with
curl http://127.0.0.1:11434/api/tags. - Adjust
config/config.yamlif you need a different model name, base URL, or generation options. The default template configures two local models:ollama: base_url: http://127.0.0.1:11434 models: primary: name: risk-primary-8b model: llama3.1:8b secondary: name: risk-secondary-13b model: llama3.1:13b
- Restart
uvicorn(or the Docker container) so the new configuration is loaded. The backend will now call the local Ollama HTTP API for both classifier passes.
- Authenticate the Ollama CLI (one time):
ollama signin
- Register the cloud models you plan to call (examples):
ollama pull gpt-oss:120b-cloud ollama pull deepseek-v3.1:671b-cloud
- Copy your Ollama Cloud API key from https://ollama.com.
- Export it before starting the backend (or place it in your shell profile):
The default
export OLLAMA_API_KEY="sk_live_xxx"
config/config.yamlalready points tohttps://api.ollama.aiand reads the key viaapi_key_env: OLLAMA_API_KEY, so no further edits are required. If you prefer to hardcode the key instead, replace the env entry withapi_key: "sk_live_xxx"(only do this for local testing). - Restart
uvicorn src.main:app --reload. The classifier will now send requests to Ollama Cloud using your key while retaining the same dual-model workflow.
uvicorn src.main:app --reloadstreamlit run ui_dashboard/app.pymake test
make lint
make formatdocker build -t doc-assistant .
docker run -p 8000:8000 doc-assistantSee demo.ipynb for an end-to-end walkthrough of the sample test cases (TC1–TC5).