Byeongjun Moon, 2024, bj.moon@usc.edu
A machine learning-powered tool that analyzes GitHub repository documentation quality and suggests improvements using state-of-the-art language models.
- Analyzes README quality using DistilBERT for semantic understanding
- Evaluates code-documentation alignment using CodeBERT
- Generates enhanced documentation using OPT-125m (optimized for Apple Silicon)
- Provides detailed section-by-section analysis with quality scores
- Suggests actionable improvements based on best practices
- Web interface for easy interaction
src/
├── models/ # Core ML models and analyzers
│ ├── code_documentation_analyzer.py # CodeBERT-based code analysis
│ ├── quality_enhancer.py # OPT-125m-based enhancement
│ ├── unified_scorer.py # Combined scoring system
├── trainers/ # Model training scripts
└── main.py # Application entry point
- Clone the repository:
git clone https://github.com/yourusername/docugen.git
cd docugen- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Set up environment variables:
cp .env.example .env
# Edit .env with your GitHub token- Start the web interface:
python src/main.py- Open your browser and navigate to:
http://127.0.0.1:7862
- Enter a GitHub repository URL to analyze
- Used for semantic analysis of documentation content
- Evaluates code-documentation alignment
- Measures documentation completeness and quality
- Analyzes docstrings and code comments
- Lightweight model optimized for Apple Silicon
- Generates contextual documentation improvements
- Memory-efficient operation
- Enhanced performance on M1/M2 chips
- Custom model for quality scoring
- Fine-tuned on high-quality documentation examples
- Evaluates clarity, completeness, and structure
- Provides section-specific quality metrics
-
Add your GitHub token to .env
-
Run the data collection script:
python src/models/getting_data.py- Train the README quality model:
python src/trainers/train_readme_model.py- Run the main script to start the web interface:
python src/main.py