- put data in '02/src' as 'Spotify_Dataset_V3.csv'
- setup a python3 virtual environment and install the following packages: pandas, numpy, matplotlib, scikit learn, seaborn, imblearn, pickle, xgboost, spotipy
- run the individual '.ipynb' type notebooks in '02/src' with 'Data Processing.ipynb' for data processing, 'EDA.ipynb' for exploratory data analysis, 'modelCreation.ipynb' for model setup and evaluting the performance, 'futurePredictions.ipynb' for obtaining model's prediction on unseen examples.
python 3.11, pandas 2.0.3, numpy 1.23.5, matplotlib 3.7.2, scikit-learn=1.2.2, seaborn 0.12.2, imbalanced-learn 0.10.1, pickle 0.7.5, xgboost=1.7.3, spotipy=2.23.0
spotify's dataset exploration, finding trends for songs popularity, building a model to predict songs popularity based off of its characteristics.
You can find the dataset under this link