- Link to BIRD-SQL Landing Page
To recreate the ic_bird.json and ic_bird_dbs.sql files, do the following:
- Download the
devdataset from the BIRD-SQL landing page, or from here. - Run the below script with the local path to the BIRD-SQL dataset as an argument
python transform.py <Path to BIRD dataset>- You can also run this script on the
The script performs very simple adjustments to the BIRD-SQL dataset to create an InterCode compatible version.
To create the ic_bird.json task instances, the transform.py script iterates through the dev.json file and performs the following steps:
- Changes the names of the following fields:
question→querydb_id→db
- Retrieves each task instance's corresponding solution from the
dev_gold.sqlfile and saves it to thegoldkey.
To create the ic_bird_dbs.sql database, the transform.py script performs the following steps:
- Create a mapping of each database to its tables based on
.csvfiles within thedev_databasesfolder. - Iterates through the mapping to create a single
ic_bird_dbs.sqlfile that is the procedure for creating all databases + tables in a MySQL compatible format.
The ic_bird.json and ic_bird_dbs.sql currently reflect the task instances and tables for the debit_card_specializing, superhero, and toxicology databases. Upon resolving minor SQLlite to MySQL conversion issues, we plan on migrating more of the BIRD-SQL dataset to be InterCode compatible.
Note that the transformation procedure can also be applied to the train dataset (linked here) with a manual adjustment of modifying any dev references in the transform.py script to train.