bird

BIRD-SQL Dataset Transformation

Link to BIRD-SQL Landing Page

Set Up

To recreate the ic_bird.json and ic_bird_dbs.sql files, do the following:

Download the dev dataset from the BIRD-SQL landing page, or from here.
Run the below script with the local path to the BIRD-SQL dataset as an argument

python transform.py <Path to BIRD dataset>

You can also run this script on the

Dataset Description

The script performs very simple adjustments to the BIRD-SQL dataset to create an InterCode compatible version.

To create the ic_bird.json task instances, the transform.py script iterates through the dev.json file and performs the following steps:

Changes the names of the following fields:
- question → query
- db_id → db
Retrieves each task instance's corresponding solution from the dev_gold.sql file and saves it to the gold key.

To create the ic_bird_dbs.sql database, the transform.py script performs the following steps:

Create a mapping of each database to its tables based on .csv files within the dev_databases folder.
Iterates through the mapping to create a single ic_bird_dbs.sql file that is the procedure for creating all databases + tables in a MySQL compatible format.

The ic_bird.json and ic_bird_dbs.sql currently reflect the task instances and tables for the debit_card_specializing, superhero, and toxicology databases. Upon resolving minor SQLlite to MySQL conversion issues, we plan on migrating more of the BIRD-SQL dataset to be InterCode compatible.

Note that the transformation procedure can also be applied to the train dataset (linked here) with a manual adjustment of modifying any dev references in the transform.py script to train.

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
ic_bird.json		ic_bird.json
ic_bird_dbs.sql		ic_bird_dbs.sql
transform.py		transform.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

BIRD-SQL Dataset Transformation

Set Up

Dataset Description

FilesExpand file tree

bird

Directory actions

More options

Directory actions

More options

Latest commit

History

bird

Folders and files

parent directory

README.md

BIRD-SQL Dataset Transformation

Set Up

Dataset Description