sql_env / DATA_LICENSE
hjerpe's picture
Upload folder using huggingface_hub
a001a97 verified
Data License Notice
Data in data/ is adapted from the Spider dataset (Yu et al., 2018),
distributed under CC BY-SA 4.0.
We retrieved question/SQL pairs from the xlangai/spider HuggingFace mirror
and SQLite databases from the taoyds/spider GitHub mirror, then curated a
10-database subset, derived gold answers by executing the gold SQL, and
generated SFT trajectories from those artifacts.
Derived data in data/ is shared under CC BY-SA 4.0.
Software code is licensed separately under MIT (see LICENSE).
References:
- Spider dataset: https://yale-lily.github.io/spider
- Yu et al. (2018). Spider: A Large-Scale Human-Labeled Dataset for Complex
and Cross-Domain Semantic Parsing and Text-to-SQL Task. EMNLP.
- xlangai/spider on HuggingFace: https://huggingface.co/datasets/xlangai/spider
- taoyds/spider on GitHub: https://github.com/taoyds/spider