| Data License Notice | |
| Data in data/ is adapted from the Spider dataset (Yu et al., 2018), | |
| distributed under CC BY-SA 4.0. | |
| We retrieved question/SQL pairs from the xlangai/spider HuggingFace mirror | |
| and SQLite databases from the taoyds/spider GitHub mirror, then curated a | |
| 10-database subset, derived gold answers by executing the gold SQL, and | |
| generated SFT trajectories from those artifacts. | |
| Derived data in data/ is shared under CC BY-SA 4.0. | |
| Software code is licensed separately under MIT (see LICENSE). | |
| References: | |
| - Spider dataset: https://yale-lily.github.io/spider | |
| - Yu et al. (2018). Spider: A Large-Scale Human-Labeled Dataset for Complex | |
| and Cross-Domain Semantic Parsing and Text-to-SQL Task. EMNLP. | |
| - xlangai/spider on HuggingFace: https://huggingface.co/datasets/xlangai/spider | |
| - taoyds/spider on GitHub: https://github.com/taoyds/spider | |