| license: mit | |
| tags: | |
| - tool-use | |
| - evaluation | |
| - play2prompt | |
| - stabletoolbench | |
| # Play2Prompt (P2P) StableToolBench Evaluation Pipeline | |
| Replicates the [Play2Prompt](https://aclanthology.org/2025.findings-acl.1347/) paper conditions on [StableToolBench](https://arxiv.org/abs/2403.07714) using Llama-3.1-8B-Instruct. | |
| **Designed for extensibility**: The 4 conditions are controlled by two pluggable components — tool descriptions and in-context examples. To test your own description types, just drop replacement files into `p2p_data/descriptions/` and `p2p_data/examples/`. | |
| See `pipeline/` directory for all source code. | |