--- license: mit tags: - tool-use - evaluation - play2prompt - stabletoolbench --- # Play2Prompt (P2P) StableToolBench Evaluation Pipeline Replicates the [Play2Prompt](https://aclanthology.org/2025.findings-acl.1347/) paper conditions on [StableToolBench](https://arxiv.org/abs/2403.07714) using Llama-3.1-8B-Instruct. **Designed for extensibility**: The 4 conditions are controlled by two pluggable components — tool descriptions and in-context examples. To test your own description types, just drop replacement files into `p2p_data/descriptions/` and `p2p_data/examples/`. See `pipeline/` directory for all source code.