p2p-stabletoolbench / README.md
Dwootton's picture
Add comprehensive README
00c71c4 verified
---
license: mit
tags:
- tool-use
- evaluation
- play2prompt
- stabletoolbench
---
# Play2Prompt (P2P) StableToolBench Evaluation Pipeline
Replicates the [Play2Prompt](https://aclanthology.org/2025.findings-acl.1347/) paper conditions on [StableToolBench](https://arxiv.org/abs/2403.07714) using Llama-3.1-8B-Instruct.
**Designed for extensibility**: The 4 conditions are controlled by two pluggable components — tool descriptions and in-context examples. To test your own description types, just drop replacement files into `p2p_data/descriptions/` and `p2p_data/examples/`.
See `pipeline/` directory for all source code.