metadata
license: mit
tags:
- tool-use
- evaluation
- play2prompt
- stabletoolbench
Play2Prompt (P2P) StableToolBench Evaluation Pipeline
Replicates the Play2Prompt paper conditions on StableToolBench using Llama-3.1-8B-Instruct.
Designed for extensibility: The 4 conditions are controlled by two pluggable components — tool descriptions and in-context examples. To test your own description types, just drop replacement files into p2p_data/descriptions/ and p2p_data/examples/.
See pipeline/ directory for all source code.