File size: 629 Bytes
00c71c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
---
license: mit
tags:
- tool-use
- evaluation
- play2prompt
- stabletoolbench
---

# Play2Prompt (P2P) StableToolBench Evaluation Pipeline

Replicates the [Play2Prompt](https://aclanthology.org/2025.findings-acl.1347/) paper conditions on [StableToolBench](https://arxiv.org/abs/2403.07714) using Llama-3.1-8B-Instruct.

**Designed for extensibility**: The 4 conditions are controlled by two pluggable components — tool descriptions and in-context examples. To test your own description types, just drop replacement files into `p2p_data/descriptions/` and `p2p_data/examples/`.

See `pipeline/` directory for all source code.