Dwootton
/

p2p-stabletoolbench

stabletoolbench

Model card Files Files and versions

p2p-stabletoolbench / README.md

Dwootton's picture

Add comprehensive README

00c71c4 verified 7 days ago

|

history blame contribute delete

629 Bytes

	---
	license: mit
	tags:
	- tool-use
	- evaluation
	- play2prompt
	- stabletoolbench
	---

	# Play2Prompt (P2P) StableToolBench Evaluation Pipeline

	Replicates the [Play2Prompt](https://aclanthology.org/2025.findings-acl.1347/) paper conditions on [StableToolBench](https://arxiv.org/abs/2403.07714) using Llama-3.1-8B-Instruct.

	Designed for extensibility: The 4 conditions are controlled by two pluggable components — tool descriptions and in-context examples. To test your own description types, just drop replacement files into `p2p_data/descriptions/` and `p2p_data/examples/`.

	See `pipeline/` directory for all source code.