HF Hub Community Tool Challenge Pack (gpt-oss)
Assume commands are run from the repo root.
Run command (single prompt)
fast-agent go \
--no-env \
--model gpt-oss \
--agent-cards .fast-agent/tool-cards \
--agent hf_hub_community \
--results /tmp/hf_hub_one.json \
-m "<PROMPT>"
Batch loop (manual)
while IFS= read -r p; do
[ -z "$p" ] && continue
echo "\n=== PROMPT ===\n$p\n"
fast-agent go \
--no-env \
--model gpt-oss \
--agent-cards .fast-agent/tool-cards \
--agent hf_hub_community \
--results /tmp/hf_hub_loop.json \
-m "$p"
done < scripts/hf_hub_community_challenges.txt
Scoring (0-2 each)
- Endpoint/tool correctness
- Efficiency (pagination/filtering/projection)
- Multi-step reasoning
- Safety compliance
- Output clarity
Total per challenge: /10
Automated scorer
Run all challenges:
python scripts/score_hf_hub_community_challenges.py
Run a subset (e.g., 1-3):
python scripts/score_hf_hub_community_challenges.py --start 1 --end 3
Outputs:
- JSON:
docs/hf_hub_community_challenge_report.json - Markdown:
docs/hf_hub_community_challenge_report.md