hf-papers / docs /hf_hub_community_challenge_pack.md
evalstate's picture
evalstate HF Staff
sync: promote hf_hub_community prompt v3 + add prompt/coverage harness
bba4fab verified

HF Hub Community Tool Challenge Pack (gpt-oss)

Assume commands are run from the repo root.

Run command (single prompt)

fast-agent go \
  --no-env \
  --model gpt-oss \
  --agent-cards .fast-agent/tool-cards \
  --agent hf_hub_community \
  --results /tmp/hf_hub_one.json \
  -m "<PROMPT>"

Batch loop (manual)

while IFS= read -r p; do
  [ -z "$p" ] && continue
  echo "\n=== PROMPT ===\n$p\n"
  fast-agent go \
    --no-env \
    --model gpt-oss \
    --agent-cards .fast-agent/tool-cards \
    --agent hf_hub_community \
    --results /tmp/hf_hub_loop.json \
    -m "$p"
done < scripts/hf_hub_community_challenges.txt

Scoring (0-2 each)

  • Endpoint/tool correctness
  • Efficiency (pagination/filtering/projection)
  • Multi-step reasoning
  • Safety compliance
  • Output clarity

Total per challenge: /10

Automated scorer

Run all challenges:

python scripts/score_hf_hub_community_challenges.py

Run a subset (e.g., 1-3):

python scripts/score_hf_hub_community_challenges.py --start 1 --end 3

Outputs:

  • JSON: docs/hf_hub_community_challenge_report.json
  • Markdown: docs/hf_hub_community_challenge_report.md