Spaces:

evalstate
/

hf-papers

Sleeping

App Files Files Community

hf-papers / docs /hf_hub_community_challenge_pack.md

evalstate HF Staff

sync: promote hf_hub_community prompt v3 + add prompt/coverage harness

bba4fab verified 2 months ago

preview code

raw

history blame contribute delete

1.18 kB

HF Hub Community Tool Challenge Pack (gpt-oss)

Assume commands are run from the repo root.

Run command (single prompt)

fast-agent go \
  --no-env \
  --model gpt-oss \
  --agent-cards .fast-agent/tool-cards \
  --agent hf_hub_community \
  --results /tmp/hf_hub_one.json \
  -m "<PROMPT>"

Batch loop (manual)

while IFS= read -r p; do
  [ -z "$p" ] && continue
  echo "\n=== PROMPT ===\n$p\n"
  fast-agent go \
    --no-env \
    --model gpt-oss \
    --agent-cards .fast-agent/tool-cards \
    --agent hf_hub_community \
    --results /tmp/hf_hub_loop.json \
    -m "$p"
done < scripts/hf_hub_community_challenges.txt

Scoring (0-2 each)

Endpoint/tool correctness
Efficiency (pagination/filtering/projection)
Multi-step reasoning
Safety compliance
Output clarity

Total per challenge: /10

Automated scorer

Run all challenges:

python scripts/score_hf_hub_community_challenges.py

Run a subset (e.g., 1-3):

python scripts/score_hf_hub_community_challenges.py --start 1 --end 3

Outputs:

JSON: docs/hf_hub_community_challenge_report.json
Markdown: docs/hf_hub_community_challenge_report.md