Spaces:
Sleeping
HF Router verification before shipping the Space
The judges will overwhelmingly run the demo through the Hugging Face
Router path. Before deploying any change that touches the connection
panel, the provider abstraction, or the server's /interactive/*
routes, walk this checklist.
A. Automated (no token required)
cd physix-live
source .venv/bin/activate # or however you activate
python -m pytest tests/test_providers.py tests/test_providers_hf.py -v
Expected: 20 passed. These pin:
- The HF Router base URL (
https://router.huggingface.co/v1). - That the OpenAI SDK is constructed with the visitor's
api_key, the canonicalbase_url, and aUser-Agentheader. - That
HF_TOKENenv-var fallback works when the panel field is empty but only for HF URLs (no leakage to third-party providers). - That a provider rejecting
response_format={"type":"json_object"}with a 400 transparently retries without it. - That
401,404, connection errors, and timeouts each surface a hint that points at the right remediation step. - That a full episode through
/interactive/sessions/{id}/llm-steppasses the visitor'sbase_url + model + api_keybyte-for-byte to the OpenAI SDK call.
B. Real-network probe (HF_TOKEN required)
# Terminal 1 — the demo backend.
python -m physix.server.app --host 127.0.0.1 --port 8000
# Terminal 2 — the verifier.
export HF_TOKEN=hf_... # needs 'Make calls to Inference Providers' scope.
python scripts/verify_hf_router.py
What to look for:
Step 1: HF_TOKEN check must end with
✓ HF_TOKEN is valid and has Inference Providers scope.If it fails, fix the token scope at https://huggingface.co/settings/tokens before going further.Step 2: model probes. Each of the four HF models we suggest in the connection panel gets a 4-token completion attempt. Any model reported as
NOT SERVED (404)will appear broken in the demo unless you fix it on the model card (see "If the trained model 404s" below).Step 3: live episode. The verifier drives one real PhysiX episode. Per-turn output should look like:
turn 1: match=0.42 format=1.00 total=0.46 (3.2s) equation: 'd2y/dt2 = -9.81' turn 2: match=0.81 format=1.00 total=0.69 (2.8s) equation: 'd2y/dt2 = -9.81 + 0.04 * vy**2'format=1.00on every turn means the prompt + parser + verifier pipeline is healthy.format=0.00means the model returned something unparseable — usually a hint thatresponse_format={"type":"json_object"}was silently ignored and the model wasn't SFT-warmed. Acceptable for raw-Qwen baselines, not for the trained PhysiX model.
C. Browser walkthrough
With both backend and frontend running (make dev):
Panel renders four endpoints. Open the page; both the A and B panels show the same four-option Endpoint dropdown:
Ollama (localhost:11434)·Hugging Face Router·OpenAI·Custom. The default isHugging Face Routeron both sides.Model field adapts per endpoint.
- With Hugging Face Router selected, the Model field is a text input. Click in it: a datalist of four suggestions appears (PhysiX RL, PhysiX SFT, untuned Qwen 3B, larger Qwen 7B).
- Switch to Ollama: the Model field becomes a hard select.
If
ollama serveis running, it lists installed tags; if not, it shows an amber-bordered fallback input with the canonicalqwen2.5:3b-instructplaceholder. - Switch to OpenAI: text input + datalist of
gpt-4o-mini,gpt-4o,gpt-4.1-mini. - Switch to Custom: text input with no suggestions, plus a new Custom base URL field appears.
API key persistence. Type a key into side A's panel, refresh the page — the key reappears (per-
base_urllocalStoragekey). Switch the endpoint to OpenAI: the key is not the HF one (each base URL has its own slot). Switch back to HF Router: HF key reappears.One real run end-to-end. Paste your HF token, leave A pointed at
Pratyush-01/physix-3b-rland B atQwen/Qwen2.5-3B-Instruct, pick "Free Fall" from the system dropdown, hit▶ Run side-by-side.Expected within ~30s:
- Both panels show a trajectory plot with predicted overlays.
- Per-side reward strip ticks turn-by-turn.
- Once both finish, the "Scoreboard" banner at the bottom shows
Winner: A(the trained model should beat raw Qwen on Free Fall after a few turns; if it doesn't, that's a real signal worth investigating — not a UI bug).
401 surfaces correctly. Clear the API key on side A and re-run. The A column shows an amber error row containing "'Make calls to Inference Providers' fine-grained permission". Side B (with its own key still set) keeps running normally — the two sides are independent.
404 surfaces correctly. Type a clearly bogus model id like
does-not-exist/fooand run. The error row points at the model card "Deploy → Inference API" panel.
If the trained model 404s
Pratyush-01/physix-3b-rl is public, but HF Inference Providers
only serves models that at least one provider has loaded. If
verify_hf_router.py reports it as NOT SERVED, fix it before
shipping:
- Open the model card at https://huggingface.co/Pratyush-01/physix-3b-rl.
- Click Deploy → Inference Providers. Pick at least one provider that supports custom Qwen2.5-3B fine-tunes — Featherless and Together are the most reliable for this model size.
- Wait ~5 minutes for the provider to warm up the weights.
- Re-run
python scripts/verify_hf_router.py— theNOT SERVEDline should now beOK.
If no provider will load it, the fall-back demo story is still
strong: keep side A on the SFT checkpoint
(Pratyush-01/physix-3b-sft-merged) or the untuned Qwen, and
explain in the writeup that the trained checkpoint runs locally
via Ollama / vLLM. The :fastest model-id suffix is also worth
trying — HF will pick whichever provider can serve it first.
Pre-deploy gate
Don't merge to main if any of these regress:
-
pytest tests/test_providers.py tests/test_providers_hf.pyis green -
python scripts/verify_hf_router.py --skip-episodereports ≥1 served model in the suggestion list - At least one of
Pratyush-01/physix-3b-rlorPratyush-01/physix-3b-sft-mergedis served (the comparison story collapses if neither trained variant works) - Browser walkthrough section C above completes without surprises