Spaces:

Pratyush-01
/

physix-live

Sleeping

App Files Files Community

Pratyush-01 commited on 12 days ago

Commit

7f40db3

verified ·

1 Parent(s): 8225d8a

cleanup: trim verbose comments, drop dead code, fix stale tests, proper Dockerfile + .gitignore

Browse files

Files changed (12) hide show

.gitignore +31 -0
Dockerfile +14 -51
__init__.py +5 -15
client.py +1 -7
frontend/src/components/OpenEnvExplorerPane.tsx +6 -7
frontend/src/components/PhysixInferStatus.tsx +2 -7
models.py +1 -7
scripts/space_app.py +7 -22
scripts/verify_hf_router.py +1 -2
tests/test_interactive_api.py +2 -6
tests/test_providers_hf.py +0 -1
train/README.md +7 -13

.gitignore ADDED Viewed

	@@ -0,0 +1,31 @@

+# Python
+__pycache__/
+*.py[cod]
+*.egg-info/
+.pytest_cache/
+.ruff_cache/
+.mypy_cache/
+.venv/
+build/
+dist/
+# Frontend
+frontend/node_modules/
+frontend/dist/
+frontend/dist-ts-build/
+frontend/tsconfig.tsbuildinfo
+frontend/.vite/
+# OS / editor
+.DS_Store
+*.swp
+.vscode/
+.idea/
+# Local env / secrets
+.env
+.env.local
+# Local artifacts from running training scripts
+/tmp_*/
+*.log

Dockerfile CHANGED Viewed

@@ -1,59 +1,32 @@
-# PhysiX-Live demo Space — CPU-only env + UI.
 #
-# What this Space hosts:
 #
-#   :7860  uvicorn _space_app:app
-#          ├─ /reset, /step                 (OpenEnv stateless API)
-#          ├─ /interactive/*                (browser session API)
-#          ├─ /web/                         (built React SPA)
-#          └─ /interactive/.../llm-step      (LLM-driven episode)
-#
-# What this Space does NOT host:
-#   * Inference. The demo is CPU-only — no torch, no vLLM, no GPU. When
-#     the UI calls `/interactive/.../llm-step` the server forwards to
-#     whatever OpenAI-compatible base URL the browser handed us
-#     (HF Router, OpenAI, Ollama, or our sister L4 Space at
-#     `Pratyush-01/physix-infer` for the trained 3B + Qwen baseline).
-#
-# Why a separate inference Space:
-#   Keeps this CPU image tiny (sub-second cold-start) so the demo URL
-#   never feels like it's stalled. The L4 Space pays GPU rates only
-#   while it's actually serving requests — its `sleep_time=300s` shuts
-#   it down between sessions. Two Spaces, two failure surfaces; if
-#   inference is broken the verifier-only demo (Custom URL → Ollama
-#   etc.) still works.
 ############################
 # Stage 1: build the SPA
 ############################
-# WORKDIR renamed (was /build) to break HF BuildKit's poisoned cache.
-# The previous /build mount kept a stale pnpm symlink at
-#   /build/node_modules/@types/katex
-# from an earlier failed deploy, and every subsequent `COPY frontend/ ./`
-# blew up with `cannot copy to non-directory`. Switching paths gets us
-# a fresh cache bucket; nothing in the project depends on /build itself.
 FROM node:20-alpine AS frontend
 WORKDIR /spa
 RUN corepack enable
-# Copy ALL of frontend/ first — including package.json/pnpm-lock.yaml —
-# THEN install. Order matters: install runs ON TOP OF the source tree
-# instead of the source tree being overlaid on top of a pre-installed
-# node_modules, eliminating the directory-vs-symlink collision class
-# of failure entirely.
 COPY frontend/ ./
-# Same-origin API fetches (relative paths). The Space serves both API and UI.
 ENV VITE_PHYSIX_API_URL=""
-# Cache-bust marker. Bump when an SPA change isn't taking on the Space.
-# physix-spa-rebuild: 10
 RUN pnpm install --frozen-lockfile \
     && pnpm exec tsc -b \
     && pnpm exec vite build --base=/web/
 ############################
-# Stage 2: runtime (FastAPI + SPA)
 ############################
 FROM python:3.11-slim AS runtime
@@ -67,16 +40,13 @@ ENV PYTHONUNBUFFERED=1 \
     PHYSIX_HOST=0.0.0.0 \
     PHYSIX_CORS_ORIGINS=*
-# curl for healthchecks; the slim image has neither curl nor build tools
-# by default. Everything else (numpy, scipy, sympy) is a wheel install.
 RUN apt-get update \
     && apt-get install -y --no-install-recommends curl \
     && rm -rf /var/lib/apt/lists/*
 WORKDIR /app
-# Pin the server-side runtime stack. NO torch / unsloth / trl here —
-# this Space never trains and never runs a model locally.
 RUN pip install \
         "openenv-core[core]>=0.2.2" \
         "numpy>=1.24" \
@@ -88,27 +58,20 @@ RUN pip install \
         "openai>=1.40" \
         "requests>=2.31"
-COPY pyproject.toml ./
 COPY physix ./physix
-COPY README.md ./
 RUN pip install --no-deps -e .
-# Built SPA from stage 1.
 COPY --from=frontend /spa/dist /app/static
-# Space wrapper — mounts the React SPA at /web/, registers / -> /web/
-# redirect (OpenEnv's create_fastapi_app doesn't add one for us).
 COPY scripts/space_app.py /app/_space_app.py
-# Pre-create writable dirs. HF Spaces runs containers as a non-root UID
-# with no /etc/passwd entry, so any cache path under $HOME must exist
-# and be world-writable BEFORE the runtime user shows up.
 RUN mkdir -p "$HOME" "$HF_HOME" "$XDG_CACHE_HOME" \
     && chmod -R 0777 /tmp /app
 EXPOSE 7860
-# /health is OpenEnv's stock endpoint and turns 200 once uvicorn binds.
 HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
     CMD curl -fsS "http://127.0.0.1:${PORT}/health" || exit 1

+# PhysiX-Live demo Space — CPU-only env + UI on :7860
 #
+#   uvicorn _space_app:app
+#     ├─ /reset, /step      OpenEnv stateless API
+#     ├─ /interactive/*     browser session API + LLM-step
+#     └─ /web/              built React SPA
 #
+# No torch / vLLM / GPU here. LLM inference is forwarded to whatever
+# OpenAI-compatible base URL the browser provides (HF Router, OpenAI,
+# Ollama, or our sister L4 Space `Pratyush-01/physix-infer`).
 ############################
 # Stage 1: build the SPA
 ############################
 FROM node:20-alpine AS frontend
 WORKDIR /spa
 RUN corepack enable
 COPY frontend/ ./
+# Same-origin API fetches at runtime (Space serves both API and UI).
 ENV VITE_PHYSIX_API_URL=""
 RUN pnpm install --frozen-lockfile \
     && pnpm exec tsc -b \
     && pnpm exec vite build --base=/web/
 ############################
+# Stage 2: runtime
 ############################
 FROM python:3.11-slim AS runtime
     PHYSIX_HOST=0.0.0.0 \
     PHYSIX_CORS_ORIGINS=*
 RUN apt-get update \
     && apt-get install -y --no-install-recommends curl \
     && rm -rf /var/lib/apt/lists/*
 WORKDIR /app
+# Inference deps only — no torch / unsloth / trl. Training runs on HF Jobs.
 RUN pip install \
         "openenv-core[core]>=0.2.2" \
         "numpy>=1.24" \
         "openai>=1.40" \
         "requests>=2.31"
+COPY pyproject.toml README.md ./
 COPY physix ./physix
 RUN pip install --no-deps -e .
 COPY --from=frontend /spa/dist /app/static
 COPY scripts/space_app.py /app/_space_app.py
+# HF Spaces runs as a non-root UID with no /etc/passwd; pre-create
+# writable cache dirs so $HOME-based caches work.
 RUN mkdir -p "$HOME" "$HF_HOME" "$XDG_CACHE_HOME" \
     && chmod -R 0777 /tmp /app
 EXPOSE 7860
 HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
     CMD curl -fsS "http://127.0.0.1:${PORT}/health" || exit 1

__init__.py CHANGED Viewed

@@ -1,19 +1,9 @@
-"""OpenEnv root package shim.
-The OpenEnv CLI's ``validate_env_structure`` expects ``__init__.py``,
-``client.py``, and ``models.py`` to exist at the env-directory root —
-the convention where "the env *is* the package". Our actual code lives
-under ``physix/`` so the wheel builds cleanly and tests / training
-imports stay at ``from physix.* import ...``. These root shims simply
-re-export from ``physix`` so OpenEnv's auto-discovery (which Pascal-
-cases ``name: physix`` to ``Physix*`` in client.py) finds the symbols
-it expects without duplicating any implementation.
-NOTE: Inside the deployed Space the package installed via
-``pip install -e .`` is ``physix`` (see ``[tool.hatch.build.targets.wheel]``
-in pyproject.toml). These root files are *only* loaded by the OpenEnv
-CLI's local validator and by users who import the env-directory as a
-package; they are never imported at runtime by the FastAPI server.
 """
 from physix import (  # noqa: F401

+"""OpenEnv root package shim — re-exports the public API from ``physix``.
+OpenEnv's CLI validator expects ``__init__.py``, ``client.py``, and
+``models.py`` at the env-directory root. The real implementation lives
+under ``physix/``; these root files are thin re-exports so the wheel
+build and runtime imports stay at ``from physix.* import ...``.
 """
 from physix import (  # noqa: F401

client.py CHANGED Viewed

@@ -1,10 +1,4 @@
-"""OpenEnv root client shim — re-exports ``physix.client``.
-OpenEnv's CLI validator and auto-discovery expect ``client.py`` at the
-env-directory root. The real implementation lives in
-``physix/client.py``; this file just re-exports it so the OpenEnv
-contract is satisfied without duplicating any code.
-"""
 from physix.client import PhysiXEnv, PhysixEnv  # noqa: F401


1	+ """OpenEnv root client shim — re-exports ``physix.client``."""






2
3	from physix.client import PhysiXEnv, PhysixEnv # noqa: F401
4

frontend/src/components/OpenEnvExplorerPane.tsx CHANGED Viewed

@@ -46,7 +46,7 @@ const DEFAULT_EQUATION = "d2y/dt2 = -9.81";
 const DEFAULT_PARAMS_JSON = "{}";
 const DEFAULT_RATIONALE = "Free-fall under gravity.";
-// Same shape used by RunWithLlmPane / ComparePane so the OpenEnv tab
 // renders the identical reward layout when no step has run yet.
 const ZERO_REWARD: RewardBreakdown = {
   match: 0,
@@ -985,12 +985,11 @@ function ReferenceCard({
 // ---------------- reward display ----------------
 //
-// Kept in sync with the duplicate in RunWithLlmPane / ComparePane so all
-// three tabs render the same layout: the four trainable reward
-// components on top (match / progress / simplicity / format) and the
-// three diagnostic-only sub-scores (shape / freq / amplitude) on the
-// bottom labelled "diag". The diag row exists because R² collapses to
-// zero on small phase shifts, which makes match=0 misleading on its
 // own; shape/freq/amplitude give partial credit for "visual closeness"
 // without ever feeding into the reward total or the trainer.

 const DEFAULT_PARAMS_JSON = "{}";
 const DEFAULT_RATIONALE = "Free-fall under gravity.";
+// Same zero-reward shape used by RunWithLlmPane so the OpenEnv tab
 // renders the identical reward layout when no step has run yet.
 const ZERO_REWARD: RewardBreakdown = {
   match: 0,
 // ---------------- reward display ----------------
 //
+// Kept visually in sync with RunWithLlmPane: four trainable reward
+// components on top (match / progress / simplicity / format) and three
+// diagnostic-only sub-scores (shape / freq / amplitude) on the bottom
+// labelled "diag". Diag exists because R² collapses to zero on small
+// phase shifts, which makes match=0 misleading on its
 // own; shape/freq/amplitude give partial credit for "visual closeness"
 // without ever feeding into the reward total or the trainer.

frontend/src/components/PhysixInferStatus.tsx CHANGED Viewed

@@ -69,13 +69,8 @@ interface ProbeResult {
   hitContainer: boolean;
 }
-// Module-level dedup. The Compare pane mounts TWO copies of this
-// component (one per side), and without coalescing they'd each fire
-// their own `/health` GET every 15 s — pointless duplicate load on
-// the GPU Space's edge. We share a single in-flight promise across
-// concurrent callers and cache the last successful result for a
-// short window so the second mount on the same tick reuses the
-// first probe's answer instead of issuing its own.
 let inFlight: Promise<ProbeResult> | null = null;
 let lastResult: { result: ProbeResult; at: number } | null = null;
 const SHARED_RESULT_WINDOW_MS = 5_000;

   hitContainer: boolean;
 }
+// Module-level dedup. Multiple mounts share a single in-flight `/health`
+// probe and cache the last successful result for a short window.
 let inFlight: Promise<ProbeResult> | null = null;
 let lastResult: { result: ProbeResult; at: number } | null = null;
 const SHARED_RESULT_WINDOW_MS = 5_000;

models.py CHANGED Viewed

@@ -1,10 +1,4 @@
-"""OpenEnv root models shim — re-exports ``physix.models``.
-OpenEnv's CLI validator expects ``models.py`` at the env-directory
-root. The real Pydantic schemas live in ``physix/models.py``; this
-file re-exports them so OpenEnv's auto-discovery finds them under
-the env-name-derived path.
-"""
 from physix.models import (  # noqa: F401
     CONVERGENCE_THRESHOLD,

+"""OpenEnv root models shim — re-exports ``physix.models``."""
 from physix.models import (  # noqa: F401
     CONVERGENCE_THRESHOLD,

scripts/space_app.py CHANGED Viewed

@@ -1,23 +1,11 @@
 """Space entrypoint: physix.server.app:app + static UI mount.
-Imported at runtime by the Dockerfile's CMD via ``uvicorn _space_app:app``.
-What this wrapper adds on top of ``physix.server.app:app``:
-  1. ``GET /``            -> 302 to ``/web/`` (so the bare Space URL
-                             doesn't 404 — OpenEnv's ``create_fastapi_app``
-                             does NOT add a root redirect; that's only in
-                             the higher-level wrapper which mounts Gradio
-                             at ``/web`` and would clobber our React SPA).
-  2. ``GET /web``         -> 302 to ``/web/`` (same reason; users hit the
-                             no-trailing-slash variant from outside links).
-  3. ``StaticFiles`` mount at ``/web/`` serving the built Vite SPA. The
-     vite build was run with ``--base=/web/`` so all asset URLs in the
-     emitted ``index.html`` already include the prefix.
-Kept as a real .py file (not a heredoc inside the Dockerfile) so any
-syntax error is caught by the build's static analysis rather than at
-runtime — saved several deploy-fail loops in earlier iterations.
 """
 from __future__ import annotations
@@ -44,10 +32,7 @@ async def _web_no_slash_redirect() -> RedirectResponse:
 if _STATIC_DIR.is_dir():
     # html=True makes StaticFiles serve index.html for directory hits and
-    # fall back to it for unknown sub-paths (so client-side React routing
-    # works). Mounted last so registered API routes (/web/metadata,
-    # /web/reset, /web/step from OpenEnv; /interactive/* from physix)
-    # always win over the static handler.
     app.mount(
         "/web",
         StaticFiles(directory=str(_STATIC_DIR), html=True),

 """Space entrypoint: physix.server.app:app + static UI mount.
+Adds two things on top of `physix.server.app:app`:
+1. `GET /` and `GET /web` -> 302 to `/web/` (the OpenEnv app doesn't
+   ship a root redirect, so the bare Space URL would otherwise 404).
+2. `StaticFiles` mount at `/web/` for the built Vite SPA (Vite is
+   built with `--base=/web/` so asset URLs already include the prefix).
 """
 from __future__ import annotations
 if _STATIC_DIR.is_dir():
     # html=True makes StaticFiles serve index.html for directory hits and
+    # fall back to it for unknown sub-paths so client-side routing works.
     app.mount(
         "/web",
         StaticFiles(directory=str(_STATIC_DIR), html=True),

scripts/verify_hf_router.py CHANGED Viewed

@@ -30,7 +30,6 @@ from __future__ import annotations
 import argparse
 import asyncio
-import json
 import os
 import sys
 from dataclasses import dataclass
@@ -132,7 +131,7 @@ def check_token() -> str:
         print(response.text[:500], file=sys.stderr)
         sys.exit(1)
-    print(_green(f"✓ HF_TOKEN is valid and has Inference Providers scope."))
     return token

 import argparse
 import asyncio
 import os
 import sys
 from dataclasses import dataclass
         print(response.text[:500], file=sys.stderr)
         sys.exit(1)
+    print(_green("✓ HF_TOKEN is valid and has Inference Providers scope."))
     return token

tests/test_interactive_api.py CHANGED Viewed

@@ -89,15 +89,11 @@ def test_systems_endpoint_returns_supported_systems_in_order(
     returned_ids = [row["system_id"] for row in catalogue]
     assert returned_ids == list(SUPPORTED_SYSTEMS)
-    # The demo intentionally exposes all registered systems including
-    # tier-3 (``projectile_drag``, ``charged_b_field``) so visitors can
-    # stress-test the verifier on systems the model never trained on —
-    # that's the generalisation showcase.
     system_ids = set(returned_ids)
     assert "free_fall" in system_ids
     assert "damped_spring" in system_ids
-    assert "projectile_drag" in system_ids
-    assert "charged_b_field" in system_ids
 # --- Local model catalogue ---

     returned_ids = [row["system_id"] for row in catalogue]
     assert returned_ids == list(SUPPORTED_SYSTEMS)
+    # The 3 systems we trained on must all be exposed.
     system_ids = set(returned_ids)
     assert "free_fall" in system_ids
+    assert "simple_pendulum" in system_ids
     assert "damped_spring" in system_ids
 # --- Local model catalogue ---

tests/test_providers_hf.py CHANGED Viewed

@@ -27,7 +27,6 @@ from unittest.mock import MagicMock, patch
 import openai
 import pytest
-from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.testclient import TestClient
 from openenv.core.env_server import create_fastapi_app

 import openai
 import pytest
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.testclient import TestClient
 from openenv.core.env_server import create_fastapi_app

train/README.md CHANGED Viewed

@@ -4,19 +4,13 @@ This folder contains the scripts that launch SFT → GRPO training for the
 [PhysiX OpenEnv](../) on **Hugging Face Jobs**, plus a self-contained
 **Colab notebook** judges can re-run.
-> This used to be a separate `physix-train/` repo / training Space
-> (Dockerfile + `train.sh`). We migrated to HF Jobs because it queues,
-> doesn't pay for idle time, and reuses the upstream Unsloth image
-> directly. The Docker artifacts have been removed and the launcher
-> moved into the env repo, so there's now one repo, one Space.
 ## Files
 | File | What it does |
 |------|--------------|
-| [`physix_train_colab.ipynb`](physix_train_colab.ipynb) | End-to-end SFT → GRPO in one notebook. Built on **OpenEnv + Unsloth + TRL**. T4/L4 for `1.5b` profile, A100 for `3b`. |
-| [`submit.py`](submit.py) | Submit a job to HF Jobs via `HfApi.run_uv_job` (the CLI hangs intermittently on whoami; this path is reliable). |
-| [`job_train.py`](job_train.py) | Multi-system training driver (6 in-distribution systems). Runs *inside* the HF Jobs container. PEP 723 inline deps. |
 | [`job_train_single.py`](job_train_single.py) | Single-system variant (defaults to `damped_spring`) — focused reward signal, easier to read curves. |
 | [`sync-plots.sh`](sync-plots.sh) | Pull committed loss/reward PNGs from the model repo into `../docs/plots/` so they ship with the env Space. |
@@ -35,15 +29,15 @@ export WANDB_API_KEY=wandb_v1_...
 python submit.py
 ```
-Defaults: l40sx1 ($1.80/hr), 3 h timeout, source mounted from
-`hf://datasets/Pratyush-01/physix-live-src:/physix-live`.
 ## Run in Colab
 Open [`physix_train_colab.ipynb`](physix_train_colab.ipynb) on a Colab
 GPU runtime. The notebook installs the same dependency set as the cloud
-job, fetches the source from the HF dataset, runs SFT then GRPO, and
-plots loss + reward + per-component reward curves at the end.
 ## Pipeline cost (l40sx1, 3B profile)

 [PhysiX OpenEnv](../) on **Hugging Face Jobs**, plus a self-contained
 **Colab notebook** judges can re-run.
 ## Files
 | File | What it does |
 |------|--------------|
+| [`physix_train_colab.ipynb`](physix_train_colab.ipynb) | End-to-end SFT → GRPO in one notebook. Built on **OpenEnv + Unsloth + TRL**. T4/L4 for `1.5b` profile, L4/A100 for `3b`. |
+| [`submit.py`](submit.py) | Submit a job to HF Jobs via `HfApi.run_uv_job`. |
+| [`job_train.py`](job_train.py) | Training driver across the 3 trained systems. Runs *inside* the HF Jobs container. PEP 723 inline deps. |
 | [`job_train_single.py`](job_train_single.py) | Single-system variant (defaults to `damped_spring`) — focused reward signal, easier to read curves. |
 | [`sync-plots.sh`](sync-plots.sh) | Pull committed loss/reward PNGs from the model repo into `../docs/plots/` so they ship with the env Space. |
 python submit.py
 ```
+Defaults: l40sx1, 3 h timeout. Source is fetched at job-start by
+`_stage_physix_live()` directly from this Hugging Face Space repo.
 ## Run in Colab
 Open [`physix_train_colab.ipynb`](physix_train_colab.ipynb) on a Colab
 GPU runtime. The notebook installs the same dependency set as the cloud
+job, fetches the source from this Hugging Face Space, runs SFT then
+GRPO, and plots loss + reward curves at the end.
 ## Pipeline cost (l40sx1, 3B profile)