Spaces:

Mihir1107
/

DateSelectEnv

Sleeping

Mihir1107 commited on Mar 31

Commit

c90be96

1 Parent(s): fd61835

Complete OpenEnv hackathon submission

- Add pyproject.toml + uv.lock for openenv validate compliance
- Add server/app.py shim (importlib) for [project.scripts] entry point
- Rename baseline.py → inference.py (WebSocket-based LLM agent)
- Fix env.py: guaranteed label flip, noise trap, shaped reward, warmup
- Fix models.py: strategy_weights validator
- Fix server.py: /ws WebSocket, /grader with episode persistence, Reward model
- Fix openenv.yaml: sync medium/hard success criteria with grader
- Add .gitignore: exclude .DS_Store, __pycache__, .claude/
- Add websockets to requirements.txt
- Update README: HF frontmatter, actual baseline scores, full API docs
openenv validate: [OK] DataSelectEnv: Ready for multi-mode deployment

Files changed (13) hide show

.gitignore +31 -0
README.md +133 -0
baseline.py +0 -204
env.py +45 -19
inference.py +280 -0
models.py +7 -1
openenv.yaml +5 -4
pyproject.toml +30 -0
requirements.txt +2 -1
server.py +185 -39
server/__init__.py +0 -0
server/app.py +39 -0
uv.lock +0 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,31 @@

+# macOS
+.DS_Store
+.AppleDouble
+.LSOverride
+# Python
+__pycache__/
+*.py[cod]
+*.pyo
+*.pyd
+.Python
+*.egg-info/
+dist/
+build/
+.eggs/
+# Environments
+.env
+.venv
+env/
+venv/
+# Claude Code memory (local tooling only, not part of submission)
+.claude/
+# IDE
+.vscode/
+.idea/
+# OS
+Thumbs.db

README.md CHANGED Viewed

	@@ -1 +1,134 @@











1	# DataSelectEnv — OpenEnv Environment for Data Curation in ML Training

+---
+title: DataSelectEnv
+emoji: 🤖
+colorFrom: blue
+colorTo: green
+sdk: docker
+pinned: false
+tags:
+  - openenv
+---
 # DataSelectEnv — OpenEnv Environment for Data Curation in ML Training
+## Description
+DataSelectEnv is an [OpenEnv](https://github.com/meta-pytorch/OpenEnv)-compliant reinforcement learning environment for the Meta PyTorch OpenEnv Hackathon. The environment models a core problem in real-world machine learning: given a large pool of candidate training examples — some clean, some mislabelled, all with a cost to acquire — which samples should an agent select to maximise model quality under a fixed labelling budget?
+We implement a real incremental training loop using SGDClassifier, where agents must select training data under budget constraints capturing realistic ML dynamics including diminishing returns, redundancy penalties, and noise sensitivity. The agent observes the current classifier's validation performance, an estimate of remaining pool noise, training-set diversity, and budget left, then decides how many samples to select and which sampling strategy to weight. Three difficulty tiers (easy / medium / hard) vary the noise level, budget, and time horizon to test whether agents can adapt their strategy to the environment's constraints.
+---
+## Observation Space
+| Field | Type | Range | Description |
+|---|---|---|---|
+| `remaining_budget` | int | [0, budget] | Samples remaining in the selection budget |
+| `diversity_score` | float | [0, ∞) | Std-dev of current training set features (proxy for diversity) |
+| `noise_estimate` | float | [0, 1] | Fraction of noisy samples still remaining in the pool |
+| `current_performance` | float | [0, 1] | Validation score: 1 / (1 + log_loss) |
+| `samples_available` | int | [0, ~1100] | Number of unlabelled samples left in the pool |
+---
+## Action Space
+| Field | Type | Values | Description |
+|---|---|---|---|
+| `action_type` | string | `select_batch`, `stop` | Select a batch of data or end the episode early |
+| `batch_size` | int | ≥ 0 | Number of samples to select this step |
+| `strategy_weights.uncertainty` | float | ≥ 0 | Weight for uncertainty sampling (highest-entropy samples) |
+| `strategy_weights.diversity` | float | ≥ 0 | Weight for diversity sampling (farthest from training centroid) |
+| `strategy_weights.random` | float | ≥ 0 | Weight for uniform random sampling |
+Strategy weights are normalised internally and do not need to sum to 1.
+---
+## Tasks
+| Name | flip_y | Budget | max_steps | Success criteria | Expected random score |
+|---|---|---|---|---|---|
+| `easy` | 0.05 | 300 | 15 | performance > 0.55 | ~0.60 |
+| `medium` | 0.25 | 150 | 12 | performance > 0.52 AND avg noise ratio < 0.30 | ~0.40 |
+| `hard` | 0.30 | 100 | 8 | performance > 0.53 (+ budget efficiency) | ~0.30 |
+---
+## Reward Function
+```
+gain        = (new_performance - old_performance) * 5.0
+            + 0.2 * std(selected_batch)            # diversity bonus
+            + 0.2 * (new_performance - old_performance)  # alignment bonus
+if redundancy > 0.8:  gain *= 0.5   # redundancy penalty
+if new_performance > 0.85: gain *= 0.7  # diminishing-returns cap
+noise_penalty = 0.4 * noise_ratio_of_selected_batch
+reward = gain
+       - 0.01 * batch_size       # budget cost
+       - 0.3  * redundancy       # cosine similarity to training centroid
+       - noise_penalty
+```
+---
+## Setup
+### Local (pip)
+```bash
+pip install -r requirements.txt
+python server.py          # starts on http://localhost:7860
+```
+### Docker
+```bash
+docker build -t dataselectenv .
+docker run -p 7860:7860 dataselectenv
+```
+### Run inference (LLM agent)
+```bash
+export HF_TOKEN=hf_...        # or OPENAI_API_KEY=sk-...
+export API_BASE_URL=https://api-inference.huggingface.co/v1   # optional
+export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct           # optional
+python inference.py --host http://localhost:7860
+```
+### API quick-start
+```bash
+# Reset an episode
+curl -X POST http://localhost:7860/reset \
+  -H "Content-Type: application/json" \
+  -d '{"task_id": "easy", "seed": 42}'
+# Take a step
+curl -X POST http://localhost:7860/step \
+  -H "Content-Type: application/json" \
+  -d '{"action": {"action_type": "select_batch", "batch_size": 10,
+       "strategy_weights": {"uncertainty": 0.4, "diversity": 0.4, "random": 0.2}}}'
+# Run the built-in baseline and get reproducible scores
+curl http://localhost:7860/baseline
+```
+---
+## Baseline Scores
+Scores below are from the fixed balanced agent (`uncertainty=0.4, diversity=0.4, random=0.2`, seed=42) run via `GET /baseline`.
+| Task | Score | Passed | Final performance |
+|---|---|---|---|
+| easy | 0.7020 | ✅ | 0.6904 |
+| medium | 0.6600 | ✅ | 0.6569 |
+| hard | 0.4174 | ✅ | 0.6176 |
+Scores are from the fixed balanced agent (`uncertainty=0.4, diversity=0.4, random=0.2`, seed=42) via `GET /baseline`.

baseline.py DELETED Viewed

@@ -1,204 +0,0 @@
-"""
-baseline.py — Baseline inference script for DataSelectEnv
-Uses the OpenAI API client (as required by OpenEnv spec) to run an LLM
-agent against all 3 tasks and produce reproducible scores.
-Usage:
-    export OPENAI_API_KEY=sk-...
-    python baseline.py [--host http://localhost:7860]
-The agent is given the current observation as a JSON prompt and asked
-to return an action. This tests whether an LLM can navigate the
-data-curation environment without any fine-tuning.
-"""
-import argparse
-import json
-import os
-import sys
-import requests
-from openai import OpenAI
-# ---------------------------------------------------------------------------
-# Config
-# ---------------------------------------------------------------------------
-DEFAULT_HOST = os.environ.get("ENV_HOST", "http://localhost:7860")
-SEED = 42
-TASKS = ["easy", "medium", "hard"]
-SYSTEM_PROMPT = """You are an intelligent data curation agent.
-Your goal is to select high-quality training data from a pool to improve a machine learning model.
-At each step you observe the current state and must choose a data selection strategy.
-You will receive a JSON observation with these fields:
-- remaining_budget: how many samples you can still select
-- diversity_score: current diversity of training set (higher = more diverse)
-- noise_estimate: estimated fraction of noisy samples remaining in pool
-- current_performance: current model validation performance (higher = better)
-- samples_available: number of samples left in the pool
-You must respond with ONLY a valid JSON action in this exact format:
-{
-  "action_type": "select_batch",
-  "batch_size": <integer between 5 and 20>,
-  "strategy_weights": {
-    "uncertainty": <float 0-1>,
-    "diversity": <float 0-1>,
-    "random": <float 0-1>
-  }
-}
-Rules:
-- Weights do not need to sum to 1 (they are normalized automatically)
-- If noise_estimate is high (>0.2), reduce uncertainty weight and increase diversity weight
-- If diversity_score is low (<0.5), increase diversity weight
-- If remaining_budget is low (<30), use smaller batch sizes
-- You may use "action_type": "stop" to end early if performance is already good
-- Respond with ONLY the JSON object, no explanation."""
-def query_llm(client: OpenAI, observation: dict) -> dict:
-    """Ask the LLM to produce an action given the current observation."""
-    user_msg = f"Current observation:\n{json.dumps(observation, indent=2)}\n\nWhat action do you take?"
-    response = client.chat.completions.create(
-        model="gpt-4o-mini",
-        messages=[
-            {"role": "system", "content": SYSTEM_PROMPT},
-            {"role": "user",   "content": user_msg},
-        ],
-        temperature=0.0,   # deterministic
-        max_tokens=200,
-    )
-    raw = response.choices[0].message.content.strip()
-    # Strip markdown fences if the model wraps the JSON
-    if raw.startswith("```"):
-        raw = raw.split("```")[1]
-        if raw.startswith("json"):
-            raw = raw[4:]
-    raw = raw.strip()
-    return json.loads(raw)
-def run_task(host: str, client: OpenAI, task_id: str) -> dict:
-    """Run one full episode on a task and return the grader result."""
-    print(f"\n{'='*50}")
-    print(f"Task: {task_id.upper()}")
-    print(f"{'='*50}")
-    # Reset
-    r = requests.post(f"{host}/reset", json={"task_id": task_id, "seed": SEED})
-    r.raise_for_status()
-    data       = r.json()
-    episode_id = data["episode_id"]
-    obs        = data["observation"]
-    print(f"Episode ID: {episode_id}")
-    print(f"Initial obs: {obs}")
-    step = 0
-    total_reward = 0.0
-    while True:
-        step += 1
-        # Get action from LLM
-        try:
-            action = query_llm(client, obs)
-        except (json.JSONDecodeError, KeyError) as e:
-            print(f"  Step {step}: LLM returned invalid JSON ({e}), using fallback action")
-            action = {
-                "action_type": "select_batch",
-                "batch_size": 10,
-                "strategy_weights": {"uncertainty": 0.4, "diversity": 0.4, "random": 0.2},
-            }
-        # Execute action
-        r = requests.post(f"{host}/step", json={"action": action})
-        r.raise_for_status()
-        result = r.json()
-        obs          = result["observation"]
-        reward       = result["reward"]
-        done         = result["done"]
-        total_reward += reward
-        print(f"  Step {step:2d} | perf={obs['current_performance']:.4f} "
-              f"budget={obs['remaining_budget']:3d} reward={reward:+.4f} "
-              f"noise_est={obs['noise_estimate']:.3f}")
-        if done:
-            break
-    print(f"\nEpisode done after {step} steps | total_reward={total_reward:.4f}")
-    print(f"Final performance: {obs['current_performance']:.4f}")
-    # Grade
-    r = requests.post(f"{host}/grader", json={"episode_id": episode_id, "task_id": task_id})
-    r.raise_for_status()
-    grade = r.json()
-    print(f"Score:   {grade['score']:.4f}")
-    print(f"Passed:  {grade['passed']}")
-    print(f"Details: {grade['breakdown']}")
-    return {
-        "task_id":     task_id,
-        "score":       grade["score"],
-        "passed":      grade["passed"],
-        "breakdown":   grade["breakdown"],
-        "steps":       step,
-        "total_reward": round(total_reward, 4),
-        "final_performance": obs["current_performance"],
-    }
-def main():
-    parser = argparse.ArgumentParser(description="DataSelectEnv baseline inference script")
-    parser.add_argument("--host", default=DEFAULT_HOST, help="Environment server URL")
-    args = parser.parse_args()
-    api_key = os.environ.get("OPENAI_API_KEY")
-    if not api_key:
-        print("ERROR: OPENAI_API_KEY environment variable not set.")
-        sys.exit(1)
-    client = OpenAI(api_key=api_key)
-    # Health check
-    try:
-        r = requests.get(f"{args.host}/health", timeout=5)
-        r.raise_for_status()
-        print(f"Connected to {args.host} — {r.json()}")
-    except Exception as e:
-        print(f"ERROR: Could not reach environment at {args.host}: {e}")
-        sys.exit(1)
-    # Run all tasks
-    results = {}
-    for task_id in TASKS:
-        results[task_id] = run_task(args.host, client, task_id)
-    # Summary
-    print(f"\n{'='*50}")
-    print("BASELINE RESULTS SUMMARY")
-    print(f"{'='*50}")
-    print(f"{'Task':<10} {'Score':<8} {'Passed':<8} {'Final Perf':<12} {'Steps'}")
-    print("-" * 50)
-    for task_id, r in results.items():
-        print(f"{task_id:<10} {r['score']:<8.4f} {str(r['passed']):<8} "
-              f"{r['final_performance']:<12.4f} {r['steps']}")
-    overall = sum(r["score"] for r in results.values()) / len(results)
-    print(f"\nOverall mean score: {overall:.4f}")
-    print(json.dumps({"baseline_results": results, "mean_score": round(overall, 4)}, indent=2))
-if __name__ == "__main__":
-    main()

env.py CHANGED Viewed

@@ -1,3 +1,4 @@
 import numpy as np
 import random
@@ -7,7 +8,7 @@ from sklearn.metrics import log_loss
 from sklearn.preprocessing import StandardScaler
 from models import Observation, Action, EnvState
-from sampling import sample_uncertainty, sample_diversity, sample_random, entropy, sim_to_noisy
 from reward import mean_cosine, running_mean
@@ -57,7 +58,7 @@ class DataSelectEnv:
             n_informative=5,
             n_redundant=5,
             n_clusters_per_class=2,
-            class_sep=1.5,
             flip_y=0.1,
             random_state=42,   # dataset skeleton is fixed; noise injection varies by seed
         )
@@ -75,9 +76,11 @@ class DataSelectEnv:
         flip_prob = self.cfg["data"].get("flip_y", 0.1)
         noise_mask = np.random.rand(len(y_pool)) < flip_prob
         y_pool_noisy = y_pool.copy()
-        random_labels = np.random.randint(0, 2, size=np.sum(noise_mask))
-        y_pool_noisy[noise_mask] = random_labels
-        X_pool[noise_mask] += np.random.normal(0, 0.5, X_pool[noise_mask].shape)
         # Fresh model every episode
         self.model = SGDClassifier(
@@ -90,7 +93,7 @@ class DataSelectEnv:
         )
         # Warm start on seed data
-        for _ in range(2):
             self.model.partial_fit(X_seed, y_seed, classes=np.unique(y))
         self._episode_state = DatasetState(
@@ -129,23 +132,34 @@ class DataSelectEnv:
         total = sum(w.values()) + 1e-8
         w = {k: v / total for k, v in w.items()}
         b = min(action.batch_size, s.budget)
         if b <= 0:
             return self._obs(), -0.01, False, {"error": "empty batch"}
         if action.action_type == "stop":
-            return self._obs(), 0.05 * s.budget, True, {}
         # Uncertainty + noise trap
         proba_pool = self.model.predict_proba(s.X_pool)
         H = entropy(proba_pool)
-        # Force noisy samples to look extremely uncertain (the trap)
-        noise_boost = s.noise_mask.astype(float) * 0.4
-        # Penalize structurally via similarity to noisy centroid
-        sim_noise = sim_to_noisy(s.X_pool, s.noisy_centroid)
-        H_adj = H + noise_boost - (self.cfg["alpha"] * 3.0) * sim_noise
         # Sampling
         n_u = int(b * w.get("uncertainty", 0))
@@ -163,7 +177,7 @@ class DataSelectEnv:
         Xb, yb = s.X_pool[idx], s.y_pool[idx]
         selected_noise = s.noise_mask[idx]
-        noise_ratio = float(np.mean(selected_noise)) if s.steps >= self.WARMUP else 0.0
         # Remove selected samples from pool — keep noise_mask in sync
         keep = np.ones(len(s.X_pool), dtype=bool)
@@ -180,11 +194,18 @@ class DataSelectEnv:
         new = self._score()
         # ----------------------------------------------------------------
-        # Reward design — DO NOT modify, balance is tuned
         # ----------------------------------------------------------------
         gain = (new - old) * 5.0
-        diversity_bonus = np.std(Xb)
-        gain += 0.2 * diversity_bonus
         redundancy = mean_cosine(Xb, s.train_centroid)
         if redundancy > 0.8:
@@ -192,9 +213,14 @@ class DataSelectEnv:
         if new > 0.85:
             gain *= 0.7
-        noise_penalty = 0.4 * noise_ratio
         reward = gain - 0.01 * b - 0.3 * redundancy - noise_penalty
-        reward += 0.2 * (new - old)   # small alignment bonus
         # ----------------------------------------------------------------
         # Update state

+import math
 import numpy as np
 import random
 from sklearn.preprocessing import StandardScaler
 from models import Observation, Action, EnvState
+from sampling import sample_uncertainty, sample_diversity, sample_random, entropy
 from reward import mean_cosine, running_mean
             n_informative=5,
             n_redundant=5,
             n_clusters_per_class=2,
+            class_sep=1.0,
             flip_y=0.1,
             random_state=42,   # dataset skeleton is fixed; noise injection varies by seed
         )
         flip_prob = self.cfg["data"].get("flip_y", 0.1)
         noise_mask = np.random.rand(len(y_pool)) < flip_prob
         y_pool_noisy = y_pool.copy()
+        # Guaranteed label flip (1-y), not random assignment.
+        # Random assignment gives the correct label 50% of the time, halving
+        # effective noise. Flipping guarantees every noise_mask sample is wrong.
+        y_pool_noisy[noise_mask] = 1 - y_pool[noise_mask]
+        X_pool[noise_mask] += np.random.normal(0, 0.1, X_pool[noise_mask].shape)
         # Fresh model every episode
         self.model = SGDClassifier(
         )
         # Warm start on seed data
+        for _ in range(10):
             self.model.partial_fit(X_seed, y_seed, classes=np.unique(y))
         self._episode_state = DatasetState(
         total = sum(w.values()) + 1e-8
         w = {k: v / total for k, v in w.items()}
+        min_b = self.cfg.get("min_batch", 1)
         b = min(action.batch_size, s.budget)
+        if b < min_b and action.action_type != "stop":
+            b = min_b   # enforce minimum; prevents single-sample gaming
         if b <= 0:
             return self._obs(), -0.01, False, {"error": "empty batch"}
         if action.action_type == "stop":
+            perf_threshold = self.cfg.get("stop_threshold", 0.60)
+            if s.performance >= perf_threshold:
+                stop_reward = 0.05 * s.budget
+            else:
+                stop_reward = -1.0
+            return self._obs(), stop_reward, True, {}
         # Uncertainty + noise trap
         proba_pool = self.model.predict_proba(s.X_pool)
         H = entropy(proba_pool)
+        # Noise trap: boost entropy of noisy samples so uncertainty sampling is
+        # attracted to them. Capped at 0.55 (< log(2) ≈ 0.693 max binary entropy)
+        # so clean uncertain samples can still compete — trap misleads rather
+        # than completely overrides, keeping uncertainty a near-miss on hard.
+        max_entropy = math.log(2)   # ≈ 0.693 for binary classifier
+        flip_prob   = self.cfg["data"].get("flip_y", 0.1)
+        boost_raw   = 0.1 + flip_prob * 2.0
+        noise_boost = s.noise_mask.astype(float) * min(boost_raw, 0.55)
+        H_adj = H + noise_boost
         # Sampling
         n_u = int(b * w.get("uncertainty", 0))
         Xb, yb = s.X_pool[idx], s.y_pool[idx]
         selected_noise = s.noise_mask[idx]
+        noise_ratio = float(np.mean(selected_noise))
         # Remove selected samples from pool — keep noise_mask in sync
         keep = np.ones(len(s.X_pool), dtype=bool)
         new = self._score()
         # ----------------------------------------------------------------
+        # Reward design
         # ----------------------------------------------------------------
         gain = (new - old) * 5.0
+        # Distance-based diversity bonus: rewards batches that cover regions
+        # far from existing training data. Diversity sampling scores high
+        # (~0.25), random scores average (~0.22), uncertainty scores low
+        # (~0.15) because boundary samples cluster near the centroid.
+        diversity_bonus = float(np.mean(
+            np.linalg.norm(Xb - s.train_centroid, axis=1)
+        )) * 0.05
+        gain += diversity_bonus
         redundancy = mean_cosine(Xb, s.train_centroid)
         if redundancy > 0.8:
         if new > 0.85:
             gain *= 0.7
+        # Noise penalty scales with task difficulty: easy is forgiving,
+        # hard severely punishes noisy selections.
+        flip_prob = self.cfg["data"].get("flip_y", 0.1)
+        noise_scale = 1.0 + flip_prob * 2.0   # 1.1 easy | 1.5 medium | 1.6 hard
+        noise_penalty = noise_scale * noise_ratio
         reward = gain - 0.01 * b - 0.3 * redundancy - noise_penalty
+        reward += 0.15   # baseline: keeps reward in mixed-sign territory so
+                         # RL agents receive positive signal for good steps
         # ----------------------------------------------------------------
         # Update state

inference.py ADDED Viewed

	@@ -0,0 +1,280 @@

+"""
+inference.py — WebSocket-based inference script for DataSelectEnv
+Connects to the environment via WebSocket (/ws) — the required transport
+on HF Spaces where HTTP /reset and /step are not accessible.
+Usage:
+    export HF_TOKEN=hf_...              # or OPENAI_API_KEY=sk-...
+    export ENV_HOST=https://your-space.hf.space   # or http://localhost:7860
+    export API_BASE_URL=https://api-inference.huggingface.co/v1  # optional
+    export MODEL_NAME=meta-llama/Llama-3.1-8B-Instruct           # optional
+    python inference.py [--host URL]
+Runs all 3 tasks sequentially using one WebSocket connection per task,
+calls POST /grader after each episode, prints scores and final summary.
+Designed to complete in under 20 minutes on 2 vCPU / 8 GB RAM.
+"""
+import argparse
+import asyncio
+import json
+import os
+import sys
+import requests
+import websockets
+from openai import OpenAI
+# ---------------------------------------------------------------------------
+# Config — all overridable via environment variables
+# ---------------------------------------------------------------------------
+DEFAULT_HOST  = os.environ.get("ENV_HOST",      "http://localhost:7860")
+API_BASE_URL  = os.environ.get("API_BASE_URL",  "https://api.openai.com/v1")
+MODEL_NAME    = os.environ.get("MODEL_NAME",    "gpt-4o-mini")
+SEED          = 42
+TASKS         = ["easy", "medium", "hard"]
+FALLBACK_ACTION = {
+    "action_type": "select_batch",
+    "batch_size": 10,
+    "strategy_weights": {"uncertainty": 0.3, "diversity": 0.5, "random": 0.2},
+}
+SYSTEM_PROMPT = """You are an intelligent data curation agent.
+Your goal is to select high-quality training data from a noisy pool to improve
+a machine learning classifier. At each step you observe the current state and
+must choose a data selection strategy.
+Observation fields:
+- remaining_budget: samples you can still select (integer)
+- diversity_score: std-dev of current training set features (higher = more diverse)
+- noise_estimate: fraction of noisy (mislabelled) samples remaining in pool
+- current_performance: validation score = 1/(1+log_loss), range [0,1]
+- samples_available: unlabelled samples remaining in the pool
+Respond with ONLY a valid JSON action in this exact format:
+{
+  "action_type": "select_batch",
+  "batch_size": <integer 5–20>,
+  "strategy_weights": {
+    "uncertainty": <float 0–1>,
+    "diversity":   <float 0–1>,
+    "random":      <float 0–1>
+  }
+}
+Strategy rules:
+- Weights are normalized automatically (no need to sum to 1)
+- noise_estimate > 0.2  → lower uncertainty weight, raise diversity weight
+- noise_estimate > 0.4  → set uncertainty near 0, maximize diversity
+- diversity_score < 0.5 → increase diversity weight
+- remaining_budget < 30 → reduce batch_size to 5
+- You may use "action_type": "stop" with batch_size 0 only when
+  current_performance > 0.65 AND remaining_budget < 20
+- Respond with ONLY the JSON object, no explanation, no markdown fences."""
+# ---------------------------------------------------------------------------
+# LLM helper
+# ---------------------------------------------------------------------------
+def query_llm(client: OpenAI, observation: dict) -> dict:
+    """Ask the LLM to produce an action given the current observation."""
+    user_msg = (
+        f"Current observation:\n{json.dumps(observation, indent=2)}\n\n"
+        "What action do you take?"
+    )
+    response = client.chat.completions.create(
+        model=MODEL_NAME,
+        messages=[
+            {"role": "system", "content": SYSTEM_PROMPT},
+            {"role": "user",   "content": user_msg},
+        ],
+        temperature=0.0,
+        max_tokens=200,
+    )
+    raw = response.choices[0].message.content.strip()
+    # Strip markdown fences if model wraps JSON
+    if raw.startswith("```"):
+        raw = raw.split("```")[1]
+        if raw.startswith("json"):
+            raw = raw[4:]
+    return json.loads(raw.strip())
+# ---------------------------------------------------------------------------
+# WebSocket episode runner
+# ---------------------------------------------------------------------------
+def http_base(host: str) -> str:
+    """Return HTTP base URL (strip trailing slash)."""
+    return host.rstrip("/")
+def ws_url(host: str) -> str:
+    """Convert http(s):// base URL to ws(s):// WebSocket URL."""
+    base = http_base(host)
+    if base.startswith("https://"):
+        return "wss://" + base[len("https://"):] + "/ws"
+    if base.startswith("http://"):
+        return "ws://" + base[len("http://"):] + "/ws"
+    return base + "/ws"
+async def run_task_ws(host: str, client: OpenAI, task_id: str) -> dict:
+    """
+    Run one full episode for task_id over a WebSocket connection.
+    Returns the grader result dict.
+    """
+    print(f"\n{'='*52}")
+    print(f"  Task: {task_id.upper()}")
+    print(f"{'='*52}")
+    url = ws_url(host)
+    print(f"  Connecting to {url} ...")
+    async with websockets.connect(url, open_timeout=30, ping_interval=20) as ws:
+        # ── reset ────────────────────────────────────────────────────────
+        await ws.send(json.dumps({
+            "type": "reset",
+            "data": {"task_id": task_id, "seed": SEED},
+        }))
+        resp = json.loads(await ws.recv())
+        if resp["type"] == "error":
+            raise RuntimeError(f"reset error: {resp['data']['message']}")
+        episode_id = resp["data"]["episode_id"]
+        obs        = resp["data"]["observation"]
+        print(f"  Episode ID: {episode_id}")
+        print(f"  Initial obs: {obs}")
+        step         = 0
+        total_reward = 0.0
+        done         = False
+        # ── step loop ────────────────────────────────────────────────────
+        while not done:
+            step += 1
+            # Get action from LLM (with fallback on parse error)
+            try:
+                action = query_llm(client, obs)
+                # Validate required keys are present
+                assert "action_type" in action
+                assert "batch_size"  in action
+                assert "strategy_weights" in action
+            except Exception as e:
+                print(f"  Step {step}: LLM parse error ({e}), using fallback")
+                action = FALLBACK_ACTION
+            await ws.send(json.dumps({"type": "step", "data": action}))
+            resp = json.loads(await ws.recv())
+            if resp["type"] == "error":
+                print(f"  Step {step}: server error: {resp['data']['message']}")
+                break
+            data         = resp["data"]
+            obs          = data["observation"]
+            # reward is wrapped in {"value": float} per Reward model
+            raw_reward   = data["reward"]
+            reward       = raw_reward["value"] if isinstance(raw_reward, dict) else float(raw_reward)
+            done         = data["done"]
+            total_reward += reward
+            print(
+                f"  Step {step:2d} | perf={obs['current_performance']:.4f} "
+                f"budget={obs['remaining_budget']:3d} "
+                f"reward={reward:+.4f} "
+                f"noise_est={obs['noise_estimate']:.3f}"
+            )
+        # ── close WebSocket cleanly ───────────────────────────────────────
+        await ws.send(json.dumps({"type": "close", "data": {}}))
+        try:
+            await asyncio.wait_for(ws.recv(), timeout=2.0)
+        except (asyncio.TimeoutError, websockets.exceptions.ConnectionClosed):
+            pass
+    print(f"\n  Episode done after {step} steps | total_reward={total_reward:.4f}")
+    print(f"  Final performance: {obs['current_performance']:.4f}")
+    # ── grade via HTTP (grader endpoint doesn't need WebSocket) ──────────
+    r = requests.post(
+        f"{http_base(host)}/grader",
+        json={"episode_id": episode_id, "task_id": task_id},
+        timeout=15,
+    )
+    r.raise_for_status()
+    grade = r.json()
+    print(f"  Score:   {grade['score']:.4f}")
+    print(f"  Passed:  {grade['passed']}")
+    print(f"  Details: {grade['breakdown']}")
+    return {
+        "task_id":           task_id,
+        "score":             grade["score"],
+        "passed":            grade["passed"],
+        "breakdown":         grade["breakdown"],
+        "steps":             step,
+        "total_reward":      round(total_reward, 4),
+        "final_performance": obs["current_performance"],
+    }
+# ---------------------------------------------------------------------------
+# Main
+# ---------------------------------------------------------------------------
+async def amain(host: str, client: OpenAI) -> None:
+    results = {}
+    for task_id in TASKS:
+        results[task_id] = await run_task_ws(host, client, task_id)
+    print(f"\n{'='*52}")
+    print("  INFERENCE RESULTS SUMMARY")
+    print(f"{'='*52}")
+    print(f"{'Task':<10} {'Score':<8} {'Passed':<8} {'Final Perf':<12} {'Steps'}")
+    print("-" * 52)
+    for task_id, r in results.items():
+        print(
+            f"{task_id:<10} {r['score']:<8.4f} {str(r['passed']):<8} "
+            f"{r['final_performance']:<12.4f} {r['steps']}"
+        )
+    overall = sum(r["score"] for r in results.values()) / len(results)
+    print(f"\nOverall mean score: {overall:.4f}")
+    print(json.dumps({"results": results, "mean_score": round(overall, 4)}, indent=2))
+def main() -> None:
+    parser = argparse.ArgumentParser(description="DataSelectEnv WebSocket inference script")
+    parser.add_argument("--host", default=DEFAULT_HOST,
+                        help="Environment server base URL (http or https)")
+    args = parser.parse_args()
+    api_key = os.getenv("HF_TOKEN") or os.getenv("OPENAI_API_KEY")
+    if not api_key:
+        print("ERROR: Set HF_TOKEN or OPENAI_API_KEY environment variable.")
+        sys.exit(1)
+    client = OpenAI(api_key=api_key, base_url=API_BASE_URL)
+    # Health check over HTTP
+    try:
+        r = requests.get(f"{http_base(args.host)}/health", timeout=10)
+        r.raise_for_status()
+        print(f"Connected to {args.host} — {r.json()}")
+    except Exception as e:
+        print(f"ERROR: Could not reach environment at {args.host}: {e}")
+        sys.exit(1)
+    asyncio.run(amain(args.host, client))
+if __name__ == "__main__":
+    main()

models.py CHANGED Viewed

@@ -1,4 +1,4 @@
-from pydantic import BaseModel, Field
 from typing import Dict, Literal, Optional
@@ -15,6 +15,12 @@ class Action(BaseModel):
     batch_size: int = Field(ge=0)
     strategy_weights: Dict[str, float]
 class Reward(BaseModel):
     value: float

+from pydantic import BaseModel, Field, validator
 from typing import Dict, Literal, Optional
     batch_size: int = Field(ge=0)
     strategy_weights: Dict[str, float]
+    @validator('strategy_weights')
+    def weights_not_empty(cls, v):
+        if not v:
+            raise ValueError('strategy_weights cannot be empty')
+        return v
 class Reward(BaseModel):
     value: float

openenv.yaml CHANGED Viewed

@@ -78,16 +78,16 @@ tasks:
     description: >
       High noise (flip_y=0.25), budget=150, max_steps=12.
       Agent must reach performance > 0.52 while keeping average
-      noise selection rate below 0.30.
-    success_criteria: "current_performance > 0.52 AND avg_noise_ratio < 0.30"
   - id: hard
     difficulty: hard
     description: >
       High noise (flip_y=0.30), tight budget=100, max_steps=8.
-      Agent must hit performance > 0.53 efficiently.
       Grader scores performance and budget efficiency jointly.
-    success_criteria: "current_performance > 0.53, scored jointly with efficiency"
 reward:
   type: continuous
@@ -98,6 +98,7 @@ reward:
     Provides dense signal throughout the episode — not just at termination.
 endpoints:
   reset:   POST /reset
   step:    POST /step
   state:   GET  /state

     description: >
       High noise (flip_y=0.25), budget=150, max_steps=12.
       Agent must reach performance > 0.52 while keeping average
+      noise selection rate below 0.50.
+    success_criteria: "current_performance > 0.52 AND avg_noise_ratio < 0.50"
   - id: hard
     difficulty: hard
     description: >
       High noise (flip_y=0.30), tight budget=100, max_steps=8.
+      Agent must hit performance > 0.58 efficiently.
       Grader scores performance and budget efficiency jointly.
+    success_criteria: "current_performance > 0.58, scored jointly with efficiency"
 reward:
   type: continuous
     Provides dense signal throughout the episode — not just at termination.
 endpoints:
+  websocket: WS   /ws       # primary transport; required on HF Spaces
   reset:   POST /reset
   step:    POST /step
   state:   GET  /state

pyproject.toml ADDED Viewed

	@@ -0,0 +1,30 @@

+[build-system]
+requires = ["setuptools>=68", "wheel"]
+build-backend = "setuptools.backends.legacy:build"
+[project]
+name = "dataselectenv"
+version = "0.1.0"
+description = "RL environment for optimal data selection under cost, noise, and diversity constraints"
+requires-python = ">=3.10"
+dependencies = [
+    "scikit-learn",
+    "numpy",
+    "pydantic",
+    "fastapi",
+    "uvicorn[standard]",
+    "openai",
+    "websockets",
+    "python-dotenv",
+    "openenv-core>=0.2.0",
+]
+[project.scripts]
+server = "server.app:main"
+[project.optional-dependencies]
+dev = ["pytest", "httpx"]
+[tool.openenv]
+env_id = "DataSelectEnv-v0"
+entry_point = "server:app"

requirements.txt CHANGED Viewed

@@ -4,4 +4,5 @@ pydantic==2.7.4
 numpy==1.26.4
 scikit-learn==1.5.1
 openai==1.40.0
-requests==2.32.3

 numpy==1.26.4
 scikit-learn==1.5.1
 openai==1.40.0
+requests==2.32.3
+websockets>=12.0

server.py CHANGED Viewed

@@ -20,12 +20,12 @@ import uuid
 from typing import Any, Dict, Optional
 import numpy as np
-from fastapi import FastAPI, HTTPException
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel
 from env import DataSelectEnv
-from models import Action, EnvState, Observation
 # ---------------------------------------------------------------------------
 # App
@@ -63,6 +63,7 @@ BASE_CFG = {
     "budget":    300,
     "max_steps": 15,
     "alpha":     0.2,
 }
 # ---------------------------------------------------------------------------
@@ -80,9 +81,10 @@ TASKS = {
         ),
         "success_criteria": "current_performance > 0.55 at episode end",
         "cfg_overrides": {
-            "data":      {"flip_y": 0.05},
-            "budget":    300,
-            "max_steps": 15,
         },
     },
     "medium": {
@@ -91,13 +93,14 @@ TASKS = {
         "description": (
             "High noise (flip_y=0.25), budget=150, max_steps=12. "
             "Agent must reach performance > 0.52 while keeping average "
-            "noise selection rate below 0.30. Uncertainty-only strategies fail."
         ),
-        "success_criteria": "current_performance > 0.52 AND avg noise_ratio < 0.30",
         "cfg_overrides": {
-            "data":      {"flip_y": 0.25},
-            "budget":    150,
-            "max_steps": 12,
         },
     },
     "hard": {
@@ -105,15 +108,16 @@ TASKS = {
         "difficulty": "hard",
         "description": (
             "High noise (flip_y=0.30), tight budget=100, max_steps=8. "
-            "Agent must hit performance > 0.53 efficiently. "
             "Grader scores performance and budget efficiency jointly. "
             "Requires precise noise-aware + diversity-aware strategy."
         ),
-        "success_criteria": "performance > 0.53, scored jointly with budget efficiency",
         "cfg_overrides": {
-            "data":      {"flip_y": 0.30},
-            "budget":    100,
-            "max_steps": 8,
         },
     },
 }
@@ -159,6 +163,9 @@ class EpisodeStore:
 store = EpisodeStore()
 # ---------------------------------------------------------------------------
 # Request / response schemas
 # ---------------------------------------------------------------------------
@@ -199,19 +206,19 @@ def _grade(task_id: str, obs: Observation, noise_ratios: list, cfg: dict) -> Gra
     perf = obs.current_performance
     if task_id == "easy":
-        # Single dimension: raw performance
-        score = float(np.clip((perf - 0.45) / (0.65 - 0.45), 0.0, 1.0))
-        passed = perf > 0.55
         breakdown: Dict[str, Any] = {"performance_score": round(score, 4)}
     elif task_id == "medium":
         avg_noise = float(np.mean(noise_ratios)) if noise_ratios else 1.0
         # Performance sub-score
         perf_score  = float(np.clip((perf - 0.42) / (0.62 - 0.42), 0.0, 1.0))
-        # Noise avoidance sub-score: full marks at 0 noise, zero at >=0.30
-        noise_score = float(np.clip(1.0 - avg_noise / 0.30, 0.0, 1.0))
         score  = round(0.6 * perf_score + 0.4 * noise_score, 4)
-        passed = perf > 0.52 and avg_noise < 0.30
         breakdown = {
             "performance_score": round(perf_score,  4),
             "noise_score":       round(noise_score, 4),
@@ -221,12 +228,12 @@ def _grade(task_id: str, obs: Observation, noise_ratios: list, cfg: dict) -> Gra
     else:  # hard
         budget_total = cfg["budget"]
         budget_used  = budget_total - obs.remaining_budget
-        perf_score   = float(np.clip((perf - 0.43) / (0.63 - 0.43), 0.0, 1.0))
-        # Efficiency: reward finishing with budget left; +0.2 grace so
-        # spending most of the budget still gets partial efficiency credit
-        efficiency   = float(np.clip(1.0 - budget_used / budget_total + 0.2, 0.0, 1.0))
         score  = round(0.65 * perf_score + 0.35 * efficiency, 4)
-        passed = perf > 0.53
         breakdown = {
             "performance_score": round(perf_score, 4),
             "efficiency_score":  round(efficiency, 4),
@@ -311,11 +318,19 @@ def step(req: StepRequest):
     if "noise_ratio" in info:
         store.noise_ratios.append(info["noise_ratio"])
     return {
         "episode_id":  store.episode_id,
         "step":        store.step_count,
         "observation": obs.model_dump(),
-        "reward":      round(float(reward), 6),
         "done":        done,
         "info":        info,
     }
@@ -360,28 +375,40 @@ def tasks():
 @app.post("/grader")
 def grader(req: GraderRequest):
     """
-    Score the most recently completed episode.
     Body: { "episode_id": "...", "task_id": "easy|medium|hard" }
-    episode_id must match the active episode; episode must be done.
     """
-    if store.episode_id != req.episode_id:
-        raise HTTPException(
-            status_code=400,
-            detail="episode_id does not match the current episode.",
-        )
-    if not store.done:
         raise HTTPException(
             status_code=400,
-            detail="Episode is not finished. Keep stepping until done=True.",
         )
-    if req.task_id not in TASKS:
         raise HTTPException(
             status_code=400,
-            detail=f"Unknown task_id '{req.task_id}'.",
         )
-    return _grade(req.task_id, store.final_obs, store.noise_ratios, store._cfg)
 @app.get("/baseline")
@@ -435,6 +462,125 @@ def baseline():
     }
 # ---------------------------------------------------------------------------
 # Entry point
 # ---------------------------------------------------------------------------

 from typing import Any, Dict, Optional
 import numpy as np
+from fastapi import FastAPI, HTTPException, WebSocket, WebSocketDisconnect
 from fastapi.middleware.cors import CORSMiddleware
 from pydantic import BaseModel
 from env import DataSelectEnv
+from models import Action, EnvState, Observation, Reward
 # ---------------------------------------------------------------------------
 # App
     "budget":    300,
     "max_steps": 15,
     "alpha":     0.2,
+    "min_batch": 5,
 }
 # ---------------------------------------------------------------------------
         ),
         "success_criteria": "current_performance > 0.55 at episode end",
         "cfg_overrides": {
+            "data":           {"flip_y": 0.05},
+            "budget":         300,
+            "max_steps":      15,
+            "stop_threshold": 0.60,
         },
     },
     "medium": {
         "description": (
             "High noise (flip_y=0.25), budget=150, max_steps=12. "
             "Agent must reach performance > 0.52 while keeping average "
+            "noise selection rate below 0.45. Uncertainty-only strategies fail."
         ),
+        "success_criteria": "current_performance > 0.52 AND avg noise_ratio < 0.45",
         "cfg_overrides": {
+            "data":           {"flip_y": 0.25},
+            "budget":         150,
+            "max_steps":      12,
+            "stop_threshold": 0.57,
         },
     },
     "hard": {
         "difficulty": "hard",
         "description": (
             "High noise (flip_y=0.30), tight budget=100, max_steps=8. "
+            "Agent must hit performance > 0.58 efficiently. "
             "Grader scores performance and budget efficiency jointly. "
             "Requires precise noise-aware + diversity-aware strategy."
         ),
+        "success_criteria": "performance > 0.58, scored jointly with budget efficiency",
         "cfg_overrides": {
+            "data":           {"flip_y": 0.30},
+            "budget":         100,
+            "max_steps":      8,
+            "stop_threshold": 0.62,
         },
     },
 }
 store = EpisodeStore()
+# Completed episodes keyed by episode_id so /grader works after a subsequent reset()
+_completed: Dict[str, Dict[str, Any]] = {}
 # ---------------------------------------------------------------------------
 # Request / response schemas
 # ---------------------------------------------------------------------------
     perf = obs.current_performance
     if task_id == "easy":
+        # Single dimension: raw performance — range [0.55, 0.75] avoids saturation
+        score = float(np.clip((perf - 0.55) / (0.75 - 0.55), 0.0, 1.0))
+        passed = perf > 0.62
         breakdown: Dict[str, Any] = {"performance_score": round(score, 4)}
     elif task_id == "medium":
         avg_noise = float(np.mean(noise_ratios)) if noise_ratios else 1.0
         # Performance sub-score
         perf_score  = float(np.clip((perf - 0.42) / (0.62 - 0.42), 0.0, 1.0))
+        # Noise avoidance sub-score: full marks at 0 noise, zero at >=0.50
+        noise_score = float(np.clip(1.0 - avg_noise / 0.50, 0.0, 1.0))
         score  = round(0.6 * perf_score + 0.4 * noise_score, 4)
+        passed = perf > 0.52 and avg_noise < 0.50
         breakdown = {
             "performance_score": round(perf_score,  4),
             "noise_score":       round(noise_score, 4),
     else:  # hard
         budget_total = cfg["budget"]
         budget_used  = budget_total - obs.remaining_budget
+        perf_score   = float(np.clip((perf - 0.50) / (0.72 - 0.50), 0.0, 1.0))
+        # Efficiency: fraction of budget saved — no grace offset so it
+        # actually varies (0.0 = all spent, 1.0 = nothing spent)
+        efficiency   = float(np.clip(1.0 - budget_used / budget_total, 0.0, 1.0))
         score  = round(0.65 * perf_score + 0.35 * efficiency, 4)
+        passed = perf > 0.58
         breakdown = {
             "performance_score": round(perf_score, 4),
             "efficiency_score":  round(efficiency, 4),
     if "noise_ratio" in info:
         store.noise_ratios.append(info["noise_ratio"])
+    if done:
+        _completed[store.episode_id] = {
+            "final_obs":    obs,
+            "noise_ratios": list(store.noise_ratios),
+            "cfg":          store._cfg,
+            "task_id":      store.task_id,
+        }
     return {
         "episode_id":  store.episode_id,
         "step":        store.step_count,
         "observation": obs.model_dump(),
+        "reward":      Reward(value=round(float(reward), 6)).model_dump(),
         "done":        done,
         "info":        info,
     }
 @app.post("/grader")
 def grader(req: GraderRequest):
     """
+    Score a completed episode.
     Body: { "episode_id": "...", "task_id": "easy|medium|hard" }
+    Works even after a subsequent reset() — looks up by episode_id.
     """
+    if req.task_id not in TASKS:
         raise HTTPException(
             status_code=400,
+            detail=f"Unknown task_id '{req.task_id}'.",
         )
+    record = _completed.get(req.episode_id)
+    if record is None:
+        # Fall back to the active episode if it matches and is done
+        if store.episode_id == req.episode_id and store.done:
+            record = {
+                "final_obs":    store.final_obs,
+                "noise_ratios": store.noise_ratios,
+                "cfg":          store._cfg,
+                "task_id":      store.task_id,
+            }
+        else:
+            raise HTTPException(
+                status_code=404,
+                detail="episode_id not found or episode is not finished yet.",
+            )
+    if req.task_id != record["task_id"]:
         raise HTTPException(
             status_code=400,
+            detail=f"task_id mismatch: episode was '{record['task_id']}', got '{req.task_id}'.",
         )
+    return _grade(req.task_id, record["final_obs"], record["noise_ratios"], record["cfg"])
 @app.get("/baseline")
     }
+# ---------------------------------------------------------------------------
+# WebSocket endpoint — required by OpenEnv spec; primary client transport on
+# HF Spaces (HTTP /reset and /step are inaccessible after deployment there).
+#
+# Protocol: every message is {"type": str, "data": dict}
+#   Client → server types: "reset", "step", "state", "close"
+#   Server → client types: mirrors client type on success, "error" on failure
+# ---------------------------------------------------------------------------
+@app.websocket("/ws")
+async def websocket_endpoint(websocket: WebSocket):
+    await websocket.accept()
+    # Per-connection isolated state (no shared store)
+    ws_env:          DataSelectEnv | None = None
+    ws_cfg:          dict | None          = None
+    ws_episode_id:   str | None           = None
+    ws_task_id:      str | None           = None
+    ws_noise_ratios: list                 = []
+    ws_done:         bool                 = False
+    ws_final_obs:    Observation | None   = None
+    async def send_error(message: str, code: str = "error") -> None:
+        await websocket.send_json({"type": "error", "data": {"message": message, "code": code}})
+    try:
+        while True:
+            raw = await websocket.receive_json()
+            msg_type = raw.get("type")
+            msg_data = raw.get("data", {})
+            # ── reset ─��───────────────────────────────────────────────────
+            if msg_type == "reset":
+                tid  = msg_data.get("task_id", "easy")
+                seed = int(msg_data.get("seed", 42))
+                if tid not in TASKS:
+                    await send_error(
+                        f"Unknown task_id '{tid}'. Valid: {list(TASKS.keys())}",
+                        "invalid_task",
+                    )
+                    continue
+                ws_cfg          = _build_cfg(tid)
+                ws_env          = DataSelectEnv(ws_cfg, seed=seed)
+                obs             = ws_env.reset()
+                ws_task_id      = tid
+                ws_episode_id   = str(uuid.uuid4())
+                ws_noise_ratios = []
+                ws_done         = False
+                ws_final_obs    = obs
+                await websocket.send_json({
+                    "type": "reset",
+                    "data": {
+                        "episode_id":  ws_episode_id,
+                        "task_id":     ws_task_id,
+                        "observation": obs.model_dump(),
+                        "reward":      0.0,
+                        "done":        False,
+                    },
+                })
+            # ── step ──────────────────────────────────────────────────────
+            elif msg_type == "step":
+                if ws_env is None or ws_done:
+                    await send_error("No active episode. Send a reset message first.", "no_episode")
+                    continue
+                try:
+                    action = Action(**msg_data)
+                except Exception as exc:
+                    await send_error(f"Invalid action: {exc}", "invalid_action")
+                    continue
+                obs, reward, done, info = ws_env.step(action)
+                ws_done         = done
+                ws_final_obs    = obs
+                if "noise_ratio" in info:
+                    ws_noise_ratios.append(info["noise_ratio"])
+                await websocket.send_json({
+                    "type": "step",
+                    "data": {
+                        "episode_id":  ws_episode_id,
+                        "observation": obs.model_dump(),
+                        "reward":      round(float(reward), 6),
+                        "done":        done,
+                        "info":        info,
+                    },
+                })
+            # ── state ─────────────────────────────────────────────────────
+            elif msg_type == "state":
+                if ws_env is None:
+                    state_data = {
+                        "step_count": 0, "remaining_budget": None,
+                        "current_performance": None, "pool_size": None, "done": False,
+                    }
+                else:
+                    state_data = ws_env.get_state().model_dump()
+                await websocket.send_json({
+                    "type": "state",
+                    "data": {"episode_id": ws_episode_id, "task_id": ws_task_id, **state_data},
+                })
+            # ── close ─────────────────────────────────────────────────────
+            elif msg_type == "close":
+                await websocket.send_json({"type": "close", "data": {}})
+                break
+            else:
+                await send_error(f"Unknown message type '{msg_type}'", "unknown_type")
+    except WebSocketDisconnect:
+        pass  # client disconnected cleanly
 # ---------------------------------------------------------------------------
 # Entry point
 # ---------------------------------------------------------------------------

server/__init__.py ADDED Viewed

File without changes

server/app.py ADDED Viewed

	@@ -0,0 +1,39 @@

+"""
+server/app.py — OpenEnv entry point shim.
+Loads the root-level server.py by absolute file path so that the
+server/ package and the root server.py file can coexist without
+Python's import system preferring the package over the module.
+"""
+import importlib.util
+import os
+import sys
+# Add project root to path so root-level modules (env, models, etc.) resolve
+_root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
+if _root not in sys.path:
+    sys.path.insert(0, _root)
+# Load root server.py by file path, registered under a private module name
+# to avoid collision with the `server` package name.
+_spec = importlib.util.spec_from_file_location(
+    "_dataselectenv_server",
+    os.path.join(_root, "server.py"),
+)
+_mod = importlib.util.module_from_spec(_spec)
+sys.modules["_dataselectenv_server"] = _mod
+_spec.loader.exec_module(_mod)
+# Re-export the FastAPI app — this is what openenv and uvicorn look for.
+app = _mod.app
+def main() -> None:
+    """Entry point required by openenv validate and [project.scripts]."""
+    import uvicorn
+    port = int(os.environ.get("PORT", 7860))
+    uvicorn.run(app, host="0.0.0.0", port=port, reload=False)
+if __name__ == "__main__":
+    main()

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff