InosLihka Claude Opus 4.7 (1M context) commited on
Commit
105973d
·
1 Parent(s): bb2a9c7

client: surface ALL observation fields (was dropping deltas, anomalies, last_action, step_history)

Browse files

Pre-fix, an external user calling the deployed HF Space via this client got
a strictly worse view than the server returned: per-meter deltas, anomalies,
last_action, and the full step_history were silently dropped during JSON
parsing. This violated the 'client/server separation done right' criterion
in the hackathon rubric (training/WhatMakesAGoodSubmission.md): the client
was pretending fields didn't exist.

Fix: _parse_result now reconstructs the full RhythmObservation including
all 5 *_delta fields, last_action, and step_history with full StepRecord
fidelity (including the new vitality/cognition/progress/serenity/connection
anomaly fields added in iter 4 commit).

Also adds the hackathon submission criteria doc to the repo for reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (2) hide show
  1. client.py +39 -3
  2. training/WhatMakesAGoodSubmission.md +57 -0
client.py CHANGED
@@ -18,9 +18,9 @@ from openenv.core.client_types import StepResult
18
  from openenv.core.env_client import EnvClient
19
 
20
  try:
21
- from .models import RhythmAction, RhythmObservation, RhythmState
22
  except ImportError:
23
- from models import RhythmAction, RhythmObservation, RhythmState
24
 
25
 
26
  class RhythmEnv(EnvClient[RhythmAction, RhythmObservation, RhythmState]):
@@ -38,9 +38,36 @@ class RhythmEnv(EnvClient[RhythmAction, RhythmObservation, RhythmState]):
38
  return {"action_type": action.action_type.value}
39
 
40
  def _parse_result(self, payload: Dict[str, Any]) -> StepResult[RhythmObservation]:
41
- """Parse server response into StepResult[RhythmObservation]."""
 
 
 
 
 
 
42
  obs_data = payload.get("observation", {})
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  observation = RhythmObservation(
45
  timestep=obs_data.get("timestep", 0),
46
  day=obs_data.get("day", 0),
@@ -56,6 +83,15 @@ class RhythmEnv(EnvClient[RhythmAction, RhythmObservation, RhythmState]):
56
  done=payload.get("done", False),
57
  reward=payload.get("reward", 0.0),
58
  metadata=obs_data.get("metadata", {}),
 
 
 
 
 
 
 
 
 
59
  )
60
 
61
  return StepResult(
 
18
  from openenv.core.env_client import EnvClient
19
 
20
  try:
21
+ from .models import RhythmAction, RhythmObservation, RhythmState, StepRecord
22
  except ImportError:
23
+ from models import RhythmAction, RhythmObservation, RhythmState, StepRecord
24
 
25
 
26
  class RhythmEnv(EnvClient[RhythmAction, RhythmObservation, RhythmState]):
 
38
  return {"action_type": action.action_type.value}
39
 
40
  def _parse_result(self, payload: Dict[str, Any]) -> StepResult[RhythmObservation]:
41
+ """Parse server response into StepResult[RhythmObservation].
42
+
43
+ Surfaces ALL observation fields the server returns, including the
44
+ per-meter deltas, anomalies (in step_history), last_action, and the
45
+ full step history. Without these, an external agent connecting to the
46
+ server can't see the meta-RL signals it needs to infer the profile.
47
+ """
48
  obs_data = payload.get("observation", {})
49
 
50
+ # Reconstruct step_history with full StepRecord fidelity
51
+ step_history_raw = obs_data.get("step_history", []) or []
52
+ step_history = [
53
+ StepRecord(
54
+ step=h.get("step", 0),
55
+ action=h.get("action", ""),
56
+ reward=h.get("reward", 0.0),
57
+ vitality_delta=h.get("vitality_delta", 0.0),
58
+ cognition_delta=h.get("cognition_delta", 0.0),
59
+ progress_delta=h.get("progress_delta", 0.0),
60
+ serenity_delta=h.get("serenity_delta", 0.0),
61
+ connection_delta=h.get("connection_delta", 0.0),
62
+ vitality_anomaly=h.get("vitality_anomaly", 0.0),
63
+ cognition_anomaly=h.get("cognition_anomaly", 0.0),
64
+ progress_anomaly=h.get("progress_anomaly", 0.0),
65
+ serenity_anomaly=h.get("serenity_anomaly", 0.0),
66
+ connection_anomaly=h.get("connection_anomaly", 0.0),
67
+ )
68
+ for h in step_history_raw
69
+ ]
70
+
71
  observation = RhythmObservation(
72
  timestep=obs_data.get("timestep", 0),
73
  day=obs_data.get("day", 0),
 
83
  done=payload.get("done", False),
84
  reward=payload.get("reward", 0.0),
85
  metadata=obs_data.get("metadata", {}),
86
+ # Per-meter deltas from THIS step (was being silently dropped)
87
+ vitality_delta=obs_data.get("vitality_delta", 0.0),
88
+ cognition_delta=obs_data.get("cognition_delta", 0.0),
89
+ progress_delta=obs_data.get("progress_delta", 0.0),
90
+ serenity_delta=obs_data.get("serenity_delta", 0.0),
91
+ connection_delta=obs_data.get("connection_delta", 0.0),
92
+ last_action=obs_data.get("last_action"),
93
+ # Rolling history with anomalies (the meta-RL signal)
94
+ step_history=step_history,
95
  )
96
 
97
  return StepResult(
training/WhatMakesAGoodSubmission.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # What makes a submission stand out:
2
+ Pick an ambitious, original problem
3
+ The themes (problems) are deliberately open. Use them as launching pads, not boxes. Judges have seen a lot of chess, snake, tic-tac-toe, and grid-world clones. To score well on innovation,
4
+ you need a genuinely fresh angle. Some questions to ask yourself:
5
+ Does this environment exist to teach an LLM something it currently can’t do well?
6
+ Is the domain underexplored in RL/LLM training?
7
+ Could a researcher write a paper about training on this?
8
+
9
+ Design a reward signal that actually teaches
10
+ A great environment has a reward function that:
11
+ Provides a rich, informative signal (not just 0/1 at the end)
12
+ Captures something hard to measure in a clever way
13
+ Uses OpenEnv’s Rubric system thoughtfully (composable rubrics > monolithic scoring)
14
+ Is hard to game; an agent that exploits the reward without solving the task should not get high scores
15
+
16
+ Show real training, end to end
17
+ The bar isn’t “training script exists.” The bar is “training script runs against the environment, the
18
+ agent learns, and you can show it.” Concretely:
19
+ Your training loop should connect to your environment (not a static dataset)
20
+ Train long enough that the curves mean something
21
+ Compare a trained agent vs. a random/untrained baseline; quantitative and/or qualitative
22
+ Include the plots and numbers in your README and writeup
23
+
24
+ Make your plots readable
25
+ Reviewers spend seconds, not minutes, on each plot. Help them out:
26
+ Label both axes (e.g. “training step” / “episode” on x, “reward” / “loss” on y) and include units where they apply
27
+ Save plots as .png or .jpg and commit them to the repo (don’t leave them only in a Colab cell or a deleted Wandb run) (if you ran via WANBD, please include the link to that specific run of your plots)
28
+ Embed the key plots in your README with a one-line caption explaining what each one shows If you have multiple runs (baseline vs. trained, ablations, etc.), put them on the same axes so the comparison is obvious
29
+
30
+
31
+ Tell a story, not an API doc
32
+ Your README, blog, and pitch should answer:
33
+ Problem) what capability gap or interesting domain are you targeting?
34
+ Environment) what does the agent see, do, and get rewarded for?
35
+ Results) what changed after training? Show it.
36
+ Why does it matter) who would care, and why?
37
+
38
+ A reviewer should be able to read your README in 3~5 minutes and want to try your
39
+ environment.
40
+
41
+ NOTE: If you have a video, HF post, or anything else interesting, please make sure that it’s linked
42
+ from your README.
43
+
44
+
45
+ Engineer it cleanly (table stakes)
46
+ Engineering quality matters less than ambition, but sloppy work hurts. Make sure you:
47
+ Use OpenEnv’s Environment / MCPEnvironment base classes properly
48
+ Respect the client / server separation (clients should never import server internals)
49
+ Follow the standard Gym-style API (reset, step, state)
50
+ Have a valid openenv.yaml manifest
51
+ Don’t use reserved tool names (reset, step, state, close) for MCP tools
52
+
53
+ Final Note
54
+ Judges are looking for environments that push the frontier of what we can train LLMs to do. Be
55
+ ambitious. Pick a problem you find genuinely interesting; that almost always produces better
56
+ work than chasing what you think judges want. Good luck.
57
+