Spaces:
Sleeping
Sleeping
refactor: rewrite blog around product vision; fix UI for Gradio 6
Browse filesBlog now opens with the concrete product scenario (watch/calendar/sleep
tracker inputs, Accept-or-Ignore as rewards) and frames RhythmEnv as the
training curriculum for the inference skill the product needs.
UI: replace ASCII meter bars with HTML progress bars, add week calendar
grid and live matplotlib meter trajectory chart, merge configure+play
into one tab. Fix Gradio 6 incompatibilities (theme kwarg, show_copy_button).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docs/blog_post.md +53 -67
- ui/app.py +297 -243
docs/blog_post.md
CHANGED
|
@@ -1,123 +1,109 @@
|
|
| 1 |
# Teaching an AI to Know You (Without Asking)
|
| 2 |
|
| 3 |
-
|
| 4 |
|
| 5 |
-
|
| 6 |
|
| 7 |
-
|
| 8 |
|
| 9 |
-
|
| 10 |
|
| 11 |
-
## The
|
| 12 |
|
| 13 |
-
|
| 14 |
|
| 15 |
-
|
| 16 |
-
- It can't observe your actual behavioral responses to recommendations
|
| 17 |
-
- It runs in the cloud, costs per query, and can't be always-on or private
|
| 18 |
-
- Most users can't accurately describe their own patterns anyway
|
| 19 |
-
|
| 20 |
-
What we actually need is a small model β something that can run cheaply, frequently, and eventually on-device β that builds up a model of you from *how you respond*, not from what you say about yourself. That's the gap RhythmEnv is designed to train for.
|
| 21 |
|
| 22 |
-
I work on AI at Microsoft. One thing I kept running into building assistant features was the gap between what users *say* they want and what actually helps them. People are bad at introspecting their own patterns. The introvert who says "I don't mind meetings" because they've
|
| 23 |
|
| 24 |
-
Preference forms capture what people believe about themselves.
|
| 25 |
|
| 26 |
-
## The
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |
-
|
| 31 |
|
| 32 |
-
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
-
## The
|
| 37 |
|
| 38 |
-
Here's the part
|
| 39 |
|
| 40 |
-
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
-
|
| 43 |
-
- The extrovert values **connection** above all (75% weight). A week full of meaningful social interactions is a great week, even if they didn't get much work done.
|
| 44 |
-
- The workaholic values **progress** above all (70% weight). Deep productive work is the whole point. Everything else is secondary.
|
| 45 |
|
| 46 |
-
|
| 47 |
|
| 48 |
-
##
|
| 49 |
|
| 50 |
-
RhythmEnv simulates one week in a person's life
|
| 51 |
|
| 52 |
-
Five
|
| 53 |
|
| 54 |
-
- **Vitality** β physical energy. Sleep
|
| 55 |
- **Cognition** β mental sharpness. Peaks in the morning for some, evening for others.
|
| 56 |
- **Progress** β career momentum. Only goes up when you work.
|
| 57 |
- **Serenity** β inner calm. Meditation helps. Overwork kills it.
|
| 58 |
-
- **Connection** β relationship health. Decays passively every time slot.
|
| 59 |
-
|
| 60 |
-
After every action, meters shift. The agent sees the new meter values and gets a reward. That reward is the hidden weighted sum of what changed β and the weights are different for every person type.
|
| 61 |
-
|
| 62 |
-
## Why Identical Actions Produce Different Results
|
| 63 |
-
|
| 64 |
-
The trait modifiers change how actions physically affect the person, not just how rewards are computed.
|
| 65 |
-
|
| 66 |
-
Tell the introvert to socialize: their vitality drops 3Γ faster than normal. Their body physically rejects it. Tell the extrovert the same thing: barely any drain. They could socialize all day.
|
| 67 |
-
|
| 68 |
-
Tell the introvert to meditate: they get a bonus +0.10 serenity on top of the base effect. Alone time is their recharge mechanism. Tell the workaholic the same thing: their serenity *drops* by 0.10, because idle activities make them anxious.
|
| 69 |
|
| 70 |
-
|
| 71 |
|
| 72 |
-
|
| 73 |
|
| 74 |
-
|
| 75 |
|
| 76 |
-
The
|
| 77 |
|
| 78 |
-
|
| 79 |
|
| 80 |
-
|
| 81 |
|
| 82 |
-
|
| 83 |
|
| 84 |
-
|
| 85 |
|
| 86 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
-
|
| 89 |
|
| 90 |
-
|
| 91 |
|
| 92 |
-
|
| 93 |
|
| 94 |
-
|
| 95 |
|
| 96 |
-
|
| 97 |
-
|---------|-------------|--------|--------------|--------|
|
| 98 |
-
| Introvert | MEDITATE | +1.76 | SOCIALIZE | +0.03 |
|
| 99 |
-
| Extrovert | FAMILY_TIME | +2.63 | ME_TIME | β0.42 |
|
| 100 |
-
| Workaholic | DEEP_WORK | +1.57 | ME_TIME | β0.27 |
|
| 101 |
|
| 102 |
-
The
|
| 103 |
|
| 104 |
-
|
| 105 |
|
| 106 |
-
|
| 107 |
|
| 108 |
-
|
| 109 |
|
| 110 |
-
The
|
| 111 |
|
| 112 |
-
|
| 113 |
|
| 114 |
-
|
| 115 |
|
| 116 |
---
|
| 117 |
|
| 118 |
**Links:**
|
| 119 |
- [Live Environment on HF Spaces](https://huggingface.co/spaces/InosLihka/rhythm_env)
|
| 120 |
- [Training Notebook (Colab)](../training/RhythmEnv_GRPO_Training.ipynb)
|
| 121 |
-
- [Source Code
|
| 122 |
|
| 123 |
*Built for the Meta OpenEnv Hackathon Grand Finale, April 2026.*
|
|
|
|
| 1 |
# Teaching an AI to Know You (Without Asking)
|
| 2 |
|
| 3 |
+
Imagine this. It's 2pm. You had deep work blocked on your calendar. Your AI assistant sends you a nudge:
|
| 4 |
|
| 5 |
+
> *"I know you planned Deep Work now, but your focus metrics just dropped below 20%. If you push through, you'll likely spend 3 hours on something that would take 1 hour at peak. Take a 20-minute rest first β I'll remind you when your window opens."*
|
| 6 |
|
| 7 |
+
You tap Accept or Ignore. Either way, the agent just learned something about you.
|
| 8 |
|
| 9 |
+
That's the product vision. But there's a problem nobody has solved cleanly: how does the AI know that rest-then-work is the right call *for you specifically*, and not just generically good advice?
|
| 10 |
|
| 11 |
+
## The gap that everyone papers over
|
| 12 |
|
| 13 |
+
Most AI assistants give the same advice to everyone. They know best practices β sleep enough, work in the morning, don't skip exercise. That's useful for nobody who isn't already average.
|
| 14 |
|
| 15 |
+
The people who give you genuinely good advice about your life have learned you over time. A great EA, a close friend, a good coach β none of them sat you down with a questionnaire. They watched how you responded to things. They noticed that you're wrecked after back-to-back meetings even when you say you're fine. That you do your sharpest thinking before anyone else is online. That skipping one workout makes you irritable by Wednesday.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
+
I work on AI at Microsoft. One thing I kept running into building assistant features was the gap between what users *say* they want and what actually helps them. People are bad at introspecting their own patterns. The introvert who says "I don't mind meetings" because they've normalised the drain. The workaholic who checks "I value work-life balance" because they know they should.
|
| 18 |
|
| 19 |
+
Preference forms capture what people believe about themselves. Behaviour reveals what's actually true.
|
| 20 |
|
| 21 |
+
## The real-world input problem
|
| 22 |
|
| 23 |
+
You wouldn't manually type "I am at 40% energy." That's a chore nobody does.
|
| 24 |
|
| 25 |
+
The real input comes from devices you already carry. Your watch sends resting heart rate and HRV β that's Vitality and Serenity. Your calendar sends meeting density and deadline proximity β that's Progress pressure. Your sleep tracker sends last night's data β that's Cognition. Your phone knows whether you've been social or isolated.
|
| 26 |
|
| 27 |
+
The agent never asks how you feel. It reads what your devices already know.
|
| 28 |
|
| 29 |
+
And the reward signal? It comes from you, passively. Every time the agent makes a recommendation and you Accept or Ignore it, that choice is data. Accept means "yes, that was the right read." Ignore means "you got something wrong about me." Over hundreds of those micro-interactions, the agent builds a precise model of who you are β not the person you describe yourself to be.
|
| 30 |
|
| 31 |
+
## The foundational problem: teaching the inference skill to a small model
|
| 32 |
|
| 33 |
+
Here's the hard part. A frontier model like GPT-4 can already do decent personalised planning if you describe yourself in the prompt. But that doesn't work at scale:
|
| 34 |
|
| 35 |
+
- You have to describe yourself every single session
|
| 36 |
+
- The model can't observe your actual responses to its recommendations
|
| 37 |
+
- It runs in the cloud, costs per query, can't be always-on or private
|
| 38 |
+
- Most users can't accurately describe their own patterns anyway
|
| 39 |
|
| 40 |
+
What the real product needs is a small model β one that can run cheaply, close to you, eventually on-device β that builds up a model of you from *how you respond*, not from what you say about yourself.
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
That's the inference skill we're training. **RhythmEnv is the curriculum.**
|
| 43 |
|
| 44 |
+
## How the training environment works
|
| 45 |
|
| 46 |
+
RhythmEnv simulates one week in a person's life β 7 days, 4 time slots each, 28 decisions. Each decision is an activity: deep work, exercise, sleep, meditation, family time, socialising. Ten options.
|
| 47 |
|
| 48 |
+
Five meters track the person's state:
|
| 49 |
|
| 50 |
+
- **Vitality** β physical energy. Sleep fills it. Work drains it.
|
| 51 |
- **Cognition** β mental sharpness. Peaks in the morning for some, evening for others.
|
| 52 |
- **Progress** β career momentum. Only goes up when you work.
|
| 53 |
- **Serenity** β inner calm. Meditation helps. Overwork kills it.
|
| 54 |
+
- **Connection** β relationship health. Decays passively every time slot. Ignore it and it quietly drops.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
|
| 56 |
+
Hidden underneath is a personality profile. The agent can't see it. It controls both *what the person values* (their hidden reward weights) and *how actions physically affect them* (their hidden trait modifiers).
|
| 57 |
|
| 58 |
+
Three profiles, wildly different hidden mechanics:
|
| 59 |
|
| 60 |
+
The **introvert morning person** values serenity above everything (60% of their score). Socialising drains their vitality three times faster than the base rate. Meditating gives them a bonus +0.10 serenity on top of the base effect. Deep work in the morning gives double progress. The agent discovers: *mornings are sacred, social events are costly, alone time heals.*
|
| 61 |
|
| 62 |
+
The **extrovert night owl** values connection above everything (75%). Socialising barely costs any vitality β they could do it all day. Deep work in the morning gives only 40% of expected output. The same work in the evening gives 1.8Γ output. The agent discovers: *protect the mornings for rest, do cognitive work at night, keep socialising high.*
|
| 63 |
|
| 64 |
+
The **workaholic stoic** values progress above everything (70%). Productive work actually *recovers* vitality for them β output is energising. Idle activities like leisure or passive rest drain their serenity β the guilt is real. The agent discovers: *keep working, rest only when vitality is critical, never let idle time accumulate.*
|
| 65 |
|
| 66 |
+
## What the agent must figure out
|
| 67 |
|
| 68 |
+
The agent sees meters, time of day, and a reward signal. It doesn't see the profile, the trait values, or the reward weights.
|
| 69 |
|
| 70 |
+
Same action, same starting state β completely different reward depending on who you're helping:
|
| 71 |
|
| 72 |
+
| Profile | DEEP_WORK reward (step 1) |
|
| 73 |
+
|---|---|
|
| 74 |
+
| Workaholic | +1.57 |
|
| 75 |
+
| Introvert | +0.32 |
|
| 76 |
+
| Extrovert | β0.39 |
|
| 77 |
|
| 78 |
+
The extrovert gets a *negative* reward from deep work first thing β because it gives zero connection, and connection is 75% of their score.
|
| 79 |
|
| 80 |
+
A good agent should probe in the first few steps, read the unexpected meter changes, infer the hidden profile, and adapt its strategy for the rest of the week. This is the same skill the real product needs: detect who you are from how you respond, not from what you tell me.
|
| 81 |
|
| 82 |
+
## The training pipeline
|
| 83 |
|
| 84 |
+
We train using GRPO β Group Relative Policy Optimization. For each game state, generate multiple candidate actions, score them all against the real environment, update the model to prefer the higher-scoring ones. The environment is the critic.
|
| 85 |
|
| 86 |
+
The model is Qwen 2.5-3B with 4-bit quantization and LoRA. Small enough to train on a free Colab T4. Small enough to eventually run at the edge.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
+
The heuristic baseline β fixed rules, treats everyone the same β scores around 0.76β0.82. Sleep when vitality is low. Meditate when serenity drops. Socialise when connection falls. Reasonable advice for anyone. Wrong advice for someone specifically.
|
| 89 |
|
| 90 |
+
A trained agent that discovers the hidden personality should beat the heuristic by doing something qualitatively different: the introvert's week should look nothing like the extrovert's week. That differentiation is the signal that real inference is happening.
|
| 91 |
|
| 92 |
+
## Why simulation first
|
| 93 |
|
| 94 |
+
Everything here is simulated. The person doesn't exist. The meters aren't biometric readings. This is standard practice β robotics RL trains in simulation before deploying on hardware. The simulator is the curriculum, not the product.
|
| 95 |
|
| 96 |
+
The inference skill transfers. An agent that learns to detect "this person's vitality drops 3Γ faster from social events than expected" from simulated reward signals learns the *structure* of the problem. When the medium changes β when vitality comes from HRV instead of a formula β the skill of reading differential responses still applies.
|
| 97 |
|
| 98 |
+
The Accept/Ignore loop in the real product is the same reward signal, made human. Every time you ignore a recommendation, you're telling the agent: "you read me wrong." Every Accept says: "that was right." Over enough interactions, the model converges on your hidden profile without you ever having to describe it.
|
| 99 |
|
| 100 |
+
No questionnaire. No settings page. Just devices watching, signals flowing, and an agent that gets better at knowing you every week.
|
| 101 |
|
| 102 |
---
|
| 103 |
|
| 104 |
**Links:**
|
| 105 |
- [Live Environment on HF Spaces](https://huggingface.co/spaces/InosLihka/rhythm_env)
|
| 106 |
- [Training Notebook (Colab)](../training/RhythmEnv_GRPO_Training.ipynb)
|
| 107 |
+
- [Source Code](https://huggingface.co/spaces/InosLihka/rhythm_env)
|
| 108 |
|
| 109 |
*Built for the Meta OpenEnv Hackathon Grand Finale, April 2026.*
|
ui/app.py
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
"""
|
| 2 |
-
RhythmEnv Visual Explorer β Life Simulator
|
| 3 |
Run: python ui/app.py
|
| 4 |
"""
|
| 5 |
|
|
@@ -7,127 +7,191 @@ import sys
|
|
| 7 |
import os
|
| 8 |
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
| 9 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
import gradio as gr
|
| 11 |
from server.rhythm_environment import (
|
| 12 |
RhythmEnvironment, MAX_STEPS, METERS, ACTION_EFFECTS, PROFILES
|
| 13 |
)
|
| 14 |
from models import RhythmAction, ActionType
|
| 15 |
|
| 16 |
-
SLOT_NAMES
|
| 17 |
-
|
|
|
|
| 18 |
PROFILE_NAMES = ["introvert_morning", "extrovert_night_owl", "workaholic_stoic", "random"]
|
| 19 |
ACTION_NAMES = [at.value.upper() for at in ActionType]
|
| 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
# ---------------------------------------------------------------------------
|
| 22 |
-
#
|
| 23 |
# ---------------------------------------------------------------------------
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
-
def
|
| 33 |
-
day_name = DAY_NAMES[obs.day] if obs.day < 7 else f"Day {obs.day
|
| 34 |
slot_name = SLOT_NAMES[obs.slot] if obs.slot < 4 else f"Slot {obs.slot}"
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
f"Step {obs.timestep}/{MAX_STEPS} | "
|
| 39 |
-
f"{obs.remaining_steps} steps left{event_line}"
|
| 40 |
)
|
| 41 |
-
bars = [
|
| 42 |
-
_meter_bar(obs.vitality, "Vitality"),
|
| 43 |
-
_meter_bar(obs.cognition, "Cognition"),
|
| 44 |
-
_meter_bar(obs.progress, "Progress"),
|
| 45 |
-
_meter_bar(obs.serenity, "Serenity"),
|
| 46 |
-
_meter_bar(obs.connection, "Connection"),
|
| 47 |
-
]
|
| 48 |
-
return header + "\n\n" + "\n".join(bars)
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
def format_reward_breakdown(breakdown: dict) -> str:
|
| 52 |
-
if not breakdown:
|
| 53 |
-
return "β"
|
| 54 |
-
lines = []
|
| 55 |
-
for k, v in breakdown.items():
|
| 56 |
-
sign = "+" if v >= 0 else ""
|
| 57 |
-
lines.append(f" {k:<25} {sign}{v:.4f}")
|
| 58 |
-
return "\n".join(lines)
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
-
def show_action_effects() -> str:
|
| 62 |
-
header = f"{'Action':<15}" + "".join(f" {m[:3]:>6}" for m in METERS)
|
| 63 |
-
lines = [header, "β" * 52]
|
| 64 |
-
for action, effects in ACTION_EFFECTS.items():
|
| 65 |
-
row = f"{action:<15}"
|
| 66 |
-
for m in METERS:
|
| 67 |
-
row += f" {effects[m]:>+6.2f}"
|
| 68 |
-
lines.append(row)
|
| 69 |
-
return "\n".join(lines)
|
| 70 |
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
-
def
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
lines.append(f" β’ Connection passive decay: β{p['connection_decay_rate']}/step")
|
| 106 |
-
return "\n".join(lines)
|
| 107 |
|
| 108 |
|
| 109 |
# ---------------------------------------------------------------------------
|
| 110 |
-
#
|
| 111 |
# ---------------------------------------------------------------------------
|
| 112 |
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
|
| 117 |
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 123 |
|
| 124 |
|
| 125 |
# ---------------------------------------------------------------------------
|
| 126 |
-
# Tab 1
|
| 127 |
# ---------------------------------------------------------------------------
|
| 128 |
|
|
|
|
|
|
|
| 129 |
def reset_episode(profile_name: str, seed_str: str):
|
| 130 |
-
global _last_obs, _step_log
|
| 131 |
|
| 132 |
try:
|
| 133 |
seed = int(seed_str.strip()) if seed_str.strip() else 42
|
|
@@ -135,126 +199,149 @@ def reset_episode(profile_name: str, seed_str: str):
|
|
| 135 |
seed = 42
|
| 136 |
|
| 137 |
env = get_env()
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
meters_text = format_meters(_last_obs)
|
| 146 |
-
log = (
|
| 147 |
-
f"Episode started.\n"
|
| 148 |
-
f" Profile : {env._profile['name']}\n"
|
| 149 |
-
f" Seed : {seed}\n\n"
|
| 150 |
-
"Choose an action and press Take Step, or use an auto-run button."
|
| 151 |
)
|
| 152 |
-
return meters_text, log, "β", "β", False
|
| 153 |
|
| 154 |
|
| 155 |
-
# ---------------------------------------------------------------------------
|
| 156 |
-
# Tab 2 β Run Episode
|
| 157 |
-
# ---------------------------------------------------------------------------
|
| 158 |
-
|
| 159 |
def take_action(action_str: str):
|
| 160 |
-
global _last_obs, _step_log
|
| 161 |
|
| 162 |
if _last_obs is None:
|
| 163 |
-
return "β οΈ
|
| 164 |
if _last_obs.done:
|
| 165 |
-
return
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
|
| 167 |
-
env
|
| 168 |
-
|
| 169 |
-
obs = env.step(RhythmAction(action_type=action_type))
|
| 170 |
_last_obs = obs
|
|
|
|
|
|
|
| 171 |
|
| 172 |
-
sign = "+" if obs.reward >= 0 else ""
|
| 173 |
-
step_line = (
|
| 174 |
-
f"Step {obs.timestep:>2} | {action_str:<15} | "
|
| 175 |
-
f"reward {sign}{obs.reward:.4f} | "
|
| 176 |
-
f"V:{obs.vitality:.2f} Co:{obs.cognition:.2f} "
|
| 177 |
-
f"P:{obs.progress:.2f} S:{obs.serenity:.2f} Cn:{obs.connection:.2f}"
|
| 178 |
-
)
|
| 179 |
-
if obs.active_event:
|
| 180 |
-
step_line += f" β‘{obs.active_event}"
|
| 181 |
if obs.done:
|
| 182 |
final = obs.reward_breakdown.get("final_score", 0.0)
|
| 183 |
-
|
| 184 |
-
|
| 185 |
|
| 186 |
-
|
| 187 |
f"Final: {obs.reward_breakdown.get('final_score', 0.0):.4f}"
|
| 188 |
if obs.done else f"Step reward: {obs.reward:+.4f}"
|
| 189 |
)
|
| 190 |
return (
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
)
|
| 197 |
|
| 198 |
|
| 199 |
-
def
|
| 200 |
-
|
| 201 |
-
global _last_obs, _step_log
|
| 202 |
import random as _random
|
|
|
|
| 203 |
|
| 204 |
try:
|
| 205 |
seed = int(seed_str.strip()) if seed_str.strip() else 42
|
| 206 |
except ValueError:
|
| 207 |
seed = 42
|
| 208 |
|
| 209 |
-
from training.inference_eval import heuristic_action
|
| 210 |
-
|
| 211 |
rng = _random.Random(seed + 999)
|
| 212 |
all_actions = list(ActionType)
|
| 213 |
-
|
| 214 |
env = get_env()
|
|
|
|
| 215 |
obs = env.reset(seed=seed) if profile_name == "random" else env.reset(seed=seed, profile=profile_name)
|
| 216 |
-
_last_obs
|
| 217 |
-
_step_log
|
| 218 |
-
|
| 219 |
-
total_reward = 0.0
|
| 220 |
|
| 221 |
while not obs.done:
|
| 222 |
-
if strategy == "heuristic"
|
| 223 |
-
action_type = heuristic_action(obs)
|
| 224 |
-
else:
|
| 225 |
-
action_type = rng.choice(all_actions)
|
| 226 |
obs = env.step(RhythmAction(action_type=action_type))
|
| 227 |
_last_obs = obs
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
line = (
|
| 231 |
-
f"Step {obs.timestep:>2} | {action_type.value.upper():<15} | "
|
| 232 |
-
f"reward {sign}{obs.reward:.4f} | "
|
| 233 |
-
f"V:{obs.vitality:.2f} Co:{obs.cognition:.2f} "
|
| 234 |
-
f"P:{obs.progress:.2f} S:{obs.serenity:.2f} Cn:{obs.connection:.2f}"
|
| 235 |
-
)
|
| 236 |
-
if obs.active_event:
|
| 237 |
-
line += f" β‘{obs.active_event}"
|
| 238 |
-
logs.append(line)
|
| 239 |
|
| 240 |
final = obs.reward_breakdown.get("final_score", 0.0)
|
| 241 |
-
|
| 242 |
-
_step_log = logs
|
| 243 |
|
| 244 |
return (
|
| 245 |
-
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
|
|
|
| 249 |
)
|
| 250 |
|
| 251 |
|
| 252 |
-
def
|
| 253 |
-
return
|
| 254 |
|
| 255 |
|
| 256 |
-
|
| 257 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 258 |
|
| 259 |
|
| 260 |
# ---------------------------------------------------------------------------
|
|
@@ -265,108 +352,70 @@ with gr.Blocks(title="RhythmEnv β Life Simulator") as demo:
|
|
| 265 |
|
| 266 |
gr.Markdown(
|
| 267 |
"# RhythmEnv β Life Simulator\n"
|
| 268 |
-
"**Can
|
| 269 |
"Balance 5 life meters across a 7-day week. "
|
| 270 |
-
"
|
| 271 |
-
"
|
| 272 |
)
|
| 273 |
|
| 274 |
with gr.Tabs():
|
| 275 |
|
| 276 |
-
# ββ Tab 1:
|
| 277 |
-
with gr.TabItem("
|
| 278 |
-
gr.Markdown("### Start a new episode")
|
| 279 |
|
| 280 |
with gr.Row():
|
| 281 |
profile_dd = gr.Dropdown(
|
| 282 |
-
choices=PROFILE_NAMES,
|
| 283 |
-
|
| 284 |
-
|
| 285 |
)
|
| 286 |
-
seed_in
|
| 287 |
-
|
| 288 |
-
reset_btn = gr.Button("βΆ Reset Episode", variant="primary")
|
| 289 |
|
| 290 |
-
gr.Markdown("---")
|
| 291 |
gr.Markdown(
|
| 292 |
-
"| Profile | Core
|
| 293 |
"|---|---|---|\n"
|
| 294 |
-
"| `introvert_morning` | Recharges alone, peaks at dawn |
|
| 295 |
-
"Social drain
|
| 296 |
-
"| `extrovert_night_owl` | Energised by people, peaks at night |
|
| 297 |
-
"Morning is a penalty zone
|
| 298 |
-
"| `workaholic_stoic` | Finds meaning in output, resilient |
|
| 299 |
-
"Idle time drains serenity
|
| 300 |
)
|
| 301 |
|
| 302 |
-
|
| 303 |
-
|
| 304 |
-
|
| 305 |
-
|
| 306 |
-
|
| 307 |
-
|
| 308 |
-
reset_btn.click(
|
| 309 |
-
reset_episode,
|
| 310 |
-
inputs=[profile_dd, seed_in],
|
| 311 |
-
outputs=[meters_out, log_out, breakdown_t1, score_t1, done_flag],
|
| 312 |
-
)
|
| 313 |
|
| 314 |
-
|
| 315 |
-
|
| 316 |
-
gr.Markdown("### Manual control β or use the auto-run buttons for a full episode")
|
| 317 |
|
| 318 |
with gr.Row():
|
| 319 |
action_dd = gr.Dropdown(
|
| 320 |
-
choices=ACTION_NAMES,
|
| 321 |
-
|
| 322 |
-
label="Action",
|
| 323 |
)
|
| 324 |
-
step_btn = gr.Button("βΆ
|
| 325 |
|
| 326 |
with gr.Row():
|
| 327 |
-
|
| 328 |
-
|
| 329 |
-
value="introvert_morning",
|
| 330 |
-
label="Profile (for auto-run)",
|
| 331 |
-
)
|
| 332 |
-
seed_in2 = gr.Textbox(label="Seed (for auto-run)", value="42", scale=1)
|
| 333 |
|
| 334 |
-
|
| 335 |
-
|
| 336 |
-
|
| 337 |
-
|
| 338 |
-
meters_display = gr.Textbox(label="Meters", lines=9, interactive=False)
|
| 339 |
-
score_display = gr.Textbox(label="Step Reward / Final Score", interactive=False)
|
| 340 |
-
log_display = gr.Textbox(label="Step Log", lines=20, interactive=False)
|
| 341 |
-
breakdown_display = gr.Textbox(label="Last Reward Breakdown", lines=8, interactive=False)
|
| 342 |
-
|
| 343 |
-
step_btn.click(
|
| 344 |
-
take_action,
|
| 345 |
-
inputs=[action_dd],
|
| 346 |
-
outputs=[meters_display, log_display, breakdown_display, score_display, done_flag],
|
| 347 |
-
)
|
| 348 |
-
heuristic_btn.click(
|
| 349 |
-
run_heuristic_episode,
|
| 350 |
-
inputs=[profile_dd2, seed_in2],
|
| 351 |
-
outputs=[meters_display, log_display, breakdown_display, score_display],
|
| 352 |
-
)
|
| 353 |
-
random_btn.click(
|
| 354 |
-
run_random_episode,
|
| 355 |
-
inputs=[profile_dd2, seed_in2],
|
| 356 |
-
outputs=[meters_display, log_display, breakdown_display, score_display],
|
| 357 |
)
|
| 358 |
|
| 359 |
-
# ββ Tab
|
| 360 |
-
with gr.TabItem("
|
| 361 |
gr.Markdown("### Action Effect Matrix")
|
| 362 |
gr.Markdown(
|
| 363 |
"Base delta per action on each meter. "
|
| 364 |
-
"Profile modifiers are applied on top β invisibly."
|
| 365 |
-
)
|
| 366 |
-
gr.Textbox(
|
| 367 |
-
value=show_action_effects(),
|
| 368 |
-
lines=14, interactive=False, label="",
|
| 369 |
)
|
|
|
|
| 370 |
|
| 371 |
gr.Markdown("### Hidden Personality Profiles")
|
| 372 |
gr.Markdown(
|
|
@@ -374,11 +423,16 @@ with gr.Blocks(title="RhythmEnv β Life Simulator") as demo:
|
|
| 374 |
"It must infer the active profile through reward patterns β "
|
| 375 |
"the core learning challenge of RhythmEnv."
|
| 376 |
)
|
| 377 |
-
gr.Textbox(
|
| 378 |
-
|
| 379 |
-
|
| 380 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 381 |
|
| 382 |
|
| 383 |
if __name__ == "__main__":
|
| 384 |
-
demo.launch(server_port=7862, share=False, theme=gr.themes.
|
|
|
|
| 1 |
"""
|
| 2 |
+
RhythmEnv Visual Explorer β Life Simulator v2
|
| 3 |
Run: python ui/app.py
|
| 4 |
"""
|
| 5 |
|
|
|
|
| 7 |
import os
|
| 8 |
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
|
| 9 |
|
| 10 |
+
import matplotlib
|
| 11 |
+
matplotlib.use("Agg")
|
| 12 |
+
import matplotlib.pyplot as plt
|
| 13 |
+
import matplotlib.patches as mpatches
|
| 14 |
+
|
| 15 |
import gradio as gr
|
| 16 |
from server.rhythm_environment import (
|
| 17 |
RhythmEnvironment, MAX_STEPS, METERS, ACTION_EFFECTS, PROFILES
|
| 18 |
)
|
| 19 |
from models import RhythmAction, ActionType
|
| 20 |
|
| 21 |
+
SLOT_NAMES = ["Morning", "Afternoon", "Evening", "Night"]
|
| 22 |
+
SLOT_ICONS = ["π
", "βοΈ", "π", "π"]
|
| 23 |
+
DAY_NAMES = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
|
| 24 |
PROFILE_NAMES = ["introvert_morning", "extrovert_night_owl", "workaholic_stoic", "random"]
|
| 25 |
ACTION_NAMES = [at.value.upper() for at in ActionType]
|
| 26 |
|
| 27 |
+
METER_COLORS = {
|
| 28 |
+
"vitality": "#3b82f6",
|
| 29 |
+
"cognition": "#8b5cf6",
|
| 30 |
+
"progress": "#22c55e",
|
| 31 |
+
"serenity": "#14b8a6",
|
| 32 |
+
"connection": "#f97316",
|
| 33 |
+
}
|
| 34 |
+
|
| 35 |
# ---------------------------------------------------------------------------
|
| 36 |
+
# Global session state
|
| 37 |
# ---------------------------------------------------------------------------
|
| 38 |
|
| 39 |
+
_env = None
|
| 40 |
+
_last_obs = None
|
| 41 |
+
_step_log = []
|
| 42 |
+
_meter_history = [] # list of {meter: value} per step
|
| 43 |
+
_completed_slots = [] # (day, slot) pairs already acted on
|
| 44 |
+
|
| 45 |
+
def get_env():
|
| 46 |
+
global _env
|
| 47 |
+
if _env is None:
|
| 48 |
+
_env = RhythmEnvironment()
|
| 49 |
+
return _env
|
| 50 |
+
|
| 51 |
+
# ---------------------------------------------------------------------------
|
| 52 |
+
# HTML β colored meter bars
|
| 53 |
+
# ---------------------------------------------------------------------------
|
| 54 |
|
| 55 |
+
def _bar_color(v: float) -> str:
|
| 56 |
+
if v < 0.20:
|
| 57 |
+
return "#ef4444"
|
| 58 |
+
if v < 0.40:
|
| 59 |
+
return "#f59e0b"
|
| 60 |
+
return "#22c55e"
|
| 61 |
|
| 62 |
+
def format_meters_html(obs) -> str:
|
| 63 |
+
day_name = DAY_NAMES[obs.day] if obs.day < 7 else f"Day {obs.day+1}"
|
| 64 |
slot_name = SLOT_NAMES[obs.slot] if obs.slot < 4 else f"Slot {obs.slot}"
|
| 65 |
+
event_bit = (
|
| 66 |
+
f'<span style="color:#f59e0b;margin-left:8px">β‘ {obs.active_event}</span>'
|
| 67 |
+
if obs.active_event else ""
|
|
|
|
|
|
|
| 68 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
+
html = f"""
|
| 71 |
+
<div style="background:#f9fafb;border-radius:10px;padding:14px 16px;font-family:monospace">
|
| 72 |
+
<div style="font-size:13px;color:#6b7280;margin-bottom:10px">
|
| 73 |
+
π
<b>{day_name} {slot_name}</b>
|
| 74 |
+
Β· Step {obs.timestep}/{MAX_STEPS}
|
| 75 |
+
Β· {obs.remaining_steps} steps left
|
| 76 |
+
{event_bit}
|
| 77 |
+
</div>
|
| 78 |
+
"""
|
| 79 |
+
|
| 80 |
+
for meter in METERS:
|
| 81 |
+
val = getattr(obs, meter)
|
| 82 |
+
pct = int(val * 100)
|
| 83 |
+
color = _bar_color(val)
|
| 84 |
+
dot = METER_COLORS[meter]
|
| 85 |
+
html += f"""
|
| 86 |
+
<div style="display:flex;align-items:center;gap:8px;margin:5px 0">
|
| 87 |
+
<span style="width:10px;height:10px;border-radius:50%;background:{dot};display:inline-block;flex-shrink:0"></span>
|
| 88 |
+
<span style="width:80px;font-size:12px;color:#374151">{meter.capitalize()}</span>
|
| 89 |
+
<div style="flex:1;background:#e5e7eb;border-radius:6px;height:16px;overflow:hidden;max-width:260px">
|
| 90 |
+
<div style="width:{pct}%;background:{color};height:16px;border-radius:6px;transition:width 0.25s"></div>
|
| 91 |
+
</div>
|
| 92 |
+
<span style="width:36px;font-size:12px;color:#374151;text-align:right">{val:.2f}</span>
|
| 93 |
+
</div>"""
|
| 94 |
+
|
| 95 |
+
html += "\n </div>"
|
| 96 |
+
return html
|
| 97 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
|
| 99 |
+
# ---------------------------------------------------------------------------
|
| 100 |
+
# HTML β week calendar grid
|
| 101 |
+
# ---------------------------------------------------------------------------
|
| 102 |
|
| 103 |
+
def format_week_grid(obs) -> str:
|
| 104 |
+
html = """
|
| 105 |
+
<div style="background:#f9fafb;border-radius:10px;padding:12px 16px;font-family:monospace;margin-top:8px">
|
| 106 |
+
<div style="font-size:12px;color:#6b7280;margin-bottom:8px">Week Progress</div>
|
| 107 |
+
<table style="border-collapse:separate;border-spacing:3px;width:100%">
|
| 108 |
+
<tr>
|
| 109 |
+
<td style="width:24px"></td>"""
|
| 110 |
+
|
| 111 |
+
for day in DAY_NAMES:
|
| 112 |
+
html += f'<td style="text-align:center;font-size:11px;color:#9ca3af;padding:1px 3px">{day}</td>'
|
| 113 |
+
html += "</tr>"
|
| 114 |
+
|
| 115 |
+
current_step = obs.timestep # 0-based: next step to take
|
| 116 |
+
# timestep goes 0β27; obs.timestep is the step about to be taken
|
| 117 |
+
# slots completed = those < current_step
|
| 118 |
+
for slot_idx, icon in enumerate(SLOT_ICONS):
|
| 119 |
+
html += f'<tr><td style="font-size:12px;text-align:center">{icon}</td>'
|
| 120 |
+
for day_idx in range(7):
|
| 121 |
+
step_num = day_idx * 4 + slot_idx
|
| 122 |
+
if step_num < current_step:
|
| 123 |
+
cell = "β
"
|
| 124 |
+
bg = "#d1fae5"
|
| 125 |
+
elif step_num == current_step and not obs.done:
|
| 126 |
+
cell = "π΅"
|
| 127 |
+
bg = "#dbeafe"
|
| 128 |
+
else:
|
| 129 |
+
cell = "Β·"
|
| 130 |
+
bg = "transparent"
|
| 131 |
+
html += f'<td style="text-align:center;background:{bg};border-radius:3px;padding:1px 3px;font-size:13px">{cell}</td>'
|
| 132 |
+
html += "</tr>"
|
| 133 |
+
|
| 134 |
+
html += "</table></div>"
|
| 135 |
+
return html
|
|
|
|
|
|
|
| 136 |
|
| 137 |
|
| 138 |
# ---------------------------------------------------------------------------
|
| 139 |
+
# Matplotlib β meter trajectory chart
|
| 140 |
# ---------------------------------------------------------------------------
|
| 141 |
|
| 142 |
+
def make_chart(history: list) -> plt.Figure:
|
| 143 |
+
fig, ax = plt.subplots(figsize=(7, 3.5))
|
| 144 |
+
fig.patch.set_facecolor("#f9fafb")
|
| 145 |
+
ax.set_facecolor("#f9fafb")
|
| 146 |
+
|
| 147 |
+
if history:
|
| 148 |
+
steps = list(range(len(history)))
|
| 149 |
+
for meter, color in METER_COLORS.items():
|
| 150 |
+
vals = [h[meter] for h in history]
|
| 151 |
+
ax.plot(steps, vals, color=color, linewidth=2.0, label=meter.capitalize(), solid_capstyle="round")
|
| 152 |
+
ax.axhline(y=0.20, color="#ef4444", linestyle="--", linewidth=0.8, alpha=0.4)
|
| 153 |
+
|
| 154 |
+
patches = [mpatches.Patch(color=c, label=m.capitalize()) for m, c in METER_COLORS.items()]
|
| 155 |
+
ax.legend(handles=patches, loc="upper right", fontsize=8, ncol=2,
|
| 156 |
+
framealpha=0.7, edgecolor="#e5e7eb")
|
| 157 |
+
ax.set_xlim(0, MAX_STEPS)
|
| 158 |
+
ax.set_ylim(-0.02, 1.08)
|
| 159 |
+
ax.set_xlabel("Step (1 step = 1 time slot)", fontsize=9, color="#6b7280")
|
| 160 |
+
ax.set_ylabel("Meter value", fontsize=9, color="#6b7280")
|
| 161 |
+
ax.set_title("Life Meters Over the Week", fontsize=11, color="#374151", pad=8)
|
| 162 |
+
ax.tick_params(labelsize=8, colors="#9ca3af")
|
| 163 |
+
for spine in ax.spines.values():
|
| 164 |
+
spine.set_edgecolor("#e5e7eb")
|
| 165 |
+
ax.grid(True, alpha=0.3, color="#d1d5db")
|
| 166 |
+
plt.tight_layout(pad=1.2)
|
| 167 |
+
return fig
|
| 168 |
|
| 169 |
|
| 170 |
+
# ---------------------------------------------------------------------------
|
| 171 |
+
# Helpers
|
| 172 |
+
# ---------------------------------------------------------------------------
|
| 173 |
+
|
| 174 |
+
def _snap(obs):
|
| 175 |
+
return {m: getattr(obs, m) for m in METERS}
|
| 176 |
+
|
| 177 |
+
def _step_line(obs, action_name: str) -> str:
|
| 178 |
+
sign = "+" if obs.reward >= 0 else ""
|
| 179 |
+
day = DAY_NAMES[obs.day] if obs.day < 7 else f"D{obs.day}"
|
| 180 |
+
slot = SLOT_NAMES[obs.slot] if obs.slot < 4 else f"S{obs.slot}"
|
| 181 |
+
line = f"Step {obs.timestep:>2} [{day} {slot}] {action_name:<15} {sign}{obs.reward:.3f}"
|
| 182 |
+
if obs.active_event:
|
| 183 |
+
line += f" β‘{obs.active_event}"
|
| 184 |
+
return line
|
| 185 |
|
| 186 |
|
| 187 |
# ---------------------------------------------------------------------------
|
| 188 |
+
# Tab 1 callbacks
|
| 189 |
# ---------------------------------------------------------------------------
|
| 190 |
|
| 191 |
+
OUTPUTS_COUNT = 5 # meters_html, week_grid, chart, log, score
|
| 192 |
+
|
| 193 |
def reset_episode(profile_name: str, seed_str: str):
|
| 194 |
+
global _last_obs, _step_log, _meter_history
|
| 195 |
|
| 196 |
try:
|
| 197 |
seed = int(seed_str.strip()) if seed_str.strip() else 42
|
|
|
|
| 199 |
seed = 42
|
| 200 |
|
| 201 |
env = get_env()
|
| 202 |
+
_last_obs = env.reset(seed=seed) if profile_name == "random" else env.reset(seed=seed, profile=profile_name)
|
| 203 |
+
|
| 204 |
+
_step_log = [f"βΆ Profile: {env._profile['name']} | Seed: {seed} | 28 steps to go"]
|
| 205 |
+
_meter_history = [_snap(_last_obs)]
|
| 206 |
|
| 207 |
+
return (
|
| 208 |
+
format_meters_html(_last_obs),
|
| 209 |
+
format_week_grid(_last_obs),
|
| 210 |
+
make_chart(_meter_history),
|
| 211 |
+
"\n".join(_step_log),
|
| 212 |
+
"β",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 213 |
)
|
|
|
|
| 214 |
|
| 215 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 216 |
def take_action(action_str: str):
|
| 217 |
+
global _last_obs, _step_log, _meter_history
|
| 218 |
|
| 219 |
if _last_obs is None:
|
| 220 |
+
return "β οΈ Reset the episode first.", "", make_chart([]), "β", "β"
|
| 221 |
if _last_obs.done:
|
| 222 |
+
return (
|
| 223 |
+
format_meters_html(_last_obs),
|
| 224 |
+
format_week_grid(_last_obs),
|
| 225 |
+
make_chart(_meter_history),
|
| 226 |
+
"\n".join(_step_log[-22:]),
|
| 227 |
+
"Episode done β press Reset to play again.",
|
| 228 |
+
)
|
| 229 |
|
| 230 |
+
env = get_env()
|
| 231 |
+
obs = env.step(RhythmAction(action_type=ActionType(action_str.lower())))
|
|
|
|
| 232 |
_last_obs = obs
|
| 233 |
+
_meter_history.append(_snap(obs))
|
| 234 |
+
_step_log.append(_step_line(obs, action_str))
|
| 235 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 236 |
if obs.done:
|
| 237 |
final = obs.reward_breakdown.get("final_score", 0.0)
|
| 238 |
+
_step_log.append("β" * 52)
|
| 239 |
+
_step_log.append(f"β
Final score: {final:.4f}")
|
| 240 |
|
| 241 |
+
score = (
|
| 242 |
f"Final: {obs.reward_breakdown.get('final_score', 0.0):.4f}"
|
| 243 |
if obs.done else f"Step reward: {obs.reward:+.4f}"
|
| 244 |
)
|
| 245 |
return (
|
| 246 |
+
format_meters_html(obs),
|
| 247 |
+
format_week_grid(obs),
|
| 248 |
+
make_chart(_meter_history),
|
| 249 |
+
"\n".join(_step_log[-22:]),
|
| 250 |
+
score,
|
| 251 |
)
|
| 252 |
|
| 253 |
|
| 254 |
+
def _run_auto(profile_name: str, seed_str: str, strategy: str):
|
| 255 |
+
global _last_obs, _step_log, _meter_history
|
|
|
|
| 256 |
import random as _random
|
| 257 |
+
from training.inference_eval import heuristic_action
|
| 258 |
|
| 259 |
try:
|
| 260 |
seed = int(seed_str.strip()) if seed_str.strip() else 42
|
| 261 |
except ValueError:
|
| 262 |
seed = 42
|
| 263 |
|
|
|
|
|
|
|
| 264 |
rng = _random.Random(seed + 999)
|
| 265 |
all_actions = list(ActionType)
|
|
|
|
| 266 |
env = get_env()
|
| 267 |
+
|
| 268 |
obs = env.reset(seed=seed) if profile_name == "random" else env.reset(seed=seed, profile=profile_name)
|
| 269 |
+
_last_obs = obs
|
| 270 |
+
_step_log = [f"βΆ Auto-run ({strategy}) | Profile: {env._profile['name']} | Seed: {seed}"]
|
| 271 |
+
_meter_history = [_snap(obs)]
|
|
|
|
| 272 |
|
| 273 |
while not obs.done:
|
| 274 |
+
action_type = heuristic_action(obs) if strategy == "heuristic" else rng.choice(all_actions)
|
|
|
|
|
|
|
|
|
|
| 275 |
obs = env.step(RhythmAction(action_type=action_type))
|
| 276 |
_last_obs = obs
|
| 277 |
+
_meter_history.append(_snap(obs))
|
| 278 |
+
_step_log.append(_step_line(obs, action_type.value.upper()))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 279 |
|
| 280 |
final = obs.reward_breakdown.get("final_score", 0.0)
|
| 281 |
+
_step_log += ["β" * 52, f"β
Final score: {final:.4f}"]
|
|
|
|
| 282 |
|
| 283 |
return (
|
| 284 |
+
format_meters_html(obs),
|
| 285 |
+
format_week_grid(obs),
|
| 286 |
+
make_chart(_meter_history),
|
| 287 |
+
"\n".join(_step_log[-25:]),
|
| 288 |
+
f"Final: {final:.4f}",
|
| 289 |
)
|
| 290 |
|
| 291 |
|
| 292 |
+
def run_heuristic(p, s): return _run_auto(p, s, "heuristic")
|
| 293 |
+
def run_random(p, s): return _run_auto(p, s, "random")
|
| 294 |
|
| 295 |
|
| 296 |
+
# ---------------------------------------------------------------------------
|
| 297 |
+
# Reference tab helpers
|
| 298 |
+
# ---------------------------------------------------------------------------
|
| 299 |
+
|
| 300 |
+
def show_action_effects() -> str:
|
| 301 |
+
header = f"{'Action':<15}" + "".join(f" {m[:3]:>6}" for m in METERS)
|
| 302 |
+
lines = [header, "β" * 52]
|
| 303 |
+
for action, effects in ACTION_EFFECTS.items():
|
| 304 |
+
row = f"{action:<15}"
|
| 305 |
+
for m in METERS:
|
| 306 |
+
row += f" {effects[m]:>+6.2f}"
|
| 307 |
+
lines.append(row)
|
| 308 |
+
return "\n".join(lines)
|
| 309 |
+
|
| 310 |
+
|
| 311 |
+
def show_profiles() -> str:
|
| 312 |
+
lines = []
|
| 313 |
+
for p in PROFILES:
|
| 314 |
+
weights = p["reward_weights"]
|
| 315 |
+
lines += [f"\n{'β'*52}", f" {p['name'].upper()}", f"{'β'*52}"]
|
| 316 |
+
lines.append(" Reward weights (hidden from agent):")
|
| 317 |
+
for m, w in weights.items():
|
| 318 |
+
bar = "β" * int(w * 20)
|
| 319 |
+
lines.append(f" {m:<12} {bar:<20} {w:.0%}")
|
| 320 |
+
lines.append("\n Key hidden modifiers:")
|
| 321 |
+
if p.get("morning_cognition_bonus"):
|
| 322 |
+
lines.append(f" β’ Morning: cognition/progress Γ{p['morning_cognition_bonus']} (peak window)")
|
| 323 |
+
if p.get("evening_night_cognition_bonus"):
|
| 324 |
+
lines.append(f" β’ Evening/Night: cognition/progress Γ{p['evening_night_cognition_bonus']} (peak zone)")
|
| 325 |
+
if p.get("morning_penalty"):
|
| 326 |
+
lines.append(f" β’ Morning: cognition/progress Γ{p['morning_penalty']} (groggy zone)")
|
| 327 |
+
sv = p.get("social_vitality_multiplier", 1.0)
|
| 328 |
+
if sv != 1.0:
|
| 329 |
+
lines.append(f" β’ Social vitality drain Γ{sv}")
|
| 330 |
+
if p.get("binge_shame"):
|
| 331 |
+
lines.append(f" β’ Binge watch: shame spiral β0.15 serenity")
|
| 332 |
+
if p.get("progress_serenity_bonus"):
|
| 333 |
+
lines.append(f" β’ Work gives serenity +{p['progress_serenity_bonus']} (meaning)")
|
| 334 |
+
if p.get("idle_serenity_decay"):
|
| 335 |
+
lines.append(f" β’ Idle drains serenity β{p['idle_serenity_decay']} (guilt)")
|
| 336 |
+
if p.get("work_vitality_recovery"):
|
| 337 |
+
lines.append(f" β’ Work recovers vitality +{p['work_vitality_recovery']} (energized)")
|
| 338 |
+
if p.get("solo_serenity_bonus"):
|
| 339 |
+
lines.append(f" β’ Solo time gives serenity +{p['solo_serenity_bonus']} (recharge)")
|
| 340 |
+
scm = p.get("social_connection_multiplier", 1.0)
|
| 341 |
+
if scm != 1.0:
|
| 342 |
+
lines.append(f" β’ Social connection Γ{scm}")
|
| 343 |
+
lines.append(f" β’ Connection passive decay: β{p['connection_decay_rate']}/step")
|
| 344 |
+
return "\n".join(lines)
|
| 345 |
|
| 346 |
|
| 347 |
# ---------------------------------------------------------------------------
|
|
|
|
| 352 |
|
| 353 |
gr.Markdown(
|
| 354 |
"# RhythmEnv β Life Simulator\n"
|
| 355 |
+
"**Can a lightweight AI learn who you are β without being told?**\n\n"
|
| 356 |
"Balance 5 life meters across a 7-day week. "
|
| 357 |
+
"A hidden personality profile secretly changes how every action affects you. "
|
| 358 |
+
"The agent must infer who you are from reward signals alone."
|
| 359 |
)
|
| 360 |
|
| 361 |
with gr.Tabs():
|
| 362 |
|
| 363 |
+
# ββ Tab 1: Play βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 364 |
+
with gr.TabItem("βΆ Play"):
|
|
|
|
| 365 |
|
| 366 |
with gr.Row():
|
| 367 |
profile_dd = gr.Dropdown(
|
| 368 |
+
choices=PROFILE_NAMES, value="introvert_morning",
|
| 369 |
+
label="Hidden Profile (visible here for demo β agent cannot see this)",
|
| 370 |
+
scale=3,
|
| 371 |
)
|
| 372 |
+
seed_in = gr.Textbox(label="Seed", value="42", scale=1)
|
| 373 |
+
reset_btn = gr.Button("β³ Reset", variant="primary", scale=1)
|
|
|
|
| 374 |
|
|
|
|
| 375 |
gr.Markdown(
|
| 376 |
+
"| Profile | Core trait | What the agent must discover |\n"
|
| 377 |
"|---|---|---|\n"
|
| 378 |
+
"| `introvert_morning` | Recharges alone, peaks at dawn |"
|
| 379 |
+
" Social drain Γ3 Β· Morning deep work gives Γ2 progress |\n"
|
| 380 |
+
"| `extrovert_night_owl` | Energised by people, peaks at night |"
|
| 381 |
+
" Morning is a penalty zone Β· Social gives Γ2 connection |\n"
|
| 382 |
+
"| `workaholic_stoic` | Finds meaning in output, resilient |"
|
| 383 |
+
" Idle time drains serenity Β· Work recovers vitality |"
|
| 384 |
)
|
| 385 |
|
| 386 |
+
with gr.Row():
|
| 387 |
+
with gr.Column(scale=2):
|
| 388 |
+
meters_html = gr.HTML()
|
| 389 |
+
week_grid_html = gr.HTML()
|
| 390 |
+
score_display = gr.Textbox(label="Score", interactive=False, lines=1)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 391 |
|
| 392 |
+
with gr.Column(scale=3):
|
| 393 |
+
chart_display = gr.Plot(label="Meter Trajectories")
|
|
|
|
| 394 |
|
| 395 |
with gr.Row():
|
| 396 |
action_dd = gr.Dropdown(
|
| 397 |
+
choices=ACTION_NAMES, value="DEEP_WORK",
|
| 398 |
+
label="Choose action", scale=4,
|
|
|
|
| 399 |
)
|
| 400 |
+
step_btn = gr.Button("βΆ Take Step", variant="primary", scale=1)
|
| 401 |
|
| 402 |
with gr.Row():
|
| 403 |
+
heuristic_btn = gr.Button("βΆβΆ Full Episode β Heuristic Baseline")
|
| 404 |
+
random_btn = gr.Button("βΆβΆ Full Episode β Random Baseline")
|
|
|
|
|
|
|
|
|
|
|
|
|
| 405 |
|
| 406 |
+
log_display = gr.Textbox(
|
| 407 |
+
label="Step Log (last 22 steps)",
|
| 408 |
+
lines=10, interactive=False,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 409 |
)
|
| 410 |
|
| 411 |
+
# ββ Tab 2: Environment Reference βββββββββββββββββββββββββββββββββββββ
|
| 412 |
+
with gr.TabItem("π Environment Reference"):
|
| 413 |
gr.Markdown("### Action Effect Matrix")
|
| 414 |
gr.Markdown(
|
| 415 |
"Base delta per action on each meter. "
|
| 416 |
+
"Profile modifiers and time-of-day multipliers are applied on top β invisibly."
|
|
|
|
|
|
|
|
|
|
|
|
|
| 417 |
)
|
| 418 |
+
gr.Textbox(value=show_action_effects(), lines=14, interactive=False, label="")
|
| 419 |
|
| 420 |
gr.Markdown("### Hidden Personality Profiles")
|
| 421 |
gr.Markdown(
|
|
|
|
| 423 |
"It must infer the active profile through reward patterns β "
|
| 424 |
"the core learning challenge of RhythmEnv."
|
| 425 |
)
|
| 426 |
+
gr.Textbox(value=show_profiles(), lines=55, interactive=False, label="")
|
| 427 |
+
|
| 428 |
+
# ββ Wire up ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 429 |
+
_out = [meters_html, week_grid_html, chart_display, log_display, score_display]
|
| 430 |
+
|
| 431 |
+
reset_btn.click(reset_episode, inputs=[profile_dd, seed_in], outputs=_out)
|
| 432 |
+
step_btn.click(take_action, inputs=[action_dd], outputs=_out)
|
| 433 |
+
heuristic_btn.click(run_heuristic, inputs=[profile_dd, seed_in], outputs=_out)
|
| 434 |
+
random_btn.click(run_random, inputs=[profile_dd, seed_in], outputs=_out)
|
| 435 |
|
| 436 |
|
| 437 |
if __name__ == "__main__":
|
| 438 |
+
demo.launch(server_port=7862, share=False, theme=gr.themes.Soft())
|