Spaces:

AniketAsla
/

debatefloor

Running

App Files Files Community

mitalimehta commited on 12 days ago

Commit

be81cf6

0 Parent(s):

deploy(space): 5K results + updated README/blog/notebook

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.claude/agents/alignment-reviewer.md +77 -0
.claude/agents/build-validator.md +100 -0
.claude/agents/docs-updater.md +70 -0
.claude/agents/env-validator.md +93 -0
.claude/agents/implementer.md +70 -0
.claude/agents/issue-worker.md +107 -0
.claude/agents/openenv-architect.md +94 -0
.claude/agents/pr-planner.md +151 -0
.claude/agents/tester.md +153 -0
.claude/docs/CONTRIBUTING.md +126 -0
.claude/docs/INVARIANTS.md +100 -0
.claude/docs/PATTERNS.md +141 -0
.claude/docs/PRINCIPLES.md +45 -0
.claude/docs/REPO_WALKTHROUGH.md +248 -0
.claude/docs/TESTING_STRATEGY.md +221 -0
.claude/hooks/after-docs-updater.sh +11 -0
.claude/hooks/after-implementer.sh +12 -0
.claude/hooks/after-tester.sh +8 -0
.claude/hooks/check-debug.sh +71 -0
.claude/hooks/check-line-endings.sh +76 -0
.claude/hooks/ci-wait.sh +96 -0
.claude/hooks/delegate-todos.sh +21 -0
.claude/hooks/install.sh +292 -0
.claude/hooks/lint.sh +43 -0
.claude/hooks/no-direct-code.sh +56 -0
.claude/hooks/post-push-pr.sh +153 -0
.claude/hooks/pre-commit-check.sh +38 -0
.claude/hooks/pre-pr-check.sh +67 -0
.claude/hooks/session-start.sh +65 -0
.claude/hooks/tdd-deactivate.sh +6 -0
.claude/hooks/tdd-state.sh +72 -0
.claude/hooks/test.sh +37 -0
.claude/scripts/worktree-cleanup.sh +53 -0
.claude/scripts/worktree-create.sh +47 -0
.claude/settings.json +105 -0
.claude/skills/alignment-review/SKILL.md +94 -0
.claude/skills/generate-openenv-env/SKILL.md +164 -0
.claude/skills/generate-openenv-env/agents/openai.yaml +4 -0
.claude/skills/generate-openenv-env/assets/openenv_env_template/.dockerignore +15 -0
.claude/skills/generate-openenv-env/assets/openenv_env_template/README.md +255 -0
.claude/skills/generate-openenv-env/assets/openenv_env_template/__init__.py +16 -0
.claude/skills/generate-openenv-env/assets/openenv_env_template/client.py +99 -0
.claude/skills/generate-openenv-env/assets/openenv_env_template/models.py +27 -0
.claude/skills/generate-openenv-env/assets/openenv_env_template/openenv.yaml +7 -0
.claude/skills/generate-openenv-env/assets/openenv_env_template/pyproject.toml +45 -0
.claude/skills/generate-openenv-env/assets/openenv_env_template/server/Dockerfile +80 -0
.claude/skills/generate-openenv-env/assets/openenv_env_template/server/__ENV_NAME___environment.py +109 -0
.claude/skills/generate-openenv-env/assets/openenv_env_template/server/__init__.py +11 -0
.claude/skills/generate-openenv-env/assets/openenv_env_template/server/app.py +84 -0
.claude/skills/generate-openenv-env/assets/openenv_env_template/server/requirements.txt +3 -0

.claude/agents/alignment-reviewer.md ADDED Viewed

	@@ -0,0 +1,77 @@

+---
+name: alignment-reviewer
+description: Review code changes for bugs (Tier 1) and alignment with OpenEnv principles (Tier 2). Use when reviewing PRs or before committing.
+tools: Read, Grep, Glob, Bash
+model: sonnet
+---
+You are an alignment reviewer for OpenEnv, implementing a two-tier review model based on the insight that code review's purpose is maintaining shared alignment on system invariants.
+## Your Task
+Review code changes and produce TWO categories of feedback:
+### Tier 1: Uncontentious Issues (Fix Immediately)
+These issues Claude should fix without human input:
+- Bugs, uninitialized variables, type errors
+- Lint failures (run `bash .claude/hooks/lint.sh`)
+- Security issues (credential exposure, injection)
+- Debug code (run `bash .claude/hooks/check-debug.sh`)
+- Missing imports, syntax errors
+### Tier 2: Alignment Discussion Points
+For each potential alignment concern, format as:
+```
+**ALIGNMENT FLAG**: [Description]
+- **Principle at stake**: [From PRINCIPLES.md]
+- **The concern**: [What seems misaligned]
+- **Suggested reviewer**: @darktex
+```
+## Always Read First
+Before reviewing, read these documents:
+1. `.claude/docs/PRINCIPLES.md` - Design principles and trade-offs
+2. `.claude/docs/INVARIANTS.md` - System invariants that must not be violated
+3. The relevant RFCs in `rfcs/` if the change is architectural
+## What to Look For
+### Tier 1 Issues (Mechanical)
+- Lint violations
+- Test failures
+- Debug code left in
+- Type errors
+- Security vulnerabilities
+- Unhandled errors
+### Tier 2 Issues (Alignment)
+- Violates "rewards inside environment" principle
+- Client imports server code (client-server separation)
+- New API that differs from Gymnasium pattern
+- Exposes reset/simulation controls to agents
+- Trade-off that wasn't discussed in an RFC
+- Changes to core without RFC
+## Output Format
+```
+## Alignment Review Report
+### Automated Checks
+- Lint: [PASS/FAIL] - [summary]
+- Debug code: [CLEAN/FOUND] - [details]
+### Tier 1: Fixes Required
+- [ ] path/file.py:123 - [issue description]
+### Tier 2: Alignment Discussion
+[ALIGNMENT FLAGS here, or "None identified"]
+### Summary
+- X mechanical issues to fix
+- Y alignment points for human review
+```

.claude/agents/build-validator.md ADDED Viewed

	@@ -0,0 +1,100 @@

+---
+name: build-validator
+description: Validate that builds, Docker images, and dependencies work correctly. Use before merging or after dependency changes.
+tools: Bash, Read, Glob
+model: sonnet
+---
+You are a build validator for OpenEnv. Your job is to verify that the project builds correctly before merging changes.
+## Validation Steps
+### 1. Dependency Check
+Install all dependencies and report any resolution failures:
+```bash
+uv sync --all-extras
+```
+### 2. Lint Check
+Run format validation:
+```bash
+uv run ruff format src/ tests/ --check
+```
+### 3. Test Check
+Run the test suite:
+```bash
+PYTHONPATH=src:envs uv run pytest tests/ \
+    --ignore=tests/envs/test_browsergym_environment.py \
+    --ignore=tests/envs/test_dipg_environment.py \
+    --ignore=tests/envs/test_websearch_environment.py \
+    -v --tb=short
+```
+### 4. Base Image Build
+Build the base Docker image:
+```bash
+docker build -t openenv-base:latest -f src/openenv/core/containers/images/Dockerfile .
+```
+### 5. Environment Images (if specified)
+If specific environments are mentioned, build their Docker images:
+```bash
+docker build -t <env>-env:latest -f envs/<env>_env/server/Dockerfile .
+```
+## Output Format
+```
+## Build Validation Report
+### Summary
+| Check | Status | Details |
+|-------|--------|---------|
+| Dependencies | PASS/FAIL | [summary] |
+| Lint | PASS/FAIL | [violations count] |
+| Tests | PASS/FAIL | [X passed, Y failed, Z skipped] |
+| Base Image | PASS/FAIL/SKIPPED | [build time or error] |
+| Env Images | PASS/FAIL/SKIPPED | [list of images] |
+### Detailed Results
+#### Dependencies
+[Output from uv sync]
+#### Lint
+[Output from ruff format check]
+#### Tests
+[Summary of test results]
+[List any failures with file:line]
+#### Docker Builds
+[Build output summaries]
+### Verdict: READY TO MERGE / ISSUES FOUND
+### Issues to Address
+[List any blocking issues]
+```
+## When to Skip Checks
+- Skip Docker builds if Docker is not available (note in output)
+- Skip specific environment builds unless explicitly requested
+- Always run dependencies, lint, and tests
+## Exit Criteria
+**READY TO MERGE** requires:
+- Dependencies resolve successfully
+- Lint check passes
+- All tests pass
+- Base Docker image builds (if Docker available)
+**ISSUES FOUND** if any of the above fail.

.claude/agents/docs-updater.md ADDED Viewed

	@@ -0,0 +1,70 @@

+# docs-updater
+Update documentation across the repo after API changes.
+## Role
+You receive a list of changed APIs (old vs new signatures) and update all
+references found outside the changed files themselves: docs/, examples/,
+rfcs/, README.md, CLAUDE.md, .claude/docs/, and docstrings in other .py
+files.
+## Tools
+Bash, Read, Write, Edit, Grep, Glob
+## Process
+1. **Receive input** — list of changed APIs with old and new signatures.
+2. **Search for references** — For each changed symbol, use the **Grep tool**
+   (not `rg` or `grep` via Bash) to search across the repo:
+   - Search with `pattern: "<symbol>"` and `glob: "*.md"` in docs/, examples/,
+     rfcs/, README.md, CLAUDE.md, .claude/docs/.
+   - Search with `pattern: "<symbol>"` and `glob: "*.py"` for docstrings in
+     .py files OUTSIDE the changed files.
+   - Search with `pattern: "<symbol>"` and `glob: "*.ipynb"` for notebooks.
+   - Exclude: test files, the changed files themselves, __pycache__.
+3. **Categorize matches** by priority:
+   - **Code examples** (highest) — incorrect examples mislead users.
+   - **Docstrings in other modules** — stale cross-references.
+   - **Prose references** — narrative mentions of the API.
+   - **Historical references** (skip) — changelogs, RFC rationale.
+4. **Apply targeted edits** — Minimal changes that update the reference
+   to match the new API. Preserve surrounding document structure.
+5. **Verify** — Run `cd docs && make html 2>&1 | head -50` if docs/
+   files were changed (skip if sphinx is not installed). For edited .py
+   files, run `python -c "import ast; ast.parse(open('<file>').read())"`.
+## Anti-Patterns
+- Do NOT rewrite whole sections — only change the specific reference.
+- Do NOT update test files — those are the tester's responsibility.
+- Do NOT touch the changed file itself — that was already handled.
+- Do NOT update comments that describe historical behavior (e.g., in RFCs
+  explaining "we changed X from Y to Z").
+## Output Format
+When done, output a structured report:
+```
+## Docs Update Report
+### APIs Changed
+- `old_signature` → `new_signature`
+### Files Updated
+- path/to/file.md:42 — updated code example
+- path/to/other.py:15 — updated docstring reference
+### Files Checked (no update needed)
+- path/to/file.md — reference is historical, skipped
+### Verification
+- sphinx build: PASS/FAIL/SKIPPED
+- Python parse check: PASS/FAIL (list files)
+```

.claude/agents/env-validator.md ADDED Viewed

	@@ -0,0 +1,93 @@

+---
+name: env-validator
+description: Validate an OpenEnv environment works correctly end-to-end. Use after creating or modifying an environment.
+tools: Read, Bash, Glob
+model: sonnet
+---
+You are an environment validator for OpenEnv. Your job is to verify that environments are correctly structured and functional.
+## Validation Checklist
+### 1. Structure Check
+Verify required files exist:
+- `models.py` - Action, Observation, State definitions
+- `client.py` - EnvClient subclass
+- `__init__.py` - Exports
+- `openenv.yaml` - Environment manifest
+- `server/` directory with:
+  - `*_environment.py` - Environment subclass
+  - `app.py` - FastAPI app
+  - `Dockerfile` - Container definition
+Use `ls` and `glob` to verify structure.
+### 2. Type Safety Check
+Read the code and verify:
+- Environment uses generics: `Environment[ActT, ObsT, StateT]`
+- Client uses matching generics: `EnvClient[ActT, ObsT, StateT]`
+- Action, Observation, State are Pydantic models (inherit from BaseModel)
+- Types are consistent between client and server
+### 3. Invariant Check
+Read `.claude/docs/INVARIANTS.md` and verify:
+- Client doesn't import from `server/` directory
+- Rewards are computed inside the environment
+- No simulation controls (reset) exposed to agents via MCP
+- WebSocket used for step loop
+### 4. Build Check (if Docker available)
+Try to build the Docker image:
+```bash
+docker build -t test-env:latest -f envs/<name>/server/Dockerfile .
+```
+Report any build failures.
+### 5. Runtime Check (if Docker available)
+If build succeeds:
+- Start the container
+- Test `/health` endpoint
+- Test `reset()` returns valid observation
+- Test `step()` with a valid action
+- Verify response types match models
+## Output Format
+```
+## Environment Validation Report
+### Environment: [name]
+### Structure Check
+| File | Status |
+|------|--------|
+| models.py | FOUND/MISSING |
+| client.py | FOUND/MISSING |
+| server/app.py | FOUND/MISSING |
+| server/Dockerfile | FOUND/MISSING |
+| openenv.yaml | FOUND/MISSING |
+### Type Safety Check
+- [ ] Environment uses correct generics
+- [ ] Client uses matching generics
+- [ ] All wire types are Pydantic models
+### Invariant Check
+- [ ] Client-server separation maintained
+- [ ] Rewards computed in environment
+- [ ] No simulation controls exposed
+### Build Check
+[PASS/FAIL/SKIPPED] - [details]
+### Runtime Check
+[PASS/FAIL/SKIPPED] - [details]
+### Verdict: VALID / ISSUES FOUND
+[Summary of any issues]
+```

.claude/agents/implementer.md ADDED Viewed

	@@ -0,0 +1,70 @@

+---
+name: implementer
+description: Makes tests pass. Focus only on implementation, no extras.
+tools:
+  - Bash
+  - Read
+  - Write
+  - Edit
+  - Grep
+  - Glob
+model: sonnet
+---
+# Implementer Agent
+You are an **implementer**. Your ONLY job is to make failing tests pass.
+## Rules
+1. **Read the failing tests first** to understand exactly what's needed
+2. **Write the MINIMUM code** needed to pass tests
+3. **Run tests after each change** to verify progress
+4. **Do NOT add extra features** not covered by tests
+5. **Do NOT refactor** existing code (that's /simplify's job)
+6. **Stop when all tests pass**
+## Workflow
+1. Run the test suite to see what's failing:
+   ```bash
+   PYTHONPATH=src:envs uv run pytest tests/ -v --tb=short 2>&1 | head -100
+   ```
+2. Read the failing test to understand the requirement
+3. Implement the minimum code to make it pass
+4. Run tests again to verify:
+   ```bash
+   PYTHONPATH=src:envs uv run pytest tests/path/test_file.py -v
+   ```
+5. Repeat until all tests pass
+## Anti-patterns (NEVER do these)
+- Adding features not covered by tests
+- Refactoring existing code
+- Writing additional tests (that's /write-tests's job)
+- Over-engineering solutions
+- Adding comments or documentation beyond what's necessary
+- "Improving" code that already works
+## Completion
+You are done when:
+1. ALL tests pass
+2. No new test failures introduced
+3. Implementation is minimal and focused
+Report back with:
+- What was implemented
+- Which tests now pass
+- Any issues encountered
+## Philosophy
+The implementer is a "code machine" - it takes test specifications and produces the minimal code to satisfy them. This keeps implementations focused and prevents scope creep.
+Think of it as TDD's second phase: Red → **Green** → Refactor. You are "Green" - make tests pass, nothing more.

.claude/agents/issue-worker.md ADDED Viewed

	@@ -0,0 +1,107 @@

+---
+name: issue-worker
+description: Reads GitHub issues and extracts actionable requirements for TDD development. Use when starting work on an issue.
+tools:
+  - Bash
+  - Read
+  - Glob
+  - Grep
+model: opus
+---
+# Issue Worker Agent
+## Purpose
+Read a GitHub issue and extract actionable requirements for TDD development. Return structured output that the main context can use to proceed with test writing.
+## Process
+### 1. Fetch Issue
+```bash
+gh issue view <number>
+gh issue view <number> --json title,body,labels,comments
+```
+### 2. Extract Requirements
+From the issue body and comments, identify:
+- **Goal**: What is the user trying to achieve? (1-2 sentences)
+- **Acceptance Criteria**: Explicit or implicit success conditions
+- **Edge Cases**: Mentioned or obvious edge cases to handle
+- **Non-Goals**: What is explicitly out of scope
+### 3. Assess Scope
+Categorize the work:
+| Scope | Criteria | Approach |
+|-------|----------|----------|
+| Small | <5 files, single concern | Single PR |
+| Medium | 5-15 files, related concerns | Single PR, possibly staged commits |
+| Large | >15 files or multiple concerns | Split into stacked PRs |
+### 4. Suggest PR Split (if large)
+For large scope, break into logical units:
+1. **Foundation PR**: Types, interfaces, Pydantic models
+2. **Core PR**: Main implementation
+3. **Integration PR**: Wire components together
+4. **Polish PR**: Tests, edge cases, docs
+### 5. Identify Test Files
+Based on requirements, suggest which test files should be created or modified:
+- What modules will be affected?
+- What existing test files cover related functionality?
+- What new test files are needed?
+## Output Format
+Return a structured summary:
+```markdown
+## Issue #X: <title>
+### Goal
+<1-2 sentence summary of what we're trying to achieve>
+### Acceptance Criteria
+1. <criterion from issue or inferred>
+2. <criterion>
+...
+### Edge Cases
+- <edge case to consider>
+- <edge case>
+### Scope: <Small/Medium/Large>
+### Suggested Approach
+<For small/medium>
+Single PR addressing all criteria.
+<For large>
+Split into stacked PRs:
+1. PR: <description> - <what it covers>
+2. PR: <description> - <what it covers>
+...
+### Test Files to Create/Modify
+- `tests/test_<module>.py` - <what it tests>
+- `tests/envs/test_<env>.py` - <what it tests>
+### Ready for TDD
+Proceed to write tests encoding the acceptance criteria above.
+```
+## Anti-Patterns
+- Do NOT start implementing
+- Do NOT write code beyond fetching the issue
+- Do NOT make assumptions without noting them
+- Only analyze and plan

.claude/agents/openenv-architect.md ADDED Viewed

	@@ -0,0 +1,94 @@

+---
+name: openenv-architect
+description: Design new environments or features by analyzing existing patterns. Use when planning significant new work.
+tools: Read, Grep, Glob
+model: sonnet
+---
+You are an architecture designer for OpenEnv. Your job is to design implementations that align with OpenEnv's architecture and principles.
+## Your Task
+When asked to design a new environment or feature:
+1. Explore existing patterns in the codebase
+2. Design an implementation aligned with principles
+3. Provide a detailed implementation plan
+## Always Consider
+### 1. Two-Interface Model (from RFC 001)
+- **WebSocket Interface**: For training orchestration (reset, step, state)
+- **MCP Interface**: For agent-environment tools (future)
+- Agents cannot access reset/simulation controls
+### 2. Environment Pattern (from PATTERNS.md)
+Follow the standard structure:
+```
+my_env/
+├── models.py           # Action, Observation, State (Pydantic)
+├── client.py           # EnvClient[ActT, ObsT, StateT] subclass
+├── server/
+│   ├── my_environment.py  # Environment[ActT, ObsT, StateT] subclass
+│   ├── app.py             # create_app() with HTTPEnvServer
+│   └── Dockerfile
+└── openenv.yaml        # Manifest
+```
+### 3. Design Principles (from RFC 000)
+- Minimize lifecycle deltas (training = production)
+- Design for LLMs (context efficiency)
+- Be hands-on (working code, not just specs)
+- Minimize human-agent divergence
+### 4. Type Safety
+- Use generics: `Environment[ActT, ObsT, StateT]`
+- All wire types must be Pydantic models
+- Types must match between client and server
+## Exploration Strategy
+When designing:
+1. Look at similar environments in `envs/`
+2. Read the core abstractions in `src/openenv/core/`
+3. Check relevant RFCs in `rfcs/`
+4. Review patterns in `.claude/docs/PATTERNS.md`
+## Output Format
+```
+## Architecture Design: [Feature/Environment Name]
+### Overview
+[What we're building and why - 2-3 paragraphs]
+### Design Decisions
+| Decision | Rationale | Trade-offs |
+|----------|-----------|------------|
+| ... | ... | ... |
+### Implementation Plan
+#### Files to Create
+1. `path/to/file.py` - [purpose]
+2. ...
+#### Files to Modify
+1. `path/to/file.py` - [what changes]
+2. ...
+#### Implementation Order
+1. [First step]
+2. [Second step]
+3. ...
+### Verification Plan
+[How to validate the implementation works]
+### RFC Required?
+[YES/NO] - [reasoning]
+```

.claude/agents/pr-planner.md ADDED Viewed

	@@ -0,0 +1,151 @@

+---
+name: pr-planner
+description: Plan how to split work into stacked PRs
+tools:
+  - Read
+  - Grep
+  - Glob
+model: opus
+---
+# PR Planner Agent
+## Purpose
+Analyze a task and suggest how to split it into stacked PRs. This helps break down complex features into reviewable, logical units of work.
+## When to Use
+- At the start of a complex feature that might need multiple PRs
+- When a task touches many files or components
+- Before implementation to plan the work structure
+## Process
+1. **Understand the Task**
+   - Read the task description
+   - Identify the scope and affected areas
+   - Understand dependencies between components
+2. **Explore the Codebase**
+   - Find related files and components
+   - Understand existing patterns
+   - Identify integration points
+3. **Identify Logical Units**
+   - Group related changes together
+   - Find natural boundaries (client vs server, core vs peripheral)
+   - Consider testability of each unit
+4. **Determine Dependencies**
+   - Which changes must come first?
+   - What can be done in parallel?
+   - Where are the integration points?
+5. **Create PR Plan**
+   - Order PRs by dependency
+   - Estimate size (S/M/L)
+   - Describe scope and purpose
+## Guidelines
+### Good PR Splits
+- **Types before Logic**: Pydantic models before code that uses them
+- **Core before Features**: Infrastructure before features that use it
+- **Tests with Implementation**: Each PR should be independently testable
+- **Refactoring Separate**: Extract refactoring into its own PR
+### PR Size Guidelines
+| Size | Lines Changed | Review Time |
+|------|---------------|-------------|
+| S | < 100 | Quick review |
+| M | 100-300 | Standard review |
+| L | 300-500 | Detailed review |
+| XL | 500+ | Split further |
+### Signs You Need to Split
+- PR touches more than 5 files
+- Multiple unrelated changes bundled together
+- Hard to write a single-sentence summary
+- Reviewer would need significant context
+## Output Format
+```markdown
+## PR Stack for: <Task Summary>
+### PR 1: <Title> (Size: S/M/L)
+- **Scope**: <files/components affected>
+- **Depends on**: None (base)
+- **Description**: <what this PR does>
+- **Worktree**: `<branch-name>` (`.claude/scripts/worktree-create.sh <name>`)
+### PR 2: <Title> (Size: S/M/L)
+- **Scope**: <files/components affected>
+- **Depends on**: PR 1
+- **Description**: <what this PR does>
+- **Worktree**: `<branch-name>`
+[Continue for additional PRs...]
+## Dependency Graph
+PR 1 -> PR 2 -> PR 3
+           \-> PR 4 (can parallel with PR 3)
+## Implementation Order
+1. Start with PR 1
+2. After PR 1 is approved, start PR 2
+3. ...
+## Notes
+- <any caveats, alternatives, or considerations>
+- <potential risks or areas needing clarification>
+```
+## Example
+For a task "Add MCP tool interface to environments":
+```markdown
+## PR Stack for: Add MCP tool interface to environments
+### PR 1: Add MCP tool base types (Size: S)
+- **Scope**: `src/openenv/core/mcp/`
+- **Depends on**: None
+- **Description**: Add MCPTool, MCPToolResult base classes
+- **Worktree**: `mcp-types`
+### PR 2: Add MCP tool registry (Size: M)
+- **Scope**: `src/openenv/core/mcp/`, `src/openenv/core/environment.py`
+- **Depends on**: PR 1
+- **Description**: Tool registry, environment integration
+- **Worktree**: `mcp-registry`
+### PR 3: Add MCP tools to echo_env (Size: M)
+- **Scope**: `envs/echo_env/`
+- **Depends on**: PR 2
+- **Description**: Reference implementation of MCP tools
+- **Worktree**: `mcp-echo`
+### PR 4: Documentation and tests (Size: M)
+- **Scope**: `docs/`, `tests/`
+- **Depends on**: PR 3
+- **Description**: User docs, comprehensive tests
+- **Worktree**: `mcp-docs`
+## Dependency Graph
+PR 1 -> PR 2 -> PR 3 -> PR 4
+## Implementation Order
+1. PR 1: Types (can merge quickly)
+2. PR 2: Registry (core logic)
+3. PR 3: Reference implementation
+4. PR 4: Documentation & tests
+## Notes
+- Consider adding tests in each PR for the new code
+- MCP config should follow RFC 001 dual-interface model
+```

.claude/agents/tester.md ADDED Viewed

	@@ -0,0 +1,153 @@

+---
+name: tester
+description: Expert test writer focused on high-signal, non-redundant tests
+tools:
+  - Bash
+  - Read
+  - Write
+  - Edit
+  - Grep
+  - Glob
+model: sonnet
+---
+# Tester Agent
+## Purpose
+Write high-signal, non-redundant tests. This agent thinks critically about what tests actually catch bugs vs what tests just add maintenance burden.
+## Philosophy
+### High-Signal Tests
+A test is high-signal if it:
+- Catches a bug that could actually happen in production
+- Tests behavior that's easy to break during refactoring
+- Covers an edge case that's non-obvious from the implementation
+- Validates a complex state machine or multi-step flow
+### Low-Signal Tests (Avoid)
+- Tests that verify `list.append` works
+- Tests that duplicate another test with trivial variation
+- Tests for code paths that are already covered by integration tests
+- Boundary tests for no-op cases (unless documenting important behavior)
+### Redundancy Detection
+Before writing a test, ask:
+1. Is this behavior already tested by another test?
+2. Would a failure here also cause another test to fail?
+3. Does this test add coverage the integration tests don't have?
+## Testing Hierarchy
+Reference: `.claude/docs/TESTING_STRATEGY.md`
+1. **Unit tests** - Pure functions, Pydantic validation, state mutations
+2. **Integration tests** - Client-server interaction, WebSocket protocol
+3. **E2E tests** - Full environment lifecycle (reset, step, step, ...)
+4. **Environment validation** - Structure and invariant checks
+## Edge Cases to Consider
+### State Management
+- Empty state / default values
+- Maximum capacity / overflow
+- Concurrent access (if applicable)
+- State after error recovery
+### Input Handling
+- Empty input
+- Unicode / multi-byte characters
+- Very long input
+- Malformed input (Pydantic validation)
+### Protocol / Events
+- Out-of-order messages
+- Duplicate messages
+- Missing messages in sequence
+- Timeout / connection drops
+### Python-Specific
+- None values where not expected
+- Type mismatches (runtime vs static)
+- Pydantic validation errors
+- Async/await edge cases
+## Process
+### 1. Analyze Target Code
+```bash
+# Find the code to test
+cat <file>
+# Check existing tests
+PYTHONPATH=src:envs uv run pytest tests/ --collect-only 2>&1 | grep "test_"
+```
+### 2. Identify Gaps
+- What edge cases aren't covered?
+- What state transitions lack tests?
+- What error paths are untested?
+### 3. Prioritize by Signal
+Rate each potential test:
+- **High**: Would catch real bugs, tests complex logic
+- **Medium**: Documents behavior, catches regression
+- **Low**: Trivial, redundant, or over-specified
+Only write High and some Medium tests.
+### 4. Write Minimal Tests
+- One assertion per behavior (when possible)
+- Clear test names that describe the scenario
+- Use fixtures to reduce boilerplate
+- Group related tests in classes
+### 5. Verify Tests FAIL
+After writing, verify tests fail (proving they test something real):
+```bash
+PYTHONPATH=src:envs uv run pytest tests/path/test_file.py -v
+```
+## Output Format
+```markdown
+## Test Analysis for <target>
+### Coverage Gaps Identified
+1. [Gap description] - Priority: High/Medium/Low
+2. ...
+### Tests Written
+| Test Name | Signal | Rationale |
+|-----------|--------|-----------|
+| test_foo_edge_case | High | Catches off-by-one in boundary |
+| test_bar_error_path | Medium | Documents error behavior |
+### Tests NOT Written (and why)
+- test_trivial_case: Already covered by test_foo
+- test_obvious_behavior: Implementation makes this impossible
+### Redundancy Check
+- Verified no overlap with existing tests: [list checked]
+- New tests add coverage for: [specific gaps filled]
+### Verification
+All tests FAIL as expected (no implementation yet).
+```
+## Anti-Patterns to Avoid
+1. **Over-mocking**: Don't mock things that are fast and deterministic
+2. **Testing implementation**: Test behavior, not internal structure
+3. **Flaky setup**: Tests should work with simple fixtures when possible
+4. **Assertion overload**: One test, one behavior
+5. **Copy-paste tests**: If tests are similar, parameterize with `@pytest.mark.parametrize`

.claude/docs/CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1,126 @@

+# Contributing with Claude Code
+OpenEnv is an agentic-first project. We expect most contributions to use Claude Code or similar tools. This document describes the workflow.
+## The Two-Phase Model
+### Phase 1: Design & Alignment (Human-Owned)
+Humans own the "what" and "why":
+- Major architectural decisions require RFCs
+- Discuss trade-offs in issues before implementation
+- Establish acceptance criteria and invariants
+- Review for alignment, not just correctness
+### Phase 2: Implementation (Claude-Owned)
+Claude handles the mechanical loop:
+```
+while not working:
+    try_some_shit()
+    test()
+```
+Humans intervene only for alignment questions.
+## TDD Workflow
+OpenEnv uses Test-Driven Development (TDD) enforced through Claude Code hooks.
+### Quick Start
+```bash
+# Start working on an issue with TDD enforcement
+/work-on-issue #42
+# Or create a plain worktree (no TDD — free editing)
+.claude/scripts/worktree-create.sh my-feature
+cd .worktrees/my-feature
+```
+### The Red-Green-Refactor Cycle
+1. **Red**: `/write-tests` - Create failing tests that encode requirements
+2. **Green**: `/implement` - Write minimal code to make tests pass
+3. **Docs**: `/update-docs` - Fix stale references across the repo
+4. **Refactor**: `/simplify` - Clean up without changing behavior
+5. **Validate**: `/pre-submit-pr` - Ensure everything passes before PR
+### When to Use TDD Mode
+TDD is opt-in — it is activated only by `/work-on-issue`, not by being in a worktree.
+**Use TDD (`/work-on-issue`) for:**
+- New features with clear acceptance criteria
+- Bug fixes where you can write a failing test first
+- Refactoring where tests ensure nothing breaks
+**Skip TDD (stay in main repo or use a plain worktree) for:**
+- Quick exploration and prototyping
+- Documentation updates
+- Simple config changes
+- Discussing approaches before implementing
+### Multi-Issue Work
+For parallel work on a batch of issues:
+```bash
+/sprint 67,68,69
+```
+This uses Agent Teams (if enabled) to work on all issues in parallel,
+each in its own worktree with TDD enforcement, then creates stacked PRs.
+Without Agent Teams, it prepares worktrees and requirements for manual work.
+### Bypassing TDD
+When TDD is active, say "skip TDD" in your message to bypass the edit blocking.
+This is useful for:
+- Fixing typos in code you just wrote
+- Making quick adjustments during iteration
+- Emergency hotfixes
+To deactivate TDD entirely: `bash .claude/hooks/tdd-deactivate.sh`
+## When to Write an RFC
+**Required for:**
+- New core APIs in `src/openenv/core/`
+- Breaking changes to existing APIs
+- Major architectural decisions
+- New abstractions or design patterns
+- Changes affecting the two-interface model (WebSocket/MCP)
+**Not required for:**
+- Bug fixes, documentation, minor refactoring
+- New example environments (unless introducing new patterns)
+- Dependency updates, test additions
+See `rfcs/README.md` for the RFC process.
+## Review Expectations
+### What Claude Catches (Tier 1)
+- Bugs, uninitialized variables, type errors
+- Lint failures, test failures
+- Security issues (credential exposure, injection)
+- Debug code left in (print statements, breakpoints)
+### What Humans Review (Tier 2)
+- Does this align with our principles in PRINCIPLES.md?
+- Does this maintain our invariants in INVARIANTS.md?
+- Is this the right trade-off for the project?
+- Should this decision be documented in an RFC?
+### Alignment Flags
+When Claude identifies a potential alignment issue, it formats as:
+```
+**ALIGNMENT FLAG**: [Brief description]
+- **Principle at stake**: [Which principle]
+- **The concern**: [What seems misaligned]
+- **Suggested reviewer**: @[maintainer]
+```
+## Available Tools
+For the full list of available skills, subagents, and recommended plugins, see [CLAUDE.md](../../CLAUDE.md#available-skills).

.claude/docs/INVARIANTS.md ADDED Viewed

	@@ -0,0 +1,100 @@

+# System Invariants
+These invariants must NEVER be violated. If a change would violate them, stop and flag for human review.
+## API Invariants
+1. **Gymnasium API signatures**
+   - `reset(seed?, episode_id?) -> Observation`
+   - `step(action) -> Observation`
+   - `state -> State`
+   - These signatures must not change without a major version bump
+2. **Generic type safety**
+   - All environments must use `Environment[ActT, ObsT, StateT]` generics
+   - All clients must use `EnvClient[ActT, ObsT, StateT]` generics
+   - Types must match between client and server
+3. **Pydantic serialization**
+   - All wire types (Action, Observation, State) must be Pydantic models
+   - Serialization must be JSON-compatible
+## Security Invariants
+1. **Agent isolation**
+   - Agents cannot access reset/simulation controls
+   - The WebSocket interface for reset/step is for orchestration only
+   - MCP tools must not expose simulation control to agents
+2. **Container isolation**
+   - Environments run in isolated Docker containers
+   - Containers must not have access to host filesystem (except explicitly mounted volumes)
+   - Network access must be explicitly configured
+3. **No credential exposure**
+   - Never log API keys, tokens, or secrets
+   - Never include credentials in error messages
+   - Use environment variables for sensitive configuration
+## Architectural Invariants
+1. **Dual API boundary** (see RFC 001, RFC 004)
+   OpenEnv exposes two distinct APIs to two different boundaries:
+   | Boundary | API | Purpose |
+   |----------|-----|---------|
+   | **Agent** | MCP (Model Context Protocol) | Tools the agent uses to interact with the environment |
+   | **Infrastructure** | Gym-like (`reset`, `step`, `state`) | Simulation control for training orchestration |
+   **Critical**: The Gym-like API is NOT accessible to the agent being trained.
+   **Why?** The agent must not be able to call `reset()`. If an agent could reset after crashing a car, it would learn that consequences are reversible - which breaks the training paradigm. The infrastructure calls `reset()` to clean up for the next episode, but from the agent's perspective, the episode simply ends.
+   **Violations to flag:**
+   - Exposing `reset()`, `step()`, or `state()` via MCP tools
+   - Giving agents direct access to the Gym-like WebSocket API
+   - Any mechanism that lets an agent trigger simulation control
+2. **Client-server separation**
+   - Clients must never import from `server/` directory
+   - Server code must never import client code
+   - Shared code goes in `models.py`
+3. **Rewards in environment**
+   - Reward computation must stay inside environment boundary
+   - External reward augmentation uses Transform pipeline
+   - Transforms are server-side only
+4. **Communication patterns**
+   - WebSocket for all environment communication (Gym-like API + metadata)
+   - No custom protocols
+   **Note**: We are in the process of deprecating HTTP (see PR #252) in favor of WebSocket-only, but we are still transitioning and both protocols are currently available.
+## Breaking Change Policy
+- **Pre-1.0**: Breaking changes acceptable if documented in release notes
+- **Post-1.0**: Semantic versioning strictly enforced
+  - MAJOR: Breaking changes
+  - MINOR: New features, backward compatible
+  - PATCH: Bug fixes only
+## Violation Response
+If you identify a potential invariant violation:
+1. **Stop** - Do not proceed with the change
+2. **Flag** - Create an ALIGNMENT FLAG with:
+   - Which invariant is at risk
+   - Why the change might violate it
+   - Suggested reviewer
+3. **Wait** - Get human approval before proceeding
+Example:
+```
+**ALIGNMENT FLAG**: Client importing server module
+- **Invariant at risk**: Client-server separation
+- **The concern**: client.py imports from server/environment.py
+- **Suggested reviewer**: @darktex
+```

.claude/docs/PATTERNS.md ADDED Viewed

	@@ -0,0 +1,141 @@

+# Code Patterns & Conventions
+This document describes the canonical patterns for OpenEnv code. Follow these patterns for consistency.
+## Environment Structure
+Every environment follows this structure:
+```
+my_env/
+├── __init__.py          # Export Action, Observation, Client
+├── models.py            # Action, Observation, State (Pydantic)
+├── client.py            # EnvClient[ActT, ObsT, StateT] subclass
+├── openenv.yaml         # Environment manifest
+├── pyproject.toml       # Dependencies
+└── server/
+    ├── my_environment.py  # Environment[ActT, ObsT, StateT] subclass
+    ├── app.py             # create_app() with HTTPEnvServer
+    ├── requirements.txt   # Docker dependencies
+    └── Dockerfile
+```
+Use `openenv init <name>` to scaffold this structure.
+## Type Safety Pattern
+Always use generics for type safety across the wire:
+```python
+# models.py
+from pydantic import BaseModel
+class MyAction(BaseModel):
+    command: str
+class MyObservation(BaseModel):
+    result: str
+    reward: float
+    done: bool
+class MyState(BaseModel):
+    episode_id: str
+    step_count: int
+```
+```python
+# client.py
+from openenv.core import EnvClient, StepResult
+class MyEnv(EnvClient[MyAction, MyObservation, MyState]):
+    def _step_payload(self, action: MyAction) -> dict:
+        return action.model_dump()
+    def _parse_result(self, payload: dict) -> StepResult[MyObservation]:
+        obs = MyObservation(**payload["observation"])
+        return StepResult(observation=obs, reward=obs.reward, done=obs.done)
+    def _parse_state(self, payload: dict) -> MyState:
+        return MyState(**payload)
+```
+```python
+# server/my_environment.py
+from openenv.core.env_server import Environment
+class MyEnvironment(Environment[MyAction, MyObservation, MyState]):
+    def reset(self, seed=None, episode_id=None) -> MyObservation:
+        ...
+    def step(self, action: MyAction) -> MyObservation:
+        ...
+    @property
+    def state(self) -> MyState:
+        ...
+```
+## Pydantic Models
+- All wire types must be Pydantic models
+- Use `Field()` for validation constraints
+- Enable `arbitrary_types_allowed` for numpy/torch types
+```python
+from pydantic import BaseModel, Field
+import numpy as np
+class MyObservation(BaseModel):
+    class Config:
+        arbitrary_types_allowed = True
+    grid: np.ndarray
+    score: float = Field(ge=0.0)
+```
+## Error Handling
+- Return error info in observations, don't raise exceptions
+- Use `done=True` with error observation for fatal errors
+- Reserve exceptions for truly exceptional cases (server crashes)
+```python
+def step(self, action: MyAction) -> MyObservation:
+    try:
+        result = self._execute(action)
+        return MyObservation(result=result, error=None, done=False)
+    except InvalidAction as e:
+        return MyObservation(result="", error=str(e), done=False)
+    except FatalError as e:
+        return MyObservation(result="", error=str(e), done=True)
+```
+## Reward Computation
+Rewards are computed inside the environment, not externally:
+```python
+def step(self, action: MyAction) -> MyObservation:
+    # Execute action
+    new_state = self._apply_action(action)
+    # Compute reward inside environment
+    reward = self._compute_reward(new_state)
+    return MyObservation(
+        state=new_state,
+        reward=reward,
+        done=self._is_terminal(new_state)
+    )
+```
+## FastAPI App Pattern
+```python
+# server/app.py
+from openenv.core.env_server import create_app
+from .my_environment import MyEnvironment
+from ..models import MyAction, MyObservation
+env = MyEnvironment()
+app = create_app(env, MyAction, MyObservation)
+```

.claude/docs/PRINCIPLES.md ADDED Viewed

	@@ -0,0 +1,45 @@

+# OpenEnv Design Principles
+This document encodes the shared alignment between contributors on what OpenEnv optimizes for, what we trade off, and key decisions we've made.
+## Core Principles (from RFC 000)
+1. **Minimize lifecycle deltas**: Training → Evals → Production should use identical interfaces
+2. **Minimize human-agent divergence**: Tools that work for humans should work for agents
+3. **Be hands-on**: Provide ready-to-use implementations, not just specs
+4. **Design for LLMs**: Optimize for context efficiency, in-distribution behavior
+## What We Optimize For
+- **Simple Gymnasium-style API** (`reset`, `step`, `state`) - familiar to RL practitioners
+- **Container isolation** for reproducibility and security
+- **Type safety** with generics and Pydantic across the wire
+- **Production-readiness** from day one - training and production use same interfaces
+## What We Trade Off
+- **Flexibility for simplicity**: One canonical way to build environments
+- **Performance for isolation**: Docker overhead is acceptable for reproducibility
+- **Cutting-edge for stability**: FastAPI over experimental frameworks
+## Key Decisions Made
+These decisions are documented in RFCs and should not be changed without a new RFC:
+| Decision | Rationale | RFC |
+|----------|-----------|-----|
+| **Rewards inside environment** | Domain knowledge encapsulated in env, not external | 002 |
+| **Agents cannot reset** | Prevents learning that consequences are reversible | 001 |
+| **MCP as universal standard** | All agent-environment tool interaction via MCP | 003 |
+| **WebSocket for step loop** | Lower latency than HTTP per-step | 002 |
+| **Two-interface model** | WebSocket for orchestration, MCP for agent tools | 001 |
+| **One env = one trajectory** | Batching via environment stacking, not multiplexing | 004 |
+**One env = one trajectory**: Environments do not support multiplexed trajectories. To generate batches, stack multiple environment instances. Helpers like `EnvPool` orchestrate batch collection across the stack. Multiplexing is left to future work.
+## When to Revisit These Principles
+- If a principle blocks a valid use case, open an RFC discussion
+- If production experience contradicts a trade-off, document and propose changes
+- Pre-1.0: Breaking changes acceptable with documentation
+- Post-1.0: Semantic versioning strictly enforced

.claude/docs/REPO_WALKTHROUGH.md ADDED Viewed

	@@ -0,0 +1,248 @@

+# Repository Walkthrough
+This document provides a navigational guide to the OpenEnv codebase.
+## Top-Level Structure
+```
+OpenEnv/
+├── CLAUDE.md                 # Entry point for Claude Code - build commands, architecture overview
+├── README.md                 # Project overview and getting started
+├── pyproject.toml            # Python package configuration (uv/pip)
+├── uv.lock                   # Locked dependencies
+│
+├── src/                      # Core library code (installed as `openenv`)
+├── envs/                     # Example environments (not installed, used via PYTHONPATH)
+├── tests/                    # Test suite
+├── examples/                 # Usage examples and tutorials
+├── docs/                     # Documentation (Sphinx)
+├── rfcs/                     # Design documents and architectural decisions
+├── scripts/                  # Utility scripts
+│
+├── .claude/                  # Claude Code configuration (skills, agents, docs)
+├── .github/                  # GitHub Actions, PR templates, issue templates
+└── .gitignore
+```
+## Source Code (`src/`)
+```
+src/
+├── openenv/                  # Main package
+│   ├── __init__.py
+│   │
+│   ├── core/                 # Core abstractions - the heart of OpenEnv
+│   │   ├── env_client.py         # EnvClient base class (WebSocket client)
+│   │   ├── client_types.py       # Client-side type definitions
+│   │   ├── utils.py              # Shared utilities
+│   │   │
+│   │   ├── env_server/           # Server-side components
+│   │   │   ├── interfaces.py         # Environment abstract base class
+│   │   │   ├── http_server.py        # HTTPEnvServer (FastAPI + WebSocket)
+│   │   │   ├── types.py              # Wire types (Action, Observation, State, WS messages)
+│   │   │   ├── serialization.py      # Pydantic serialization helpers
+│   │   │   ├── base_transforms.py    # Transform pipeline for rewards/observations
+│   │   │   ├── web_interface.py      # Web UI for debugging environments
+│   │   │   ├── route_config.py       # FastAPI route configuration
+│   │   │   └── exceptions.py         # Server-side exceptions
+│   │   │
+│   │   ├── containers/           # Container lifecycle management
+│   │   │   ├── runtime/              # Provider implementations
+│   │   │   │   ├── providers.py           # ContainerProvider/RuntimeProvider ABCs + LocalDockerProvider
+│   │   │   │   ├── daytona_provider.py    # DaytonaProvider (Daytona cloud sandboxes)
+│   │   │   │   └── uv_provider.py         # UVProvider (for local dev)
+│   │   │   └── images/               # Base Docker images
+│   │   │       └── Dockerfile            # openenv-base image
+│   │   │
+│   │   └── tools/                # Reusable tool implementations
+│   │       ├── local_python_executor.py  # Python code execution
+│   │       └── git_server_client.py      # Git operations
+│   │
+│   └── cli/                  # Command-line interface
+│       ├── __main__.py           # Entry point (`python -m openenv.cli`)
+│       ├── commands/             # CLI subcommands
+│       │   ├── init.py               # `openenv init` - scaffold new env
+│       │   ├── serve.py              # `openenv serve` - run server locally
+│       │   ├── build.py              # `openenv build` - build Docker image
+│       │   ├── push.py               # `openenv push` - deploy to HF Spaces
+│       │   └── validate.py           # `openenv validate` - check config
+│       └── templates/            # Scaffolding templates
+│           └── openenv_env/          # Template for `openenv init`
+│
+└── openenv_core/             # Legacy compatibility shim (imports from openenv.core)
+```
+## Environments (`envs/`)
+Each environment follows a consistent structure:
+```
+envs/
+├── echo_env/                 # Minimal reference environment
+│   ├── client.py                 # EnvClient subclass
+│   ├── models.py                 # Action, Observation, State models
+│   ├── openenv.yaml              # Environment manifest
+│   ├── pyproject.toml            # Environment-specific dependencies
+│   ├── README.md
+│   └── server/
+│       ├── app.py                    # FastAPI app setup
+│       ├── echo_environment.py       # Environment implementation
+│       └── Dockerfile                # Container definition
+│
+├── coding_env/               # Python code execution environment
+├── chat_env/                 # Conversational environment
+├── textarena_env/            # Text-based games (TextArena)
+├── browsergym_env/           # Browser automation (BrowserGym)
+├── openspiel_env/            # Game theory environments (OpenSpiel)
+├── atari_env/                # Atari games via Gymnasium
+├── finrl_env/                # Financial RL environment
+├── git_env/                  # Git operations environment
+├── snake_env/                # Classic Snake game
+├── sumo_rl_env/              # Traffic simulation (SUMO)
+├── connect4_env/             # Connect Four game
+├── dipg_safety_env/          # Safety-focused environment
+├── reasoning_gym_env/        # Reasoning problems and puzzles
+└── websearch_env/            # Web search environment
+```
+## Tests (`tests/`)
+```
+tests/
+├── conftest.py               # Pytest fixtures
+├── test_*.py                 # Core library tests
+│
+├── envs/                     # Per-environment integration tests
+│   ├── test_echo_environment.py
+│   ├── test_coding_environment.py
+│   └── ...
+│
+├── test_cli/                 # CLI command tests
+└── scripts/                  # Test utility scripts
+```
+## RFCs (`rfcs/`)
+Design documents that capture architectural decisions:
+```
+rfcs/
+├── README.md                 # RFC process and template
+├── 000-project-phases.md     # Project vision and phases
+├── 001-abstractions.md       # Core abstractions (Environment, Client, two-interface model)
+├── 002-env-spec.md           # Environment specification
+└── 003-mcp-support.md        # MCP integration design
+```
+## Claude Code Configuration (`.claude/`)
+```
+.claude/
+├── docs/                     # Alignment documents
+│   ├── PRINCIPLES.md             # Design principles and trade-offs
+│   ├── INVARIANTS.md             # System invariants (must never violate)
+│   ├── PATTERNS.md               # Code patterns and conventions
+│   ├── CONTRIBUTING.md           # Agentic contribution workflow
+│   └── REPO_WALKTHROUGH.md       # This file
+│
+├── skills/                   # Auto-discovered skills
+│   ├── alignment-review/
+│   │   └── SKILL.md              # Two-tier code review
+│   ├── implement/
+│   │   └── SKILL.md              # Make tests pass (Green phase)
+│   ├── pre-submit-pr/
+│   │   └── SKILL.md              # PR readiness validation
+│   ├── rfc-check/
+│   │   └── SKILL.md              # RFC requirement analysis
+│   ├── simplify/
+│   │   └── SKILL.md              # Refactor after tests pass
+│   ├── sprint/
+│   │   └── SKILL.md              # Parallel multi-issue batch (Agent Teams)
+│   ├── update-docs/
+│   │   └── SKILL.md              # Fix stale docs after API changes
+│   ├── watch-pr/
+│   │   └── SKILL.md              # Monitor CI + Greptile review after PR
+│   ├── work-on-issue/
+│   │   └── SKILL.md              # Start TDD on a single issue
+│   └── write-tests/
+│       └── SKILL.md              # Write failing tests (Red phase)
+│
+├── agents/                   # Specialized subagents
+│   ├── alignment-reviewer.md     # Review for bugs + alignment
+│   ├── build-validator.md        # Validate builds
+│   ├── docs-updater.md           # Fix stale docs after API changes
+│   ├── env-validator.md          # Validate environments e2e
+│   ├── implementer.md            # Make tests pass with minimal code
+│   ├── issue-worker.md           # Extract requirements from GitHub issues
+│   ├── openenv-architect.md      # Design new features
+│   ├── pr-planner.md             # Plan stacked PRs for complex features
+│   └── tester.md                 # Write high-signal, failing tests
+│
+└── hooks/                    # Automation scripts
+    ├── lint.sh                   # Run ruff format check
+    ├── test.sh                   # Run pytest
+    ├── check-debug.sh            # Find debug code
+    ├── post-push-pr.sh           # Validate PR after push (freshness, CI, conflicts)
+    ├── tdd-state.sh              # Shared TDD state helpers (is_tdd_active, activate, deactivate)
+    ├── tdd-deactivate.sh         # Standalone TDD deactivation script
+    ├── install.sh                # Install git hooks (pre-commit, pre-push, etc.)
+    ├── session-start.sh          # SessionStart banner (3-state: TDD/worktree/explore)
+    ├── no-direct-code.sh         # PreToolUse: block direct edits when TDD active
+    ├── pre-commit-check.sh       # PreToolUse: warn on git commit in TDD mode
+    ├── pre-pr-check.sh           # PreToolUse: block gh pr create if branch stale
+    ├── delegate-todos.sh         # PostToolUse: TDD workflow reminder on TodoWrite
+    ├── after-tester.sh           # SubagentStop: next steps after tester
+    ├── after-implementer.sh      # SubagentStop: next steps after implementer
+    ├── ci-wait.sh                # CI polling: block until checks complete or timeout
+    └── after-docs-updater.sh     # SubagentStop: next steps after docs-updater
+```
+## Documentation (`docs/`)
+Sphinx-based documentation:
+```
+docs/
+├── Makefile                  # Sphinx build targets (html, html-noplot, html-stable)
+├── README.md                 # Local build instructions
+│
+└── source/                   # Sphinx source root
+    ├── conf.py               # Sphinx configuration
+    ├── index.md              # Home page
+    ├── core.md               # Core API reference (autodoc)
+    ├── cli.md                # CLI reference (autodoc)
+    ├── auto_discovery.md     # Auto-discovery API docs
+    ├── customizing-web-ui.md # Web UI customization guide
+    ├── environments.md       # Environments catalog page
+    │
+    ├── environments/         # Per-environment documentation
+    │   ├── echo.md
+    │   ├── coding.md
+    │   └── ...
+    │
+    ├── getting_started/      # Sphinx Gallery executable tutorials
+    │   ├── plot_01_introduction_quickstart.py
+    │   ├── plot_02_using_environments.py
+    │   ├── plot_03_building_environments.py
+    │   ├── contributing-envs.md
+    │   └── environment-builder.md
+    │
+    ├── tutorials/            # Additional tutorials
+    │   ├── openenv-tutorial.md
+    │   ├── wordle-grpo.md
+    │   └── rl-training-2048.md
+    │
+    └── _static/              # Static assets (versions.json, etc.)
+```
+## Key Files to Know
+| File | Purpose |
+|------|---------|
+| `src/openenv/core/env_server/interfaces.py` | `Environment` abstract base class |
+| `src/openenv/core/env_client.py` | `EnvClient` WebSocket client |
+| `src/openenv/core/env_server/http_server.py` | `HTTPEnvServer` FastAPI wrapper |
+| `src/openenv/core/env_server/types.py` | All wire types and WebSocket messages |
+| `envs/echo_env/` | Reference implementation - start here |
+| `rfcs/001-abstractions.md` | Core architectural decisions |
+| `.claude/docs/INVARIANTS.md` | Rules that must never be broken |

.claude/docs/TESTING_STRATEGY.md ADDED Viewed

	@@ -0,0 +1,221 @@

+# OpenEnv Testing Strategy
+This document outlines OpenEnv's testing philosophy, hierarchy, and conventions.
+## Testing Hierarchy
+Tests are organized by scope and signal:
+### 1. Unit Tests (Fastest, Most Isolated)
+Test individual functions and classes in isolation.
+**Good candidates:**
+- Pure functions (e.g., reward calculations)
+- Pydantic model validation
+- State mutations
+- Utility functions
+**Location:** `tests/` mirroring `src/` structure
+**Example:**
+```python
+def test_action_model_validates_required_fields():
+    with pytest.raises(ValidationError):
+        Action()  # Missing required fields
+```
+### 2. Integration Tests (Medium Scope)
+Test component interactions, especially client-server communication.
+**Good candidates:**
+- Client-server WebSocket protocol
+- Environment lifecycle (reset → step → step → ...)
+- Type serialization across wire boundary
+**Location:** `tests/` with `_integration` suffix or in dedicated directories
+**Example:**
+```python
+async def test_client_connects_and_resets():
+    async with start_server() as server:
+        client = EchoEnvClient(server.url)
+        obs = await client.reset()
+        assert isinstance(obs, EchoObservation)
+```
+### 3. Environment Validation Tests
+Test that environments follow OpenEnv conventions and invariants.
+**Good candidates:**
+- File structure validation
+- Type consistency (generics match)
+- Invariant checking (no client→server imports)
+**Location:** `tests/envs/`
+**Uses:** `env-validator` agent patterns
+### 4. E2E Tests (Slowest, Highest Signal)
+Test complete workflows from user perspective.
+**Good candidates:**
+- Full training loop simulation
+- Container lifecycle
+- MCP tool interactions
+**Location:** `tests/e2e/` (if needed)
+## Test Location Conventions
+```
+tests/
+├── conftest.py              # Shared fixtures
+├── core/                    # Core library tests
+│   ├── test_environment.py
+│   ├── test_client.py
+│   └── test_server.py
+├── envs/                    # Environment-specific tests
+│   ├── test_echo_environment.py
+│   └── test_<env>_environment.py
+└── e2e/                     # End-to-end tests (optional)
+```
+## Running Tests
+### Full Suite
+```bash
+PYTHONPATH=src:envs uv run pytest tests/ -v --tb=short
+```
+### Single File
+```bash
+PYTHONPATH=src:envs uv run pytest tests/path/test_file.py -v
+```
+### Single Test
+```bash
+PYTHONPATH=src:envs uv run pytest tests/path/test_file.py::test_name -v
+```
+### Exclude Special Environments
+Some environments require special setup (browser, websearch). The hook script excludes these:
+```bash
+bash .claude/hooks/test.sh
+```
+## Edge Cases to Consider
+### Python-Specific
+- `None` where not expected
+- Type mismatches at runtime (despite type hints)
+- Pydantic `ValidationError` on invalid data
+- Async/await edge cases (timeouts, cancellation)
+### State Management
+- Empty state / default values
+- Maximum capacity / overflow
+- State after error recovery
+- Concurrent access patterns
+### Protocol / WebSocket
+- Connection drops mid-step
+- Out-of-order messages
+- Malformed JSON payloads
+- Timeout handling
+### Pydantic Models
+- Extra fields in input (strict mode)
+- Missing required fields
+- Type coercion behavior
+- Nested model validation
+## Test Patterns
+### Fixtures for Common Setup
+```python
+@pytest.fixture
+def echo_env():
+    """Create a fresh EchoEnvironment for each test."""
+    return EchoEnvironment()
+def test_reset_returns_observation(echo_env):
+    obs, _ = echo_env.reset()
+    assert isinstance(obs, EchoObservation)
+```
+### Async Tests
+```python
+import pytest
+@pytest.mark.asyncio
+async def test_async_client():
+    async with create_client() as client:
+        result = await client.step(action)
+        assert result.done is False
+```
+### Parametrized Tests
+```python
+@pytest.mark.parametrize("input,expected", [
+    ("hello", "HELLO"),
+    ("", ""),
+    ("123", "123"),
+])
+def test_transform(input, expected):
+    assert transform(input) == expected
+```
+## What Makes a Good Test
+### High-Signal (Write These)
+- Catches bugs that could happen in production
+- Tests behavior from user perspective
+- Covers non-obvious edge cases
+- Validates complex state machines
+### Low-Signal (Avoid These)
+- Tests that verify Python built-ins work
+- Duplicates of existing tests with trivial variation
+- Tests that mock so much they don't test real behavior
+- Tests for code paths already covered by integration tests
+## TDD Workflow
+The testing strategy integrates with the TDD workflow:
+1. **Red**: `/write-tests` creates failing tests
+2. **Green**: `/implement` makes tests pass
+3. **Refactor**: `/simplify` cleans up code
+4. **Validate**: `/pre-submit-pr` runs full suite
+## Coverage Gaps (Known)
+Document known gaps here as they're identified:
+- [ ] WebSocket reconnection handling
+- [ ] Container lifecycle edge cases
+- [ ] MCP tool error responses (when MCP is added)
+## Verification
+After writing tests, verify with:
+```bash
+# Run specific tests
+PYTHONPATH=src:envs uv run pytest tests/path/test_file.py -v
+# Check coverage (if coverage is set up)
+PYTHONPATH=src:envs uv run pytest tests/ --cov=src/openenv
+# Run lint to ensure test code is clean
+uv run ruff check tests/
+```

.claude/hooks/after-docs-updater.sh ADDED Viewed

	@@ -0,0 +1,11 @@

+#!/bin/bash
+# SubagentStop hook for docs-updater: Suggest next steps
+echo ""
+echo "Documentation update complete."
+echo ""
+echo "Next steps:"
+echo "  - /simplify       ->  refactor if needed (optional)"
+echo "  - /pre-submit-pr  ->  validate before creating PR"
+echo "  - /watch-pr       ->  monitor CI + review after PR (after pre-submit)"
+echo ""

.claude/hooks/after-implementer.sh ADDED Viewed

	@@ -0,0 +1,12 @@

+#!/bin/bash
+# SubagentStop hook for implementer: Suggest next steps
+echo ""
+echo "Implementation complete."
+echo ""
+echo "Next steps:"
+echo "  - /update-docs   ->  fix stale docs if APIs changed"
+echo "  - /simplify      ->  refactor if needed (optional)"
+echo "  - Mark todo complete and move to next pending todo"
+echo "  - /pre-submit-pr ->  validate before creating PR"
+echo ""

.claude/hooks/after-tester.sh ADDED Viewed

	@@ -0,0 +1,8 @@

+#!/bin/bash
+# SubagentStop hook for tester: Chain to /implement
+echo ""
+echo "Tests written by tester agent."
+echo ""
+echo "Next step: Run /implement to make the tests pass."
+echo ""

.claude/hooks/check-debug.sh ADDED Viewed

	@@ -0,0 +1,71 @@

+#!/bin/bash
+# Check for debug code that shouldn't be committed
+# Exit code 0 always (informational), but outputs findings
+# Check for required tools
+if ! command -v rg &> /dev/null; then
+    echo "Warning: 'rg' (ripgrep) is not installed, falling back to grep"
+    USE_GREP=1
+fi
+echo "=== Checking for debug code ==="
+found_issues=0
+# Check for print statements (allow if marked with # ok-to-print)
+echo ""
+echo "--- Print statements in src/ ---"
+if [ "$USE_GREP" = "1" ]; then
+    prints=$(grep -rn "print(" src/ --include="*.py" 2>/dev/null | grep -v "# ok-to-print" || true)
+else
+    prints=$(rg -n "print\(" src/ --glob "*.py" 2>/dev/null | grep -v "# ok-to-print" || true)
+fi
+if [ -n "$prints" ]; then
+    echo "$prints"
+    found_issues=1
+else
+    echo "None found"
+fi
+# Check for TODO/FIXME/XXX/HACK comments
+echo ""
+echo "--- TODO/FIXME comments in src/ ---"
+if [ "$USE_GREP" = "1" ]; then
+    todos=$(grep -rn -E "TODO|FIXME|XXX|HACK" src/ --include="*.py" 2>/dev/null || true)
+else
+    todos=$(rg -n "TODO|FIXME|XXX|HACK" src/ --glob "*.py" 2>/dev/null || true)
+fi
+if [ -n "$todos" ]; then
+    echo "$todos"
+    found_issues=1
+else
+    echo "None found"
+fi
+# Check for debugger statements
+echo ""
+echo "--- Debugger statements in src/ ---"
+if [ "$USE_GREP" = "1" ]; then
+    debuggers=$(grep -rn -E "breakpoint\(\)|pdb\.|ipdb\." src/ --include="*.py" 2>/dev/null || true)
+else
+    debuggers=$(rg -n "breakpoint\(\)|pdb\.|ipdb\." src/ --glob "*.py" 2>/dev/null || true)
+fi
+if [ -n "$debuggers" ]; then
+    echo "$debuggers"
+    found_issues=1
+else
+    echo "None found"
+fi
+echo ""
+if [ $found_issues -eq 1 ]; then
+    echo "=== Debug code found (review before committing) ==="
+else
+    echo "=== No debug code found ==="
+fi
+# Always exit 0 - this is informational
+exit 0

.claude/hooks/check-line-endings.sh ADDED Viewed

	@@ -0,0 +1,76 @@

+#!/bin/bash
+# Check for CRLF line endings in text files
+# Uses portable constructs that work in sandboxed environments
+set -e
+# Get the directory to check (default to current directory)
+CHECK_DIR="${1:-.}"
+# Find all tracked text files with CRLF line endings
+CRLF_FILES=()
+# Check if we're in a git repository
+if git -C "$CHECK_DIR" rev-parse --git-dir > /dev/null 2>&1; then
+    # In a git repo - check only tracked files
+    # Use a temp file for portability (avoids process substitution issues in sandboxes)
+    TEMP_FILE=$(mktemp)
+    trap "rm -f '$TEMP_FILE'" EXIT
+    (cd "$CHECK_DIR" && git ls-files) > "$TEMP_FILE"
+    while IFS= read -r file; do
+        # Skip if file doesn't exist
+        if [[ ! -f "$file" ]]; then
+            continue
+        fi
+        # Check if file is binary using git
+        if git diff --no-index --numstat /dev/null "$file" 2>/dev/null | grep -q "^-"; then
+            continue
+        fi
+        # Check for CRLF line endings
+        if grep -qU $'\r' "$file" 2>/dev/null; then
+            CRLF_FILES+=("$file")
+        fi
+    done < "$TEMP_FILE"
+else
+    # Not a git repo - check all text files
+    # Use a temp file for portability
+    TEMP_FILE=$(mktemp)
+    trap "rm -f '$TEMP_FILE'" EXIT
+    find "$CHECK_DIR" -type f -print > "$TEMP_FILE" 2>/dev/null || true
+    while IFS= read -r file; do
+        # Skip if file doesn't exist or is a directory
+        if [[ ! -f "$file" ]]; then
+            continue
+        fi
+        # Simple binary file check - skip files with null bytes
+        if grep -qP '\x00' "$file" 2>/dev/null; then
+            continue
+        fi
+        # Check for CRLF line endings
+        if grep -qU $'\r' "$file" 2>/dev/null; then
+            CRLF_FILES+=("$file")
+        fi
+    done < "$TEMP_FILE"
+fi
+# Report results
+if [[ ${#CRLF_FILES[@]} -gt 0 ]]; then
+    echo "ERROR: Found ${#CRLF_FILES[@]} file(s) with CRLF line endings:" >&2
+    for file in "${CRLF_FILES[@]}"; do
+        echo "  - $file" >&2
+    done
+    echo "" >&2
+    echo "To fix, convert these files to LF line endings:" >&2
+    echo "  dos2unix <file>  # or use your editor's line ending conversion" >&2
+    exit 1
+fi
+exit 0

.claude/hooks/ci-wait.sh ADDED Viewed

	@@ -0,0 +1,96 @@

+#!/bin/bash
+# CI polling script. Blocks until all CI checks complete or timeout.
+#
+# Usage: bash .claude/hooks/ci-wait.sh <PR_NUMBER> [TIMEOUT_SECONDS]
+#
+# Exit codes:
+#   0 - All checks passed
+#   1 - One or more checks failed
+#   2 - Timeout exceeded
+#   3 - Error (could not fetch PR)
+#
+# Polls every 120 seconds. Prints status updates to stdout.
+set -e
+PR_NUMBER="${1:?Usage: ci-wait.sh <PR_NUMBER> [TIMEOUT_SECONDS]}"
+TIMEOUT="${2:-1800}"
+POLL_INTERVAL=120
+ELAPSED=0
+echo ""
+echo "==================================================================="
+echo "  CI Wait: Monitoring PR #$PR_NUMBER"
+echo "==================================================================="
+echo "  Timeout: ${TIMEOUT}s | Poll interval: ${POLL_INTERVAL}s"
+echo ""
+while true; do
+    # Fetch current check status
+    PR_JSON=$(gh pr view "$PR_NUMBER" --json statusCheckRollup 2>/dev/null || true)
+    if [[ -z "$PR_JSON" ]]; then
+        echo "ERROR: Could not fetch PR #$PR_NUMBER"
+        exit 3
+    fi
+    CHECK_COUNT=$(echo "$PR_JSON" | jq '.statusCheckRollup | length' 2>/dev/null || echo "0")
+    if [[ "$CHECK_COUNT" -eq 0 ]]; then
+        echo "[$(date +%H:%M:%S)] No CI checks found yet. Waiting..."
+    else
+        PENDING=$(echo "$PR_JSON" | jq '[.statusCheckRollup[] | select(.status != "COMPLETED")] | length' 2>/dev/null || echo "0")
+        FAILED_CHECKS=$(echo "$PR_JSON" | jq '[.statusCheckRollup[] | select(.conclusion == "FAILURE")] | length' 2>/dev/null || echo "0")
+        PASSED_CHECKS=$(echo "$PR_JSON" | jq '[.statusCheckRollup[] | select(.conclusion == "SUCCESS")] | length' 2>/dev/null || echo "0")
+        echo "[$(date +%H:%M:%S)] Checks: $PASSED_CHECKS passed, $FAILED_CHECKS failed, $PENDING pending (of $CHECK_COUNT)"
+        # If no checks are pending, we have a final result
+        if [[ "$PENDING" -eq 0 ]]; then
+            echo ""
+            if [[ "$FAILED_CHECKS" -gt 0 ]]; then
+                echo "==================================================================="
+                echo "  CI FAILED: $FAILED_CHECKS check(s) failed"
+                echo "==================================================================="
+                echo ""
+                echo "Failed checks:"
+                echo "$PR_JSON" | jq -r '.statusCheckRollup[] | select(.conclusion == "FAILURE") | "  - \(.name)"'
+                echo ""
+                exit 1
+            elif [[ "$PASSED_CHECKS" -ne "$CHECK_COUNT" ]]; then
+                echo "==================================================================="
+                echo "  CI INCOMPLETE: $((CHECK_COUNT - PASSED_CHECKS - FAILED_CHECKS)) check(s) cancelled/skipped"
+                echo "==================================================================="
+                echo ""
+                echo "Non-success checks:"
+                echo "$PR_JSON" | jq -r '.statusCheckRollup[] | select(.conclusion != "SUCCESS" and .conclusion != null) | "  - \(.name): \(.conclusion)"'
+                echo ""
+                exit 1
+            else
+                echo "==================================================================="
+                echo "  CI PASSED: All $PASSED_CHECKS check(s) passed"
+                echo "==================================================================="
+                echo ""
+                exit 0
+            fi
+        fi
+    fi
+    # Check timeout
+    if [[ "$ELAPSED" -ge "$TIMEOUT" ]]; then
+        echo ""
+        echo "==================================================================="
+        echo "  CI TIMEOUT: Exceeded ${TIMEOUT}s waiting for checks"
+        echo "==================================================================="
+        echo ""
+        if [[ "$CHECK_COUNT" -gt 0 ]]; then
+            echo "Pending checks:"
+            echo "$PR_JSON" | jq -r '.statusCheckRollup[] | select(.status != "COMPLETED") | "  - \(.name): \(.status)"'
+            echo ""
+        fi
+        exit 2
+    fi
+    # Sleep and increment
+    sleep "$POLL_INTERVAL"
+    ELAPSED=$((ELAPSED + POLL_INTERVAL))
+done

.claude/hooks/delegate-todos.sh ADDED Viewed

	@@ -0,0 +1,21 @@

+#!/bin/bash
+# PostToolUse hook for TodoWrite: Remind about TDD workflow when TDD is active
+# Check if TDD is active
+source "$(dirname "$0")/tdd-state.sh"
+if ! is_tdd_active; then
+    exit 0  # TDD not active, no reminder needed
+fi
+# Soft reminder about the workflow
+cat << 'EOF'
+TDD Workflow Reminder:
+  For each todo that requires implementation:
+  1. /write-tests  ->  create failing tests first
+  2. /implement    ->  make tests pass
+  3. Mark todo complete
+EOF
+exit 0

.claude/hooks/install.sh ADDED Viewed

	@@ -0,0 +1,292 @@

+#!/bin/bash
+# Install git hooks for OpenEnv
+#
+# Usage: .claude/hooks/install.sh
+#
+# This installs pre-commit, pre-push, commit-msg, and post-merge hooks.
+set -e
+REPO_ROOT="$(git rev-parse --show-toplevel)"
+# Use --git-common-dir to get the shared hooks directory (works in worktrees too)
+GIT_COMMON_DIR="$(git rev-parse --git-common-dir)"
+HOOKS_DIR="$GIT_COMMON_DIR/hooks"
+# Create hooks directory if it doesn't exist
+mkdir -p "$HOOKS_DIR"
+echo "Installing git hooks..."
+# Pre-commit hook: format, lint, branch check
+cat > "$HOOKS_DIR/pre-commit" << 'EOF'
+#!/bin/bash
+# Installed by .claude/hooks/install.sh
+echo "Running pre-commit checks..."
+REPO_ROOT="$(git rev-parse --show-toplevel)"
+# === Branch Check (BLOCKING) ===
+echo ""
+echo "=== Branch Check ==="
+BRANCH=$(git rev-parse --abbrev-ref HEAD)
+if [ "$BRANCH" = "main" ] || [ "$BRANCH" = "master" ]; then
+    echo "ERROR: Cannot commit directly to $BRANCH"
+    echo ""
+    echo "Create a worktree first:"
+    echo "  $REPO_ROOT/.claude/scripts/worktree-create.sh <name>"
+    exit 1
+fi
+echo "On branch: $BRANCH"
+# === Import Sort + Format Check ===
+echo ""
+echo "=== Import Sort + Format Check ==="
+# Run the arc f pipeline: usort then ruff format
+uv run usort format src/ tests/ >/dev/null 2>&1
+uv run ruff format src/ tests/ >/dev/null 2>&1
+CHANGED=$(git diff --name-only -- '*.py' 2>/dev/null || true)
+if [ -n "$CHANGED" ]; then
+    echo "Files need formatting (usort + ruff format):"
+    echo "$CHANGED"
+    echo ""
+    echo "Auto-formatting and staging changes..."
+    git add $CHANGED
+    echo "Fixed! Changes staged."
+else
+    echo "Import sort + format check passed!"
+fi
+# === Lint Check ===
+echo ""
+echo "=== Lint Check ==="
+"$REPO_ROOT/.claude/hooks/lint.sh" || {
+    echo "Lint failed. Fix issues before committing."
+    exit 1
+}
+# === Debug Artifacts (non-blocking) ===
+echo ""
+echo "=== Debug Artifacts ==="
+"$REPO_ROOT/.claude/hooks/check-debug.sh"
+echo ""
+echo "Pre-commit checks passed"
+EOF
+chmod +x "$HOOKS_DIR/pre-commit"
+echo "  Installed pre-commit hook"
+# Commit-msg hook: require issue reference
+cat > "$HOOKS_DIR/commit-msg" << 'EOF'
+#!/bin/bash
+# Installed by .claude/hooks/install.sh
+# Require issue reference in commit message
+COMMIT_MSG_FILE="$1"
+COMMIT_MSG=$(cat "$COMMIT_MSG_FILE")
+# Check for issue reference (#123, Fixes #123, Part of #123, etc.)
+if echo "$COMMIT_MSG" | grep -qE '#[0-9]+'; then
+    exit 0
+fi
+# Allow WIP commits without issue reference
+if echo "$COMMIT_MSG" | grep -qiE '^WIP'; then
+    exit 0
+fi
+echo ""
+echo "WARNING: Commit message should reference an issue (#123)"
+echo "  Examples: 'Fix bug in parser #45'"
+echo "            'Fixes #123'"
+echo "            'Part of #99'"
+echo ""
+echo "Proceeding anyway (this is a soft warning)..."
+exit 0
+EOF
+chmod +x "$HOOKS_DIR/commit-msg"
+echo "  Installed commit-msg hook"
+# Pre-push hook: comprehensive validation
+cat > "$HOOKS_DIR/pre-push" << 'EOF'
+#!/bin/bash
+# Installed by .claude/hooks/install.sh
+# Comprehensive pre-push validation
+echo "Running pre-push checks..."
+REPO_ROOT="$(git rev-parse --show-toplevel)"
+FAILED=0
+# 0. BLOCK PUSHES TO MAIN/MASTER (most critical check)
+echo ""
+echo "=== Protected Branch Check ==="
+# Read the remote and refs being pushed from stdin
+while read local_ref local_sha remote_ref remote_sha; do
+    # Extract branch name from remote ref (refs/heads/main -> main)
+    remote_branch="${remote_ref#refs/heads/}"
+    if [ "$remote_branch" = "main" ] || [ "$remote_branch" = "master" ]; then
+        echo "ERROR: Direct push to '$remote_branch' is blocked!"
+        echo ""
+        echo "  You are trying to push to a protected branch."
+        echo "  Create a PR instead:"
+        echo ""
+        echo "    # Push to a feature branch"
+        echo "    git push -u origin HEAD:feature/your-branch-name"
+        echo ""
+        echo "    # Then create a PR"
+        echo "    gh pr create"
+        echo ""
+        echo "  To bypass (not recommended): git push --no-verify"
+        exit 1
+    fi
+done
+echo "Not pushing to protected branch - OK"
+# 1. Import sort + format check
+echo ""
+echo "=== Import Sort + Format Check ==="
+uv run usort format src/ tests/ >/dev/null 2>&1
+uv run ruff format src/ tests/ >/dev/null 2>&1
+CHANGED_FMT=$(git diff --name-only -- '*.py' 2>/dev/null || true)
+if [ -n "$CHANGED_FMT" ]; then
+    echo "Files not properly formatted:"
+    echo "$CHANGED_FMT"
+    echo ""
+    echo "Run: uv run usort format src/ tests/ && uv run ruff format src/ tests/"
+    git checkout -- $CHANGED_FMT 2>/dev/null || true
+    FAILED=1
+fi
+# 2. Lint check
+echo ""
+echo "=== Lint Check ==="
+"$REPO_ROOT/.claude/hooks/lint.sh" || {
+    echo "Lint failed"
+    FAILED=1
+}
+# 3. Test check
+echo ""
+echo "=== Test Check ==="
+"$REPO_ROOT/.claude/hooks/test.sh" || {
+    echo "Tests failed"
+    FAILED=1
+}
+# 4. Debug artifacts
+echo ""
+echo "=== Debug Artifacts ==="
+"$REPO_ROOT/.claude/hooks/check-debug.sh"
+# 5. Invariant: Client should not import from server
+echo ""
+echo "=== Invariant Checks ==="
+# Check if any client file imports from server directory
+# Pattern matches actual imports: "from .server", "from ..server", "import server"
+# Excludes comments and string literals mentioning "server"
+VIOLATIONS=$(grep -rE "^[[:space:]]*(from [.]+server|import server)" --include="*.py" envs/*/client.py envs/*/__init__.py 2>/dev/null | grep -v "# noqa" || true)
+if [ -n "$VIOLATIONS" ]; then
+    echo "INVARIANT VIOLATION: Client imports from server"
+    echo "$VIOLATIONS"
+    echo ""
+    echo "   Client code must not import server code. Check INVARIANTS.md."
+    echo "   Add '# noqa' comment to suppress if this is intentional (e.g., for local testing)."
+    # Note: This is a warning for now due to pre-existing violations
+    # TODO: Make this blocking once all violations are fixed (issue #XXX)
+    echo "   (Currently warning-only - see pre-existing violations)"
+else
+    echo "Client-server separation maintained"
+fi
+# 6. Check branch freshness with main (warning only, non-blocking)
+echo ""
+echo "=== Branch Freshness Check ==="
+# Fetch latest main silently
+git fetch origin main --quiet 2>/dev/null || true
+# Check how many commits behind main we are
+BEHIND_COUNT=$(git rev-list --count HEAD..origin/main 2>/dev/null || echo "0")
+if [ "$BEHIND_COUNT" -gt 0 ]; then
+    echo "WARNING: Your branch is $BEHIND_COUNT commit(s) behind main!"
+    echo ""
+    echo "  GitHub will show 'This branch is out-of-date with the base branch'"
+    echo ""
+    echo "  To update before pushing:"
+    echo "    git fetch origin main"
+    echo "    git merge origin/main"
+    echo "    git push"
+    echo ""
+    echo "  Pushing anyway (update before merging PR)"
+else
+    echo "Branch is up to date with main"
+fi
+# 7. Check for conflicts with main (warning only, non-blocking)
+echo ""
+echo "=== Conflict Check with main ==="
+# Try a test merge to detect conflicts (then abort)
+MERGE_OUTPUT=$(git merge --no-commit --no-ff origin/main 2>&1) || true
+MERGE_EXIT=$?
+git merge --abort 2>/dev/null || true
+if echo "$MERGE_OUTPUT" | grep -q "CONFLICT"; then
+    echo "WARNING: Your branch has conflicts with main!"
+    echo ""
+    echo "$MERGE_OUTPUT" | grep "CONFLICT" | head -5
+    echo ""
+    echo "  To resolve before PR review:"
+    echo "    git fetch origin main"
+    echo "    git merge origin/main"
+    echo "    # resolve conflicts"
+    echo "    git push"
+    echo ""
+    echo "  Pushing anyway (fix conflicts before merging PR)"
+else
+    echo "No conflicts with main detected"
+fi
+# Summary
+echo ""
+if [ $FAILED -eq 1 ]; then
+    echo "Pre-push checks FAILED. Fix issues before pushing."
+    exit 1
+else
+    echo "Pre-push checks passed"
+fi
+EOF
+chmod +x "$HOOKS_DIR/pre-push"
+echo "  Installed pre-push hook"
+# Post-merge hook: remind about worktree cleanup
+cat > "$HOOKS_DIR/post-merge" << 'EOF'
+#!/bin/bash
+# Installed by .claude/hooks/install.sh
+# Remind about worktree cleanup after merge
+echo ""
+echo "=== Post-Merge Reminder ==="
+# Check if we're in a worktree
+TOPLEVEL=$(git rev-parse --show-toplevel 2>/dev/null)
+if [ -f "$TOPLEVEL/.git" ]; then
+    echo "You're in a worktree: $TOPLEVEL"
+    echo ""
+    echo "If this PR is complete, clean up with:"
+    echo "  .claude/scripts/worktree-cleanup.sh $TOPLEVEL"
+fi
+EOF
+chmod +x "$HOOKS_DIR/post-merge"
+echo "  Installed post-merge hook"
+echo ""
+echo "Git hooks installed successfully!"
+echo ""
+echo "Hooks installed:"
+echo "  - pre-commit: branch check, usort+format, lint, check-debug"
+echo "  - commit-msg: issue reference reminder (soft warning)"
+echo "  - pre-push: usort+format, lint, tests, check-debug, invariant checks, conflict detection"
+echo "  - post-merge: worktree cleanup reminder"
+echo ""
+echo "To skip hooks temporarily: git commit/push --no-verify"

.claude/hooks/lint.sh ADDED Viewed

	@@ -0,0 +1,43 @@

+#!/bin/bash
+# Lint check for OpenEnv
+# Replicates the exact arc f pipeline from fbsource:
+#   1. usort format — sort imports (matches arc f's usort pass)
+#   2. ruff format  — code formatting, line-length 88 (matches arc f's ruff-api pass)
+#   3. ruff check   — lint rules (E, F, W)
+#
+# usort is scoped to src/ and tests/ only. envs/ uses ruff format only
+# because standalone usort and pyfmt's usort disagree on import ordering
+# inside try/except blocks in some env files.
+set -e
+# Check for required tools
+if ! command -v uv &> /dev/null; then
+    echo "Error: 'uv' is not installed or not in PATH"
+    echo "Install with: curl -LsSf https://astral.sh/uv/install.sh | sh"
+    exit 1
+fi
+echo "=== Running import sort + format check ==="
+# Run the same pipeline as arc f: usort then ruff format.
+# If any file changes, the code wasn't properly formatted.
+uv run usort format src/ tests/ >/dev/null 2>&1
+uv run ruff format src/ tests/ envs/ >/dev/null 2>&1
+# Check if any files were modified (means they weren't formatted before)
+CHANGED=$(git diff --name-only -- '*.py' 2>/dev/null || true)
+if [ -n "$CHANGED" ]; then
+    echo "ERROR: The following files need formatting:"
+    echo "$CHANGED"
+    echo ""
+    echo "Run: uv run usort format src/ tests/ && uv run ruff format src/ tests/ envs/"
+    # Undo the formatting so the working tree stays as-is
+    git checkout -- $CHANGED 2>/dev/null || true
+    exit 1
+fi
+echo "Import sort + format check passed!"
+echo "=== Running lint rules check ==="
+uv run ruff check src/ tests/
+echo "=== Lint check passed ==="

.claude/hooks/no-direct-code.sh ADDED Viewed

	@@ -0,0 +1,56 @@

+#!/bin/bash
+# PreToolUse hook for Edit/Write: Block direct code edits in TDD mode
+#
+# Design: Only block when TDD is activated via /work-on-issue.
+# Worktrees without TDD marker and the main repo allow direct edits.
+# Check if TDD is active (marker file from /work-on-issue)
+source "$(dirname "$0")/tdd-state.sh"
+if ! is_tdd_active; then
+    exit 0  # TDD not active, allow all edits
+fi
+# Read JSON from stdin (hook input format)
+INPUT=$(cat)
+FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty' 2>/dev/null)
+# If no file path or jq failed, allow
+if [[ -z "$FILE_PATH" ]]; then
+    exit 0
+fi
+# Only check Python implementation files
+if [[ "$FILE_PATH" != *.py ]]; then
+    exit 0  # Not a Python file, allow
+fi
+# Allow test files
+if [[ "$FILE_PATH" == *test* ]] || [[ "$FILE_PATH" == */tests/* ]]; then
+    exit 0  # Test file, allow (tester persona can write these)
+fi
+# Allow non-src files (scripts, configs, etc.)
+if [[ "$FILE_PATH" != */src/* ]] && [[ "$FILE_PATH" != */envs/* ]]; then
+    exit 0
+fi
+# Block with helpful message
+ISSUE=$(get_tdd_issue)
+cat >&2 << EOF
+===================================================================
+  TDD MODE: Direct code edit blocked  (issue #${ISSUE:-?})
+===================================================================
+In TDD mode, use the TDD workflow:
+  1. /write-tests  ->  tester writes failing tests
+  2. /implement    ->  implementer makes tests pass
+To bypass this check, say "skip TDD" in your message.
+===================================================================
+EOF
+exit 2

.claude/hooks/post-push-pr.sh ADDED Viewed

	@@ -0,0 +1,153 @@

+#!/bin/bash
+# Post-push PR validation. Run after `gh pr create` or `git push` to verify
+# the PR looks good on GitHub.
+#
+# Usage: bash .claude/hooks/post-push-pr.sh [PR_NUMBER]
+#
+# If PR_NUMBER is omitted, uses the PR for the current branch.
+set -e
+REPO_ROOT="$(git rev-parse --show-toplevel)"
+PR_NUMBER="${1:-}"
+FAILED=0
+echo ""
+echo "==================================================================="
+echo "  Post-Push PR Checks"
+echo "==================================================================="
+echo ""
+# Resolve PR number from current branch if not provided
+if [[ -z "$PR_NUMBER" ]]; then
+    PR_NUMBER=$(gh pr view --json number -q '.number' 2>/dev/null || true)
+    if [[ -z "$PR_NUMBER" ]]; then
+        echo "ERROR: No PR found for current branch."
+        echo "  Create one with: gh pr create"
+        exit 1
+    fi
+fi
+echo "Checking PR #$PR_NUMBER..."
+echo ""
+# Fetch PR details in one call
+PR_JSON=$(gh pr view "$PR_NUMBER" --json state,mergeable,baseRefName,headRefName,title,body,statusCheckRollup,commits 2>/dev/null)
+if [[ -z "$PR_JSON" ]]; then
+    echo "ERROR: Could not fetch PR #$PR_NUMBER"
+    exit 1
+fi
+PR_STATE=$(echo "$PR_JSON" | jq -r '.state')
+PR_MERGEABLE=$(echo "$PR_JSON" | jq -r '.mergeable')
+PR_BASE=$(echo "$PR_JSON" | jq -r '.baseRefName')
+PR_HEAD=$(echo "$PR_JSON" | jq -r '.headRefName')
+PR_TITLE=$(echo "$PR_JSON" | jq -r '.title')
+PR_BODY=$(echo "$PR_JSON" | jq -r '.body')
+COMMIT_COUNT=$(echo "$PR_JSON" | jq '.commits | length')
+# 1. PR is open
+echo "=== PR State ==="
+if [[ "$PR_STATE" == "OPEN" ]]; then
+    echo "PASS: PR is open"
+else
+    echo "FAIL: PR state is '$PR_STATE'"
+    FAILED=1
+fi
+# 2. Mergeable (no conflicts)
+echo ""
+echo "=== Merge Conflicts ==="
+if [[ "$PR_MERGEABLE" == "MERGEABLE" ]]; then
+    echo "PASS: No merge conflicts with $PR_BASE"
+elif [[ "$PR_MERGEABLE" == "UNKNOWN" ]]; then
+    echo "WARN: Mergeability not yet computed (check again shortly)"
+else
+    echo "FAIL: PR has merge conflicts with $PR_BASE"
+    echo "  Rebase onto $PR_BASE to fix:"
+    echo "    git fetch origin $PR_BASE"
+    echo "    git rebase origin/$PR_BASE"
+    echo "    git push --force-with-lease"
+    FAILED=1
+fi
+# 3. Branch freshness (commits behind base)
+echo ""
+echo "=== Branch Freshness ==="
+git fetch origin "$PR_BASE" --quiet 2>/dev/null || true
+BEHIND=$(git rev-list --count HEAD.."origin/$PR_BASE" 2>/dev/null || echo "?")
+if [[ "$BEHIND" == "0" ]]; then
+    echo "PASS: Branch is up to date with $PR_BASE"
+elif [[ "$BEHIND" == "?" ]]; then
+    echo "WARN: Could not determine freshness"
+else
+    echo "FAIL: Branch is $BEHIND commit(s) behind $PR_BASE"
+    echo "  Rebase to fix:"
+    echo "    git rebase origin/$PR_BASE"
+    echo "    git push --force-with-lease"
+    FAILED=1
+fi
+# 4. PR description
+echo ""
+echo "=== PR Description ==="
+BODY_LEN=${#PR_BODY}
+if [[ "$BODY_LEN" -lt 50 ]]; then
+    echo "WARN: PR description is very short ($BODY_LEN chars)"
+    echo "  Consider adding a summary, change list, and test plan"
+else
+    echo "PASS: PR description present ($BODY_LEN chars)"
+fi
+# Check for test plan
+if echo "$PR_BODY" | grep -qi "test plan"; then
+    echo "PASS: Test plan section found"
+else
+    echo "WARN: No 'Test plan' section in PR description"
+fi
+# 5. CI status
+echo ""
+echo "=== CI Checks ==="
+CHECK_COUNT=$(echo "$PR_JSON" | jq '.statusCheckRollup | length' 2>/dev/null || echo "0")
+if [[ "$CHECK_COUNT" -gt 0 ]]; then
+    PENDING=$(echo "$PR_JSON" | jq '[.statusCheckRollup[] | select(.status != "COMPLETED")] | length')
+    FAILED_CHECKS=$(echo "$PR_JSON" | jq '[.statusCheckRollup[] | select(.conclusion == "FAILURE")] | length')
+    PASSED_CHECKS=$(echo "$PR_JSON" | jq '[.statusCheckRollup[] | select(.conclusion == "SUCCESS")] | length')
+    echo "$PASSED_CHECKS passed, $FAILED_CHECKS failed, $PENDING pending (of $CHECK_COUNT total)"
+    if [[ "$FAILED_CHECKS" -gt 0 ]]; then
+        echo ""
+        echo "Failed checks:"
+        echo "$PR_JSON" | jq -r '.statusCheckRollup[] | select(.conclusion == "FAILURE") | "  - \(.name)"'
+        FAILED=1
+    fi
+    if [[ "$PENDING" -gt 0 ]]; then
+        echo ""
+        echo "Pending checks (re-run this script after they complete):"
+        echo "$PR_JSON" | jq -r '.statusCheckRollup[] | select(.status != "COMPLETED") | "  - \(.name): \(.status)"'
+    fi
+else
+    echo "WARN: No CI checks found (may still be starting)"
+fi
+# 6. Commit count
+echo ""
+echo "=== Commits ==="
+echo "$COMMIT_COUNT commit(s) in this PR"
+# Summary
+echo ""
+echo "==================================================================="
+if [[ $FAILED -eq 1 ]]; then
+    echo "  ISSUES FOUND — fix before requesting review"
+else
+    echo "  ALL CHECKS PASSED — ready for review"
+fi
+echo "==================================================================="
+echo ""
+echo "  PR: https://github.com/$(gh repo view --json nameWithOwner -q .nameWithOwner)/pull/$PR_NUMBER"
+echo ""
+exit $FAILED

.claude/hooks/pre-commit-check.sh ADDED Viewed

	@@ -0,0 +1,38 @@

+#!/bin/bash
+# PreToolUse hook for Bash: Warn on git commit without /pre-submit-pr
+# Read JSON from stdin
+INPUT=$(cat)
+COMMAND=$(echo "$INPUT" | jq -r '.tool_input.command // empty' 2>/dev/null)
+# Only check git commit commands
+if [[ "$COMMAND" != *"git commit"* ]]; then
+    exit 0
+fi
+# Only warn when TDD is active
+source "$(dirname "$0")/tdd-state.sh"
+if ! is_tdd_active; then
+    exit 0  # TDD not active, just allow
+fi
+# Soft warning - don't block, just remind
+cat >&2 << 'EOF'
+===================================================================
+  REMINDER: Consider running /pre-submit-pr before committing
+===================================================================
+This ensures:
+- Lint check passes
+- Tests pass
+- No debug code left in
+- Alignment with principles
+Proceeding with commit...
+===================================================================
+EOF
+exit 0

.claude/hooks/pre-pr-check.sh ADDED Viewed

	@@ -0,0 +1,67 @@

+#!/bin/bash
+# PreToolUse hook for Bash: Block PR creation if branch is stale
+#
+# Intercepts `gh pr create` and checks branch freshness against the
+# base branch. Unlike git hooks, this cannot be bypassed with --no-verify.
+# Read JSON from stdin
+INPUT=$(cat)
+COMMAND=$(echo "$INPUT" | jq -r '.tool_input.command // empty' 2>/dev/null)
+# Only check gh pr create commands
+if [[ "$COMMAND" != *"gh pr create"* ]]; then
+    exit 0
+fi
+# Determine base branch (default: main)
+BASE="main"
+if echo "$COMMAND" | grep -qoP '(?<=--base\s)\S+'; then
+    BASE=$(echo "$COMMAND" | grep -oP '(?<=--base\s)\S+')
+fi
+# Fetch latest base and check freshness
+git fetch origin "$BASE" --quiet 2>/dev/null || true
+BEHIND=$(git rev-list --count HEAD.."origin/$BASE" 2>/dev/null || echo "?")
+if [[ "$BEHIND" != "0" && "$BEHIND" != "?" ]]; then
+    cat >&2 << EOF
+===================================================================
+  PR BLOCKED: Branch is $BEHIND commit(s) behind $BASE
+===================================================================
+  Your PR will show "out of date with base branch" on GitHub.
+  Fix with:
+    git fetch origin $BASE
+    git rebase origin/$BASE
+    git push --force-with-lease
+  Then retry gh pr create.
+===================================================================
+EOF
+    exit 2
+fi
+# Check we're not on main/master
+BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null)
+if [[ "$BRANCH" == "main" || "$BRANCH" == "master" ]]; then
+    cat >&2 << EOF
+===================================================================
+  PR BLOCKED: Cannot create PR from $BRANCH
+===================================================================
+  Create a feature branch first:
+    git checkout -b <branch-name>
+    git push -u origin <branch-name>
+===================================================================
+EOF
+    exit 2
+fi
+exit 0

.claude/hooks/session-start.sh ADDED Viewed

	@@ -0,0 +1,65 @@

+#!/bin/bash
+# SessionStart hook: Show context and set mode based on TDD state
+echo ""
+# Check if we're in a git repo
+if ! git rev-parse --is-inside-work-tree &>/dev/null; then
+    exit 0
+fi
+TOPLEVEL=$(git rev-parse --show-toplevel)
+# Source TDD state helpers
+source "$(dirname "$0")/tdd-state.sh"
+if is_tdd_active; then
+    # TDD mode activated via /work-on-issue
+    ISSUE=$(get_tdd_issue)
+    FEATURE=$(basename "$TOPLEVEL")
+    BRANCH=$(git branch --show-current 2>/dev/null)
+    echo "==================================================================="
+    echo "  TDD MODE ACTIVE  (issue #${ISSUE:-?})"
+    echo "==================================================================="
+    echo "  Worktree: $FEATURE"
+    echo "  Branch: $BRANCH"
+    echo ""
+    echo "  Direct code edits blocked."
+    echo ""
+    echo "  Workflow:"
+    echo "    /write-tests    ->  create failing tests"
+    echo "    /implement      ->  make tests pass"
+    echo "    /update-docs    ->  fix stale docs"
+    echo "    /simplify       ->  clean up (optional)"
+    echo "    /pre-submit-pr  ->  validate before commit"
+    echo ""
+    echo "  Say \"skip TDD\" to bypass blocking"
+    echo "==================================================================="
+elif [[ "$TOPLEVEL" == *".worktrees"* ]]; then
+    # In a worktree but TDD not activated
+    FEATURE=$(basename "$TOPLEVEL")
+    BRANCH=$(git branch --show-current 2>/dev/null)
+    echo "==================================================================="
+    echo "  WORKTREE: $FEATURE"
+    echo "==================================================================="
+    echo "  Branch: $BRANCH"
+    echo ""
+    echo "  Direct edits allowed. To enable TDD enforcement:"
+    echo "    /work-on-issue #<N>  ->  start TDD workflow"
+    echo "==================================================================="
+else
+    echo "==================================================================="
+    echo "  MAIN REPO (Explore Mode)"
+    echo "==================================================================="
+    echo ""
+    echo "  Direct edits allowed. For focused work:"
+    echo "    /work-on-issue #42  ->  start TDD workflow"
+    echo ""
+    echo "  Or manually:"
+    echo "    .claude/scripts/worktree-create.sh <name>"
+    echo "==================================================================="
+fi
+echo ""

.claude/hooks/tdd-deactivate.sh ADDED Viewed

	@@ -0,0 +1,6 @@

+#!/bin/bash
+# Standalone script to deactivate TDD enforcement.
+# Usage: bash .claude/hooks/tdd-deactivate.sh
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+bash "$SCRIPT_DIR/tdd-state.sh" deactivate

.claude/hooks/tdd-state.sh ADDED Viewed

	@@ -0,0 +1,72 @@

+#!/bin/bash
+# Shared TDD state helpers.
+#
+# Can be used two ways:
+#   1. Sourced:  source tdd-state.sh && is_tdd_active
+#   2. Direct:   bash tdd-state.sh activate 42
+#
+# TDD is activated by /work-on-issue, which writes .tdd-session.json
+# to the worktree root. All hooks check this file instead of the
+# .worktrees path, making TDD opt-in.
+_tdd_toplevel() {
+    git rev-parse --show-toplevel 2>/dev/null
+}
+is_tdd_active() {
+    local toplevel
+    toplevel=$(_tdd_toplevel) || return 1
+    [[ -f "$toplevel/.tdd-session.json" ]]
+}
+get_tdd_issue() {
+    local toplevel
+    toplevel=$(_tdd_toplevel) || return 1
+    jq -r '.issue // empty' "$toplevel/.tdd-session.json" 2>/dev/null
+}
+activate_tdd() {
+    local issue="$1"
+    if [[ -z "$issue" ]]; then
+        echo "Usage: activate_tdd <issue-number>" >&2
+        return 1
+    fi
+    local toplevel
+    toplevel=$(_tdd_toplevel) || return 1
+    local branch
+    branch=$(git branch --show-current 2>/dev/null)
+    jq -n \
+        --arg issue "$issue" \
+        --arg branch "$branch" \
+        --arg ts "$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
+        '{issue: $issue, branch: $branch, activated_at: $ts}' \
+        > "$toplevel/.tdd-session.json"
+    echo "TDD enforcement activated for issue #$issue"
+}
+deactivate_tdd() {
+    local toplevel
+    toplevel=$(_tdd_toplevel) || return 1
+    if [[ -f "$toplevel/.tdd-session.json" ]]; then
+        rm "$toplevel/.tdd-session.json"
+        echo "TDD enforcement deactivated"
+    else
+        echo "TDD was not active"
+    fi
+}
+# When executed directly (not sourced), dispatch subcommands
+if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
+    case "${1:-}" in
+        activate)   activate_tdd "$2" ;;
+        deactivate) deactivate_tdd ;;
+        active)     is_tdd_active ;;
+        issue)      get_tdd_issue ;;
+        *)
+            echo "Usage: bash $0 {activate <issue>|deactivate|active|issue}" >&2
+            exit 1
+            ;;
+    esac
+fi

.claude/hooks/test.sh ADDED Viewed

	@@ -0,0 +1,37 @@

+#!/bin/bash
+# Test runner for OpenEnv
+# Runs pytest excluding environments that need special setup
+set -e
+# Check for required tools
+if ! command -v uv &> /dev/null; then
+    echo "Error: 'uv' is not installed or not in PATH"
+    echo "Install with: curl -LsSf https://astral.sh/uv/install.sh | sh"
+    exit 1
+fi
+echo "=== Running tests ==="
+# Note: Using timeout to prevent hanging tests from blocking indefinitely (5 min max)
+# Matches .github/workflows/test.yml exactly to catch CI failures before push
+PYTHONPATH=src:envs timeout 300 uv run pytest tests/ \
+    --ignore=tests/envs/test_browsergym_environment.py \
+    --ignore=tests/envs/test_dipg_environment.py \
+    --ignore=tests/envs/test_websearch_environment.py \
+    --ignore=tests/envs/test_python_codeact_reset.py \
+    --ignore=tests/envs/test_python_codeact_rewards.py \
+    --ignore=tests/envs/test_textarena_environment.py \
+    -m "not integration and not network and not docker" \
+    -v \
+    --tb=short
+TEST_EXIT_CODE=$?
+if [ $TEST_EXIT_CODE -eq 124 ]; then
+    echo "ERROR: Tests timed out after 5 minutes"
+    exit 1
+elif [ $TEST_EXIT_CODE -ne 0 ]; then
+    echo "=== Tests failed ==="
+    exit $TEST_EXIT_CODE
+fi
+echo "=== Tests completed ==="

.claude/scripts/worktree-cleanup.sh ADDED Viewed

	@@ -0,0 +1,53 @@

+#!/bin/bash
+# Clean up a git worktree after PR is merged
+set -e
+if [ -z "$1" ]; then
+    echo "Usage: $0 <worktree-path>"
+    echo ""
+    echo "Example: $0 .worktrees/add-auth"
+    echo "Removes the worktree and optionally deletes the branch"
+    exit 1
+fi
+WORKTREE_PATH="$1"
+# Verify it's a valid worktree
+if [ ! -d "$WORKTREE_PATH" ]; then
+    echo "ERROR: Directory does not exist: $WORKTREE_PATH"
+    exit 1
+fi
+if [ ! -f "$WORKTREE_PATH/.git" ]; then
+    echo "ERROR: Not a git worktree: $WORKTREE_PATH"
+    exit 1
+fi
+# Get the branch name
+cd "$WORKTREE_PATH"
+BRANCH=$(git branch --show-current)
+cd - > /dev/null
+echo "Removing worktree: $WORKTREE_PATH"
+echo "Branch: $BRANCH"
+echo ""
+# Remove the worktree
+git worktree remove "$WORKTREE_PATH" --force
+echo "Worktree removed."
+echo ""
+# Ask about branch deletion
+read -p "Delete branch '$BRANCH'? (y/N) " -n 1 -r
+echo ""
+if [[ $REPLY =~ ^[Yy]$ ]]; then
+    git branch -D "$BRANCH"
+    echo "Branch deleted."
+else
+    echo "Branch kept. Delete manually with: git branch -D $BRANCH"
+fi
+echo ""
+echo "Cleanup complete!"

.claude/scripts/worktree-create.sh ADDED Viewed

	@@ -0,0 +1,47 @@

+#!/bin/bash
+# Create a git worktree for a new feature branch
+set -e
+if [ -z "$1" ]; then
+    echo "Usage: $0 <branch-name>"
+    echo ""
+    echo "Example: $0 add-auth"
+    echo "Creates: .worktrees/add-auth with branch feature/add-auth"
+    exit 1
+fi
+BRANCH_NAME="$1"
+FEATURE_BRANCH="feature/$BRANCH_NAME"
+# Get repo root
+REPO_ROOT=$(git rev-parse --show-toplevel)
+# Worktree path is inside .worktrees/ subdirectory
+WORKTREE_PATH="$REPO_ROOT/.worktrees/$BRANCH_NAME"
+# Ensure .worktrees directory exists
+mkdir -p "$REPO_ROOT/.worktrees"
+# Check if worktree already exists
+if [ -d "$WORKTREE_PATH" ]; then
+    echo "ERROR: Worktree already exists at $WORKTREE_PATH"
+    exit 1
+fi
+# Check if branch already exists
+if git show-ref --verify --quiet "refs/heads/$FEATURE_BRANCH"; then
+    echo "Branch $FEATURE_BRANCH already exists, using existing branch"
+    git worktree add "$WORKTREE_PATH" "$FEATURE_BRANCH"
+else
+    echo "Creating new branch $FEATURE_BRANCH"
+    git worktree add -b "$FEATURE_BRANCH" "$WORKTREE_PATH"
+fi
+echo ""
+echo "Worktree created successfully!"
+echo ""
+echo "Path: $WORKTREE_PATH"
+echo "Branch: $FEATURE_BRANCH"
+echo ""
+echo "To start working:"
+echo "  cd .worktrees/$BRANCH_NAME"

.claude/settings.json ADDED Viewed

	@@ -0,0 +1,105 @@

+{
+  "permissions": {
+    "allow": [
+      "Bash(gh auth status:*)"
+    ]
+  },
+  "hooks": {
+    "SessionStart": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": ".claude/hooks/session-start.sh"
+          }
+        ]
+      }
+    ],
+    "PreToolUse": [
+      {
+        "matcher": "Bash",
+        "hooks": [
+          {
+            "type": "command",
+            "command": ".claude/hooks/pre-commit-check.sh"
+          },
+          {
+            "type": "command",
+            "command": ".claude/hooks/pre-pr-check.sh"
+          }
+        ]
+      },
+      {
+        "matcher": "Edit|Write",
+        "hooks": [
+          {
+            "type": "command",
+            "command": ".claude/hooks/no-direct-code.sh"
+          }
+        ]
+      }
+    ],
+    "PostToolUse": [
+      {
+        "matcher": "TodoWrite",
+        "hooks": [
+          {
+            "type": "command",
+            "command": ".claude/hooks/delegate-todos.sh"
+          }
+        ]
+      }
+    ],
+    "Stop": [
+      {
+        "hooks": [
+          {
+            "type": "prompt",
+            "prompt": "First, perform quick checks to avoid unnecessary TDD evaluation:\n\n0. TDD CONTEXT CHECK: Look at the session start output. If there is NO 'TDD MODE ACTIVE' banner and the session was not initiated via /work-on-issue, return 'stop' immediately. TDD enforcement only applies when explicitly activated.\n\n1. SKIP CHECK: If the user's message contains phrases like 'skip TDD', 'no TDD', 'just discussing', 'exploration only', or similar opt-out language, return 'stop' immediately.\n\n2. EDIT CHECK: Look at Claude's actions in this turn. Did Claude edit any implementation files (*.py files in src/ or envs/)? If NO implementation files were edited, return 'stop' immediately.\n\n3. TDD EVALUATION: Only if implementation files were edited AND no opt-out phrase was used, evaluate TDD compliance: (a) Did tests for the edited functionality exist first? (b) If starting new work, were requirements gathered and tests written before implementing? (c) If creating a PR or commit, is it linked to a GitHub issue?\n\nReturn 'continue' with corrective instructions if TDD was violated. Return 'stop' if workflow was followed or checks 0-2 passed."
+          }
+        ]
+      }
+    ],
+    "SubagentStop": [
+      {
+        "matcher": "tester",
+        "hooks": [
+          {
+            "type": "command",
+            "command": ".claude/hooks/after-tester.sh"
+          }
+        ]
+      },
+      {
+        "matcher": "implementer",
+        "hooks": [
+          {
+            "type": "command",
+            "command": ".claude/hooks/after-implementer.sh"
+          }
+        ]
+      },
+      {
+        "matcher": "docs-updater",
+        "hooks": [
+          {
+            "type": "command",
+            "command": ".claude/hooks/after-docs-updater.sh"
+          }
+        ]
+      },
+      {
+        "hooks": [
+          {
+            "type": "prompt",
+            "prompt": "Evaluate if the subagent completed its task successfully. For issue-worker: did it extract actionable requirements and acceptance criteria? For tester: did it produce tests with clear assertions? For pre-submit: did validation complete? Return 'continue' if the agent needs to do more work, 'stop' if complete."
+          }
+        ]
+      }
+    ]
+  },
+  "enabledPlugins": {
+    "code-simplifier@claude-plugins-official": true,
+    "pr-review-toolkit@claude-plugins-official": true
+  }
+}

.claude/skills/alignment-review/SKILL.md ADDED Viewed

	@@ -0,0 +1,94 @@

+---
+name: alignment-review
+description: Review code changes for bugs and alignment with OpenEnv principles and RFCs. Use when reviewing PRs, checking code before commit, or when asked to review changes. Implements two-tier review model.
+allowed-tools: Read, Grep, Glob, Bash
+---
+# Alignment Review
+Review code changes for alignment with OpenEnv principles using a two-tier model.
+## Instructions
+1. **Run automated checks first**:
+   - Execute `bash .claude/hooks/lint.sh` - capture lint issues
+   - Execute `bash .claude/hooks/check-debug.sh` - capture debug code
+2. **Read alignment documents**:
+   - `.claude/docs/PRINCIPLES.md` - design principles
+   - `.claude/docs/INVARIANTS.md` - system invariants
+3. **Read open RFCs**:
+   - Scan `rfcs/` directory for all RFC files
+   - Note the status of each RFC (Draft, In Review, Accepted, Implemented)
+   - Pay special attention to Draft and In Review RFCs - these represent active design discussions
+4. **Analyze changes** (use `git diff` or provided diff):
+   - Identify mechanical issues (Tier 1)
+   - Flag alignment concerns (Tier 2)
+   - Flag conflicts with open RFCs (Tier 2)
+## Tier 1: Uncontentious Issues (Fix Immediately)
+These are issues to fix without human input:
+- Lint failures from hook output
+- Debug code from hook output (print statements, breakpoints)
+- Uninitialized variables, type errors
+- Missing imports, syntax errors
+- Security issues (credential exposure, injection vulnerabilities)
+## Tier 2: Alignment Discussion Points
+For each potential alignment concern, format as:
+```
+**ALIGNMENT FLAG**: [Brief description]
+- **Principle/RFC at stake**: [Which principle from PRINCIPLES.md or RFC number]
+- **The concern**: [What seems misaligned or in conflict]
+- **Suggested reviewer**: @darktex [pull actual reviewers based on authors of the specific line of PRINCIPLES.md and INVARIANTS.md using git blame, and/or authors of conflicting RFCs]
+```
+### Examples of Tier 2 Issues
+**Principle conflicts:**
+- Adding external reward computation (violates "rewards in environment")
+- Client importing server code (violates client-server separation)
+- New API that differs from Gymnasium pattern
+**RFC conflicts (flag even for Draft/In Review RFCs):**
+- Change conflicts with design proposed in an open RFC
+- Change pre-empts a decision being discussed in an RFC
+- Change implements something differently than an RFC proposes
+- Change affects an area covered by an RFC under review
+**Why flag RFC conflicts?** Even if an RFC isn't finalized, flagging conflicts helps focus design discussions. The change might be correct and the RFC might need updating, or vice versa - either way, the team should discuss.
+## Output Format
+```
+## Alignment Review Report
+### Automated Checks
+- Lint: [PASS/FAIL] - [summary]
+- Debug code: [CLEAN/FOUND] - [details]
+### Open RFCs Context
+[List any RFCs in Draft or In Review status that might be relevant to these changes]
+### Tier 1: Fixes Required
+- [ ] path/file.py:123 - [issue description]
+- [ ] path/file.py:456 - [issue description]
+### Tier 2: Alignment Discussion
+#### Principle Conflicts
+[ALIGNMENT FLAGS for principle violations, or "None identified"]
+#### RFC Conflicts
+[ALIGNMENT FLAGS for RFC conflicts, or "None identified"]
+### Summary
+- X mechanical issues to fix
+- Y alignment points for human review
+- Z RFC conflicts to discuss
+```

.claude/skills/generate-openenv-env/SKILL.md ADDED Viewed

	@@ -0,0 +1,164 @@

+---
+name: generate-openenv-env
+description: Generate OpenEnv environments from a concrete use case (for example, "generate an env for the library textarena"). Use when asked to design or implement a new environment under envs/ by researching a target library/API, selecting matching OpenEnv examples, asking key implementation questions, and building models/client/server/openenv.yaml. Do not use for model training or evaluation tasks.
+---
+# /generate-openenv-env
+Build a production-ready OpenEnv environment from a use-case prompt.
+## Execute Workflow
+When invoked, execute this workflow end-to-end.
+### 1. Parse the use case and name the environment
+Derive a repo path in the form `envs/<name>_env/`.
+- Normalize to snake_case.
+- Keep names short and domain-specific.
+- Example: "generate an env for the library textarena" -> `envs/textarena_env/`.
+### 2. Research the target library/API before coding
+Gather the minimum interface facts needed to implement `reset`, `step`, and state serialization.
+- Search local docs/examples first.
+- Search upstream docs/repo for the target library when local context is insufficient.
+- Extract only implementation-critical details:
+  - installation/dependency requirements
+  - environment creation API
+  - action format
+  - observation format
+  - reward and done semantics
+  - special setup (model files, downloads, auth, etc.)
+### 3. Mine matching OpenEnv examples
+Select 2-3 existing environments as implementation templates.
+- Always read `references/openenv-tutorial-01-environments.md` (Part 10) and `references/openenv-docs-environment-builder.md`.
+- Prefer `envs/textarena_env` for external-library wrappers with richer state.
+- Add one simpler baseline (for example `envs/snake_env` or `envs/echo_env`) to keep the implementation minimal.
+- Follow patterns, do not copy blindly.
+- Exclude generated or vendored files when mining examples (`.venv/`, `build/`, `site-packages/`, `__pycache__/`).
+For a compact checklist and mapping, read `references/env-generation-checklist.md`.
+### 4. Ask focused implementation questions
+Ask only the questions that materially affect architecture. Use the question bank in `references/env-generation-checklist.md`.
+Cover at least:
+- action space contract
+- observation fields needed by agents
+- reward design and terminal conditions
+- episode/session configuration knobs
+- deployment target and dependency constraints
+If answers are unavailable, proceed with explicit assumptions and document them.
+### 5. Choose the environment archetype
+Choose one archetype before scaffolding:
+- Typed step/reset environment (default): use `EnvClient` + typed `Action/Observation[/State]` models.
+- MCP tool environment: use `MCPEnvironment` + `MCPToolClient` and MCP action/observation types.
+- Specialized client flow (rare): only when the standard clients cannot express required behavior (for example local+remote hybrid clients).
+### 6. Scaffold the environment
+Use the CLI to scaffold:
+```bash
+PYTHONPATH=src uv run openenv init <name>_env --output-dir envs
+```
+This generates all files with correct placeholders replaced, including `pyproject.toml`, `Dockerfile`, and `uv.lock`.
+If the CLI is unavailable (import errors, missing dependencies), create the structure manually matching:
+```text
+envs/<name>_env/
+├── __init__.py
+├── client.py
+├── models.py
+├── openenv.yaml
+├── pyproject.toml
+└── server/
+    ├── __init__.py
+    ├── app.py
+    ├── <name>_environment.py
+    └── Dockerfile
+```
+Use `assets/openenv_env_template/` as a reference for file contents when scaffolding manually.
+### 7. Implement with OpenEnv contracts
+Implement these files in order:
+1. `models.py`
+2. `server/<name>_environment.py`
+3. `server/app.py`
+4. `client.py`
+5. `openenv.yaml`
+6. `README.md`
+Use these standards:
+- Use typed models (Action/Observation/State).
+- Use `create_app(<factory_or_class>, ActionType, ObservationType, env_name=...)` in `server/app.py`. Pass a class or factory callable, not an instantiated environment.
+- **Dual-import pattern** (required in `server/app.py` and `server/<name>_environment.py`): Use `try: from ..models import X / except ImportError: from models import X`. Relative imports work in-repo (`PYTHONPATH=src:envs`); bare imports work in Docker (`PYTHONPATH=/app/env`). The same pattern applies to intra-server imports (e.g., `from .foo import Bar` vs `from server.foo import Bar`).
+- `client.py` uses `EnvClient[ActionType, ObservationType, State]` (three type parameters).
+- Keep server logic in `server/`, keep client parsing in `client.py`.
+- Expose config through environment variables when behavior is likely to vary.
+- Keep reward logic inside the environment.
+- Prefer reset/step signatures compatible with `Environment`:
+  - `reset(seed=None, episode_id=None, **kwargs)`
+  - `step(action, timeout_s=None, **kwargs)`
+- Set `SUPPORTS_CONCURRENT_SESSIONS=True` only when isolation is real. Set `max_concurrent_envs` in `create_app` accordingly (1 when `False`, >1 when `True`).
+- For MCP/tool-call UIs that send stringified JSON arguments, add action validators/parsers in `server/app.py`.
+- Export public client/models symbols in `__init__.py`.
+- Keep `openenv.yaml` aligned with current scaffold format (`spec_version: 1`, `name`, `type`, `runtime`, `app`, `port`).
+- Avoid training/evaluation code paths in this skill.
+### 8. Validate before handoff
+Run the narrowest useful checks:
+```bash
+# Verify in-repo imports work (catches missing dual-import pattern)
+PYTHONPATH=src:envs uv run python -c "from envs.<name>_env.server.<name>_environment import <ClassName>Environment"
+# Build and validate
+cd envs/<name>_env
+openenv build
+openenv validate --verbose
+PYTHONPATH=src:envs uv run pytest envs/<name>_env -q
+```
+If tests do not exist, run a smoke check:
+```bash
+PYTHONPATH=src:envs uv run uvicorn envs.<name>_env.server.app:app --port 8000
+curl http://localhost:8000/health
+openenv validate --url http://localhost:8000
+```
+### 9. Deliver with assumptions and gaps
+Report:
+- files created/updated
+- chosen archetype (typed vs MCP vs specialized)
+- assumptions made due to missing answers
+- validation commands executed and outcomes
+- remaining risks or follow-up questions
+## Guardrails
+- Do not route into model training/evaluation workflows.
+- Do not invent library APIs; confirm against source docs.
+- Do not skip reading at least one existing OpenEnv env before implementation.
+- Do not copy outdated manifest patterns from older envs (`name/version/action/observation`-only manifests).
+- Do not copy build artifacts or virtualenv files from example envs.
+- Do not set `max_concurrent_envs > 1` unless the environment explicitly supports concurrent sessions.

.claude/skills/generate-openenv-env/agents/openai.yaml ADDED Viewed

	@@ -0,0 +1,4 @@

+interface:
+  display_name: "OpenEnv Env Generator"
+  short_description: "Generate OpenEnv environments from use cases"
+  default_prompt: "Use $generate-openenv-env to turn a use case into a complete OpenEnv environment scaffold."

.claude/skills/generate-openenv-env/assets/openenv_env_template/.dockerignore ADDED Viewed

	@@ -0,0 +1,15 @@

+.venv
+.git
+.gitignore
+.env
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+*.pyw
+*.pyz
+*.pywz
+*.pyzw
+*.pyzwz

.claude/skills/generate-openenv-env/assets/openenv_env_template/README.md ADDED Viewed

	@@ -0,0 +1,255 @@

+---
+title: __ENV_TITLE_NAME__ Environment Server
+emoji: __HF_EMOJI__
+colorFrom: __HF_COLOR_FROM__
+colorTo: __HF_COLOR_TO__
+sdk: docker
+pinned: false
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
+---
+# __ENV_TITLE_NAME__ Environment
+A simple test environment that echoes back messages. Perfect for testing the env APIs as well as demonstrating environment usage patterns.
+## Quick Start
+The simplest way to use the __ENV_TITLE_NAME__ environment is through the `__ENV_CLASS_NAME__Env` class:
+```python
+from __ENV_NAME__ import __ENV_CLASS_NAME__Action, __ENV_CLASS_NAME__Env
+try:
+    # Create environment from Docker image
+    __ENV_NAME__env = __ENV_CLASS_NAME__Env.from_docker_image("__ENV_NAME__-env:latest")
+    # Reset
+    result = __ENV_NAME__env.reset()
+    print(f"Reset: {result.observation.echoed_message}")
+    # Send multiple messages
+    messages = ["Hello, World!", "Testing echo", "Final message"]
+    for msg in messages:
+        result = __ENV_NAME__env.step(__ENV_CLASS_NAME__Action(message=msg))
+        print(f"Sent: '{msg}'")
+        print(f"  → Echoed: '{result.observation.echoed_message}'")
+        print(f"  → Length: {result.observation.message_length}")
+        print(f"  → Reward: {result.reward}")
+finally:
+    # Always clean up
+    __ENV_NAME__env.close()
+```
+That's it! The `__ENV_CLASS_NAME__Env.from_docker_image()` method handles:
+- Starting the Docker container
+- Waiting for the server to be ready
+- Connecting to the environment
+- Container cleanup when you call `close()`
+## Building the Docker Image
+Before using the environment, you need to build the Docker image:
+```bash
+# From project root
+docker build -t __ENV_NAME__-env:latest -f server/Dockerfile .
+```
+## Deploying to Hugging Face Spaces
+You can easily deploy your OpenEnv environment to Hugging Face Spaces using the `openenv push` command:
+```bash
+# From the environment directory (where openenv.yaml is located)
+openenv push
+# Or specify options
+openenv push --namespace my-org --private
+```
+The `openenv push` command will:
+1. Validate that the directory is an OpenEnv environment (checks for `openenv.yaml`)
+2. Prepare a custom build for Hugging Face Docker space (enables web interface)
+3. Upload to Hugging Face (ensuring you're logged in)
+### Prerequisites
+- Authenticate with Hugging Face: The command will prompt for login if not already authenticated
+### Options
+- `--directory`, `-d`: Directory containing the OpenEnv environment (defaults to current directory)
+- `--repo-id`, `-r`: Repository ID in format 'username/repo-name' (defaults to 'username/env-name' from openenv.yaml)
+- `--base-image`, `-b`: Base Docker image to use (overrides Dockerfile FROM)
+- `--private`: Deploy the space as private (default: public)
+### Examples
+```bash
+# Push to your personal namespace (defaults to username/env-name from openenv.yaml)
+openenv push
+# Push to a specific repository
+openenv push --repo-id my-org/my-env
+# Push with a custom base image
+openenv push --base-image ghcr.io/meta-pytorch/openenv-base:latest
+# Push as a private space
+openenv push --private
+# Combine options
+openenv push --repo-id my-org/my-env --base-image custom-base:latest --private
+```
+After deployment, your space will be available at:
+`https://huggingface.co/spaces/<repo-id>`
+The deployed space includes:
+- **Web Interface** at `/web` - Interactive UI for exploring the environment
+- **API Documentation** at `/docs` - Full OpenAPI/Swagger interface
+- **Health Check** at `/health` - Container health monitoring
+- **WebSocket** at `/ws` - Persistent session endpoint for low-latency interactions
+## Environment Details
+### Action
+**__ENV_CLASS_NAME__Action**: Contains a single field
+- `message` (str) - The message to echo back
+### Observation
+**__ENV_CLASS_NAME__Observation**: Contains the echo response and metadata
+- `echoed_message` (str) - The message echoed back
+- `message_length` (int) - Length of the message
+- `reward` (float) - Reward based on message length (length × 0.1)
+- `done` (bool) - Always False for echo environment
+- `metadata` (dict) - Additional info like step count
+### Reward
+The reward is calculated as: `message_length × 0.1`
+- "Hi" → reward: 0.2
+- "Hello, World!" → reward: 1.3
+- Empty message → reward: 0.0
+## Advanced Usage
+### Connecting to an Existing Server
+If you already have a __ENV_TITLE_NAME__ environment server running, you can connect directly:
+```python
+from __ENV_NAME__ import __ENV_CLASS_NAME__Env
+# Connect to existing server
+__ENV_NAME__env = __ENV_CLASS_NAME__Env(base_url="<ENV_HTTP_URL_HERE>")
+# Use as normal
+result = __ENV_NAME__env.reset()
+result = __ENV_NAME__env.step(__ENV_CLASS_NAME__Action(message="Hello!"))
+```
+Note: When connecting to an existing server, `__ENV_NAME__env.close()` will NOT stop the server.
+### Using the Context Manager
+The client supports context manager usage for automatic connection management:
+```python
+from __ENV_NAME__ import __ENV_CLASS_NAME__Action, __ENV_CLASS_NAME__Env
+# Connect with context manager (auto-connects and closes)
+with __ENV_CLASS_NAME__Env(base_url="http://localhost:8000") as env:
+    result = env.reset()
+    print(f"Reset: {result.observation.echoed_message}")
+    # Multiple steps with low latency
+    for msg in ["Hello", "World", "!"]:
+        result = env.step(__ENV_CLASS_NAME__Action(message=msg))
+        print(f"Echoed: {result.observation.echoed_message}")
+```
+The client uses WebSocket connections for:
+- **Lower latency**: No HTTP connection overhead per request
+- **Persistent session**: Server maintains your environment state
+- **Efficient for episodes**: Better for many sequential steps
+### Concurrent WebSocket Sessions
+The server supports multiple concurrent WebSocket connections. To enable this,
+modify `server/app.py` to use factory mode:
+```python
+# In server/app.py - use factory mode for concurrent sessions
+app = create_app(
+    __ENV_CLASS_NAME__Environment,  # Pass class, not instance
+    __ENV_CLASS_NAME__Action,
+    __ENV_CLASS_NAME__Observation,
+    max_concurrent_envs=4,  # Allow 4 concurrent sessions
+)
+```
+Then multiple clients can connect simultaneously:
+```python
+from __ENV_NAME__ import __ENV_CLASS_NAME__Action, __ENV_CLASS_NAME__Env
+from concurrent.futures import ThreadPoolExecutor
+def run_episode(client_id: int):
+    with __ENV_CLASS_NAME__Env(base_url="http://localhost:8000") as env:
+        result = env.reset()
+        for i in range(10):
+            result = env.step(__ENV_CLASS_NAME__Action(message=f"Client {client_id}, step {i}"))
+        return client_id, result.observation.message_length
+# Run 4 episodes concurrently
+with ThreadPoolExecutor(max_workers=4) as executor:
+    results = list(executor.map(run_episode, range(4)))
+```
+## Development & Testing
+### Direct Environment Testing
+Test the environment logic directly without starting the HTTP server:
+```bash
+# From the server directory
+python3 server/__ENV_NAME___environment.py
+```
+This verifies that:
+- Environment resets correctly
+- Step executes actions properly
+- State tracking works
+- Rewards are calculated correctly
+### Running Locally
+Run the server locally for development:
+```bash
+uvicorn server.app:app --reload
+```
+## Project Structure
+```
+__ENV_NAME__/
+├── .dockerignore         # Docker build exclusions
+├── __init__.py            # Module exports
+├── README.md              # This file
+├── openenv.yaml           # OpenEnv manifest
+├── pyproject.toml         # Project metadata and dependencies
+├── uv.lock                # Locked dependencies (generated)
+├── client.py              # __ENV_CLASS_NAME__Env client
+├── models.py              # Action and Observation models
+└── server/
+    ├── __init__.py        # Server module exports
+    ├── __ENV_NAME___environment.py  # Core environment logic
+    ├── app.py             # FastAPI application (HTTP + WebSocket endpoints)
+    └── Dockerfile         # Container image definition
+```

.claude/skills/generate-openenv-env/assets/openenv_env_template/__init__.py ADDED Viewed

	@@ -0,0 +1,16 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""__ENV_TITLE_NAME__ Environment."""
+from .client import __ENV_CLASS_NAME__Env
+from .models import __ENV_CLASS_NAME__Action, __ENV_CLASS_NAME__Observation
+__all__ = [
+    "__ENV_CLASS_NAME__Action",
+    "__ENV_CLASS_NAME__Observation",
+    "__ENV_CLASS_NAME__Env",
+]

.claude/skills/generate-openenv-env/assets/openenv_env_template/client.py ADDED Viewed

	@@ -0,0 +1,99 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""__ENV_TITLE_NAME__ Environment Client."""
+from typing import Dict
+from openenv.core import EnvClient
+from openenv.core.client_types import StepResult
+from openenv.core.env_server.types import State
+from .models import __ENV_CLASS_NAME__Action, __ENV_CLASS_NAME__Observation
+class __ENV_CLASS_NAME__Env(
+    EnvClient[__ENV_CLASS_NAME__Action, __ENV_CLASS_NAME__Observation, State]
+):
+    """
+    Client for the __ENV_TITLE_NAME__ Environment.
+    This client maintains a persistent WebSocket connection to the environment server,
+    enabling efficient multi-step interactions with lower latency.
+    Each client instance has its own dedicated environment session on the server.
+    Example:
+        >>> # Connect to a running server
+        >>> with __ENV_CLASS_NAME__Env(base_url="http://localhost:8000") as client:
+        ...     result = client.reset()
+        ...     print(result.observation.echoed_message)
+        ...
+        ...     result = client.step(__ENV_CLASS_NAME__Action(message="Hello!"))
+        ...     print(result.observation.echoed_message)
+    Example with Docker:
+        >>> # Automatically start container and connect
+        >>> client = __ENV_CLASS_NAME__Env.from_docker_image("__ENV_NAME__-env:latest")
+        >>> try:
+        ...     result = client.reset()
+        ...     result = client.step(__ENV_CLASS_NAME__Action(message="Test"))
+        ... finally:
+        ...     client.close()
+    """
+    def _step_payload(self, action: __ENV_CLASS_NAME__Action) -> Dict:
+        """
+        Convert __ENV_CLASS_NAME__Action to JSON payload for step message.
+        Args:
+            action: __ENV_CLASS_NAME__Action instance
+        Returns:
+            Dictionary representation suitable for JSON encoding
+        """
+        return {
+            "message": action.message,
+        }
+    def _parse_result(self, payload: Dict) -> StepResult[__ENV_CLASS_NAME__Observation]:
+        """
+        Parse server response into StepResult[__ENV_CLASS_NAME__Observation].
+        Args:
+            payload: JSON response data from server
+        Returns:
+            StepResult with __ENV_CLASS_NAME__Observation
+        """
+        obs_data = payload.get("observation", {})
+        observation = __ENV_CLASS_NAME__Observation(
+            echoed_message=obs_data.get("echoed_message", ""),
+            message_length=obs_data.get("message_length", 0),
+            done=payload.get("done", False),
+            reward=payload.get("reward"),
+            metadata=obs_data.get("metadata", {}),
+        )
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> State:
+        """
+        Parse server response into State object.
+        Args:
+            payload: JSON response from state request
+        Returns:
+            State object with episode_id and step_count
+        """
+        return State(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+        )

.claude/skills/generate-openenv-env/assets/openenv_env_template/models.py ADDED Viewed

	@@ -0,0 +1,27 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Data models for the __ENV_TITLE_NAME__ Environment.
+The __ENV_NAME__ environment is a simple test environment that echoes back messages.
+"""
+from openenv.core.env_server.types import Action, Observation
+from pydantic import Field
+class __ENV_CLASS_NAME__Action(Action):
+    """Action for the __ENV_TITLE_NAME__ environment - just a message to echo."""
+    message: str = Field(..., description="Message to echo back")
+class __ENV_CLASS_NAME__Observation(Observation):
+    """Observation from the __ENV_TITLE_NAME__ environment - the echoed message."""
+    echoed_message: str = Field(default="", description="The echoed message")
+    message_length: int = Field(default=0, description="Length of the echoed message")

.claude/skills/generate-openenv-env/assets/openenv_env_template/openenv.yaml ADDED Viewed

	@@ -0,0 +1,7 @@

+spec_version: 1
+name: __ENV_NAME__
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

.claude/skills/generate-openenv-env/assets/openenv_env_template/pyproject.toml ADDED Viewed

	@@ -0,0 +1,45 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-__ENV_NAME__"
+version = "0.1.0"
+description = "__ENV_TITLE_NAME__ environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
+    # install from github
+    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
+    "openenv-core[core]>=0.2.2",
+    # Environment-specific dependencies
+    # Add all dependencies needed for your environment here
+    # Examples:
+    # "numpy>=1.19.0",
+    # "torch>=2.0.0",
+    # "gymnasium>=0.29.0",
+    # "openspiel>=1.0.0",
+    # "smolagents>=1.22.0,<2",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point - enables running via: uv run --project . server
+# or: python -m __ENV_NAME__.server.app
+server = "__ENV_NAME__.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["__ENV_NAME__", "__ENV_NAME__.server"]
+package-dir = { "__ENV_NAME__" = ".", "__ENV_NAME__.server" = "server" }

.claude/skills/generate-openenv-env/assets/openenv_env_template/server/Dockerfile ADDED Viewed

	@@ -0,0 +1,80 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local OpenEnv sources)
+# - Standalone environments (with openenv from PyPI/Git)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Ensure git is available (required for installing dependencies from VCS)
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=__ENV_NAME__
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv is already vendored in the build context
+# For standalone builds, openenv will be installed via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install dependencies using uv sync
+# If uv.lock exists, use it; otherwise resolve on the fly
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with the /app/env structure
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

.claude/skills/generate-openenv-env/assets/openenv_env_template/server/__ENV_NAME___environment.py ADDED Viewed

	@@ -0,0 +1,109 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+__ENV_TITLE_NAME__ Environment Implementation.
+A simple test environment that echoes back messages sent to it.
+Perfect for testing HTTP server infrastructure.
+"""
+from uuid import uuid4
+from openenv.core.env_server.interfaces import Environment
+from openenv.core.env_server.types import State
+try:
+    from ..models import __ENV_CLASS_NAME__Action, __ENV_CLASS_NAME__Observation
+except ImportError:
+    from models import __ENV_CLASS_NAME__Action, __ENV_CLASS_NAME__Observation
+class __ENV_CLASS_NAME__Environment(Environment):
+    """
+    A simple echo environment that echoes back messages.
+    This environment is designed for testing the HTTP server infrastructure.
+    It maintains minimal state and simply echoes back whatever message it receives.
+    Example:
+        >>> env = __ENV_CLASS_NAME__Environment()
+        >>> obs = env.reset()
+        >>> print(obs.echoed_message)  # "__ENV_TITLE_NAME__ environment ready!"
+        >>>
+        >>> obs = env.step(__ENV_CLASS_NAME__Action(message="Hello"))
+        >>> print(obs.echoed_message)  # "Hello"
+        >>> print(obs.message_length)  # 5
+    """
+    # Set to True only when your environment isolates state between instances
+    # and max_concurrent_envs > 1 in server/app.py.
+    SUPPORTS_CONCURRENT_SESSIONS: bool = False
+    def __init__(self):
+        """Initialize the __ENV_NAME__ environment."""
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._reset_count = 0
+    def reset(
+        self, seed=None, episode_id=None, **kwargs
+    ) -> __ENV_CLASS_NAME__Observation:
+        """
+        Reset the environment.
+        Args:
+            seed: Optional seed for deterministic resets
+            episode_id: Optional externally-provided episode id
+            **kwargs: Additional reset arguments
+        Returns:
+            __ENV_CLASS_NAME__Observation with a ready message
+        """
+        self._state = State(episode_id=episode_id or str(uuid4()), step_count=0)
+        self._reset_count += 1
+        return __ENV_CLASS_NAME__Observation(
+            echoed_message="__ENV_TITLE_NAME__ environment ready!",
+            message_length=0,
+            done=False,
+            reward=0.0,
+        )
+    def step(self, action: __ENV_CLASS_NAME__Action) -> __ENV_CLASS_NAME__Observation:  # type: ignore[override]
+        """
+        Execute a step in the environment by echoing the message.
+        Args:
+            action: __ENV_CLASS_NAME__Action containing the message to echo
+        Returns:
+            __ENV_CLASS_NAME__Observation with the echoed message and its length
+        """
+        self._state.step_count += 1
+        message = action.message
+        length = len(message)
+        # Simple reward: longer messages get higher rewards
+        reward = length * 0.1
+        return __ENV_CLASS_NAME__Observation(
+            echoed_message=message,
+            message_length=length,
+            done=False,
+            reward=reward,
+            metadata={"original_message": message, "step": self._state.step_count},
+        )
+    @property
+    def state(self) -> State:
+        """
+        Get the current environment state.
+        Returns:
+            Current State with episode_id and step_count
+        """
+        return self._state

.claude/skills/generate-openenv-env/assets/openenv_env_template/server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""__ENV_TITLE_NAME__ environment server components."""
+from .__ENV_NAME___environment import __ENV_CLASS_NAME__Environment
+__all__ = ["__ENV_CLASS_NAME__Environment"]

.claude/skills/generate-openenv-env/assets/openenv_env_template/server/app.py ADDED Viewed

	@@ -0,0 +1,84 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+FastAPI application for the __ENV_TITLE_NAME__ Environment.
+This module creates an HTTP server that exposes the __ENV_CLASS_NAME__Environment
+over HTTP and WebSocket endpoints, compatible with EnvClient.
+Endpoints:
+    - POST /reset: Reset the environment
+    - POST /step: Execute an action
+    - GET /state: Get current environment state
+    - GET /schema: Get action/observation schemas
+    - WS /ws: WebSocket endpoint for persistent sessions
+Usage:
+    # Development (with auto-reload):
+    uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
+    # Production:
+    uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
+    # Or run directly:
+    python -m server.app
+"""
+try:
+    from openenv.core.env_server.http_server import create_app
+except ImportError as e:  # pragma: no cover
+    raise ImportError(
+        "openenv is required for the web interface. Install dependencies with '\n    uv sync\n'"
+    ) from e
+try:
+    from ..models import __ENV_CLASS_NAME__Action, __ENV_CLASS_NAME__Observation
+    from .__ENV_NAME___environment import __ENV_CLASS_NAME__Environment
+except ImportError:
+    from models import __ENV_CLASS_NAME__Action, __ENV_CLASS_NAME__Observation
+    from server.__ENV_NAME___environment import __ENV_CLASS_NAME__Environment
+# Create the app with web interface and README integration
+app = create_app(
+    __ENV_CLASS_NAME__Environment,
+    __ENV_CLASS_NAME__Action,
+    __ENV_CLASS_NAME__Observation,
+    env_name="__ENV_NAME__",
+    max_concurrent_envs=1,  # increase this number to allow more concurrent WebSocket sessions
+)
+def main(host: str = "0.0.0.0", port: int = 8000):
+    """
+    Entry point for direct execution via uv run or python -m.
+    This function enables running the server without Docker:
+        uv run --project . server
+        uv run --project . server --port 8001
+        python -m __ENV_NAME__.server.app
+    Args:
+        host: Host address to bind to (default: "0.0.0.0")
+        port: Port number to listen on (default: 8000)
+    For production deployments, consider using uvicorn directly with
+    multiple workers:
+        uvicorn __ENV_NAME__.server.app:app --workers 4
+    """
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--port", type=int, default=8000)
+    args = parser.parse_args()
+    main(port=args.port)

.claude/skills/generate-openenv-env/assets/openenv_env_template/server/requirements.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+openenv-core[core]>=0.2.2
+fastapi>=0.115.0
+uvicorn>=0.24.0