Spaces:

Flickinshots
/

EmailMaestro

Running

App Files Files Community

EmailMaestro / AGENTS.md

Flickinshots

Deploy Project Epsilon Space bundle

38c9982 verified 13 days ago

preview code

raw

history blame contribute delete

4.36 kB

Repository Guidelines

Project Structure & Module Organization

Core application code lives in src/executive_assistant/. Keep environment logic in env.py, SQLite workspace behavior in workspace.py, reward logic in graders.py, typed contracts in models.py, provider configuration in config.py, prompt construction in prompts.py, OpenRouter calls in llm_service.py, shared episode execution in runner.py, policies in agent.py, and RL logic in training.py. Tests live in tests/ and should mirror the module they validate. Operational scripts live in scripts/. Use training_env.ipynb with the scalerhack2-training kernel for experiments and rollout export only; move stable logic back into src/. Top-level runtime files include app.py, openenv.yaml, requirements*.txt, and PRD.md.

Build, Test, and Development Commands

Set up the separate app and training environments with:

bash scripts/setup_app_env.sh
bash scripts/setup_training_env.sh

Run the test suite with .venv-training/bin/pytest -q. Start the local Gradio entrypoint with .venv-app/bin/python app.py. Evaluate the deterministic baseline across all seeded tasks with .venv-training/bin/python scripts/evaluate_policies.py --provider baseline. Run one full episode trace with .venv-training/bin/python scripts/run_policy_episode.py --task hard_rag_reply --provider baseline. Train the tabular RL policy with .venv-training/bin/python scripts/train_rl_agent.py --episodes 300. To exercise the Gemma model through OpenRouter, set OPENROUTER_API_KEY first, then switch --provider openrouter or set POLICY_PROVIDER = "openrouter" in the notebook.

.venv-training/bin/python scripts/evaluate_policies.py --provider baseline

Coding Style & Naming Conventions

Target Python 3.11+ and use 4-space indentation. Prefer explicit types and small, single-purpose functions. Follow existing naming patterns: snake_case for functions, variables, and modules; PascalCase for Pydantic models and environment classes; uppercase for constants such as TASK_SEEDS. Keep comments brief and only where behavior is not obvious. There is no formatter configured yet, so match the existing style and keep imports tidy.

Testing Guidelines

Tests use pytest. Add or update tests with every behavioral change, especially for environment transitions, reward shaping, seeded task completion, runner traces, OpenRouter service behavior, and RL training smoke paths. Name test files test_*.py and test functions test_*. Prefer deterministic assertions against observations, snapshots, action logs, checkpoints, and scores over loose text checks. If you change notebook-driven workflows, validate the underlying module or script rather than testing notebook JSON behavior only.

Commit & Pull Request Guidelines

Current history uses short, imperative commit subjects such as Initial RL agent sandbox scaffold and Add PRD progress checkpoint note. Continue that style: concise subject line, capitalized first word, no trailing period. Pull requests should include a brief summary, note any changed scenarios or rewards, list validation steps run (pytest -q, smoke tests), and attach screenshots only when UI behavior in app.py changes.

Agent-Specific Notes

Preserve determinism in the environment, graders, and baseline policy. Live API access belongs in policy layers such as OpenRouterPolicy, not in the workspace or reward path. Keep EpisodeRunner as the shared execution path for scripts, tests, Gradio, and notebook workflows. Treat OpenRouter calls as optional runtime behavior: tests and RL smoke runs must stay runnable without network access. If notebook experiments uncover a useful change, codify it in src/ and cover it with tests before treating it as part of the baseline.

Agent Workflow Loop

All execution surfaces in this repository should follow the same loop:

Load environment state
Generate observation
Send to LLM or policy
Receive structured action
Execute action in workspace
Update state
Repeat until task complete

In code, keep this flow inside EpisodeRunner. Use initialize() for steps 1-2, choose_action() for steps 3-4, and advance() plus env.step() for steps 5-6. Do not duplicate bespoke episode loops in notebooks, scripts, or UI handlers.