Spaces:

srishtichugh
/

orgOS

Running

App Files Files Community

Taniieeee83 commited on 16 days ago

Commit

da84c63

1 Parent(s): 2305b9f

updated readme, requirements.txt

Browse files

Files changed (7) hide show

README.md +127 -243
openenv.yaml +61 -48
pyproject.toml +17 -20
requirements.txt +2 -2
server/tasks/task1_missing.py +0 -39
server/tasks/task2_format.py +0 -68
server/tasks/task3_pipeline.py +0 -104

README.md CHANGED Viewed

@@ -1,181 +1,120 @@
 ---
-title: Data Cleaning Environment
-emoji: 🧹
-colorFrom: blue
-colorTo: green
 sdk: docker
 pinned: false
 app_port: 8000
 tags:
   - openenv
   - rl
-  - data-cleaning
 ---
-# Data Cleaning OpenEnv
-A **real-world data cleaning environment** for training and evaluating AI agents.
-An agent interacts with a dirty pandas DataFrame through a standard `reset() / step() / state()` HTTP API, learning to fix common data quality problems — missing values, duplicate rows, inconsistent formats, statistical outliers, and dtype errors — across three progressively harder tasks.
-🤗 **Live HuggingFace Space:** https://srishtichugh-openenv-hack.hf.space
-📖 **Interactive API docs:** https://srishtichugh-openenv-hack.hf.space/docs
-✅ **Health check:** https://srishtichugh-openenv-hack.hf.space/health
 ---
-## Environment Description & Motivation
-Real-world datasets are almost never clean. Data engineers routinely spend 60–80 % of their time on data cleaning tasks: filling missing values with statistically appropriate strategies, removing duplicates, standardising inconsistent formats (phone numbers, dates, country names), and detecting extreme outliers.
-This environment turns those tasks into a reinforcement learning challenge with:
-- **Deterministic, programmatic graders** — ground-truth clean DataFrames are generated with a fixed seed; every reward signal is reproducible.
-- **Meaningful partial rewards** — every step emits a delta reward proportional to how much of the dataset it cleaned, so the agent receives useful signal throughout the episode rather than only at the end.
-- **Three difficulty levels** — easy, medium, hard — letting agents learn a curriculum from simple null-filling up to full multi-issue pipelines.
-- **No external data downloads** — all datasets are generated synthetically via `numpy` + `Faker` with `seed=42`.
 ---
-## Action Space
-Actions are JSON objects sent to `POST /step`.
-| `operation` | Required `column` | `params` | Description |
-|---|---|---|---|
-| `fill_missing` | ✅ | `{"strategy": "median\|mean\|mode\|constant", "value": ...}` | Fill NaN values in a column |
-| `drop_duplicates` | ❌ | — | Remove all duplicate rows |
-| `fix_format` | ✅ | — | Standardise phone/date/country format |
-| `replace_value` | ✅ | `{"old": ..., "new": ...}` | Replace a specific value |
-| `drop_outliers` | ✅ | — | Remove IQR outliers from a numeric column |
-| `fix_dtype` | ✅ | `{"dtype": "float\|int\|str"}` | Cast column to correct dtype |
-**Format rules enforced by `fix_format`:**
-| Column | Target format |
 |---|---|
-| `phone` | `NNN-NNN-NNNN` |
-| `listed_date` / `signup_date` | `YYYY-MM-DD` |
-| `country` | Title-cased canonical name (`USA`, `UK`, `Canada`, `Australia`, `Germany`) |
-**Example actions:**
-```json
-{"operation": "fill_missing",    "column": "salary",          "params": {"strategy": "median"}}
-{"operation": "fill_missing",    "column": "department",      "params": {"strategy": "mode"}}
-{"operation": "drop_duplicates"}
-{"operation": "fix_format",      "column": "phone"}
-{"operation": "fix_format",      "column": "signup_date"}
-{"operation": "drop_outliers",   "column": "purchase_amount"}
-```
 ---
-## Observation Space
-Every `POST /reset` and `POST /step` returns:
-```json
-{
-  "observation": {
-    "done":             false,
-    "reward":           0.40,
-    "data_preview":     "name,age,salary,...\n...",
-    "data_shape":       [100, 5],
-    "missing_counts":   {"age": 20, "salary": 20, "department": 10},
-    "duplicate_count":  0,
-    "dtype_issues":     {},
-    "task_description": "Task 1 (Easy) — Fill Missing Values\n...",
-    "message":          "Filled 20 missing values in 'age' using median.",
-    "step_count":       1,
-    "current_score":    0.4000
-  },
-  "reward": 0.40,
-  "done":   false,
-  "info":   {}
-}
-```
-| Field | Type | Description |
-|---|---|---|
-| `done` | bool | Episode finished (score ≥ 0.95 or max steps reached) |
-| `reward` | float | Per-step delta reward (see Reward Function) |
-| `data_preview` | string | First 10 rows of current DataFrame as CSV |
-| `data_shape` | [int, int] | Current `[rows, cols]` |
-| `missing_counts` | object | `{column: null_count}` for columns with NaN |
-| `duplicate_count` | int | Number of duplicate rows |
-| `dtype_issues` | object | `{column: issue_description}` for suspected dtype mismatches |
-| `task_description` | string | Full task instructions with available operations |
-| `message` | string | Human-readable result of the last action |
-| `step_count` | int | Steps taken in this episode |
-| `current_score` | float | Running grader score 0.0 – 1.0 |
 ---
-## State Space
-`GET /state` returns episode metadata (does not modify state):
 ```json
 {
-  "episode_id":      "a8f026a9-...",
-  "task_id":         1,
-  "step_count":      2,
-  "max_steps":       20,
-  "total_errors":    50,
-  "errors_remaining": 30
 }
 ```
 ---
-## Tasks
-### Task 1 — Fill Missing Values *(Easy)*
-| Property | Value |
-|---|---|
-| Dataset | 100-row employee records (name, age, salary, department, experience) |
-| Issues | ~20 % NaN in `age`, `salary`; ~10 % NaN in `department` |
-| Goal | Fill all missing values |
-| Valid operations | `fill_missing` |
-| Grader | `1.0 − remaining_nulls / original_nulls` |
-| Max steps | 20 |
-| Optimal steps | 3 (one per affected column) |
-### Task 2 — Fix Formats + Remove Duplicates *(Medium)*
-| Property | Value |
-|---|---|
-| Dataset | 215-row product catalog (product_id, price, category, phone, listed_date) |
-| Issues | ~60 % phone numbers in mixed formats, ~60 % dates in mixed formats, 15 duplicate rows |
-| Goal | Standardise all phone/date formats and remove duplicates |
-| Valid operations | `fix_format`, `drop_duplicates` |
-| Grader | `0.35 × phone_score + 0.35 × date_score + 0.30 × dupe_score` |
-| Max steps | 30 |
-| Optimal steps | 3 |
-### Task 3 — Full Cleaning Pipeline *(Hard)*
-| Property | Value |
-|---|---|
-| Dataset | 320-row customer database (name, age, purchase_amount, country, email, signup_date) |
-| Issues | Missing values (4 cols), 20 duplicate rows, outliers in `purchase_amount` (~3× normal), mixed country capitalisation, mixed date formats |
-| Goal | Fix all issues end-to-end |
-| Valid operations | All 6 operations |
-| Grader | `0.25×null + 0.20×dupe + 0.20×outlier + 0.175×country + 0.175×date` |
-| Max steps | 40 |
-| Optimal steps | 8 |
----
 ## Reward Function
-| Scenario | Reward |
-|---|---|
-| Score improves (delta > 0) | `new_score − old_score` (positive) |
-| Operation had no effect | `−0.01` |
-| Invalid operation / bad column | `−0.05` |
-| Episode completed (score ≥ 0.95) | `delta + 0.20` terminal bonus |
-Rewards are bounded to **[−0.05, 1.2]**. A partial reward is emitted on every step, giving the agent dense signal throughout the episode.
 ---
@@ -183,139 +122,84 @@ Rewards are bounded to **[−0.05, 1.2]**. A partial reward is emitted on every
 | Method | Path | Description |
 |---|---|---|
-| `GET` | `/health` | Health check → `{"status": "healthy"}` |
-| `POST` | `/reset` | Start episode. Body: `{"task_id": 1\|2\|3}` (optional; default: round-robin) |
-| `POST` | `/step` | Execute action. Body: action JSON |
-| `POST` | `/state` | Get episode metadata |
-| `GET` | `/metadata` | Environment name, version, task list |
-| `GET` | `/schema` | Full action / observation / state JSON schemas |
-| `GET` | `/docs` | Interactive Swagger UI |
 ---
-## Baseline Scores
-| Task | Difficulty | Score |
-|---|---|---|
-| 1 — Fill Missing Values | Easy | 0.999 |
-| 2 — Fix Formats + Duplicates | Medium | 0.999 |
-| 3 — Full Cleaning Pipeline | Hard | 0.999 |
-| **Average** | — | **0.999** |
-*Produced by `google/gemma-3-27b-it` via NVIDIA NIM, `temperature=0`. Full step-by-step agent logs: `inference_log.txt`.*
 ---
-## Setup & Usage
-### Prerequisites
-- Python 3.11+
-- Docker (for containerised deployment)
-### Local — Python
 ```bash
-# 1. Clone and install dependencies
-git clone https://github.com/Tanvi51204/openEnv.git
-cd openEnv
 pip install -r requirements.txt
-# 2. Start the server
 uvicorn server.app:app --host 0.0.0.0 --port 8000
-# 3. Open Swagger UI
-open http://localhost:8000/docs
-```
-### Local — Docker
-```bash
-docker build -t data-cleaning-env .
-docker run -p 8000:8000 data-cleaning-env
-```
-### Quick API test
-```bash
-# Health
-curl http://localhost:8000/health
-# Start Task 1
-curl -X POST http://localhost:8000/reset \
-  -H "Content-Type: application/json" \
-  -d '{"task_id": 1}'
-# Fill missing values
-curl -X POST http://localhost:8000/step \
-  -H "Content-Type: application/json" \
-  -d '{"operation": "fill_missing", "column": "salary", "params": {"strategy": "median"}}'
 ```
-### Python client
-```python
-from client import DataCleaningEnvClient
-from models import DataCleaningAction
-with DataCleaningEnvClient("http://localhost:8000") as env:
-    result = env.reset(task_id=1)
-    print(result.observation.missing_counts)   # {'age': 20, 'salary': 20, 'department': 10}
-    action = DataCleaningAction(
-        operation="fill_missing",
-        column="salary",
-        params={"strategy": "median"},
-    )
-    result = env.step(action)
-    print(result.observation.current_score)    # 0.4
-    print(result.reward)                       # 0.4
-```
-### Run baseline inference
 ```bash
-export API_BASE_URL="https://api.openai.com/v1"
-export MODEL_NAME="gpt-4o-mini"
-export HF_TOKEN="sk-..."          # your API key
-export ENV_URL="http://localhost:8000"
-python inference.py
 ```
-Produces `[START]` / `[STEP]` / `[END]` lines to stdout and `baseline_scores.json`.
-### Environment variables
-| Variable | Default | Description |
-|---|---|---|
-| `API_BASE_URL` | `https://api.openai.com/v1` | LLM API endpoint (OpenAI-compatible) |
-| `MODEL_NAME` | `gpt-4o-mini` | Model identifier |
-| `HF_TOKEN` | — | API key for LLM calls |
-| `ENV_URL` | `http://localhost:8000` | Environment server URL |
 ---
 ## Project Structure
 ```
-openenv-data-cleaning/
-├── models.py              Pydantic contracts — Action / Observation / State
-├── client.py              Sync HTTP client (reset / step / state / health)
-├── inference.py           Baseline LLM agent with [START]/[STEP]/[END] logging
-├── openenv.yaml           OpenEnv manifest
-├── Dockerfile             python:3.11-slim, non-root user, HEALTHCHECK
-├── requirements.txt       pip dependencies
-├── pyproject.toml         Python package metadata + openenv-core dependency
-└── server/
-    ├── app.py             FastAPI routes + /metadata + /schema
-    ├── environment.py     reset / step / state logic + 6 operations + rewards
-    ├── data_generator.py  Synthetic dataset generation (seed=42, reproducible)
-    └── tasks/
-        ├── task1_missing.py    Easy  — fill NaN grader
-        ├── task2_format.py     Medium — format + duplicates grader
-        └── task3_pipeline.py   Hard  — full pipeline grader
 ```
 ---
-## Live Demo
-🤗 **HuggingFace Space:** https://srishtichugh-openenv-hack.hf.space
-- Health: https://srishtichugh-openenv-hack.hf.space/health
-- Docs:   https://srishtichugh-openenv-hack.hf.space/docs

 ---
+title: OrgOS Enterprise Workflow RL Environment
+emoji: 🏢
+colorFrom: indigo
+colorTo: cyan
 sdk: docker
 pinned: false
 app_port: 8000
 tags:
   - openenv
   - rl
+  - enterprise
+  - multi-app
 ---
+# OrgOS — Enterprise Workflow RL Environment
+**OrgOS** is a multi-app enterprise reinforcement learning environment where an AI agent completes real business workflows across four interconnected SaaS applications. Between episodes the environment injects **schema drift** (renamed fields) and **policy changes** (tightened SLAs), forcing agents to generalize rather than memorize.
+Built for the [Meta PyTorch × Scaler OpenEnv Hackathon](https://huggingface.co/) — targeting the **Multi-App Enterprise Workflow** sub-theme.
 ---
+## Live Demo
+🚀 **[HuggingFace Space →](https://huggingface.co/spaces/tanvibisht/orgos-openenv)**
+```bash
+# Local quickstart
+uvicorn server.app:app --host 0.0.0.0 --port 8000
+# Open http://localhost:8000 for the live dashboard
+```
 ---
+## What Makes OrgOS Unique
+| Feature | Description |
+|---|---|
+| **4 Mock SaaS Apps** | Jira, Zendesk, Salesforce, Workday — each with realistic operations |
+| **Schema Drift** | Fields rename between episodes (e.g. `priority → severity → urgency_level`). Agent gets `-0.20` for stale names, `+0.10` for adapted names |
+| **Policy Drift** | Every 3rd episode, SLA thresholds tighten automatically |
+| **3 Workflows** | Cross-app tasks of increasing complexity: Bug Fix → Onboarding → Churn Alert |
+| **RBAC** | Support vs. manager roles enforced; `-0.25` penalty for unauthorized actions |
+| **Dense Reward** | Per-step composite signal tied to 5 measurable business outcomes |
+---
+## Applications & Operations
+| App | Key Operations |
 |---|---|
+| **Jira** | `get_issue`, `create_issue`, `update_status`, `set_priority`, `assign_owner`, `link_zendesk_ticket`, `close_issue`, `list_issues` |
+| **Zendesk** | `get_ticket`, `acknowledge_ticket`, `set_urgency`, `assign_agent`, `escalate_to_jira`, `resolve_ticket`, `add_note`, `list_tickets` |
+| **Salesforce** | `get_account`, `list_accounts`, `update_deal_stage`, `flag_churn_risk`, `assign_account_owner`, `log_interaction`, `get_opportunity` |
+| **Workday** | `get_employee`, `list_employees`, `provision_access`, `log_sla_event`, `request_budget_approval`, `create_onboarding_task`, `complete_task` |
 ---
+## Workflows
+### Workflow A — Customer Bug Fix (support role, 5 steps, max 15)
+1. Acknowledge Zendesk ticket
+2. Create linked Jira issue
+3. Assign Jira issue to engineer
+4. Log SLA event in Workday
+5. Query Salesforce for account health
+### Workflow B — Employee Onboarding (manager role, 4 steps, max 20)
+1. Create employee record in Workday
+2. Provision Jira access
+3. Add employee to Salesforce team
+4. Create Zendesk support profile
+### Workflow C — Churn Risk Alert (support role, 4 steps, max 18)
+1. Flag churn risk in Salesforce
+2. Escalate to Zendesk ticket
+3. Create Jira tracking issue
+4. Log SLA event in Workday
 ---
+## Action / Observation Format
+**Action:**
+```json
+{"app": "zendesk", "operation": "acknowledge_ticket", "args": {"ticket_number": "ZD-001"}}
+```
+**Observation (key fields):**
 ```json
 {
+  "workflow_goal": "Resolve customer bug report end-to-end",
+  "pending_steps": ["Assign Jira issue to engineer", "Log SLA event in Workday"],
+  "schema_hints": {"jira.priority": "severity"},
+  "active_rules": {"sla_p0_minutes": 30},
+  "current_score": 0.42,
+  "message": "Jira issue JI-001 created and linked to ZD-001"
 }
 ```
 ---
 ## Reward Function
+```
+score = 0.30 × workflow_completion
+      + 0.25 × rule_compliance
+      + 0.20 × schema_adaptation
+      + 0.15 × efficiency
+      + 0.10 × policy_drift_handling
+Per-step delta = new_score − old_score
+Schema error penalty   = −0.20
+RBAC violation penalty = −0.25
+Terminal completion bonus = +0.20
+```
 ---
 | Method | Path | Description |
 |---|---|---|
+| `GET` | `/health` | Health check |
+| `POST` | `/reset` | Start new episode (`{"workflow_id": "A"\|"B"\|"C"}`) |
+| `POST` | `/step` | Take action (`{"app": ..., "operation": ..., "args": {...}}`) |
+| `GET` | `/state` | Current episode metadata |
+| `GET` | `/schema/apps` | All app operations catalogue |
+| `GET` | `/docs` | Swagger UI |
+| `GET` | `/` | Live dashboard (UI) |
+| `GET` | `/ui/run-agent` | SSE stream: live agent inference |
 ---
+## Training
+The `training/grpo_orgos.ipynb` notebook trains **Qwen2.5-3B-Instruct** with **Unsloth 4-bit LoRA** using **HF TRL GRPOTrainer**:
+- Before training: ~0.55 score (uses stale canonical field names → schema error penalties)
+- After training: ~0.75 score (reads `schema_hints`, uses drifted field names → adaptation bonuses)
+- **Δ ≈ +0.20** per episode, visible in `before_after_curves.png`
 ---
+## Local Setup
 ```bash
+# 1. Install dependencies
 pip install -r requirements.txt
+# 2. Start server
 uvicorn server.app:app --host 0.0.0.0 --port 8000
+# 3. Run baseline inference (requires LLM API)
+export API_BASE_URL=https://api.openai.com/v1
+export MODEL_NAME=gpt-4o-mini
+export HF_TOKEN=your_token
+python inference.py
+# 4. Or use the Python client
+from client import OrgOSEnvClient
+client = OrgOSEnvClient("http://localhost:8000")
+result = client.reset(workflow_id="A")
+print(result.observation.workflow_goal)
 ```
+## Docker
 ```bash
+docker build -t orgos-env .
+docker run -p 8000:8000 orgos-env
 ```
 ---
 ## Project Structure
 ```
+openEnv/
+├── server/
+│   ├── app.py              # FastAPI routes (15 endpoints)
+│   ├── environment.py      # OrgOSEnvironment — reset/step/state
+│   ├── schema_drift.py     # Per-episode field renames
+│   ├── business_rules.py   # RBAC + SLA enforcement
+│   ├── workflow_engine.py  # 3 cross-app workflow definitions
+│   ├── data_generator.py   # Synthetic data (seed=42)
+│   └── apps/
+│       ├── jira.py
+│       ├── zendesk.py
+│       ├── salesforce.py
+│       └── workday.py
+├── models.py               # Pydantic models
+├── client.py               # OrgOSEnvClient
+├── inference.py            # Baseline inference loop + SSE generator
+├── ui/index.html           # Live dashboard (Tailwind + Alpine.js + Chart.js)
+├── training/
+│   └── grpo_orgos.ipynb   # GRPO training notebook (Colab)
+├── openenv.yaml            # OpenEnv manifest
+└── Dockerfile
 ```
 ---
+MIT License · Built for Meta PyTorch × Scaler OpenEnv Hackathon Round 2

openenv.yaml CHANGED Viewed

@@ -1,73 +1,86 @@
-name: data-cleaning-env
-version: "0.1.0"
 description: >
-  A real-world data cleaning environment where an AI agent fixes missing
-  values, duplicate rows, format inconsistencies, outliers, and dtype errors
-  across three progressively harder tasks.
-author: openenv-hackathon
 tags:
   - openenv
-  - data-cleaning
   - rl
-  - real-world
 tasks:
-  - id: task1
-    name: "Fill Missing Values"
     difficulty: easy
-    max_steps: 20
     description: >
-      Fill all NaN values in an employee records dataset.
-      Columns with missing data: age, salary, department.
-  - id: task2
-    name: "Fix Formats and Remove Duplicates"
     difficulty: medium
-    max_steps: 30
     description: >
-      Standardise phone numbers (NNN-NNN-NNNN) and dates (YYYY-MM-DD)
-      in a product catalog, and remove ~15 duplicate rows.
-  - id: task3
-    name: "Full Cleaning Pipeline"
-    difficulty: hard
-    max_steps: 40
     description: >
-      End-to-end pipeline on a customer database: fill missing values,
-      remove duplicates, drop outliers in purchase_amount, standardise
-      country capitalisation, and fix mixed date formats.
 api:
-  health:  GET  /health
-  reset:   POST /reset
-  step:    POST /step
-  state:   POST /state
-  docs:    GET  /docs
 reward:
   range: [0.001, 0.999]
-  partial: true
-  terminal_bonus: 0.0
 observation_space:
-  type: object
   fields:
-    done:            boolean
-    reward:          float
-    data_preview:    string   # First 10 rows as CSV
-    data_shape:      list     # [rows, cols]
-    missing_counts:  object   # {column: count}
-    duplicate_count: integer
-    dtype_issues:    object   # {column: issue_description}
-    task_description: string
-    message:         string
-    step_count:      integer
-    current_score:   float    # 0.0–1.0
 action_space:
-  type: object
   fields:
-    operation: string   # fill_missing | drop_duplicates | fix_format | replace_value | drop_outliers | fix_dtype
-    column:    string   # optional depending on operation
-    params:    object   # optional operation parameters

+name: orgos-openenv
+version: "2.0.0"
 description: >
+  OrgOS is a multi-app enterprise RL environment where an agent completes
+  business workflows across Jira, Zendesk, Salesforce, and Workday.
+  Between episodes, schema drift renames fields and policy drift tightens SLAs,
+  forcing agents to generalize rather than memorize.
+author: tanvibisht
 tags:
   - openenv
+  - enterprise
+  - multi-app
+  - schema-drift
   - rl
 tasks:
+  - id: workflow_a
+    name: "Customer Bug Fix"
     difficulty: easy
+    max_steps: 15
     description: >
+      Triage a customer bug report end-to-end: acknowledge the Zendesk ticket,
+      create a linked Jira issue, assign it to an engineer, log the SLA event
+      in Workday, and query Salesforce for account health. Support role only.
+  - id: workflow_b
+    name: "Employee Onboarding"
     difficulty: medium
+    max_steps: 20
     description: >
+      Onboard a new employee: create their Workday record, provision Jira access
+      based on role, add them to the correct Salesforce territory team, and
+      create their Zendesk support profile. Manager role required.
+  - id: workflow_c
+    name: "Churn Risk Alert"
+    difficulty: medium
+    max_steps: 18
     description: >
+      Respond to a churn risk signal: flag the account in Salesforce, escalate
+      to a Zendesk ticket, create a Jira tracking issue, and log the SLA event
+      in Workday. Support role. Policy drift may tighten SLA thresholds.
 api:
+  routes:
+    health: GET /health
+    reset:  POST /reset
+    step:   POST /step
+    state:  GET /state
+    docs:   GET /docs
+    schema: GET /schema/apps
 reward:
   range: [0.001, 0.999]
+  partial_rewards: true
+  terminal_bonus: 0.20
+  components:
+    workflow_completion:   0.30
+    rule_compliance:       0.25
+    schema_adaptation:     0.20
+    efficiency:            0.15
+    policy_drift_handling: 0.10
 observation_space:
   fields:
+    - done: bool
+    - reward: float
+    - current_score: "float in [0.001, 0.999]"
+    - workflow_id: "A | B | C"
+    - step_count: int
+    - app_states: "dict[app_name, str] — preview of each app's records"
+    - workflow_goal: str
+    - completed_steps: "list[str]"
+    - pending_steps: "list[str]"
+    - schema_hints: "dict[str, str] — e.g. {\"jira.priority\": \"severity\"}"
+    - active_rules: "dict — current SLA thresholds and RBAC rules"
+    - rule_violations: "list[str] — violations from last action"
+    - reward_breakdown: "RewardBreakdown — 5-component score snapshot"
+    - message: "str — feedback from last action"
 action_space:
   fields:
+    - app: "jira | zendesk | salesforce | workday"
+    - operation: str
+    - args: "dict — operation-specific arguments"

pyproject.toml CHANGED Viewed

@@ -1,26 +1,23 @@
 [project]
-name = "data-cleaning-env"
-version = "0.1.0"
-description = "Real-world data cleaning environment for OpenEnv / Scaler hackathon"
 requires-python = ">=3.11"
 dependencies = [
-    "fastapi==0.135.2",
-    "uvicorn[standard]==0.40.0",
-    "pydantic==2.12.5",
-    "pandas==2.2.3",
-    "numpy==2.2.4",
-    "faker==40.12.0",
-    "openai==2.15.0",
-    "httpx==0.28.1",
-    "openenv-core==0.2.3",
 ]
-[project.scripts]
-server = "server.app:main"
-[build-system]
-requires = ["hatchling"]
-build-backend = "hatchling.build"
 [tool.hatch.build.targets.wheel]
-packages = ["server"]

+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
 [project]
+name = "orgos"
+version = "2.0.0"
+description = "OrgOS — Multi-App Enterprise Workflow RL Environment"
 requires-python = ">=3.11"
 dependencies = [
+    "fastapi",
+    "uvicorn[standard]",
+    "pydantic",
+    "numpy",
+    "faker",
+    "openai",
+    "httpx",
+    "openenv-core",
+    "aiofiles",
 ]
 [tool.hatch.build.targets.wheel]
+packages = ["server"]

requirements.txt CHANGED Viewed

@@ -1,9 +1,9 @@
 fastapi==0.135.2
 uvicorn[standard]==0.40.0
 pydantic==2.12.5
-pandas==2.2.3
 numpy==2.2.4
 faker==40.12.0
 openai==2.15.0
 httpx==0.28.1
-openenv-core==0.2.3

 fastapi==0.135.2
 uvicorn[standard]==0.40.0
 pydantic==2.12.5
 numpy==2.2.4
 faker==40.12.0
 openai==2.15.0
 httpx==0.28.1
+openenv-core==0.2.3
+aiofiles>=23.0.0

server/tasks/task1_missing.py DELETED Viewed

@@ -1,39 +0,0 @@
-"""
-Task 1 — Easy: Fill Missing Values
-Objective: Fill all NaN values in the employee records DataFrame.
-Score: 1.0 - (remaining_nulls / original_nulls)
-"""
-from server.data_generator import generate_task1_datasets
-TASK_ID = 1
-MAX_STEPS = 20
-DESCRIPTION = (
-    "Task 1 (Easy) — Fill Missing Values\n"
-    "You have an employee records dataset with missing values (NaN) in "
-    "'age', 'salary', and 'department' columns. "
-    "Your goal is to fill all missing values so the dataset is complete.\n\n"
-    "Available operation: fill_missing\n"
-    "  params.strategy: 'median' | 'mean' | 'mode' | 'constant'\n"
-    "  params.value: (required when strategy='constant') the fill value\n"
-    "Example action: {\"operation\": \"fill_missing\", \"column\": \"age\", \"params\": {\"strategy\": \"median\"}}"
-)
-def load():
-    """Return (dirty_df, clean_df, original_null_count)."""
-    dirty, clean = generate_task1_datasets()
-    original_nulls = int(dirty.isnull().sum().sum())
-    return dirty.copy(), clean, original_nulls
-def score(current_df, original_nulls: int) -> float:
-    """Score in [0, 1]: fraction of nulls filled."""
-    if original_nulls == 0:
-        return 0.99
-    remaining = int(current_df.isnull().sum().sum())
-    return round(max(0.01, min(0.99, 1.0 - remaining / original_nulls)), 4)
-def count_errors(current_df) -> int:
-    return int(current_df.isnull().sum().sum())

server/tasks/task2_format.py DELETED Viewed

@@ -1,68 +0,0 @@
-"""
-Task 2 — Medium: Fix Formats + Remove Duplicates
-Objective: Standardise phone & date formats and drop duplicate rows.
-Score: weighted average of format_score (0.7) + dupe_score (0.3)
-"""
-import re
-import pandas as pd
-from server.data_generator import generate_task2_datasets
-TASK_ID = 2
-MAX_STEPS = 30
-DESCRIPTION = (
-    "Task 2 (Medium) — Fix Formats and Remove Duplicates\n"
-    "You have a product catalog with:\n"
-    "  • Phone numbers in mixed formats (need: NNN-NNN-NNNN)\n"
-    "  • Dates in mixed formats (need: YYYY-MM-DD)\n"
-    "  • Duplicate rows (~15)\n\n"
-    "Available operations:\n"
-    "  fix_format  — column: 'phone' | 'listed_date'\n"
-    "  drop_duplicates — no column needed\n\n"
-    "Example actions:\n"
-    '  {"operation": "fix_format", "column": "phone"}\n'
-    '  {"operation": "fix_format", "column": "listed_date"}\n'
-    '  {"operation": "drop_duplicates"}'
-)
-PHONE_RE = re.compile(r"^\d{3}-\d{3}-\d{4}$")
-DATE_RE  = re.compile(r"^\d{4}-\d{2}-\d{2}$")
-def load():
-    dirty, clean = generate_task2_datasets()
-    original_phone_issues = int((~dirty["phone"].str.match(PHONE_RE)).sum())
-    original_date_issues  = int((~dirty["listed_date"].apply(
-        lambda x: bool(DATE_RE.match(str(x))) if pd.notna(x) else False
-    )).sum())
-    original_dupes = len(dirty) - len(dirty.drop_duplicates())
-    meta = {
-        "orig_phone": original_phone_issues,
-        "orig_date":  original_date_issues,
-        "orig_dupes": original_dupes,
-    }
-    return dirty.copy(), clean, meta
-def score(current_df, meta: dict) -> float:
-    phone_issues = int((~current_df["phone"].str.match(PHONE_RE)).sum())
-    date_issues  = int((~current_df["listed_date"].apply(
-        lambda x: bool(DATE_RE.match(str(x))) if pd.notna(x) else False
-    )).sum())
-    dupes        = len(current_df) - len(current_df.drop_duplicates())
-    phone_score = 1.0 - phone_issues / max(meta["orig_phone"], 1)
-    date_score  = 1.0 - date_issues  / max(meta["orig_date"],  1)
-    dupe_score  = 1.0 - dupes        / max(meta["orig_dupes"], 1)
-    combined = 0.35 * phone_score + 0.35 * date_score + 0.30 * dupe_score
-    return round(max(0.01, min(0.99, combined)), 4)
-def count_errors(current_df, meta: dict) -> int:
-    phone_issues = int((~current_df["phone"].str.match(PHONE_RE)).sum())
-    date_issues  = int((~current_df["listed_date"].apply(
-        lambda x: bool(DATE_RE.match(str(x))) if pd.notna(x) else False
-    )).sum())
-    dupes = len(current_df) - len(current_df.drop_duplicates())
-    return phone_issues + date_issues + dupes

server/tasks/task3_pipeline.py DELETED Viewed

@@ -1,104 +0,0 @@
-"""
-Task 3 — Hard: Full Cleaning Pipeline
-Objective: Fix missing values, remove duplicates, handle outliers, standardise
-           country capitalisation and date formats.
-Score: equal-weight average of 4 sub-scores.
-"""
-import re
-import numpy as np
-import pandas as pd
-from server.data_generator import generate_task3_datasets
-TASK_ID = 3
-MAX_STEPS = 40
-DESCRIPTION = (
-    "Task 3 (Hard) — Full Cleaning Pipeline\n"
-    "You have a customer database with multiple issues:\n"
-    "  1. Missing values in 'age', 'purchase_amount', 'country', 'signup_date'\n"
-    "  2. ~20 duplicate rows\n"
-    "  3. Outliers in 'purchase_amount' (injected values ~10x normal)\n"
-    "  4. Mixed case in 'country' (need: title case, e.g. 'Usa' → 'USA')\n"
-    "  5. Mixed date formats in 'signup_date' (need: YYYY-MM-DD)\n\n"
-    "Available operations:\n"
-    "  fill_missing    — column + params.strategy ('median'|'mean'|'mode'|'constant')\n"
-    "  drop_duplicates — no column needed\n"
-    "  drop_outliers   — column (numeric); uses IQR method\n"
-    "  fix_format      — column: 'country' | 'signup_date'\n"
-    "  fix_dtype       — column + params.dtype ('float'|'int'|'str')\n\n"
-    "Example actions:\n"
-    '  {"operation": "fill_missing",    "column": "age",             "params": {"strategy": "median"}}\n'
-    '  {"operation": "drop_duplicates"}\n'
-    '  {"operation": "drop_outliers",   "column": "purchase_amount"}\n'
-    '  {"operation": "fix_format",      "column": "signup_date"}\n'
-    '  {"operation": "fix_format",      "column": "country"}'
-)
-DATE_RE = re.compile(r"^\d{4}-\d{2}-\d{2}$")
-VALID_COUNTRIES = {"USA", "UK", "Canada", "Australia", "Germany"}
-def load():
-    dirty, clean = generate_task3_datasets()
-    orig_nulls = int(dirty.isnull().sum().sum())
-    orig_dupes = len(dirty) - len(dirty.drop_duplicates())
-    # Outlier baseline: count rows where purchase_amount > Q3 + 3*IQR
-    pa = dirty["purchase_amount"].dropna()
-    q1, q3 = pa.quantile(0.25), pa.quantile(0.75)
-    iqr = q3 - q1
-    orig_outliers = int((pa > q3 + 3 * iqr).sum())
-    orig_country_issues = int((~dirty["country"].isin(VALID_COUNTRIES) &
-                               dirty["country"].notna()).sum())
-    orig_date_issues    = int((~dirty["signup_date"].apply(
-        lambda x: bool(DATE_RE.match(str(x))) if pd.notna(x) else False
-    )).sum())
-    meta = {
-        "orig_nulls":           orig_nulls,
-        "orig_dupes":           orig_dupes,
-        "orig_outliers":        max(orig_outliers, 1),
-        "orig_country_issues":  max(orig_country_issues, 1),
-        "orig_date_issues":     max(orig_date_issues, 1),
-        "q1": q1, "q3": q3, "iqr": iqr,
-    }
-    return dirty.copy(), clean, meta
-def score(current_df, meta: dict) -> float:
-    remaining_nulls = int(current_df.isnull().sum().sum())
-    remaining_dupes = len(current_df) - len(current_df.drop_duplicates())
-    pa = current_df["purchase_amount"].dropna()
-    remaining_outliers = int((pa > meta["q3"] + 3 * meta["iqr"]).sum())
-    remaining_country = int((~current_df["country"].isin(VALID_COUNTRIES) &
-                              current_df["country"].notna()).sum())
-    remaining_dates   = int((~current_df["signup_date"].apply(
-        lambda x: bool(DATE_RE.match(str(x))) if pd.notna(x) else False
-    )).sum())
-    null_score     = 1.0 - remaining_nulls    / max(meta["orig_nulls"],    1)
-    dupe_score     = 1.0 - remaining_dupes    / max(meta["orig_dupes"],    1)
-    outlier_score  = 1.0 - remaining_outliers / meta["orig_outliers"]
-    country_score  = 1.0 - remaining_country  / meta["orig_country_issues"]
-    date_score     = 1.0 - remaining_dates    / meta["orig_date_issues"]
-    combined = 0.25 * null_score + 0.20 * dupe_score + 0.20 * outlier_score \
-             + 0.175 * country_score + 0.175 * date_score
-    return round(max(0.01, min(0.99, combined)), 4)
-def count_errors(current_df, meta: dict) -> int:
-    remaining_nulls = int(current_df.isnull().sum().sum())
-    remaining_dupes = len(current_df) - len(current_df.drop_duplicates())
-    pa = current_df["purchase_amount"].dropna()
-    remaining_outliers = int((pa > meta["q3"] + 3 * meta["iqr"]).sum())
-    remaining_country = int((~current_df["country"].isin(VALID_COUNTRIES) &
-                              current_df["country"].notna()).sum())
-    remaining_dates   = int((~current_df["signup_date"].apply(
-        lambda x: bool(DATE_RE.match(str(x))) if pd.notna(x) else False
-    )).sum())
-    return remaining_nulls + remaining_dupes + remaining_outliers + \
-           remaining_country + remaining_dates