Spaces:
Sleeping
Sleeping
| name: data-cleaning-env | |
| version: "0.1.0" | |
| description: > | |
| A real-world data cleaning environment where an AI agent fixes missing | |
| values, duplicate rows, format inconsistencies, outliers, and dtype errors | |
| across three progressively harder tasks. | |
| author: openenv-hackathon | |
| tags: | |
| - openenv | |
| - data-cleaning | |
| - rl | |
| - real-world | |
| tasks: | |
| - id: task1 | |
| name: "Fill Missing Values" | |
| difficulty: easy | |
| max_steps: 20 | |
| description: > | |
| Fill all NaN values in an employee records dataset. | |
| Columns with missing data: age, salary, department. | |
| - id: task2 | |
| name: "Fix Formats and Remove Duplicates" | |
| difficulty: medium | |
| max_steps: 30 | |
| description: > | |
| Standardise phone numbers (NNN-NNN-NNNN) and dates (YYYY-MM-DD) | |
| in a product catalog, and remove ~15 duplicate rows. | |
| - id: task3 | |
| name: "Full Cleaning Pipeline" | |
| difficulty: hard | |
| max_steps: 40 | |
| description: > | |
| End-to-end pipeline on a customer database: fill missing values, | |
| remove duplicates, drop outliers in purchase_amount, standardise | |
| country capitalisation, and fix mixed date formats. | |
| api: | |
| health: GET /health | |
| reset: POST /reset | |
| step: POST /step | |
| state: POST /state | |
| docs: GET /docs | |
| reward: | |
| range: [0.001, 0.999] | |
| partial: true | |
| terminal_bonus: 0.0 | |
| observation_space: | |
| type: object | |
| fields: | |
| done: boolean | |
| reward: float | |
| data_preview: string # First 10 rows as CSV | |
| data_shape: list # [rows, cols] | |
| missing_counts: object # {column: count} | |
| duplicate_count: integer | |
| dtype_issues: object # {column: issue_description} | |
| task_description: string | |
| message: string | |
| step_count: integer | |
| current_score: float # 0.0–1.0 | |
| action_space: | |
| type: object | |
| fields: | |
| operation: string # fill_missing | drop_duplicates | fix_format | replace_value | drop_outliers | fix_dtype | |
| column: string # optional depending on operation | |
| params: object # optional operation parameters | |