medusa_env / openenv.yaml
rampluto's picture
Upload folder using huggingface_hub
0d004af verified
spec_version: 1
name: medusa_env
type: space
runtime: fastapi
app: server.app:app
port: 8000
tasks:
- id: clean_pipeline
name: Clean Pipeline
difficulty: easy
seed: 0
description: >
Both sources are fresh. Join keys are clean and unique. The agent must
verify freshness, prepare keys, join, apply SCD, and commit without
triggering a row explosion.
success_criteria:
- COMMIT issued (episode finalized)
- No Cartesian explosion detected
- Silver row count <= Source A row count
- match_rate > 0.80 after join
scoring_rubric:
committed: 0.20
no_explosion: 0.25
volume_ok: 0.20
high_match: 0.20
grader_pass: 0.15
grader:
type: weighted_rubric
source: tasks.score_episode
score_range: [0.0, 1.0]
pass_threshold: 0.55
criteria:
- id: committed
weight: 0.20
- id: no_explosion
weight: 0.25
- id: volume_ok
weight: 0.20
- id: high_match
weight: 0.20
- id: grader_pass
weight: 0.15
- id: dirty_integration
name: Dirty Key Integration
difficulty: medium
seed: 1
description: >
Source A has NULLs and whitespace in join keys. Source B has duplicate
keys that can cause row explosion. The agent must PREP_KEYS and
DEDUPLICATE before joining, and correctly quarantine unresolvable
orphans.
success_criteria:
- PREP_KEYS_A issued before EXECUTE_JOIN
- PREP_KEYS_B issued before EXECUTE_JOIN
- DEDUPLICATE_B issued before EXECUTE_JOIN
- No row explosion
- Quarantine integrity check passes
scoring_rubric:
committed: 0.10
prepped_before_join: 0.20
deduped_before_join: 0.20
no_explosion: 0.25
integrity_ok: 0.15
grader_pass: 0.10
grader:
type: weighted_rubric
source: tasks.score_episode
score_range: [0.0, 1.0]
pass_threshold: 0.55
criteria:
- id: committed
weight: 0.10
- id: prepped_before_join
weight: 0.20
- id: deduped_before_join
weight: 0.20
- id: no_explosion
weight: 0.25
- id: integrity_ok
weight: 0.15
- id: grader_pass
weight: 0.10
- id: full_medallion
name: Full Medallion Integration
difficulty: hard
seed: 2
description: >
Source A is stale (>6h old). Source B has new schema columns not
registered in Silver. The agent must check freshness, evolve the schema,
clean keys, deduplicate, execute a left join, apply SCD-2 for tracked
columns, and pass all grader checks.
success_criteria:
- SYNC_CHECK issued before any join
- EVOLVE_SCHEMA issued before COMMIT
- SCD-2 applied (not SCD-1) for tracked column
- Silver schema contains new columns from drift
- All 4 grader checks pass
scoring_rubric:
committed: 0.05
sync_checked: 0.15
schema_evolved: 0.15
used_scd2: 0.20
schema_ok: 0.20
grader_pass: 0.25
grader:
type: weighted_rubric
source: tasks.score_episode
score_range: [0.0, 1.0]
pass_threshold: 0.55
criteria:
- id: committed
weight: 0.05
- id: sync_checked
weight: 0.15
- id: schema_evolved
weight: 0.15
- id: used_scd2
weight: 0.20
- id: schema_ok
weight: 0.20
- id: grader_pass
weight: 0.25
- id: schema_bootstrap
name: Schema Bootstrap
difficulty: easy
seed: 3
description: >
Fresh sources arrive with new columns in both Bronze tables. The agent
must evolve the Silver schema, execute a clean join, land a non-empty
Silver table, and commit without row explosion.
success_criteria:
- EVOLVE_SCHEMA issued before COMMIT
- No row explosion
- Silver contains the joined columns after drift
- Silver table is non-empty
scoring_rubric:
committed: 0.15
no_explosion: 0.20
schema_evolved: 0.25
schema_materialized: 0.20
silver_built: 0.20
grader:
type: weighted_rubric
source: tasks.score_episode
score_range: [0.0, 1.0]
pass_threshold: 0.55
criteria:
- id: committed
weight: 0.15
- id: no_explosion
weight: 0.20
- id: schema_evolved
weight: 0.25
- id: schema_materialized
weight: 0.20
- id: silver_built
weight: 0.20
- id: dedup_guardrail
name: Dedup Guardrail
difficulty: medium
seed: 4
description: >
Dirty join keys and duplicate Dimension rows increase the risk of row
explosion. The agent must prep keys, deduplicate Source B, produce a
non-empty Silver table, and commit cleanly.
success_criteria:
- PREP_KEYS_A and PREP_KEYS_B issued before join
- DEDUPLICATE_B issued before join
- No row explosion
- Silver table is non-empty
- Grader passes
scoring_rubric:
committed: 0.10
prepped_before_join: 0.15
deduped_before_join: 0.25
no_explosion: 0.25
silver_built: 0.10
grader_pass: 0.15
grader:
type: weighted_rubric
source: tasks.score_episode
score_range: [0.0, 1.0]
pass_threshold: 0.55
criteria:
- id: committed
weight: 0.10
- id: prepped_before_join
weight: 0.15
- id: deduped_before_join
weight: 0.25
- id: no_explosion
weight: 0.25
- id: silver_built
weight: 0.10
- id: grader_pass
weight: 0.15
- id: stale_sync_recovery
name: Stale Sync Recovery
difficulty: hard
seed: 5
description: >
Source A is stale and the pipeline must not proceed blindly. The agent
must verify freshness, recover a high-match join, build Silver, and
still pass the final audit.
success_criteria:
- SYNC_CHECK issued before any join
- No row explosion
- match_rate > 0.80 after join
- Silver table is non-empty
- Grader passes
scoring_rubric:
committed: 0.05
sync_checked: 0.30
no_explosion: 0.20
high_match: 0.15
silver_built: 0.15
grader_pass: 0.15
grader:
type: weighted_rubric
source: tasks.score_episode
score_range: [0.0, 1.0]
pass_threshold: 0.55
criteria:
- id: committed
weight: 0.05
- id: sync_checked
weight: 0.30
- id: no_explosion
weight: 0.20
- id: high_match
weight: 0.15
- id: silver_built
weight: 0.15
- id: grader_pass
weight: 0.15
- id: fresh_join_baseline
name: Fresh Join Baseline
difficulty: easy
seed: 6
description: >
A clean baseline task that rewards a simple, efficient Bronze-to-Silver
run. The agent should avoid unnecessary actions while producing a
high-match, non-exploding join and a usable Silver table.
success_criteria:
- COMMIT issued
- No row explosion
- match_rate > 0.80 after join
- Silver table is non-empty
- Episode completed efficiently
scoring_rubric:
committed: 0.15
no_explosion: 0.25
high_match: 0.25
silver_built: 0.20
efficient_run: 0.15
grader:
type: weighted_rubric
source: tasks.score_episode
score_range: [0.0, 1.0]
pass_threshold: 0.55
criteria:
- id: committed
weight: 0.15
- id: no_explosion
weight: 0.25
- id: high_match
weight: 0.25
- id: silver_built
weight: 0.20
- id: efficient_run
weight: 0.15
- id: stale_history_guard
name: Stale History Guard
difficulty: hard
seed: 7
description: >
A stale-source episode where the agent must both verify freshness and
preserve historical correctness. The task emphasizes SCD-2 usage and
proper history columns in Silver.
success_criteria:
- SYNC_CHECK issued before any join
- SCD-2 used instead of SCD-1
- Silver table is non-empty
- Silver contains history columns
- Grader passes
scoring_rubric:
committed: 0.05
sync_checked: 0.20
used_scd2: 0.25
silver_built: 0.15
history_columns: 0.15
grader_pass: 0.20
grader:
type: weighted_rubric
source: tasks.score_episode
score_range: [0.0, 1.0]
pass_threshold: 0.55
criteria:
- id: committed
weight: 0.05
- id: sync_checked
weight: 0.20
- id: used_scd2
weight: 0.25
- id: silver_built
weight: 0.15
- id: history_columns
weight: 0.15
- id: grader_pass
weight: 0.20
- id: orphan_quarantine
name: Orphan Quarantine
difficulty: medium
seed: 8
description: >
Dirty keys create unmatched Fact rows that should not be silently
dropped. The agent must prep keys, choose a left join, preserve a
meaningful quarantine set, and keep audit integrity intact.
success_criteria:
- PREP_KEYS_A and PREP_KEYS_B issued before join
- Left join used
- Quarantine contains rows
- No row explosion
- Integrity checks pass
scoring_rubric:
committed: 0.10
prepped_before_join: 0.15
left_join_used: 0.20
quarantine_nonempty: 0.20
integrity_ok: 0.20
no_explosion: 0.15
grader:
type: weighted_rubric
source: tasks.score_episode
score_range: [0.0, 1.0]
pass_threshold: 0.55
criteria:
- id: committed
weight: 0.10
- id: prepped_before_join
weight: 0.15
- id: left_join_used
weight: 0.20
- id: quarantine_nonempty
weight: 0.20
- id: integrity_ok
weight: 0.20
- id: no_explosion
weight: 0.15
- id: drift_alignment
name: Drift Alignment
difficulty: medium
seed: 9
description: >
Schema drift introduces new columns, but the pipeline is otherwise
clean. The agent must evolve the schema, use the audited left-join
path, materialize the new shape in Silver, and commit successfully.
success_criteria:
- EVOLVE_SCHEMA issued before COMMIT
- Left join used
- Silver contains the joined columns after drift
- Silver table is non-empty
- Grader passes
scoring_rubric:
committed: 0.10
schema_evolved: 0.25
left_join_used: 0.15
schema_materialized: 0.25
silver_built: 0.10
grader_pass: 0.15
grader:
type: weighted_rubric
source: tasks.score_episode
score_range: [0.0, 1.0]
pass_threshold: 0.55
criteria:
- id: committed
weight: 0.10
- id: schema_evolved
weight: 0.25
- id: left_join_used
weight: 0.15
- id: schema_materialized
weight: 0.25
- id: silver_built
weight: 0.10
- id: grader_pass
weight: 0.15
- id: snapshot_upsert
name: Snapshot Upsert
difficulty: easy
seed: 10
description: >
A clean snapshot-style load where SCD-1 is sufficient. The agent should
choose overwrite semantics, maintain safe volume, and land a non-empty
Silver table without introducing join problems.
success_criteria:
- SCD-1 used instead of SCD-2
- No row explosion
- Silver row count <= Source A row count
- Silver table is non-empty
- Grader passes
scoring_rubric:
committed: 0.10
no_explosion: 0.20
used_scd1: 0.25
volume_ok: 0.20
silver_built: 0.15
grader_pass: 0.10
grader:
type: weighted_rubric
source: tasks.score_episode
score_range: [0.0, 1.0]
pass_threshold: 0.55
criteria:
- id: committed
weight: 0.10
- id: no_explosion
weight: 0.20
- id: used_scd1
weight: 0.25
- id: volume_ok
weight: 0.20
- id: silver_built
weight: 0.15
- id: grader_pass
weight: 0.10
- id: schema_history_guard
name: Schema History Guard
difficulty: hard
seed: 11
description: >
Schema drift and historical tracking requirements arrive together. The
agent must evolve schema, materialize the merged columns in Silver, use
SCD-2, and preserve history metadata through commit.
success_criteria:
- EVOLVE_SCHEMA issued before COMMIT
- SCD-2 used instead of SCD-1
- Silver contains the joined columns after drift
- Silver contains history columns
- Grader passes
scoring_rubric:
committed: 0.05
schema_evolved: 0.20
used_scd2: 0.20
schema_materialized: 0.20
history_columns: 0.15
grader_pass: 0.20
grader:
type: weighted_rubric
source: tasks.score_episode
score_range: [0.0, 1.0]
pass_threshold: 0.55
criteria:
- id: committed
weight: 0.05
- id: schema_evolved
weight: 0.20
- id: used_scd2
weight: 0.20
- id: schema_materialized
weight: 0.20
- id: history_columns
weight: 0.15
- id: grader_pass
weight: 0.20