python : Data Cleaning OpenEnv -- Baseline Inference At line:1 char:1 + python inference.py 2>&1 | Tee-Object -FilePath inference_log.txt + ~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (Data Cleaning O...eline Inference: String) [], RemoteException + FullyQualifiedErrorId : NativeCommandError Model : google/gemma-3-27b-it Env : http://localhost:8000 Health check: OK ============================================================ Running Task 1 ============================================================ [START] task=data-cleaning-task1 env=data-cleaning-openenv model=google/gemma-3-27b-it Step 1 | score=0.0100 | action={"operation":"fill_missing","column":"age","pa rams":{"strategy":"median"}} -> Filled 20 missing values in 'age' using median. [STEP] step=1 action={"operation":"fill_missing","column":"age","params":{"strategy":"median"}} reward=0.39 done=false error=null Step 2 | score=0.4000 | action={"operation":"fill_missing","column":"salary", "params":{"strategy":"median"}} -> Filled 20 missing values in 'salary' using median. [STEP] step=2 action={"operation":"fill_missing","column":"salary","params":{"strategy":"median"}} reward=0.40 done=false error=null Step 3 | score=0.8000 | action={"operation":"fill_missing","column":"departme nt","params":{"strategy":"mode"}} -> Filled 10 missing values in 'department' using mode. [STEP] step=3 action={"operation":"fill_missing","column":"department","params":{"strategy":"mode"}} reward=0.19 done=true error=null [END] task=data-cleaning-task1 score=0.9900 steps=3 Task 1 final score: 0.9900 (steps used: 3) ============================================================ Running Task 2 ============================================================ [START] task=data-cleaning-task2 env=data-cleaning-openenv model=google/gemma-3-27b-it Step 1 | score=0.0100 | action={"operation":"drop_duplicates"} -> Dropped 15 duplicate rows. [STEP] step=1 action={"operation":"drop_duplicates"} reward=0.34 done=false error=null Step 2 | score=0.3501 | action={"operation":"fix_format","column":"phone"} -> Fixed 74 phone numbers in 'phone' to NNN-NNN-NNNN format. [STEP] step=2 action={"operation":"fix_format","column":"phone"} reward=0.32 done=false error=null Step 3 | score=0.6739 | action={"operation":"fix_format","column":"listed_date"} -> Fixed 82 dates in 'listed_date' to YYYY-MM-DD format. [STEP] step=3 action={"operation":"fix_format","column":"listed_date"} reward=0.32 done=true error=null [END] task=data-cleaning-task2 score=0.9900 steps=3 Task 2 final score: 0.9900 (steps used: 3) ============================================================ Running Task 3 ============================================================ [START] task=data-cleaning-task3 env=data-cleaning-openenv model=google/gemma-3-27b-it Step 1 | score=0.0100 | action={"operation":"fill_missing","column":"age","pa rams":{"strategy":"median"}} -> Filled 52 missing values in 'age' using median. [STEP] step=1 action={"operation":"fill_missing","column":"age","params":{"strategy":"median"}} reward=0.07 done=false error=null Step 2 | score=0.0807 | action={"operation":"fill_missing","column":"purchase _amount","params":{"strategy":"mean"}} -> Filled 46 missing values in 'purchase_amount' using mean. [STEP] step=2 action={"operation":"fill_missing","column":"purchase_amount","params":{"strategy":"mean"}} reward=0.07 done=false error=null Step 3 | score=0.1522 | action={"operation":"fill_missing","column":"country" ,"params":{"strategy":"mode"}} -> Filled 32 missing values in 'country' using mode. [STEP] step=3 action={"operation":"fill_missing","column":"country","params":{"strategy":"mode"}} reward=0.05 done=false error=null Step 4 | score=0.2019 | action={"operation":"fill_missing","column":"signup_d ate","params":{"strategy":"median"}} -> Operation failed: Cannot convert [nan '2022-12-15' '2023-12-20' nan '2023-04-21' '2023-10-22' nan '2023-06-11' '2022-07-27' '2023-12-05' '2023-04-01' '2023-07-25' '2022-12-03' '2023-09-25' '24/02/2022' '2022-03-25' 'Jan 06 2023' '2023-06-19' '2023-08-14' '2022-11-18' '2023-04-20' '2023-02-08' 'Dec 03 2023' '2023-04-23' '01/11/2022' '2023-12-09' 'Mar 20 2022' nan '23/04/2022' '2022-07-30' '2023-08-23' '2023-06-20' '2023-12-10' '2023-07-02' '2023-04-11' '2022-09-10' '2022-01-04' '2022-04-01' '2022-08-06' '2022-01-30' '2023-01-06' '2023-07-22' '2023-12-05' '24/12/2022' '2023-10-01' '21/12/2023' '2023-09-20' '2022-12-06' [STEP] step=4 action={"operation":"fill_missing","column":"signup_date","params":{"strategy":"median"}} reward=0.01 done=false error=null '2022-08-26' '15/12/2023' '2022-06-26' '2022-10-28' '2022-02-24' '2023-08-03' '2023-04-20' nan '2022-06-12' '2023-02-10' nan '2023-08-09' '2022-07-14' '07/11/2023' 'Nov 21 2022' '2023-08-24' '2022-06-24' '2022-01-27' '2022-02-10' 'Sep 30 2022' '2023-11-27' 'Feb 05 2022' 'Sep 27 2022' 'Mar 21 2022' '2023-11-12' 'May 08 2023' '2023-07-11' '2023-06-05' '2023-04-25' '2023-07-19' '2022-01-06' '2023-09-24' 'Jun 25 2023' '24/06/2023' '15/10/2022' '2023-08-08' '2022-10-06' 'Feb 05 2022' 'Oct 25 2023' '21/06/2022' 'Mar 16 2022' '2023-03-30' '2022-06-20' '2023-09-19' nan '2022-01-04' '2023-11-26' '2023-01-11' '11/07/2022' '2023-05-10' '2023-01-23' '2022-01-22' nan '2022-10-20' 'May 26 2023' '2023-10-16' 'Mar 22 2023' nan '2022-07-05' '2022-06-23' '2022-10-30' '2023-09-15' '2023-01-18' '2022-09-12' '24/07/2023' nan '2023-10-21' '2022-08-07' 'Jun 03 2023' 'Mar 07 2023' nan '2022-10-17' nan '20/07/2022' '2023-02-06' 'Oct 10 2023' 'Sep 25 2023' nan '2023-05-11' '2022-03-03' '2023-05-08' '2022-09-07' '2022-11-25' '2023-06-08' 'Nov 07 2022' 'Aug 13 2023' '2023-01-25' '2023-12-31' '2023-04-11' '05/08/2022' '2023-07-20' '2022-10-25' '2023-01-25' 'Apr 10 2022' '29/03/2022' '2023-02-24' '17/05/2022' '2023-12-30' '2023-09-17' 'May 27 2023' '2023-06-21' '2023-03-11' 'Feb 14 2022' nan '28/12/2022' '31/10/2023' '2022-06-10' '2022-12-29' '2022-09-09' 'Apr 09 2022' '2022-03-17' '2022-09-24' '02/03/2022' '19/02/2022' '2022-09-14' '2022-05-28' 'Feb 10 2023' '2022-01-13' nan '2022-11-27' 'Dec 05 2023' '09/04/2023' nan nan '2022-11-25' '2022-11-03' '2023-10-10' '2023-12-02' '2023-12-04' nan nan nan nan '27/07/2022' 'Sep 11 2022' '2023-10-17' '2022-10-10' '2022-06-21' '2023-02-14' '2022-08-26' '2023-10-13' '2023-10-26' '2022-08-12' '2023-01-22' '20/12/2023' 'Jun 26 2023' '07/06/2022' '08/03/2023' '2023-03-29' '2023-04-23' '2022-01-19' '2022-08-07' '2023-09-16' '26/06/2022' '2022-08-07' nan nan '2022-04-24' '2022-09-11' 'Oct 01 2023' 'Jan 02 2022' '26/07/2022' 'Feb 14 2022' '2023-02-14' '2022-07-02' '2023-08-02' '2023-03-04' '27/07/2023' '2023-04-09' '2022-11-17' '21/09/2023' '2022-12-14' '2023-11-25' '2023-12-28' '2023-03-06' '08/05/2023' '2022-07-06' '18/08/2023' '2023-05-06' nan '2022-07-14' '2023-07-29' 'Dec 10 2022' '2022-05-16' '2022-05-05' '2023-02-15' '2022-08-22' '2022-03-16' 'Nov 08 2022' '2023-04-22' '2022-08-21' '2023-11-29' '2023-07-24' '10/01/2023' '2023-11-14' '2022-11-13' 'Apr 12 2022' 'Jan 27 2022' '2023-11-16' 'Dec 02 2023' '24/06/2023' '2022-03-17' '2022-08-03' '30/01/2022' nan '2022-06-29' 'Feb 22 2023' '2022-02-17' 'Dec 20 2023' '2022-11-29' '12/07/2022' '2023-01-12' '2023-07-31' '2022-08-17' 'Feb 04 2022' '2022-02-07' nan 'Mar 23 2022' '2023-04-09' '2022-10-08' '2023-07-22' '2022-02-14' '2023-10-18' '2023-05-25' 'Nov 24 2022' nan '15/12/2023' '2022-08-09' nan '2022-10-15' '2022-06-18' nan '2022-04-21' 'Jan 02 2022' '2022-07-31' '2022-03-23' '2022-10-30' '2023-09-22' '29/04/2022' 'Jan 01 2022' '2022-03-23' '06/01/2023' 'Jul 20 2022' '2022-12-24' 'Jan 22 2022' nan '2022-10-27' '2022-11-11' '2023-02-07' '2023-07-24' '2022-12-08' '2023-09-07' '2022-08-09' '2022-08-17' 'Oct 01 2023' nan '2023-03-11' '11/07/2022' '2022-03-25' 'May 26 2023' '2023-04-25' '2023-12-05' '2022-07-30' '2022-08-22' '2023-08-09' '18/08/2023' '2023-12-10' '2023-01-23' '20/07/2022' '2023-07-22' 'Feb 05 2022' '2022-10-20'] to numeric Step 5 | score=0.2019 | action={"operation":"fix_format","column":"signup_date"} -> Fixed 90 dates in 'signup_date' to YYYY-MM-DD format. [STEP] step=5 action={"operation":"fix_format","column":"signup_date"} reward=0.13 done=false error=null Step 6 | score=0.3320 | action={"operation":"fix_format","column":"country"} -> Fixed 112 country values to correct capitalisation. [STEP] step=6 action={"operation":"fix_format","column":"country"} reward=0.17 done=false error=null Step 7 | score=0.5070 | action={"operation":"drop_duplicates"} -> Dropped 20 duplicate rows. [STEP] step=7 action={"operation":"drop_duplicates"} reward=0.20 done=false error=null Step 8 | score=0.7100 | action={"operation":"drop_outliers","column":"purchase_amount"} -> Removed 5 outlier rows from 'purchase_amount' using IQR method. [STEP] step=8 action={"operation":"drop_outliers","column":"purchase_amount"} reward=0.20 done=false error=null Step 9 | score=0.9100 | action={"operation":"fill_missing","column":"signup_d ate","params":{"strategy":"median"}} -> Operation failed: Cannot convert [nan '2022-12-15' '2023-12-20' nan '2023-04-21' '2023-10-22' nan '2023-06-11' '2022-07-27' '2023-12-05' '2023-04-01' '2023-07-25' '2022-12-03' '2023-09-25' '2022-02-24' '2022-03-25' '2023-01-06' '2023-06-19' '2023-08-14' '2022-11-18' '2023-04-20' '2023-02-08' '2023-12-03' '2023-04-23' '2022-11-01' '2023-12-09' '2022-03-20' nan '2022-04-23' '2022-07-30' '2023-08-23' '2023-06-20' '2023-12-10' '2023-07-02' '2023-04-11' '2022-09-10' '2022-01-04' '2022-04-01' '2022-01-30' '2023-01-06' '2023-07-22' '2023-12-05' '2022-12-24' '2023-10-01' '2023-12-21' '2023-09-20' '2022-12-06' '2022-08-26' '2023-12-15' '2022-06-26' '2022-10-28' '2022-02-24' '2023-08-03' '2023-04-20' nan '2022-06-12' '2023-02-10' nan '2023-08-09' '2022-07-14' '2023-11-07' '2022-11-21' '2023-08-24' '2022-06-24' '2022-01-27' '2022-02-10' '2022-09-30' '2023-11-27' '2022-02-05' '2022-09-27' '2022-03-21' '2023-11-12' '2023-05-08' '2023-07-11' '2023-06-05' '2023-04-25' '2022-01-06' '2023-09-24' '2023-06-25' '2023-06-24' '2022-10-15' '2023-08-08' '2022-10-06' '2022-02-05' '2023-10-25' '2022-06-21' '2022-03-16' '2023-03-30' '2022-06-20' '2023-09-19' nan '2022-01-04' '2023-11-26' '2023-01-11' '2022-07-11' '2023-05-10' '2023-01-23' '2022-01-22' nan '2022-10-20' '2023-05-26' '2023-10-16' '2023-03-22' nan '2022-07-05' '2022-06-23' '2022-10-30' '2023-09-15' '2023-01-18' '2022-09-12' '2023-07-24' nan '2023-10-21' '2022-08-07' '2023-06-03' '2023-03-07' nan '2022-10-17' nan '2022-07-20' '2023-02-06' '2023-10-10' '2023-09-25' nan '2023-05-11' '2022-03-03' '2023-05-08' '2022-09-07' '2022-11-25' '2023-06-08' '2022-11-07' '2023-08-13' '2023-01-25' '2023-04-11' '2022-08-05' '2023-07-20' '2022-10-25' '2022-04-10' '2022-03-29' '2023-02-24' '2022-05-17' '2023-12-30' '2023-09-17' '2023-05-27' '2023-06-21' '2023-03-11' '2022-02-14' nan '2022-12-28' '2022-06-10' '2022-12-29' '2022-09-09' '2022-04-09' '2022-03-17' '2022-09-24' '2022-03-02' '2022-02-19' '2022-09-14' '2022-05-28' '2023-02-10' '2022-01-13' nan '2022-11-27' '2023-12-05' '2023-04-09' nan nan '2022-11-25' '2022-11-03' '2023-10-10' '2023-12-02' '2023-12-04' nan nan nan nan '2022-07-27' '2022-09-11' '2023-10-17' '2022-10-10' '2022-06-21' '2023-02-14' '2022-08-26' '2023-10-13' '2023-10-26' '2022-08-12' '2023-01-22' '2023-12-20' '2023-06-26' '2022-06-07' '2023-03-08' '2023-03-29' '2023-04-23' '2022-01-19' '2022-08-07' '2023-09-16' '2022-06-26' '2022-08-07' nan nan '2022-04-24' '2022-09-11' '2023-10-01' '2022-01-02' '2022-07-26' '2022-02-14' '2023-02-14' '2022-07-02' '2023-08-02' '2023-03-04' '2023-07-27' '2023-04-09' '2022-11-17' '2023-09-21' '2022-12-14' '2023-11-25' '2023-12-28' '2023-03-06' '2023-05-08' '2022-07-06' '2023-08-18' '2023-05-06' nan '2022-07-14' '2023-07-29' '2022-12-10' '2022-05-16' '2022-05-05' '2023-02-15' '2022-08-22' '2022-03-16' '2022-11-08' '2023-04-22' '2022-08-21' '2023-11-29' '2023-07-24' '2023-01-10' '2023-11-14' '2022-11-13' '2022-04-12' '2022-01-27' '2023-11-16' '2023-12-02' '2023-06-24' '2022-03-17' '2022-08-03' '2022-01-30' nan '2022-06-29' '2023-02-22' '2022-02-17' '2023-12-20' '2022-11-29' '2022-07-12' '2023-01-12' '2023-07-31' '2022-08-17' '2022-02-04' '2022-02-07' nan '2022-03-23' '2023-04-09' '2022-10-08' '2023-07-22' '2022-02-14' '2023-10-18' '2023-05-25' '2022-11-24' nan '2023-12-15' '2022-08-09' nan '2022-10-15' '2022-06-18' nan '2022-04-21' '2022-01-02' '2022-07-31' '2022-03-23' '2022-10-30' '2023-09-22' '2022-04-29' '2022-01-01' '2022-03-23' '2023-01-06' '2022-07-20' '2022-12-24' '2022-01-22' nan '2022-10-27' '2022-11-11' '2023-02-07' '2023-07-24' '2022-12-08' '2023-09-07'] to numeric [STEP] step=9 action={"operation":"fill_missing","column":"signup_date","params":{"strategy":"median"}} reward=0.01 done=false error=null Step 10 | score=0.9100 | action={"operation":"fill_missing","column":"signup_d ate","params":{"strategy":"mode"}} -> Filled 30 missing values in 'signup_date' using mode. [STEP] step=10 action={"operation":"fill_missing","column":"signup_date","params":{"strategy":"mode"}} reward=0.08 done=true error=null [END] task=data-cleaning-task3 score=0.9900 steps=10 Task 3 final score: 0.9900 (steps used: 10) ============================================================ BASELINE RESULTS ============================================================ task1: 0.9900 task2: 0.9900 task3: 0.9900 average: 0.9900 ============================================================ Scores written to baseline_scores.json