| {"role_target": "drift_generator", "task_id": "albert_qa", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ALBERT-tiny extractive QA on 100-sample SQuAD subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForQuestionAnswering,\n Trainer,\n TrainingArguments,\n DefaultDataCollator,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"squad\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"albert-base-v2\")\n\n\ndef prepare(examples):\n enc = tokenizer(\n examples[\"question\"],\n examples[\"context\"],\n max_length=128,\n truncation=\"only_second\",\n padding=\"max_length\",\n return_offsets_mapping=True,\n )\n start_positions, end_positions = [], []\n for i, offsets in enumerate(enc[\"offset_mapping\"]):\n answer = examples[\"answers\"][i]\n start_char = answer[\"answer_start\"][0]\n end_char = start_char + len(answer[\"text\"][0])\n\n token_start = next(\n (idx for idx, (a, b) in enumerate(offsets) if a <= start_char < b), 0\n )\n token_end = next(\n (idx for idx, (a, b) in enumerate(offsets) if a < end_char <= b), token_start\n )\n start_positions.append(token_start)\n end_positions.append(token_end)\n\n enc[\"start_positions\"] = start_positions\n enc[\"end_positions\"] = end_positions\n enc.pop(\"offset_mapping\")\n return enc\n\n\ndataset = dataset.map(prepare, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"albert-base-v2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DefaultDataCollator(),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "albert_qa", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ALBERT-tiny extractive QA on 100-sample SQuAD subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForQuestionAnswering,\n Trainer,\n TrainingArguments,\n DefaultDataCollator,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"squad\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"albert-base-v2\")\n\n\ndef prepare(examples):\n enc = tokenizer(\n examples[\"question\"],\n examples[\"context\"],\n max_length=128,\n truncation=\"only_second\",\n padding=\"max_length\",\n return_offsets_mapping=True,\n )\n start_positions, end_positions = [], []\n for i, offsets in enumerate(enc[\"offset_mapping\"]):\n answer = examples[\"answers\"][i]\n start_char = answer[\"answer_start\"][0]\n end_char = start_char + len(answer[\"text\"][0])\n\n token_start = next(\n (idx for idx, (a, b) in enumerate(offsets) if a <= start_char < b), 0\n )\n token_end = next(\n (idx for idx, (a, b) in enumerate(offsets) if a < end_char <= b), token_start\n )\n start_positions.append(token_start)\n end_positions.append(token_end)\n\n enc[\"start_positions\"] = start_positions\n enc[\"end_positions\"] = end_positions\n enc.pop(\"offset_mapping\")\n return enc\n\n\ndataset = dataset.map(prepare, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"albert-base-v2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DefaultDataCollator(),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"from_pretrained\",\n \"new_name\": \"load_from_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "albert_qa", "primitive_type": "ChangeArgumentSignature", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeArgumentSignature\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ALBERT-tiny extractive QA on 100-sample SQuAD subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForQuestionAnswering,\n Trainer,\n TrainingArguments,\n DefaultDataCollator,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"squad\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"albert-base-v2\")\n\n\ndef prepare(examples):\n enc = tokenizer(\n examples[\"question\"],\n examples[\"context\"],\n max_length=128,\n truncation=\"only_second\",\n padding=\"max_length\",\n return_offsets_mapping=True,\n )\n start_positions, end_positions = [], []\n for i, offsets in enumerate(enc[\"offset_mapping\"]):\n answer = examples[\"answers\"][i]\n start_char = answer[\"answer_start\"][0]\n end_char = start_char + len(answer[\"text\"][0])\n\n token_start = next(\n (idx for idx, (a, b) in enumerate(offsets) if a <= start_char < b), 0\n )\n token_end = next(\n (idx for idx, (a, b) in enumerate(offsets) if a < end_char <= b), token_start\n )\n start_positions.append(token_start)\n end_positions.append(token_end)\n\n enc[\"start_positions\"] = start_positions\n enc[\"end_positions\"] = end_positions\n enc.pop(\"offset_mapping\")\n return enc\n\n\ndataset = dataset.map(prepare, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"albert-base-v2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DefaultDataCollator(),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeArgumentSignature\",\n \"params\": {\n \"function_name\": \"TrainingArguments\",\n \"removed_arg\": \"num_train_epochs\",\n \"added_arg\": \"max_steps\",\n \"added_value\": \"1000\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "albert_qa", "primitive_type": "ModifyConfigField", "category": "config_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ModifyConfigField\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ALBERT-tiny extractive QA on 100-sample SQuAD subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForQuestionAnswering,\n Trainer,\n TrainingArguments,\n DefaultDataCollator,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"squad\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"albert-base-v2\")\n\n\ndef prepare(examples):\n enc = tokenizer(\n examples[\"question\"],\n examples[\"context\"],\n max_length=128,\n truncation=\"only_second\",\n padding=\"max_length\",\n return_offsets_mapping=True,\n )\n start_positions, end_positions = [], []\n for i, offsets in enumerate(enc[\"offset_mapping\"]):\n answer = examples[\"answers\"][i]\n start_char = answer[\"answer_start\"][0]\n end_char = start_char + len(answer[\"text\"][0])\n\n token_start = next(\n (idx for idx, (a, b) in enumerate(offsets) if a <= start_char < b), 0\n )\n token_end = next(\n (idx for idx, (a, b) in enumerate(offsets) if a < end_char <= b), token_start\n )\n start_positions.append(token_start)\n end_positions.append(token_end)\n\n enc[\"start_positions\"] = start_positions\n enc[\"end_positions\"] = end_positions\n enc.pop(\"offset_mapping\")\n return enc\n\n\ndataset = dataset.map(prepare, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"albert-base-v2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DefaultDataCollator(),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ModifyConfigField\",\n \"params\": {\n \"config_class\": \"TrainingArguments\",\n \"field_name\": \"per_device_train_batch_size\",\n \"new_value\": \"1\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "albert_qa", "primitive_type": "RestructureDatasetSchema", "category": "dataset_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RestructureDatasetSchema\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ALBERT-tiny extractive QA on 100-sample SQuAD subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForQuestionAnswering,\n Trainer,\n TrainingArguments,\n DefaultDataCollator,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"squad\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"albert-base-v2\")\n\n\ndef prepare(examples):\n enc = tokenizer(\n examples[\"question\"],\n examples[\"context\"],\n max_length=128,\n truncation=\"only_second\",\n padding=\"max_length\",\n return_offsets_mapping=True,\n )\n start_positions, end_positions = [], []\n for i, offsets in enumerate(enc[\"offset_mapping\"]):\n answer = examples[\"answers\"][i]\n start_char = answer[\"answer_start\"][0]\n end_char = start_char + len(answer[\"text\"][0])\n\n token_start = next(\n (idx for idx, (a, b) in enumerate(offsets) if a <= start_char < b), 0\n )\n token_end = next(\n (idx for idx, (a, b) in enumerate(offsets) if a < end_char <= b), token_start\n )\n start_positions.append(token_start)\n end_positions.append(token_end)\n\n enc[\"start_positions\"] = start_positions\n enc[\"end_positions\"] = end_positions\n enc.pop(\"offset_mapping\")\n return enc\n\n\ndataset = dataset.map(prepare, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"albert-base-v2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DefaultDataCollator(),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RestructureDatasetSchema\",\n \"params\": {\n \"old_column\": \"text\",\n \"new_column\": \"input_text\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "albert_qa", "primitive_type": "RemoveDeprecatedMethod", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RemoveDeprecatedMethod\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ALBERT-tiny extractive QA on 100-sample SQuAD subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForQuestionAnswering,\n Trainer,\n TrainingArguments,\n DefaultDataCollator,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"squad\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"albert-base-v2\")\n\n\ndef prepare(examples):\n enc = tokenizer(\n examples[\"question\"],\n examples[\"context\"],\n max_length=128,\n truncation=\"only_second\",\n padding=\"max_length\",\n return_offsets_mapping=True,\n )\n start_positions, end_positions = [], []\n for i, offsets in enumerate(enc[\"offset_mapping\"]):\n answer = examples[\"answers\"][i]\n start_char = answer[\"answer_start\"][0]\n end_char = start_char + len(answer[\"text\"][0])\n\n token_start = next(\n (idx for idx, (a, b) in enumerate(offsets) if a <= start_char < b), 0\n )\n token_end = next(\n (idx for idx, (a, b) in enumerate(offsets) if a < end_char <= b), token_start\n )\n start_positions.append(token_start)\n end_positions.append(token_end)\n\n enc[\"start_positions\"] = start_positions\n enc[\"end_positions\"] = end_positions\n enc.pop(\"offset_mapping\")\n return enc\n\n\ndataset = dataset.map(prepare, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"albert-base-v2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DefaultDataCollator(),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RemoveDeprecatedMethod\",\n \"params\": {\n \"class_name\": \"Trainer\",\n \"method_name\": \"save_model\",\n \"replacement\": \"save_to_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "bert_ner", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Bert tiny NER fine-tuning on a 200-sample CoNLL-2003 subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForTokenClassification,\n Trainer,\n TrainingArguments,\n DataCollatorForTokenClassification,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"conll2003\", split=\"train[:200]\")\ntokenizer = AutoTokenizer.from_pretrained(\"prajjwal1/bert-tiny\")\n\n\ndef tokenize_and_align(example):\n enc = tokenizer(example[\"tokens\"], is_split_into_words=True, truncation=True, max_length=64)\n word_ids = enc.word_ids()\n labels = []\n prev_id = None\n for wid in word_ids:\n if wid is None:\n labels.append(-100)\n elif wid != prev_id:\n labels.append(example[\"ner_tags\"][wid])\n else:\n labels.append(-100)\n prev_id = wid\n enc[\"labels\"] = labels\n return enc\n\n\ndataset = dataset.map(tokenize_and_align, remove_columns=dataset.column_names)\n\nmodel = AutoModelForTokenClassification.from_pretrained(\"prajjwal1/bert-tiny\", num_labels=9)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForTokenClassification(tokenizer),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "bert_ner", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Bert tiny NER fine-tuning on a 200-sample CoNLL-2003 subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForTokenClassification,\n Trainer,\n TrainingArguments,\n DataCollatorForTokenClassification,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"conll2003\", split=\"train[:200]\")\ntokenizer = AutoTokenizer.from_pretrained(\"prajjwal1/bert-tiny\")\n\n\ndef tokenize_and_align(example):\n enc = tokenizer(example[\"tokens\"], is_split_into_words=True, truncation=True, max_length=64)\n word_ids = enc.word_ids()\n labels = []\n prev_id = None\n for wid in word_ids:\n if wid is None:\n labels.append(-100)\n elif wid != prev_id:\n labels.append(example[\"ner_tags\"][wid])\n else:\n labels.append(-100)\n prev_id = wid\n enc[\"labels\"] = labels\n return enc\n\n\ndataset = dataset.map(tokenize_and_align, remove_columns=dataset.column_names)\n\nmodel = AutoModelForTokenClassification.from_pretrained(\"prajjwal1/bert-tiny\", num_labels=9)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForTokenClassification(tokenizer),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"from_pretrained\",\n \"new_name\": \"load_from_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "bert_ner", "primitive_type": "ChangeArgumentSignature", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeArgumentSignature\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Bert tiny NER fine-tuning on a 200-sample CoNLL-2003 subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForTokenClassification,\n Trainer,\n TrainingArguments,\n DataCollatorForTokenClassification,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"conll2003\", split=\"train[:200]\")\ntokenizer = AutoTokenizer.from_pretrained(\"prajjwal1/bert-tiny\")\n\n\ndef tokenize_and_align(example):\n enc = tokenizer(example[\"tokens\"], is_split_into_words=True, truncation=True, max_length=64)\n word_ids = enc.word_ids()\n labels = []\n prev_id = None\n for wid in word_ids:\n if wid is None:\n labels.append(-100)\n elif wid != prev_id:\n labels.append(example[\"ner_tags\"][wid])\n else:\n labels.append(-100)\n prev_id = wid\n enc[\"labels\"] = labels\n return enc\n\n\ndataset = dataset.map(tokenize_and_align, remove_columns=dataset.column_names)\n\nmodel = AutoModelForTokenClassification.from_pretrained(\"prajjwal1/bert-tiny\", num_labels=9)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForTokenClassification(tokenizer),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeArgumentSignature\",\n \"params\": {\n \"function_name\": \"TrainingArguments\",\n \"removed_arg\": \"num_train_epochs\",\n \"added_arg\": \"max_steps\",\n \"added_value\": \"1000\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "bert_ner", "primitive_type": "ModifyConfigField", "category": "config_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ModifyConfigField\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Bert tiny NER fine-tuning on a 200-sample CoNLL-2003 subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForTokenClassification,\n Trainer,\n TrainingArguments,\n DataCollatorForTokenClassification,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"conll2003\", split=\"train[:200]\")\ntokenizer = AutoTokenizer.from_pretrained(\"prajjwal1/bert-tiny\")\n\n\ndef tokenize_and_align(example):\n enc = tokenizer(example[\"tokens\"], is_split_into_words=True, truncation=True, max_length=64)\n word_ids = enc.word_ids()\n labels = []\n prev_id = None\n for wid in word_ids:\n if wid is None:\n labels.append(-100)\n elif wid != prev_id:\n labels.append(example[\"ner_tags\"][wid])\n else:\n labels.append(-100)\n prev_id = wid\n enc[\"labels\"] = labels\n return enc\n\n\ndataset = dataset.map(tokenize_and_align, remove_columns=dataset.column_names)\n\nmodel = AutoModelForTokenClassification.from_pretrained(\"prajjwal1/bert-tiny\", num_labels=9)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForTokenClassification(tokenizer),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ModifyConfigField\",\n \"params\": {\n \"config_class\": \"TrainingArguments\",\n \"field_name\": \"per_device_train_batch_size\",\n \"new_value\": \"1\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "bert_ner", "primitive_type": "RestructureDatasetSchema", "category": "dataset_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RestructureDatasetSchema\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Bert tiny NER fine-tuning on a 200-sample CoNLL-2003 subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForTokenClassification,\n Trainer,\n TrainingArguments,\n DataCollatorForTokenClassification,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"conll2003\", split=\"train[:200]\")\ntokenizer = AutoTokenizer.from_pretrained(\"prajjwal1/bert-tiny\")\n\n\ndef tokenize_and_align(example):\n enc = tokenizer(example[\"tokens\"], is_split_into_words=True, truncation=True, max_length=64)\n word_ids = enc.word_ids()\n labels = []\n prev_id = None\n for wid in word_ids:\n if wid is None:\n labels.append(-100)\n elif wid != prev_id:\n labels.append(example[\"ner_tags\"][wid])\n else:\n labels.append(-100)\n prev_id = wid\n enc[\"labels\"] = labels\n return enc\n\n\ndataset = dataset.map(tokenize_and_align, remove_columns=dataset.column_names)\n\nmodel = AutoModelForTokenClassification.from_pretrained(\"prajjwal1/bert-tiny\", num_labels=9)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForTokenClassification(tokenizer),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RestructureDatasetSchema\",\n \"params\": {\n \"old_column\": \"tokens\",\n \"new_column\": \"words\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "bert_ner", "primitive_type": "ChangeTokenizerBehavior", "category": "tokenizer_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeTokenizerBehavior\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Bert tiny NER fine-tuning on a 200-sample CoNLL-2003 subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForTokenClassification,\n Trainer,\n TrainingArguments,\n DataCollatorForTokenClassification,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"conll2003\", split=\"train[:200]\")\ntokenizer = AutoTokenizer.from_pretrained(\"prajjwal1/bert-tiny\")\n\n\ndef tokenize_and_align(example):\n enc = tokenizer(example[\"tokens\"], is_split_into_words=True, truncation=True, max_length=64)\n word_ids = enc.word_ids()\n labels = []\n prev_id = None\n for wid in word_ids:\n if wid is None:\n labels.append(-100)\n elif wid != prev_id:\n labels.append(example[\"ner_tags\"][wid])\n else:\n labels.append(-100)\n prev_id = wid\n enc[\"labels\"] = labels\n return enc\n\n\ndataset = dataset.map(tokenize_and_align, remove_columns=dataset.column_names)\n\nmodel = AutoModelForTokenClassification.from_pretrained(\"prajjwal1/bert-tiny\", num_labels=9)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForTokenClassification(tokenizer),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeTokenizerBehavior\",\n \"params\": {\n \"old_kwarg\": \"truncation\",\n \"old_value\": \"True\",\n \"new_kwarg\": \"truncate\",\n \"new_value\": \"True\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "bert_ner", "primitive_type": "RemoveDeprecatedMethod", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RemoveDeprecatedMethod\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Bert tiny NER fine-tuning on a 200-sample CoNLL-2003 subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForTokenClassification,\n Trainer,\n TrainingArguments,\n DataCollatorForTokenClassification,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"conll2003\", split=\"train[:200]\")\ntokenizer = AutoTokenizer.from_pretrained(\"prajjwal1/bert-tiny\")\n\n\ndef tokenize_and_align(example):\n enc = tokenizer(example[\"tokens\"], is_split_into_words=True, truncation=True, max_length=64)\n word_ids = enc.word_ids()\n labels = []\n prev_id = None\n for wid in word_ids:\n if wid is None:\n labels.append(-100)\n elif wid != prev_id:\n labels.append(example[\"ner_tags\"][wid])\n else:\n labels.append(-100)\n prev_id = wid\n enc[\"labels\"] = labels\n return enc\n\n\ndataset = dataset.map(tokenize_and_align, remove_columns=dataset.column_names)\n\nmodel = AutoModelForTokenClassification.from_pretrained(\"prajjwal1/bert-tiny\", num_labels=9)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForTokenClassification(tokenizer),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RemoveDeprecatedMethod\",\n \"params\": {\n \"class_name\": \"Trainer\",\n \"method_name\": \"save_model\",\n \"replacement\": \"save_to_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "distilbert_sst2", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilBERT fine-tuning on a tiny SST-2 subset.\n\nMinimal HuggingFace text-classification training script. Should complete\nin ~60s on CPU.\n\"\"\"\nfrom transformers import (\n DistilBertTokenizer,\n DistilBertForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"glue\", \"sst2\", split=\"train[:500]\")\ntokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased\")\n\n\ndef tokenize_function(examples):\n return tokenizer(\n examples[\"sentence\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize_function, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = DistilBertForSequenceClassification.from_pretrained(\n \"distilbert-base-uncased\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=16,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "distilbert_sst2", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilBERT fine-tuning on a tiny SST-2 subset.\n\nMinimal HuggingFace text-classification training script. Should complete\nin ~60s on CPU.\n\"\"\"\nfrom transformers import (\n DistilBertTokenizer,\n DistilBertForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"glue\", \"sst2\", split=\"train[:500]\")\ntokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased\")\n\n\ndef tokenize_function(examples):\n return tokenizer(\n examples[\"sentence\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize_function, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = DistilBertForSequenceClassification.from_pretrained(\n \"distilbert-base-uncased\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=16,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"from_pretrained\",\n \"new_name\": \"load_from_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "distilbert_sst2", "primitive_type": "ChangeArgumentSignature", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeArgumentSignature\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilBERT fine-tuning on a tiny SST-2 subset.\n\nMinimal HuggingFace text-classification training script. Should complete\nin ~60s on CPU.\n\"\"\"\nfrom transformers import (\n DistilBertTokenizer,\n DistilBertForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"glue\", \"sst2\", split=\"train[:500]\")\ntokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased\")\n\n\ndef tokenize_function(examples):\n return tokenizer(\n examples[\"sentence\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize_function, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = DistilBertForSequenceClassification.from_pretrained(\n \"distilbert-base-uncased\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=16,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeArgumentSignature\",\n \"params\": {\n \"function_name\": \"TrainingArguments\",\n \"removed_arg\": \"num_train_epochs\",\n \"added_arg\": \"max_steps\",\n \"added_value\": \"1000\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "distilbert_sst2", "primitive_type": "ModifyConfigField", "category": "config_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ModifyConfigField\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilBERT fine-tuning on a tiny SST-2 subset.\n\nMinimal HuggingFace text-classification training script. Should complete\nin ~60s on CPU.\n\"\"\"\nfrom transformers import (\n DistilBertTokenizer,\n DistilBertForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"glue\", \"sst2\", split=\"train[:500]\")\ntokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased\")\n\n\ndef tokenize_function(examples):\n return tokenizer(\n examples[\"sentence\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize_function, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = DistilBertForSequenceClassification.from_pretrained(\n \"distilbert-base-uncased\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=16,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ModifyConfigField\",\n \"params\": {\n \"config_class\": \"TrainingArguments\",\n \"field_name\": \"per_device_train_batch_size\",\n \"new_value\": \"1\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "distilbert_sst2", "primitive_type": "RestructureDatasetSchema", "category": "dataset_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RestructureDatasetSchema\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilBERT fine-tuning on a tiny SST-2 subset.\n\nMinimal HuggingFace text-classification training script. Should complete\nin ~60s on CPU.\n\"\"\"\nfrom transformers import (\n DistilBertTokenizer,\n DistilBertForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"glue\", \"sst2\", split=\"train[:500]\")\ntokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased\")\n\n\ndef tokenize_function(examples):\n return tokenizer(\n examples[\"sentence\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize_function, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = DistilBertForSequenceClassification.from_pretrained(\n \"distilbert-base-uncased\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=16,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RestructureDatasetSchema\",\n \"params\": {\n \"old_column\": \"label\",\n \"new_column\": \"labels\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "distilbert_sst2", "primitive_type": "ChangeTokenizerBehavior", "category": "tokenizer_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeTokenizerBehavior\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilBERT fine-tuning on a tiny SST-2 subset.\n\nMinimal HuggingFace text-classification training script. Should complete\nin ~60s on CPU.\n\"\"\"\nfrom transformers import (\n DistilBertTokenizer,\n DistilBertForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"glue\", \"sst2\", split=\"train[:500]\")\ntokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased\")\n\n\ndef tokenize_function(examples):\n return tokenizer(\n examples[\"sentence\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize_function, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = DistilBertForSequenceClassification.from_pretrained(\n \"distilbert-base-uncased\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=16,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeTokenizerBehavior\",\n \"params\": {\n \"old_kwarg\": \"truncation\",\n \"old_value\": \"True\",\n \"new_kwarg\": \"truncate\",\n \"new_value\": \"True\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "distilbert_sst2", "primitive_type": "RemoveDeprecatedMethod", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RemoveDeprecatedMethod\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilBERT fine-tuning on a tiny SST-2 subset.\n\nMinimal HuggingFace text-classification training script. Should complete\nin ~60s on CPU.\n\"\"\"\nfrom transformers import (\n DistilBertTokenizer,\n DistilBertForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"glue\", \"sst2\", split=\"train[:500]\")\ntokenizer = DistilBertTokenizer.from_pretrained(\"distilbert-base-uncased\")\n\n\ndef tokenize_function(examples):\n return tokenizer(\n examples[\"sentence\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize_function, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = DistilBertForSequenceClassification.from_pretrained(\n \"distilbert-base-uncased\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=16,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RemoveDeprecatedMethod\",\n \"params\": {\n \"class_name\": \"Trainer\",\n \"method_name\": \"save_model\",\n \"replacement\": \"save_to_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "electra_classification", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ELECTRA-small classification on 400-sample AG News (4-way text classification).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"ag_news\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/electra-small-discriminator\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"google/electra-small-discriminator\", num_labels=4\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "electra_classification", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ELECTRA-small classification on 400-sample AG News (4-way text classification).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"ag_news\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/electra-small-discriminator\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"google/electra-small-discriminator\", num_labels=4\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"from_pretrained\",\n \"new_name\": \"load_from_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "electra_classification", "primitive_type": "ChangeArgumentSignature", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeArgumentSignature\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ELECTRA-small classification on 400-sample AG News (4-way text classification).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"ag_news\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/electra-small-discriminator\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"google/electra-small-discriminator\", num_labels=4\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeArgumentSignature\",\n \"params\": {\n \"function_name\": \"TrainingArguments\",\n \"removed_arg\": \"num_train_epochs\",\n \"added_arg\": \"max_steps\",\n \"added_value\": \"1000\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "electra_classification", "primitive_type": "ModifyConfigField", "category": "config_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ModifyConfigField\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ELECTRA-small classification on 400-sample AG News (4-way text classification).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"ag_news\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/electra-small-discriminator\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"google/electra-small-discriminator\", num_labels=4\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ModifyConfigField\",\n \"params\": {\n \"config_class\": \"TrainingArguments\",\n \"field_name\": \"per_device_train_batch_size\",\n \"new_value\": \"1\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "electra_classification", "primitive_type": "RestructureDatasetSchema", "category": "dataset_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RestructureDatasetSchema\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ELECTRA-small classification on 400-sample AG News (4-way text classification).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"ag_news\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/electra-small-discriminator\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"google/electra-small-discriminator\", num_labels=4\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RestructureDatasetSchema\",\n \"params\": {\n \"old_column\": \"text\",\n \"new_column\": \"input_text\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "electra_classification", "primitive_type": "RestructureDatasetSchema", "category": "dataset_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RestructureDatasetSchema\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ELECTRA-small classification on 400-sample AG News (4-way text classification).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"ag_news\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/electra-small-discriminator\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"google/electra-small-discriminator\", num_labels=4\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RestructureDatasetSchema\",\n \"params\": {\n \"old_column\": \"label\",\n \"new_column\": \"labels\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "electra_classification", "primitive_type": "ChangeTokenizerBehavior", "category": "tokenizer_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeTokenizerBehavior\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ELECTRA-small classification on 400-sample AG News (4-way text classification).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"ag_news\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/electra-small-discriminator\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"google/electra-small-discriminator\", num_labels=4\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeTokenizerBehavior\",\n \"params\": {\n \"old_kwarg\": \"truncation\",\n \"old_value\": \"True\",\n \"new_kwarg\": \"truncate\",\n \"new_value\": \"True\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "electra_classification", "primitive_type": "RemoveDeprecatedMethod", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RemoveDeprecatedMethod\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ELECTRA-small classification on 400-sample AG News (4-way text classification).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"ag_news\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/electra-small-discriminator\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"google/electra-small-discriminator\", num_labels=4\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RemoveDeprecatedMethod\",\n \"params\": {\n \"class_name\": \"Trainer\",\n \"method_name\": \"save_model\",\n \"replacement\": \"save_to_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "gpt2_textgen", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilGPT2 causal-LM fine-tuning on 300 lines of WikiText (text generation).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForCausalLM,\n Trainer,\n TrainingArguments,\n DataCollatorForLanguageModeling,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"wikitext\", \"wikitext-2-raw-v1\", split=\"train[:300]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilgpt2\")\ntokenizer.pad_token = tokenizer.eos_token\n\n\ndef tokenize(examples):\n return tokenizer(examples[\"text\"], truncation=True, max_length=64)\n\n\ndataset = dataset.map(tokenize, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForCausalLM.from_pretrained(\"distilgpt2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "gpt2_textgen", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilGPT2 causal-LM fine-tuning on 300 lines of WikiText (text generation).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForCausalLM,\n Trainer,\n TrainingArguments,\n DataCollatorForLanguageModeling,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"wikitext\", \"wikitext-2-raw-v1\", split=\"train[:300]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilgpt2\")\ntokenizer.pad_token = tokenizer.eos_token\n\n\ndef tokenize(examples):\n return tokenizer(examples[\"text\"], truncation=True, max_length=64)\n\n\ndataset = dataset.map(tokenize, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForCausalLM.from_pretrained(\"distilgpt2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"from_pretrained\",\n \"new_name\": \"load_from_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "gpt2_textgen", "primitive_type": "ChangeArgumentSignature", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeArgumentSignature\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilGPT2 causal-LM fine-tuning on 300 lines of WikiText (text generation).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForCausalLM,\n Trainer,\n TrainingArguments,\n DataCollatorForLanguageModeling,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"wikitext\", \"wikitext-2-raw-v1\", split=\"train[:300]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilgpt2\")\ntokenizer.pad_token = tokenizer.eos_token\n\n\ndef tokenize(examples):\n return tokenizer(examples[\"text\"], truncation=True, max_length=64)\n\n\ndataset = dataset.map(tokenize, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForCausalLM.from_pretrained(\"distilgpt2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeArgumentSignature\",\n \"params\": {\n \"function_name\": \"TrainingArguments\",\n \"removed_arg\": \"num_train_epochs\",\n \"added_arg\": \"max_steps\",\n \"added_value\": \"1000\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "gpt2_textgen", "primitive_type": "ModifyConfigField", "category": "config_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ModifyConfigField\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilGPT2 causal-LM fine-tuning on 300 lines of WikiText (text generation).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForCausalLM,\n Trainer,\n TrainingArguments,\n DataCollatorForLanguageModeling,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"wikitext\", \"wikitext-2-raw-v1\", split=\"train[:300]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilgpt2\")\ntokenizer.pad_token = tokenizer.eos_token\n\n\ndef tokenize(examples):\n return tokenizer(examples[\"text\"], truncation=True, max_length=64)\n\n\ndataset = dataset.map(tokenize, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForCausalLM.from_pretrained(\"distilgpt2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ModifyConfigField\",\n \"params\": {\n \"config_class\": \"TrainingArguments\",\n \"field_name\": \"per_device_train_batch_size\",\n \"new_value\": \"1\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "gpt2_textgen", "primitive_type": "RestructureDatasetSchema", "category": "dataset_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RestructureDatasetSchema\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilGPT2 causal-LM fine-tuning on 300 lines of WikiText (text generation).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForCausalLM,\n Trainer,\n TrainingArguments,\n DataCollatorForLanguageModeling,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"wikitext\", \"wikitext-2-raw-v1\", split=\"train[:300]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilgpt2\")\ntokenizer.pad_token = tokenizer.eos_token\n\n\ndef tokenize(examples):\n return tokenizer(examples[\"text\"], truncation=True, max_length=64)\n\n\ndataset = dataset.map(tokenize, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForCausalLM.from_pretrained(\"distilgpt2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RestructureDatasetSchema\",\n \"params\": {\n \"old_column\": \"text\",\n \"new_column\": \"input_text\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "gpt2_textgen", "primitive_type": "ChangeTokenizerBehavior", "category": "tokenizer_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeTokenizerBehavior\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilGPT2 causal-LM fine-tuning on 300 lines of WikiText (text generation).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForCausalLM,\n Trainer,\n TrainingArguments,\n DataCollatorForLanguageModeling,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"wikitext\", \"wikitext-2-raw-v1\", split=\"train[:300]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilgpt2\")\ntokenizer.pad_token = tokenizer.eos_token\n\n\ndef tokenize(examples):\n return tokenizer(examples[\"text\"], truncation=True, max_length=64)\n\n\ndataset = dataset.map(tokenize, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForCausalLM.from_pretrained(\"distilgpt2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeTokenizerBehavior\",\n \"params\": {\n \"old_kwarg\": \"truncation\",\n \"old_value\": \"True\",\n \"new_kwarg\": \"truncate\",\n \"new_value\": \"True\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "gpt2_textgen", "primitive_type": "RemoveDeprecatedMethod", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RemoveDeprecatedMethod\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilGPT2 causal-LM fine-tuning on 300 lines of WikiText (text generation).\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForCausalLM,\n Trainer,\n TrainingArguments,\n DataCollatorForLanguageModeling,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"wikitext\", \"wikitext-2-raw-v1\", split=\"train[:300]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilgpt2\")\ntokenizer.pad_token = tokenizer.eos_token\n\n\ndef tokenize(examples):\n return tokenizer(examples[\"text\"], truncation=True, max_length=64)\n\n\ndataset = dataset.map(tokenize, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForCausalLM.from_pretrained(\"distilgpt2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),\n)\n\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RemoveDeprecatedMethod\",\n \"params\": {\n \"class_name\": \"Trainer\",\n \"method_name\": \"save_model\",\n \"replacement\": \"save_to_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "roberta_sentiment", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilRoberta sentiment classification on 400-sample IMDB subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"imdb\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilroberta-base\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"distilroberta-base\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "roberta_sentiment", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilRoberta sentiment classification on 400-sample IMDB subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"imdb\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilroberta-base\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"distilroberta-base\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"from_pretrained\",\n \"new_name\": \"load_from_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "roberta_sentiment", "primitive_type": "ChangeArgumentSignature", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeArgumentSignature\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilRoberta sentiment classification on 400-sample IMDB subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"imdb\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilroberta-base\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"distilroberta-base\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeArgumentSignature\",\n \"params\": {\n \"function_name\": \"TrainingArguments\",\n \"removed_arg\": \"num_train_epochs\",\n \"added_arg\": \"max_steps\",\n \"added_value\": \"1000\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "roberta_sentiment", "primitive_type": "ModifyConfigField", "category": "config_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ModifyConfigField\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilRoberta sentiment classification on 400-sample IMDB subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"imdb\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilroberta-base\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"distilroberta-base\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ModifyConfigField\",\n \"params\": {\n \"config_class\": \"TrainingArguments\",\n \"field_name\": \"per_device_train_batch_size\",\n \"new_value\": \"1\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "roberta_sentiment", "primitive_type": "RestructureDatasetSchema", "category": "dataset_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RestructureDatasetSchema\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilRoberta sentiment classification on 400-sample IMDB subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"imdb\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilroberta-base\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"distilroberta-base\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RestructureDatasetSchema\",\n \"params\": {\n \"old_column\": \"text\",\n \"new_column\": \"input_text\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "roberta_sentiment", "primitive_type": "RestructureDatasetSchema", "category": "dataset_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RestructureDatasetSchema\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilRoberta sentiment classification on 400-sample IMDB subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"imdb\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilroberta-base\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"distilroberta-base\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RestructureDatasetSchema\",\n \"params\": {\n \"old_column\": \"label\",\n \"new_column\": \"labels\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "roberta_sentiment", "primitive_type": "ChangeTokenizerBehavior", "category": "tokenizer_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeTokenizerBehavior\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilRoberta sentiment classification on 400-sample IMDB subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"imdb\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilroberta-base\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"distilroberta-base\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeTokenizerBehavior\",\n \"params\": {\n \"old_kwarg\": \"truncation\",\n \"old_value\": \"True\",\n \"new_kwarg\": \"truncate\",\n \"new_value\": \"True\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "roberta_sentiment", "primitive_type": "RemoveDeprecatedMethod", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RemoveDeprecatedMethod\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"DistilRoberta sentiment classification on 400-sample IMDB subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSequenceClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"imdb\", split=\"train[:400]\")\ntokenizer = AutoTokenizer.from_pretrained(\"distilroberta-base\")\n\n\ndef tokenize(examples):\n return tokenizer(\n examples[\"text\"],\n padding=\"max_length\",\n truncation=True,\n max_length=64,\n )\n\n\ndataset = dataset.map(tokenize, batched=True)\ndataset = dataset.rename_column(\"label\", \"labels\")\ndataset.set_format(\"torch\", columns=[\"input_ids\", \"attention_mask\", \"labels\"])\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n \"distilroberta-base\", num_labels=2\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=8,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RemoveDeprecatedMethod\",\n \"params\": {\n \"class_name\": \"Trainer\",\n \"method_name\": \"save_model\",\n \"replacement\": \"save_to_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "t5_summarization", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny T5 fine-tuning for summarization on 100-sample CNN/DailyMail.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSeq2SeqLM,\n DataCollatorForSeq2Seq,\n Seq2SeqTrainer,\n Seq2SeqTrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"cnn_dailymail\", \"3.0.0\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/flan-t5-small\")\n\n\ndef preprocess(examples):\n inputs = tokenizer(\n [\"summarize: \" + a for a in examples[\"article\"]],\n max_length=128,\n truncation=True,\n padding=\"max_length\",\n )\n targets = tokenizer(\n examples[\"highlights\"],\n max_length=32,\n truncation=True,\n padding=\"max_length\",\n )\n inputs[\"labels\"] = targets[\"input_ids\"]\n return inputs\n\n\ndataset = dataset.map(preprocess, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"google/flan-t5-small\")\n\ntraining_args = Seq2SeqTrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n predict_with_generate=False,\n)\n\ntrainer = Seq2SeqTrainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForSeq2Seq(tokenizer, model=model),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "t5_summarization", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny T5 fine-tuning for summarization on 100-sample CNN/DailyMail.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSeq2SeqLM,\n DataCollatorForSeq2Seq,\n Seq2SeqTrainer,\n Seq2SeqTrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"cnn_dailymail\", \"3.0.0\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/flan-t5-small\")\n\n\ndef preprocess(examples):\n inputs = tokenizer(\n [\"summarize: \" + a for a in examples[\"article\"]],\n max_length=128,\n truncation=True,\n padding=\"max_length\",\n )\n targets = tokenizer(\n examples[\"highlights\"],\n max_length=32,\n truncation=True,\n padding=\"max_length\",\n )\n inputs[\"labels\"] = targets[\"input_ids\"]\n return inputs\n\n\ndataset = dataset.map(preprocess, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"google/flan-t5-small\")\n\ntraining_args = Seq2SeqTrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n predict_with_generate=False,\n)\n\ntrainer = Seq2SeqTrainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForSeq2Seq(tokenizer, model=model),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"from_pretrained\",\n \"new_name\": \"load_from_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "t5_summarization", "primitive_type": "ChangeArgumentSignature", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeArgumentSignature\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny T5 fine-tuning for summarization on 100-sample CNN/DailyMail.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSeq2SeqLM,\n DataCollatorForSeq2Seq,\n Seq2SeqTrainer,\n Seq2SeqTrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"cnn_dailymail\", \"3.0.0\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/flan-t5-small\")\n\n\ndef preprocess(examples):\n inputs = tokenizer(\n [\"summarize: \" + a for a in examples[\"article\"]],\n max_length=128,\n truncation=True,\n padding=\"max_length\",\n )\n targets = tokenizer(\n examples[\"highlights\"],\n max_length=32,\n truncation=True,\n padding=\"max_length\",\n )\n inputs[\"labels\"] = targets[\"input_ids\"]\n return inputs\n\n\ndataset = dataset.map(preprocess, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"google/flan-t5-small\")\n\ntraining_args = Seq2SeqTrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n predict_with_generate=False,\n)\n\ntrainer = Seq2SeqTrainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForSeq2Seq(tokenizer, model=model),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeArgumentSignature\",\n \"params\": {\n \"function_name\": \"TrainingArguments\",\n \"removed_arg\": \"num_train_epochs\",\n \"added_arg\": \"max_steps\",\n \"added_value\": \"1000\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "t5_summarization", "primitive_type": "ModifyConfigField", "category": "config_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ModifyConfigField\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny T5 fine-tuning for summarization on 100-sample CNN/DailyMail.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSeq2SeqLM,\n DataCollatorForSeq2Seq,\n Seq2SeqTrainer,\n Seq2SeqTrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"cnn_dailymail\", \"3.0.0\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/flan-t5-small\")\n\n\ndef preprocess(examples):\n inputs = tokenizer(\n [\"summarize: \" + a for a in examples[\"article\"]],\n max_length=128,\n truncation=True,\n padding=\"max_length\",\n )\n targets = tokenizer(\n examples[\"highlights\"],\n max_length=32,\n truncation=True,\n padding=\"max_length\",\n )\n inputs[\"labels\"] = targets[\"input_ids\"]\n return inputs\n\n\ndataset = dataset.map(preprocess, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"google/flan-t5-small\")\n\ntraining_args = Seq2SeqTrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n predict_with_generate=False,\n)\n\ntrainer = Seq2SeqTrainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForSeq2Seq(tokenizer, model=model),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ModifyConfigField\",\n \"params\": {\n \"config_class\": \"TrainingArguments\",\n \"field_name\": \"per_device_train_batch_size\",\n \"new_value\": \"1\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "t5_summarization", "primitive_type": "ChangeTokenizerBehavior", "category": "tokenizer_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeTokenizerBehavior\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny T5 fine-tuning for summarization on 100-sample CNN/DailyMail.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSeq2SeqLM,\n DataCollatorForSeq2Seq,\n Seq2SeqTrainer,\n Seq2SeqTrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"cnn_dailymail\", \"3.0.0\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/flan-t5-small\")\n\n\ndef preprocess(examples):\n inputs = tokenizer(\n [\"summarize: \" + a for a in examples[\"article\"]],\n max_length=128,\n truncation=True,\n padding=\"max_length\",\n )\n targets = tokenizer(\n examples[\"highlights\"],\n max_length=32,\n truncation=True,\n padding=\"max_length\",\n )\n inputs[\"labels\"] = targets[\"input_ids\"]\n return inputs\n\n\ndataset = dataset.map(preprocess, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"google/flan-t5-small\")\n\ntraining_args = Seq2SeqTrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n predict_with_generate=False,\n)\n\ntrainer = Seq2SeqTrainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForSeq2Seq(tokenizer, model=model),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeTokenizerBehavior\",\n \"params\": {\n \"old_kwarg\": \"truncation\",\n \"old_value\": \"True\",\n \"new_kwarg\": \"truncate\",\n \"new_value\": \"True\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "t5_summarization", "primitive_type": "RemoveDeprecatedMethod", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RemoveDeprecatedMethod\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny T5 fine-tuning for summarization on 100-sample CNN/DailyMail.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSeq2SeqLM,\n DataCollatorForSeq2Seq,\n Seq2SeqTrainer,\n Seq2SeqTrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"cnn_dailymail\", \"3.0.0\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/flan-t5-small\")\n\n\ndef preprocess(examples):\n inputs = tokenizer(\n [\"summarize: \" + a for a in examples[\"article\"]],\n max_length=128,\n truncation=True,\n padding=\"max_length\",\n )\n targets = tokenizer(\n examples[\"highlights\"],\n max_length=32,\n truncation=True,\n padding=\"max_length\",\n )\n inputs[\"labels\"] = targets[\"input_ids\"]\n return inputs\n\n\ndataset = dataset.map(preprocess, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"google/flan-t5-small\")\n\ntraining_args = Seq2SeqTrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n predict_with_generate=False,\n)\n\ntrainer = Seq2SeqTrainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForSeq2Seq(tokenizer, model=model),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RemoveDeprecatedMethod\",\n \"params\": {\n \"class_name\": \"Trainer\",\n \"method_name\": \"save_model\",\n \"replacement\": \"save_to_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "t5_summarization", "primitive_type": "ChangeReturnType", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeReturnType\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny T5 fine-tuning for summarization on 100-sample CNN/DailyMail.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForSeq2SeqLM,\n DataCollatorForSeq2Seq,\n Seq2SeqTrainer,\n Seq2SeqTrainingArguments,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"cnn_dailymail\", \"3.0.0\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"google/flan-t5-small\")\n\n\ndef preprocess(examples):\n inputs = tokenizer(\n [\"summarize: \" + a for a in examples[\"article\"]],\n max_length=128,\n truncation=True,\n padding=\"max_length\",\n )\n targets = tokenizer(\n examples[\"highlights\"],\n max_length=32,\n truncation=True,\n padding=\"max_length\",\n )\n inputs[\"labels\"] = targets[\"input_ids\"]\n return inputs\n\n\ndataset = dataset.map(preprocess, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForSeq2SeqLM.from_pretrained(\"google/flan-t5-small\")\n\ntraining_args = Seq2SeqTrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n predict_with_generate=False,\n)\n\ntrainer = Seq2SeqTrainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DataCollatorForSeq2Seq(tokenizer, model=model),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeReturnType\",\n \"params\": {\n \"function_name\": \"tokenizer\",\n \"old_access\": \"[\\\"input_ids\\\"]\",\n \"new_access\": \".input_ids\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "tiny_mlp_mnist", "primitive_type": "RestructureDatasetSchema", "category": "dataset_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RestructureDatasetSchema\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny PyTorch MLP on a 1000-sample MNIST subset (image classification baseline).\"\"\"\nimport torch\nimport torch.nn as nn\nfrom torch.utils.data import DataLoader\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"mnist\", split=\"train[:1000]\")\ndataset = dataset.with_format(\"torch\")\n\n\ndef collate(batch):\n pixel = torch.stack([b[\"image\"].float().flatten() / 255.0 for b in batch])\n labels = torch.tensor([b[\"label\"] for b in batch])\n return pixel, labels\n\n\nloader = DataLoader(dataset, batch_size=32, shuffle=True, collate_fn=collate)\n\nmodel = nn.Sequential(\n nn.Linear(784, 64),\n nn.ReLU(),\n nn.Linear(64, 10),\n)\n\noptimizer = torch.optim.Adam(model.parameters(), lr=1e-3)\ncriterion = nn.CrossEntropyLoss()\n\nfor epoch in range(2):\n for step, (x, y) in enumerate(loader, start=1):\n optimizer.zero_grad()\n loss = criterion(model(x), y)\n loss.backward()\n optimizer.step()\n if step % 5 == 0:\n print(f\"step={step} loss={loss.item():.4f}\")\n\ntorch.save(model.state_dict(), \"/tmp/forge_output/checkpoint/mlp.pt\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RestructureDatasetSchema\",\n \"params\": {\n \"old_column\": \"label\",\n \"new_column\": \"labels\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "vit_cifar10", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny ViT image classification on 200-sample CIFAR-10 subset.\"\"\"\nfrom transformers import (\n AutoImageProcessor,\n AutoModelForImageClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\nimport torch\n\ndataset = load_dataset(\"cifar10\", split=\"train[:200]\")\nprocessor = AutoImageProcessor.from_pretrained(\"WinKawaks/vit-tiny-patch16-224\")\n\n\ndef transform(batch):\n images = [img.convert(\"RGB\") for img in batch[\"img\"]]\n inputs = processor(images=images, return_tensors=\"pt\")\n inputs[\"labels\"] = torch.tensor(batch[\"label\"])\n return inputs\n\n\ndataset = dataset.with_transform(transform)\n\nmodel = AutoModelForImageClassification.from_pretrained(\n \"WinKawaks/vit-tiny-patch16-224\", num_labels=10, ignore_mismatched_sizes=True\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "vit_cifar10", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny ViT image classification on 200-sample CIFAR-10 subset.\"\"\"\nfrom transformers import (\n AutoImageProcessor,\n AutoModelForImageClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\nimport torch\n\ndataset = load_dataset(\"cifar10\", split=\"train[:200]\")\nprocessor = AutoImageProcessor.from_pretrained(\"WinKawaks/vit-tiny-patch16-224\")\n\n\ndef transform(batch):\n images = [img.convert(\"RGB\") for img in batch[\"img\"]]\n inputs = processor(images=images, return_tensors=\"pt\")\n inputs[\"labels\"] = torch.tensor(batch[\"label\"])\n return inputs\n\n\ndataset = dataset.with_transform(transform)\n\nmodel = AutoModelForImageClassification.from_pretrained(\n \"WinKawaks/vit-tiny-patch16-224\", num_labels=10, ignore_mismatched_sizes=True\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"from_pretrained\",\n \"new_name\": \"load_from_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "vit_cifar10", "primitive_type": "ChangeArgumentSignature", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ChangeArgumentSignature\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny ViT image classification on 200-sample CIFAR-10 subset.\"\"\"\nfrom transformers import (\n AutoImageProcessor,\n AutoModelForImageClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\nimport torch\n\ndataset = load_dataset(\"cifar10\", split=\"train[:200]\")\nprocessor = AutoImageProcessor.from_pretrained(\"WinKawaks/vit-tiny-patch16-224\")\n\n\ndef transform(batch):\n images = [img.convert(\"RGB\") for img in batch[\"img\"]]\n inputs = processor(images=images, return_tensors=\"pt\")\n inputs[\"labels\"] = torch.tensor(batch[\"label\"])\n return inputs\n\n\ndataset = dataset.with_transform(transform)\n\nmodel = AutoModelForImageClassification.from_pretrained(\n \"WinKawaks/vit-tiny-patch16-224\", num_labels=10, ignore_mismatched_sizes=True\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ChangeArgumentSignature\",\n \"params\": {\n \"function_name\": \"TrainingArguments\",\n \"removed_arg\": \"num_train_epochs\",\n \"added_arg\": \"max_steps\",\n \"added_value\": \"1000\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "vit_cifar10", "primitive_type": "ModifyConfigField", "category": "config_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: ModifyConfigField\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny ViT image classification on 200-sample CIFAR-10 subset.\"\"\"\nfrom transformers import (\n AutoImageProcessor,\n AutoModelForImageClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\nimport torch\n\ndataset = load_dataset(\"cifar10\", split=\"train[:200]\")\nprocessor = AutoImageProcessor.from_pretrained(\"WinKawaks/vit-tiny-patch16-224\")\n\n\ndef transform(batch):\n images = [img.convert(\"RGB\") for img in batch[\"img\"]]\n inputs = processor(images=images, return_tensors=\"pt\")\n inputs[\"labels\"] = torch.tensor(batch[\"label\"])\n return inputs\n\n\ndataset = dataset.with_transform(transform)\n\nmodel = AutoModelForImageClassification.from_pretrained(\n \"WinKawaks/vit-tiny-patch16-224\", num_labels=10, ignore_mismatched_sizes=True\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"ModifyConfigField\",\n \"params\": {\n \"config_class\": \"TrainingArguments\",\n \"field_name\": \"per_device_train_batch_size\",\n \"new_value\": \"1\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "vit_cifar10", "primitive_type": "RestructureDatasetSchema", "category": "dataset_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RestructureDatasetSchema\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny ViT image classification on 200-sample CIFAR-10 subset.\"\"\"\nfrom transformers import (\n AutoImageProcessor,\n AutoModelForImageClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\nimport torch\n\ndataset = load_dataset(\"cifar10\", split=\"train[:200]\")\nprocessor = AutoImageProcessor.from_pretrained(\"WinKawaks/vit-tiny-patch16-224\")\n\n\ndef transform(batch):\n images = [img.convert(\"RGB\") for img in batch[\"img\"]]\n inputs = processor(images=images, return_tensors=\"pt\")\n inputs[\"labels\"] = torch.tensor(batch[\"label\"])\n return inputs\n\n\ndataset = dataset.with_transform(transform)\n\nmodel = AutoModelForImageClassification.from_pretrained(\n \"WinKawaks/vit-tiny-patch16-224\", num_labels=10, ignore_mismatched_sizes=True\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RestructureDatasetSchema\",\n \"params\": {\n \"old_column\": \"label\",\n \"new_column\": \"labels\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "vit_cifar10", "primitive_type": "RemoveDeprecatedMethod", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RemoveDeprecatedMethod\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"Tiny ViT image classification on 200-sample CIFAR-10 subset.\"\"\"\nfrom transformers import (\n AutoImageProcessor,\n AutoModelForImageClassification,\n Trainer,\n TrainingArguments,\n)\nfrom datasets import load_dataset\nimport torch\n\ndataset = load_dataset(\"cifar10\", split=\"train[:200]\")\nprocessor = AutoImageProcessor.from_pretrained(\"WinKawaks/vit-tiny-patch16-224\")\n\n\ndef transform(batch):\n images = [img.convert(\"RGB\") for img in batch[\"img\"]]\n inputs = processor(images=images, return_tensors=\"pt\")\n inputs[\"labels\"] = torch.tensor(batch[\"label\"])\n return inputs\n\n\ndataset = dataset.with_transform(transform)\n\nmodel = AutoModelForImageClassification.from_pretrained(\n \"WinKawaks/vit-tiny-patch16-224\", num_labels=10, ignore_mismatched_sizes=True\n)\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=training_args, train_dataset=dataset)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RemoveDeprecatedMethod\",\n \"params\": {\n \"class_name\": \"Trainer\",\n \"method_name\": \"save_model\",\n \"replacement\": \"save_to_hub\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "albert_qa", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ALBERT-tiny extractive QA on 100-sample SQuAD subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForQuestionAnswering,\n Trainer,\n TrainingArguments,\n DefaultDataCollator,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"squad\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"albert-base-v2\")\n\n\ndef prepare(examples):\n enc = tokenizer(\n examples[\"question\"],\n examples[\"context\"],\n max_length=128,\n truncation=\"only_second\",\n padding=\"max_length\",\n return_offsets_mapping=True,\n )\n start_positions, end_positions = [], []\n for i, offsets in enumerate(enc[\"offset_mapping\"]):\n answer = examples[\"answers\"][i]\n start_char = answer[\"answer_start\"][0]\n end_char = start_char + len(answer[\"text\"][0])\n\n token_start = next(\n (idx for idx, (a, b) in enumerate(offsets) if a <= start_char < b), 0\n )\n token_end = next(\n (idx for idx, (a, b) in enumerate(offsets) if a < end_char <= b), token_start\n )\n start_positions.append(token_start)\n end_positions.append(token_end)\n\n enc[\"start_positions\"] = start_positions\n enc[\"end_positions\"] = end_positions\n enc.pop(\"offset_mapping\")\n return enc\n\n\ndataset = dataset.map(prepare, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"albert-base-v2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DefaultDataCollator(),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "albert_qa", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ALBERT-tiny extractive QA on 100-sample SQuAD subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForQuestionAnswering,\n Trainer,\n TrainingArguments,\n DefaultDataCollator,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"squad\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"albert-base-v2\")\n\n\ndef prepare(examples):\n enc = tokenizer(\n examples[\"question\"],\n examples[\"context\"],\n max_length=128,\n truncation=\"only_second\",\n padding=\"max_length\",\n return_offsets_mapping=True,\n )\n start_positions, end_positions = [], []\n for i, offsets in enumerate(enc[\"offset_mapping\"]):\n answer = examples[\"answers\"][i]\n start_char = answer[\"answer_start\"][0]\n end_char = start_char + len(answer[\"text\"][0])\n\n token_start = next(\n (idx for idx, (a, b) in enumerate(offsets) if a <= start_char < b), 0\n )\n token_end = next(\n (idx for idx, (a, b) in enumerate(offsets) if a < end_char <= b), token_start\n )\n start_positions.append(token_start)\n end_positions.append(token_end)\n\n enc[\"start_positions\"] = start_positions\n enc[\"end_positions\"] = end_positions\n enc.pop(\"offset_mapping\")\n return enc\n\n\ndataset = dataset.map(prepare, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"albert-base-v2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DefaultDataCollator(),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "albert_qa", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ALBERT-tiny extractive QA on 100-sample SQuAD subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForQuestionAnswering,\n Trainer,\n TrainingArguments,\n DefaultDataCollator,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"squad\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"albert-base-v2\")\n\n\ndef prepare(examples):\n enc = tokenizer(\n examples[\"question\"],\n examples[\"context\"],\n max_length=128,\n truncation=\"only_second\",\n padding=\"max_length\",\n return_offsets_mapping=True,\n )\n start_positions, end_positions = [], []\n for i, offsets in enumerate(enc[\"offset_mapping\"]):\n answer = examples[\"answers\"][i]\n start_char = answer[\"answer_start\"][0]\n end_char = start_char + len(answer[\"text\"][0])\n\n token_start = next(\n (idx for idx, (a, b) in enumerate(offsets) if a <= start_char < b), 0\n )\n token_end = next(\n (idx for idx, (a, b) in enumerate(offsets) if a < end_char <= b), token_start\n )\n start_positions.append(token_start)\n end_positions.append(token_end)\n\n enc[\"start_positions\"] = start_positions\n enc[\"end_positions\"] = end_positions\n enc.pop(\"offset_mapping\")\n return enc\n\n\ndataset = dataset.map(prepare, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"albert-base-v2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DefaultDataCollator(),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "albert_qa", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ALBERT-tiny extractive QA on 100-sample SQuAD subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForQuestionAnswering,\n Trainer,\n TrainingArguments,\n DefaultDataCollator,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"squad\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"albert-base-v2\")\n\n\ndef prepare(examples):\n enc = tokenizer(\n examples[\"question\"],\n examples[\"context\"],\n max_length=128,\n truncation=\"only_second\",\n padding=\"max_length\",\n return_offsets_mapping=True,\n )\n start_positions, end_positions = [], []\n for i, offsets in enumerate(enc[\"offset_mapping\"]):\n answer = examples[\"answers\"][i]\n start_char = answer[\"answer_start\"][0]\n end_char = start_char + len(answer[\"text\"][0])\n\n token_start = next(\n (idx for idx, (a, b) in enumerate(offsets) if a <= start_char < b), 0\n )\n token_end = next(\n (idx for idx, (a, b) in enumerate(offsets) if a < end_char <= b), token_start\n )\n start_positions.append(token_start)\n end_positions.append(token_end)\n\n enc[\"start_positions\"] = start_positions\n enc[\"end_positions\"] = end_positions\n enc.pop(\"offset_mapping\")\n return enc\n\n\ndataset = dataset.map(prepare, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"albert-base-v2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DefaultDataCollator(),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "albert_qa", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ALBERT-tiny extractive QA on 100-sample SQuAD subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForQuestionAnswering,\n Trainer,\n TrainingArguments,\n DefaultDataCollator,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"squad\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"albert-base-v2\")\n\n\ndef prepare(examples):\n enc = tokenizer(\n examples[\"question\"],\n examples[\"context\"],\n max_length=128,\n truncation=\"only_second\",\n padding=\"max_length\",\n return_offsets_mapping=True,\n )\n start_positions, end_positions = [], []\n for i, offsets in enumerate(enc[\"offset_mapping\"]):\n answer = examples[\"answers\"][i]\n start_char = answer[\"answer_start\"][0]\n end_char = start_char + len(answer[\"text\"][0])\n\n token_start = next(\n (idx for idx, (a, b) in enumerate(offsets) if a <= start_char < b), 0\n )\n token_end = next(\n (idx for idx, (a, b) in enumerate(offsets) if a < end_char <= b), token_start\n )\n start_positions.append(token_start)\n end_positions.append(token_end)\n\n enc[\"start_positions\"] = start_positions\n enc[\"end_positions\"] = end_positions\n enc.pop(\"offset_mapping\")\n return enc\n\n\ndataset = dataset.map(prepare, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"albert-base-v2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DefaultDataCollator(),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "albert_qa", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ALBERT-tiny extractive QA on 100-sample SQuAD subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForQuestionAnswering,\n Trainer,\n TrainingArguments,\n DefaultDataCollator,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"squad\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"albert-base-v2\")\n\n\ndef prepare(examples):\n enc = tokenizer(\n examples[\"question\"],\n examples[\"context\"],\n max_length=128,\n truncation=\"only_second\",\n padding=\"max_length\",\n return_offsets_mapping=True,\n )\n start_positions, end_positions = [], []\n for i, offsets in enumerate(enc[\"offset_mapping\"]):\n answer = examples[\"answers\"][i]\n start_char = answer[\"answer_start\"][0]\n end_char = start_char + len(answer[\"text\"][0])\n\n token_start = next(\n (idx for idx, (a, b) in enumerate(offsets) if a <= start_char < b), 0\n )\n token_end = next(\n (idx for idx, (a, b) in enumerate(offsets) if a < end_char <= b), token_start\n )\n start_positions.append(token_start)\n end_positions.append(token_end)\n\n enc[\"start_positions\"] = start_positions\n enc[\"end_positions\"] = end_positions\n enc.pop(\"offset_mapping\")\n return enc\n\n\ndataset = dataset.map(prepare, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"albert-base-v2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DefaultDataCollator(),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |
| {"role_target": "drift_generator", "task_id": "albert_qa", "primitive_type": "RenameApiCall", "category": "api_drift", "messages": [{"role": "system", "content": "You are the Drift Generator.\nYou see a working HuggingFace training script and the curriculum target category.\nOutput exactly one JSON object describing a breakage primitive that simulates\nrealistic library version drift. The primitive must:\n1. Be PLAUSIBLE \u2014 match the kind of breakage that happens between real\n transformers/datasets/trl releases.\n2. Be SOLVABLE \u2014 the Repair Agent should be able to fix it from the error trace alone.\n3. Match the requested target_category.\n\nOutput schema:\n{\"primitive_type\": \"<one of the 8 types>\", \"params\": { ... }}\n\nAvailable primitive types and parameter schemas:\n- RenameApiCall: {\"old_name\": str, \"new_name\": str}\n- DeprecateImport: {\"old_module\": str, \"new_module\": str}\n- ChangeArgumentSignature: {\"function_name\": str, \"removed_arg\": str, \"added_arg\": str, \"added_value\": str}\n- ModifyConfigField: {\"config_class\": str, \"field_name\": str, \"new_value\": str}\n- RestructureDatasetSchema: {\"old_column\": str, \"new_column\": str}\n- ChangeTokenizerBehavior: {\"old_kwarg\": str, \"old_value\": str, \"new_kwarg\": str, \"new_value\": str}\n- RemoveDeprecatedMethod: {\"class_name\": str, \"method_name\": str, \"replacement\": str}\n- ChangeReturnType: {\"function_name\": str, \"old_access\": str, \"new_access\": str}\n\nOutput ONLY the JSON object \u2014 no commentary, no markdown fences.\n"}, {"role": "user", "content": "Target category: RenameApiCall\nLibrary versions: transformers=4.40.0, datasets=2.18.0, trl=0.10.0\n\nWorking script:\n```python\n\"\"\"ALBERT-tiny extractive QA on 100-sample SQuAD subset.\"\"\"\nfrom transformers import (\n AutoTokenizer,\n AutoModelForQuestionAnswering,\n Trainer,\n TrainingArguments,\n DefaultDataCollator,\n)\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"squad\", split=\"train[:100]\")\ntokenizer = AutoTokenizer.from_pretrained(\"albert-base-v2\")\n\n\ndef prepare(examples):\n enc = tokenizer(\n examples[\"question\"],\n examples[\"context\"],\n max_length=128,\n truncation=\"only_second\",\n padding=\"max_length\",\n return_offsets_mapping=True,\n )\n start_positions, end_positions = [], []\n for i, offsets in enumerate(enc[\"offset_mapping\"]):\n answer = examples[\"answers\"][i]\n start_char = answer[\"answer_start\"][0]\n end_char = start_char + len(answer[\"text\"][0])\n\n token_start = next(\n (idx for idx, (a, b) in enumerate(offsets) if a <= start_char < b), 0\n )\n token_end = next(\n (idx for idx, (a, b) in enumerate(offsets) if a < end_char <= b), token_start\n )\n start_positions.append(token_start)\n end_positions.append(token_end)\n\n enc[\"start_positions\"] = start_positions\n enc[\"end_positions\"] = end_positions\n enc.pop(\"offset_mapping\")\n return enc\n\n\ndataset = dataset.map(prepare, batched=True, remove_columns=dataset.column_names)\n\nmodel = AutoModelForQuestionAnswering.from_pretrained(\"albert-base-v2\")\n\ntraining_args = TrainingArguments(\n output_dir=\"/tmp/forge_output/checkpoint\",\n num_train_epochs=1,\n per_device_train_batch_size=4,\n logging_steps=5,\n save_strategy=\"epoch\",\n no_cuda=True,\n report_to=\"none\",\n)\n\ntrainer = Trainer(\n model=model,\n args=training_args,\n train_dataset=dataset,\n data_collator=DefaultDataCollator(),\n)\ntrainer.train()\ntrainer.save_model(\"/tmp/forge_output/checkpoint\")\nprint(\"TRAINING_COMPLETE\")\n\n```\n\nOutput JSON breakage primitive:"}, {"role": "assistant", "content": "{\n \"primitive_type\": \"RenameApiCall\",\n \"params\": {\n \"old_name\": \"trainer.train\",\n \"new_name\": \"trainer.start_training\"\n }\n}"}]} | |