diff --git "a/evaluation_results.jsonl" "b/evaluation_results.jsonl"
deleted file mode 100644--- "a/evaluation_results.jsonl"
+++ /dev/null
@@ -1,30 +0,0 @@
-{"discussion_title":"QLoRA - model isn&rsquo;t training","discussion_url":"https://discuss.huggingface.co/t/qlora-model-isnt-training/169337","discussion_topic_id":169337,"discussion_category":5,"discussion_created_at":"2025-10-22T11:19:32.837000Z","thread":[{"id":243954,"name":"Anton Bartash","username":"antbartash","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/a/46a35a/{size}.png","created_at":"2025-10-22T11:19:32.912Z","cooked":"<p>Hi everyone,<br>\nI’ve been trying to switch from LoRA to QLoRA on an Nvidia T4, but I’m running into an issue where the evaluation loss stays completely flat, while the training loss fluctuates around its initial value.</p>\n<p>My LoRA setup works fine, but adding <code>bnb_config</code>, <code>model.gradient_checkpointing_enable()</code>, and <code>model = prepare_model_for_kbit_training(model)</code> causes the issue described above.<br>\n<div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/5/d/5d755be17cacac8fc8637104730fdb9b8cb38d49.jpeg\" data-download-href=\"/uploads/short-url/dkLQoooAVBLFYkiL9asE9DmfI5r.jpeg?dl=1\" title=\"1000000396\" rel=\"noopener nofollow ugc\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/optimized/3X/5/d/5d755be17cacac8fc8637104730fdb9b8cb38d49_2_690x454.jpeg\" alt=\"1000000396\" data-base62-sha1=\"dkLQoooAVBLFYkiL9asE9DmfI5r\" width=\"690\" height=\"454\" srcset=\"https://us1.discourse-cdn.com/hellohellohello/optimized/3X/5/d/5d755be17cacac8fc8637104730fdb9b8cb38d49_2_690x454.jpeg, https://us1.discourse-cdn.com/hellohellohello/optimized/3X/5/d/5d755be17cacac8fc8637104730fdb9b8cb38d49_2_1035x681.jpeg 1.5x, https://us1.discourse-cdn.com/hellohellohello/optimized/3X/5/d/5d755be17cacac8fc8637104730fdb9b8cb38d49_2_1380x908.jpeg 2x\" data-dominant-color=\"1D1D1D\"><div class=\"meta\"><svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">1000000396</span><span class=\"informations\">1455×959 167 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg></div></a></div></p>\n<p>Since the non-quantized version runs without problems, I don’t think the issue is related to the LoRA config, dataset, or formatting functions. The number of trainable parameters is non-zero for both the LoRA and QLoRA setups.</p>\n<p>Below is the code I’m using for QLoRA. Any help would be appreciated!</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">ds_train_with_assistant_content = ds_train.map(construct_message_with_assistant_content)\nds_valid_with_assistant_content = ds_valid.map(construct_message_with_assistant_content)\n\nbnb_config = BitsAndBytesConfig(\n    load_in_4bit=True,\n    bnb_4bit_use_double_quant=True,\n    bnb_4bit_quant_type=\"nf4\",\n    bnb_4bit_compute_dtype=torch.bfloat16\n)\n\ncheckpoint = \"Qwen/Qwen3-0.6B\"\ntokenizer = AutoTokenizer.from_pretrained(checkpoint)\nmodel = AutoModelForCausalLM.from_pretrained(\n    checkpoint,\n    device_map=\"auto\",\n    quantization_config=bnb_config\n)\n\nmodel.config.use_cache = False\nmodel.gradient_checkpointing_enable()\nmodel = prepare_model_for_kbit_training(model)\nmodel.enable_input_require_grads()\n\n\ntimestamp = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')\nRUN_NAME = f'qlora-final-model-all-linear-r64-{timestamp}'\nwandb.init(\n    project=os.environ[\"WANDB_PROJECT\"],\n    name=RUN_NAME,\n    # id=run_id,         # resume previous run if available\n    resume=\"allow\",    # allows resuming crashed run\n)\n\n\nRESUME_TRAINING = False\nOUTPUT_DIR = \"./qlora-final_model_all_linear_r64-output\"\nPER_DEVICE_BATCH_SIZE = 2  # higher values --&gt; OOM\n\noptimizer = 'paged_adamw_8bit'\neffective_batch_size = 16\nlearning_rate = 1e-5\nweight_decay = 0.0\nbetas = (0.9, 0.9999)\nwarmup_ratio = 0.2\nepochs = 1\ngradient_accumulation_steps = int(effective_batch_size / PER_DEVICE_BATCH_SIZE)\nlora_r = 16*4\nlora_alpha = 64*4\nlora_dropout = 0.01\n\n\ntraining_args = TrainingArguments(\n    output_dir=OUTPUT_DIR,\n    per_device_train_batch_size=PER_DEVICE_BATCH_SIZE,\n    gradient_accumulation_steps=gradient_accumulation_steps,\n    learning_rate=learning_rate,\n    optim=optimizer, \n    num_train_epochs=epochs,\n    weight_decay=weight_decay,\n    lr_scheduler_type=\"cosine\",\n    warmup_ratio=warmup_ratio,\n    save_strategy=\"steps\",\n    save_steps=gradient_accumulation_steps*5,\n    save_total_limit=2,\n    eval_strategy=\"steps\",\n    eval_steps=gradient_accumulation_steps*5,\n    logging_strategy=\"steps\",\n    logging_steps=gradient_accumulation_steps*5,\n    report_to=['wandb'],\n    run_name=RUN_NAME,\n    bf16=True,\n    # fp16=True,\n    # fp16_full_eval=True,\n    metric_for_best_model=\"eval_loss\",\n    greater_is_better=False,\n    max_grad_norm=1,\n    load_best_model_at_end=True,\n    gradient_checkpointing=True,\n    gradient_checkpointing_kwargs={\"use_reentrant\": False}\n)\n\n\npeft_config = LoraConfig(\n    r=lora_r,\n    lora_alpha=lora_alpha,\n    lora_dropout=lora_dropout,\n    bias=\"none\",\n    task_type=\"CAUSAL_LM\",\n    target_modules='all-linear'\n)\n# model.requires_grad_(False)                     # freeze base weights (precautionary)\nmodel_peft = get_peft_model(model, peft_config) # inject a LoRA adapter\nprint_trainable_parameters(model_peft)\n\ntrainer = SFTTrainer(\n    model=model_peft,\n    train_dataset=ds_train_with_assistant_content,\n    eval_dataset=ds_valid_with_assistant_content,\n    formatting_func=formatting_func,\n    args=training_args,\n    callbacks=[EarlyStoppingCallback(early_stopping_patience=25)]\n)\n\n\n# Training setup summary\ndataset_size = len(ds_train_with_assistant_content)\nsteps_per_epoch = dataset_size // (PER_DEVICE_BATCH_SIZE * gradient_accumulation_steps)\ntotal_steps = steps_per_epoch * epochs\nwarmup_steps = int(total_steps * warmup_ratio)\n\nprint(\"===== Training Setup Summary =====\")\nprint(f\"Num epochs:            {epochs}\")\nprint(f\"Effective batch size:  {effective_batch_size}\")\nprint(f\"Per-device batch size: {PER_DEVICE_BATCH_SIZE}\")\nprint(f\"Gradient accumulation: {gradient_accumulation_steps}\")\nprint(f\"Dataset size:          {dataset_size}\")\nprint(f\"Steps per epoch:       {steps_per_epoch}\")\nprint(f\"Total training steps:  {total_steps}\")\nprint(f\"Warmup steps:          {warmup_steps}\")\nprint(f\"Logging steps:         {training_args.logging_steps}\")\nprint(\"===================================\")\nprint(f\"Start time: {datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}\")\n\n\n# Training\nlast_checkpoint = None\nif RESUME_TRAINING and os.path.isdir(OUTPUT_DIR):\n    last_checkpoint = get_last_checkpoint(OUTPUT_DIR)\n\nif last_checkpoint is not None:\n    print(f\"Resuming training from checkpoint: {last_checkpoint}\")\n    trainer.train(resume_from_checkpoint=last_checkpoint)\nelse:\n    print(\"Starting fresh training run\")\n    trainer.train()\n\nprint(f\"End time: {datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}\")\n\n\n# WandB logging of eval metrics\nfor log in trainer.state.log_history:\n    if 'eval_loss' in log:\n        wandb.log({\n            \"eval_loss\": log['eval_loss'],\n            \"eval_perplexity\": math.exp(log['eval_loss']),\n            \"step\": log['step'],\n            \"learning_rate\": learning_rate,\n            \"weight_decay\": weight_decay,\n            \"betas\": betas,\n            \"warmup_ratio\": warmup_ratio,\n            \"effective_batch_size\": effective_batch_size,\n            \"optimizer\": optimizer\n        })\n\nwandb.finish()  # finish the run</code></pre>","post_number":1,"post_type":1,"posts_count":4,"updated_at":"2025-10-22T11:19:32.912Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":32,"reads":8,"readers_count":7,"score":36.4,"yours":false,"topic_id":169337,"topic_slug":"qlora-model-isnt-training","display_username":"Anton Bartash","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":106030,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/qlora-model-isnt-training/169337/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":243957,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-22T12:52:50.634Z","cooked":"<blockquote>\n<p>Nvidia T4</p>\n</blockquote>\n<p>Since T4 doesn’t natively support <code>torch.bfloat16</code>, using <code>torch.float16</code>/ <code>fp16=True</code> instead might resolve the error. No other major issues appear to exist.</p>","post_number":2,"post_type":1,"posts_count":4,"updated_at":"2025-10-22T12:52:50.634Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":8,"readers_count":7,"score":11.4,"yours":false,"topic_id":169337,"topic_slug":"qlora-model-isnt-training","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/qlora-model-isnt-training/169337/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243998,"name":"Anton Bartash","username":"antbartash","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/a/46a35a/{size}.png","created_at":"2025-10-23T07:19:01.516Z","cooked":"<p>Thanks for the suggestion<br>\nIt turned out the issue was environment-related — I was able to get the expected results using the exact same code on Colab. In my local environment, clearing the caches for transformers, torch, etc., and upgrading all the libraries resolved the problem.</p>","post_number":3,"post_type":1,"posts_count":4,"updated_at":"2025-10-23T07:19:01.516Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":1,"reads":7,"readers_count":6,"score":21.2,"yours":false,"topic_id":169337,"topic_slug":"qlora-model-isnt-training","display_username":"Anton Bartash","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":106030,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/qlora-model-isnt-training/169337/3","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":244071,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-10-24T18:16:57.733Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":4,"post_type":3,"posts_count":4,"updated_at":"2025-10-24T18:16:57.733Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":2,"readers_count":1,"score":0,"yours":false,"topic_id":169337,"topic_slug":"qlora-model-isnt-training","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/qlora-model-isnt-training/169337/4","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Hi everyone,<br>\nI’ve been trying to switch from LoRA to QLoRA on an Nvidia T4, but I’m running into an issue where the evaluation loss stays completely flat, while the training loss fluctuates around its initial value.</p>\n<p>My LoRA setup works fine, but adding <code>bnb_config</code>, <code>model.gradient_checkpointing_enable()</code>, and <code>model = prepare_model_for_kbit_training(model)</code> causes the issue described above.<br>\n<div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/5/d/5d755be17cacac8fc8637104730fdb9b8cb38d49.jpeg\" data-download-href=\"/uploads/short-url/dkLQoooAVBLFYkiL9asE9DmfI5r.jpeg?dl=1\" title=\"1000000396\" rel=\"noopener nofollow ugc\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/optimized/3X/5/d/5d755be17cacac8fc8637104730fdb9b8cb38d49_2_690x454.jpeg\" alt=\"1000000396\" data-base62-sha1=\"dkLQoooAVBLFYkiL9asE9DmfI5r\" width=\"690\" height=\"454\" srcset=\"https://us1.discourse-cdn.com/hellohellohello/optimized/3X/5/d/5d755be17cacac8fc8637104730fdb9b8cb38d49_2_690x454.jpeg, https://us1.discourse-cdn.com/hellohellohello/optimized/3X/5/d/5d755be17cacac8fc8637104730fdb9b8cb38d49_2_1035x681.jpeg 1.5x, https://us1.discourse-cdn.com/hellohellohello/optimized/3X/5/d/5d755be17cacac8fc8637104730fdb9b8cb38d49_2_1380x908.jpeg 2x\" data-dominant-color=\"1D1D1D\"><div class=\"meta\"><svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">1000000396</span><span class=\"informations\">1455×959 167 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg></div></a></div></p>\n<p>Since the non-quantized version runs without problems, I don’t think the issue is related to the LoRA config, dataset, or formatting functions. The number of trainable parameters is non-zero for both the LoRA and QLoRA setups.</p>\n<p>Below is the code I’m using for QLoRA. Any help would be appreciated!</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">ds_train_with_assistant_content = ds_train.map(construct_message_with_assistant_content)\nds_valid_with_assistant_content = ds_valid.map(construct_message_with_assistant_content)\n\nbnb_config = BitsAndBytesConfig(\n    load_in_4bit=True,\n    bnb_4bit_use_double_quant=True,\n    bnb_4bit_quant_type=\"nf4\",\n    bnb_4bit_compute_dtype=torch.bfloat16\n)\n\ncheckpoint = \"Qwen/Qwen3-0.6B\"\ntokenizer = AutoTokenizer.from_pretrained(checkpoint)\nmodel = AutoModelForCausalLM.from_pretrained(\n    checkpoint,\n    device_map=\"auto\",\n    quantization_config=bnb_config\n)\n\nmodel.config.use_cache = False\nmodel.gradient_checkpointing_enable()\nmodel = prepare_model_for_kbit_training(model)\nmodel.enable_input_require_grads()\n\n\ntimestamp = datetime.now().strftime('%Y-%m-%d_%H-%M-%S')\nRUN_NAME = f'qlora-final-model-all-linear-r64-{timestamp}'\nwandb.init(\n    project=os.environ[\"WANDB_PROJECT\"],\n    name=RUN_NAME,\n    # id=run_id,         # resume previous run if available\n    resume=\"allow\",    # allows resuming crashed run\n)\n\n\nRESUME_TRAINING = False\nOUTPUT_DIR = \"./qlora-final_model_all_linear_r64-output\"\nPER_DEVICE_BATCH_SIZE = 2  # higher values --&gt; OOM\n\noptimizer = 'paged_adamw_8bit'\neffective_batch_size = 16\nlearning_rate = 1e-5\nweight_decay = 0.0\nbetas = (0.9, 0.9999)\nwarmup_ratio = 0.2\nepochs = 1\ngradient_accumulation_steps = int(effective_batch_size / PER_DEVICE_BATCH_SIZE)\nlora_r = 16*4\nlora_alpha = 64*4\nlora_dropout = 0.01\n\n\ntraining_args = TrainingArguments(\n    output_dir=OUTPUT_DIR,\n    per_device_train_batch_size=PER_DEVICE_BATCH_SIZE,\n    gradient_accumulation_steps=gradient_accumulation_steps,\n    learning_rate=learning_rate,\n    optim=optimizer, \n    num_train_epochs=epochs,\n    weight_decay=weight_decay,\n    lr_scheduler_type=\"cosine\",\n    warmup_ratio=warmup_ratio,\n    save_strategy=\"steps\",\n    save_steps=gradient_accumulation_steps*5,\n    save_total_limit=2,\n    eval_strategy=\"steps\",\n    eval_steps=gradient_accumulation_steps*5,\n    logging_strategy=\"steps\",\n    logging_steps=gradient_accumulation_steps*5,\n    report_to=['wandb'],\n    run_name=RUN_NAME,\n    bf16=True,\n    # fp16=True,\n    # fp16_full_eval=True,\n    metric_for_best_model=\"eval_loss\",\n    greater_is_better=False,\n    max_grad_norm=1,\n    load_best_model_at_end=True,\n    gradient_checkpointing=True,\n    gradient_checkpointing_kwargs={\"use_reentrant\": False}\n)\n\n\npeft_config = LoraConfig(\n    r=lora_r,\n    lora_alpha=lora_alpha,\n    lora_dropout=lora_dropout,\n    bias=\"none\",\n    task_type=\"CAUSAL_LM\",\n    target_modules='all-linear'\n)\n# model.requires_grad_(False)                     # freeze base weights (precautionary)\nmodel_peft = get_peft_model(model, peft_config) # inject a LoRA adapter\nprint_trainable_parameters(model_peft)\n\ntrainer = SFTTrainer(\n    model=model_peft,\n    train_dataset=ds_train_with_assistant_content,\n    eval_dataset=ds_valid_with_assistant_content,\n    formatting_func=formatting_func,\n    args=training_args,\n    callbacks=[EarlyStoppingCallback(early_stopping_patience=25)]\n)\n\n\n# Training setup summary\ndataset_size = len(ds_train_with_assistant_content)\nsteps_per_epoch = dataset_size // (PER_DEVICE_BATCH_SIZE * gradient_accumulation_steps)\ntotal_steps = steps_per_epoch * epochs\nwarmup_steps = int(total_steps * warmup_ratio)\n\nprint(\"===== Training Setup Summary =====\")\nprint(f\"Num epochs:            {epochs}\")\nprint(f\"Effective batch size:  {effective_batch_size}\")\nprint(f\"Per-device batch size: {PER_DEVICE_BATCH_SIZE}\")\nprint(f\"Gradient accumulation: {gradient_accumulation_steps}\")\nprint(f\"Dataset size:          {dataset_size}\")\nprint(f\"Steps per epoch:       {steps_per_epoch}\")\nprint(f\"Total training steps:  {total_steps}\")\nprint(f\"Warmup steps:          {warmup_steps}\")\nprint(f\"Logging steps:         {training_args.logging_steps}\")\nprint(\"===================================\")\nprint(f\"Start time: {datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}\")\n\n\n# Training\nlast_checkpoint = None\nif RESUME_TRAINING and os.path.isdir(OUTPUT_DIR):\n    last_checkpoint = get_last_checkpoint(OUTPUT_DIR)\n\nif last_checkpoint is not None:\n    print(f\"Resuming training from checkpoint: {last_checkpoint}\")\n    trainer.train(resume_from_checkpoint=last_checkpoint)\nelse:\n    print(\"Starting fresh training run\")\n    trainer.train()\n\nprint(f\"End time: {datetime.now().strftime('%Y-%m-%d_%H-%M-%S')}\")\n\n\n# WandB logging of eval metrics\nfor log in trainer.state.log_history:\n    if 'eval_loss' in log:\n        wandb.log({\n            \"eval_loss\": log['eval_loss'],\n            \"eval_perplexity\": math.exp(log['eval_loss']),\n            \"step\": log['step'],\n            \"learning_rate\": learning_rate,\n            \"weight_decay\": weight_decay,\n            \"betas\": betas,\n            \"warmup_ratio\": warmup_ratio,\n            \"effective_batch_size\": effective_batch_size,\n            \"optimizer\": optimizer\n        })\n\nwandb.finish()  # finish the run</code></pre>","solution":"<p>Thanks for the suggestion<br>\nIt turned out the issue was environment-related — I was able to get the expected results using the exact same code on Colab. In my local environment, clearing the caches for transformers, torch, etc., and upgrading all the libraries resolved the problem.</p>","evaluation":{"extracted_final_answer":"<p>Thanks for the suggestion<br>\nIt turned out the issue was environment-related — I was able to get the expected results using the exact same code on Colab. In my local environment, clearing the caches for transformers, torch, etc., and upgrading all the libraries resolved the problem.</p>","reasoning":"The extracted_final_answer matches the correct_answer exactly, with no differences in wording or meaning. Therefore, it is included in the extracted_final_answer without any discrepancies.","correct":"yes","confidence":100}}
-{"discussion_title":"Problem with pyannote.audio==3.1.0","discussion_url":"https://discuss.huggingface.co/t/problem-with-pyannote-audio-3-1-0/169326","discussion_topic_id":169326,"discussion_category":5,"discussion_created_at":"2025-10-21T13:54:38.497000Z","thread":[{"id":243920,"name":"MAJH","username":"aldkela","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/a/4bbf92/{size}.png","created_at":"2025-10-21T13:54:38.567Z","cooked":"<p>Hello, I was trying to use model named pyannote/speaker-diarization-3.1</p>\n<p>so I installed some libraries as below</p>\n<pre><code class=\"lang-auto\">%pip install pyannote.audio==3.1.0\n%pip install numpy==1.26\n</code></pre>\n<p>Here is the result and I think I installed this properly…</p>\n<pre><code class=\"lang-auto\">Collecting pyannote.audio==3.1.0\n  Using cached pyannote.audio-3.1.0-py2.py3-none-any.whl.metadata (7.8 kB)\nRequirement already satisfied: asteroid-filterbanks&gt;=0.4 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (0.4.0)\nRequirement already satisfied: einops&gt;=0.6.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (0.8.1)\nRequirement already satisfied: huggingface-hub&gt;=0.13.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (0.35.3)\nRequirement already satisfied: lightning&gt;=2.0.1 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (2.5.5)\nRequirement already satisfied: omegaconf&lt;3.0,&gt;=2.1 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (2.3.0)\nRequirement already satisfied: pyannote.core&gt;=5.0.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (6.0.1)\nRequirement already satisfied: pyannote.database&gt;=5.0.1 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (6.1.0)\nRequirement already satisfied: pyannote.metrics&gt;=3.2 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (4.0.0)\nRequirement already satisfied: pyannote.pipeline&gt;=3.0.1 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (4.0.0)\nRequirement already satisfied: pytorch-metric-learning&gt;=2.1.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (2.9.0)\nRequirement already satisfied: rich&gt;=12.0.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (14.2.0)\nRequirement already satisfied: semver&gt;=3.0.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (3.0.4)\nRequirement already satisfied: soundfile&gt;=0.12.1 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (0.13.1)\nRequirement already satisfied: speechbrain&gt;=0.5.14 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (1.0.3)\nRequirement already satisfied: tensorboardX&gt;=2.6 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (2.6.4)\nRequirement already satisfied: torch&gt;=2.0.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (2.9.0+cu126)\nRequirement already satisfied: torch-audiomentations&gt;=0.11.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (0.12.0)\nRequirement already satisfied: torchaudio&gt;=2.0.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (2.9.0)\nRequirement already satisfied: torchmetrics&gt;=0.11.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (1.8.2)\nRequirement already satisfied: antlr4-python3-runtime==4.9.* in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from omegaconf&lt;3.0,&gt;=2.1-&gt;pyannote.audio==3.1.0) (4.9.3)\nRequirement already satisfied: PyYAML&gt;=5.1.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from omegaconf&lt;3.0,&gt;=2.1-&gt;pyannote.audio==3.1.0) (6.0.3)\nRequirement already satisfied: numpy in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from asteroid-filterbanks&gt;=0.4-&gt;pyannote.audio==3.1.0) (1.26.0)\nRequirement already satisfied: typing-extensions in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from asteroid-filterbanks&gt;=0.4-&gt;pyannote.audio==3.1.0) (4.15.0)\n...\n    Uninstalling numpy-2.3.4:\n      Successfully uninstalled numpy-2.3.4\nSuccessfully installed numpy-1.26.0\nNote: you may need to restart the kernel to use updated packages.\nOutput is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...\nERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\npyannote-core 6.0.1 requires numpy&gt;=2.0, but you have numpy 1.26.0 which is incompatible.\npyannote-metrics 4.0.0 requires numpy&gt;=2.2.2, but you have numpy 1.26.0 which is incompatible.\n</code></pre>\n<p>I ran this code to load the ffmpeg</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">from pathlib import Path\nimport os, sys\n\nffmpeg_dll_dir = Path(r\"C:\\Users\\majh0\\miniconda3\\Library\\bin\")  \nassert ffmpeg_dll_dir.exists(), ffmpeg_dll_dir\nos.add_dll_directory(str(ffmpeg_dll_dir))  \n\nimport torch, torchcodec, platform, subprocess\nprint(\"exe:\", sys.executable)\nprint(\"torch\", torch.__version__, \"torchcodec\", torchcodec.__version__, \"py\", platform.python_version())\nsubprocess.run([\"ffmpeg\", \"-version\"], check=True)\nprint(\"cuda torch?\",torch.cuda.is_available())\n</code></pre>\n<p>and the result looks fine to me..</p>\n<pre><code class=\"lang-auto\">exe: c:\\GPT_AGENT_2025_BOOK\\venv\\Scripts\\python.exe\ntorch 2.9.0+cu126 torchcodec 0.8.0 py 3.12.9\ncuda torch? True\n</code></pre>\n<p>I ran this code and it gave me an error as below…</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\"># instantiate the pipeline\nimport torch\nfrom pyannote.audio import Pipeline\npipeline = Pipeline.from_pretrained(\n  \"pyannote/speaker-diarization-3.1\",\n  token=\"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\")\n\n\nif torch.cuda.is_available():\n    pipeline.to(torch.device(\"cuda\"))\n    print(\"Using CUDA\")\nelse:\n    print(\"Using CPU\")\n</code></pre>\n<pre><code class=\"lang-auto\">---------------------------------------------------------------------------\nAttributeError                            Traceback (most recent call last)\nCell In[3], line 3\n      1 # instantiate the pipeline\n      2 import torch\n----&gt; 3 from pyannote.audio import Pipeline\n      4 pipeline = Pipeline.from_pretrained(\n      5   \"pyannote/speaker-diarization-3.1\",\n      6   token=\"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\")\n      9 if torch.cuda.is_available():\n\nFile c:\\GPT_AGENT_2025_BOOK\\venv\\Lib\\site-packages\\pyannote\\audio\\__init__.py:29\n     25 except ImportError:\n     26     pass\n---&gt; 29 from .core.inference import Inference\n     30 from .core.io import Audio\n     31 from .core.model import Model\n\nFile c:\\GPT_AGENT_2025_BOOK\\venv\\Lib\\site-packages\\pyannote\\audio\\core\\inference.py:36\n     33 from pyannote.core import Segment, SlidingWindow, SlidingWindowFeature\n     34 from pytorch_lightning.utilities.memory import is_oom_error\n---&gt; 36 from pyannote.audio.core.io import AudioFile\n     37 from pyannote.audio.core.model import Model, Specifications\n     38 from pyannote.audio.core.task import Resolution\n...\n     49     - a \"str\" or \"Path\" instance: \"audio.wav\" or Path(\"audio.wav\")\n   (...)     56 integer to load a specific channel: {\"audio\": \"stereo.wav\", \"channel\": 0}\n     57 \"\"\"\n\nAttributeError: module 'torchaudio' has no attribute 'set_audio_backend'\n</code></pre>\n<p>I have checked the document and it says I need to install <a href=\"https://github.com/pyannote/pyannote-audio\" rel=\"noopener nofollow ugc\"><code>pyannote.audio</code></a> <code>3.1</code></p>\n<p>I don’t know why this thing doesn’t work…. I tried to solve this problem for 3hrs changing version of pyannote.audio but this thing didn’t give me solution..</p>\n<p>Do I need to delete venv and reinstall it clearly..?</p>\n<p>Thank you so much for the help in advance..</p>","post_number":1,"post_type":1,"posts_count":6,"updated_at":"2025-10-21T14:42:42.475Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":84,"reads":5,"readers_count":4,"score":221.0,"yours":false,"topic_id":169326,"topic_slug":"problem-with-pyannote-audio-3-1-0","display_username":"MAJH","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":2,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/pyannote/pyannote-audio","internal":false,"reflection":false,"title":"GitHub - pyannote/pyannote-audio: Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding","clicks":0}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":105819,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/problem-with-pyannote-audio-3-1-0/169326/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":243939,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-22T02:49:32.789Z","cooked":"<p>Seems library version incompatibility…</p>\n<hr>\n<p>Your import error comes from an API removal in torchaudio and an incompatible NumPy pin. Fix by upgrading <code>pyannote.audio</code> and undoing the NumPy downgrade. Keep your Torch 2.9 stack.</p>\n<h1><a name=\"p-243939-tldr-fix-1\" class=\"anchor\" href=\"#p-243939-tldr-fix-1\"></a>TL;DR fix</h1>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\"># clean conflicting pins\npip uninstall -y pyannote.audio pyannote.core pyannote.metrics pyannote.pipeline pyannote.database numpy\n\n# install a compatible, modern set\npip install --upgrade \"numpy&gt;=2.3\" \"pyannote.audio&gt;=4.0.1\" --prefer-binary\n# keep your existing torch==2.9.*, torchaudio==2.9.* and torchcodec\n</code></pre>\n<p><code>pyannote.audio&gt;=4</code> removed the old torchaudio backend call and uses FFmpeg via <code>torchcodec</code>, so the import works on torchaudio≥2.2. NumPy≥2.x satisfies <code>pyannote-core</code> and <code>pyannote-metrics</code>. (<a href=\"https://github.com/pyannote/pyannote-audio/releases\" title=\"Releases · pyannote/pyannote-audio\">GitHub</a>)</p>\n<p>Then restart the kernel once. Verify:</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\"># refs:\n# - torchaudio dispatcher notes: https://docs.pytorch.org/audio/main/torchaudio.html\n# - pyannote model card: https://huggingface.co/pyannote/speaker-diarization-3.1\nimport torchaudio, torchcodec\nprint(\"backends:\", torchaudio.list_audio_backends())  # should show 'ffmpeg' and/or 'soundfile'\nfrom pyannote.audio import Pipeline\npipe = Pipeline.from_pretrained(\"pyannote/speaker-diarization-3.1\", token=\"hf_xxx\")  # do not hardcode secrets\n</code></pre>\n<p><code>set_audio_backend</code> was deprecated, then removed in torchaudio 2.2+, which is why <code>pyannote.audio==3.1.0</code> fails to import on your current torchaudio. (<a href=\"https://docs.pytorch.org/audio/main/torchaudio.html\" title=\"Torchaudio 2.8.0 documentation\">PyTorch Docs</a>)</p>\n<h1><a name=\"p-243939-why-your-install-failed-2\" class=\"anchor\" href=\"#p-243939-why-your-install-failed-2\"></a>Why your install failed</h1>\n<ul>\n<li><code>pyannote.audio==3.1.0</code> calls <code>torchaudio.set_audio_backend(\"soundfile\")</code>. That function is gone in torchaudio≥2.2, so import raises <code>AttributeError</code>. Upgrading pyannote fixes it because 4.x removed that path. (<a href=\"https://github.com/pyannote/pyannote-audio/issues/1576\" title=\"Removing torchaudio.set_audio_backend(”soundfile”) #1576\">GitHub</a>)</li>\n<li>You forced <code>numpy==1.26</code>. Current pyannote ecosystem components require NumPy≥2.0 (core) and ≥2.2.2 (metrics). Pip warned correctly. Use NumPy≥2.3. (<a href=\"https://github.com/huggingface/transformers/issues/41230\" title=\"Consider forking and maintaining pyctcdecode #41230\">GitHub</a>)</li>\n</ul>\n<h1><a name=\"p-243939-if-you-must-stay-on-pyannoteaudio310-not-recommended-3\" class=\"anchor\" href=\"#p-243939-if-you-must-stay-on-pyannoteaudio310-not-recommended-3\"></a>If you must stay on <code>pyannote.audio==3.1.0</code> (not recommended)</h1>\n<p>Pick one, not both:</p>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\"># Legacy stack that still has set_audio_backend\npip install \"torch&lt;=2.1.2\" \"torchaudio&lt;=2.1.2\" \"numpy&gt;=2.0,&lt;3\" \"pyannote.audio==3.1.0\"\n</code></pre>\n<p>or a temporary shim:</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\"># WARNING: local hack to import 3.1.0 with new torchaudio\nimport torchaudio\nif not hasattr(torchaudio, \"set_audio_backend\"):\n    torchaudio.set_audio_backend = lambda *a, **k: None\n    torchaudio.get_audio_backend = lambda: \"soundfile\"\nfrom pyannote.audio import Pipeline\n</code></pre>\n<p>The first aligns versions to when the API existed. The second bypasses the call so you can upgrade later. (<a href=\"https://docs.pytorch.org/audio/main/torchaudio.html\" title=\"Torchaudio 2.8.0 documentation\">PyTorch Docs</a>)</p>\n<h1><a name=\"p-243939-gating-and-ffmpeg-checks-4\" class=\"anchor\" href=\"#p-243939-gating-and-ffmpeg-checks-4\"></a>Gating and FFmpeg checks</h1>\n<ul>\n<li>Accept the model terms for <code>pyannote/speaker-diarization-3.1</code> on Hugging Face and pass a valid token, or downloads will fail. (<a href=\"https://huggingface.co/pyannote/speaker-diarization-3.1\" title=\"pyannote/speaker-diarization-3.1\">Hugging Face</a>)</li>\n<li><code>pyannote.audio&gt;=4</code> expects FFmpeg via <code>torchcodec</code>. You already verified FFmpeg and <code>torchcodec</code>, which matches the 4.x I/O design. (<a href=\"https://github.com/pyannote/pyannote-audio/releases\" title=\"Releases · pyannote/pyannote-audio\">GitHub</a>)</li>\n</ul>\n<h1><a name=\"p-243939-sanity-test-end-to-end-5\" class=\"anchor\" href=\"#p-243939-sanity-test-end-to-end-5\"></a>Sanity test end-to-end</h1>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\"># refs in comments:\n# https://huggingface.co/pyannote/speaker-diarization-3.1\n# https://docs.pytorch.org/audio/main/torchaudio.html\nimport torch\nfrom pyannote.audio import Pipeline\npipe = Pipeline.from_pretrained(\"pyannote/speaker-diarization-3.1\", token=\"hf_xxx\")\nif torch.cuda.is_available():\n    pipe.to(\"cuda\")\nresult = pipe(\"sample.wav\")  # 16 kHz mono recommended\nprint(result)\n</code></pre>\n<p>The model card confirms “pyannote.audio version 3.1 or higher,” so using 4.x is valid and simpler on modern Torch. (<a href=\"https://huggingface.co/collinbarnwell/pyannote-speaker-diarization-31\" title=\"collinbarnwell/pyannote-speaker-diarization-31\">Hugging Face</a>)</p>\n<h1><a name=\"p-243939-extra-context-and-references-6\" class=\"anchor\" href=\"#p-243939-extra-context-and-references-6\"></a>Extra context and references</h1>\n<ul>\n<li>Torchaudio 2.2+ removed <code>set_audio_backend</code> and switched to a dispatcher. That is the precise cause of your <code>AttributeError</code>. (<a href=\"https://docs.pytorch.org/audio/main/torchaudio.html\" title=\"Torchaudio 2.8.0 documentation\">PyTorch Docs</a>)</li>\n<li>pyannote 4.x release notes: removed <code>sox</code>/<code>soundfile</code> backends; use FFmpeg or in-memory audio. Explains why 4.x works on Windows with <code>torchcodec</code>. (<a href=\"https://github.com/pyannote/pyannote-audio/releases\" title=\"Releases · pyannote/pyannote-audio\">GitHub</a>)</li>\n<li>NumPy≥2 requirement in the pyannote stack. Avoid forcing 1.26. (<a href=\"https://github.com/huggingface/transformers/issues/41230\" title=\"Consider forking and maintaining pyctcdecode #41230\">GitHub</a>)</li>\n</ul>\n<p>Deleting the venv is optional. Uninstall→reinstall with the versions above and one kernel restart is sufficient.</p>","post_number":2,"post_type":1,"posts_count":6,"updated_at":"2025-10-22T02:50:15.452Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":4,"reads":4,"readers_count":3,"score":15.8,"yours":false,"topic_id":169326,"topic_slug":"problem-with-pyannote-audio-3-1-0","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/pyannote/pyannote-audio/releases","internal":false,"reflection":false,"title":"Releases · pyannote/pyannote-audio · GitHub","clicks":1},{"url":"https://github.com/pyannote/pyannote-audio/issues/1576","internal":false,"reflection":false,"title":"Removing torchaudio.set_audio_backend(\"soundfile\") · Issue #1576 · pyannote/pyannote-audio · GitHub","clicks":1},{"url":"https://github.com/huggingface/transformers/issues/41230","internal":false,"reflection":false,"title":"Consider forking and maintaining pyctcdecode or switch to torchaudio.models.decoder · Issue #41230 · huggingface/transformers · GitHub","clicks":0},{"url":"https://huggingface.co/pyannote/speaker-diarization-3.1","internal":false,"reflection":false,"title":"pyannote/speaker-diarization-3.1 · Hugging Face","clicks":0},{"url":"https://docs.pytorch.org/audio/main/torchaudio.html","internal":false,"reflection":false,"title":"torchaudio — Torchaudio 2.8.0 documentation","clicks":0},{"url":"https://huggingface.co/collinbarnwell/pyannote-speaker-diarization-31","internal":false,"reflection":false,"title":"collinbarnwell/pyannote-speaker-diarization-31 · Hugging Face","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/problem-with-pyannote-audio-3-1-0/169326/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243955,"name":"MAJH","username":"aldkela","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/a/4bbf92/{size}.png","created_at":"2025-10-22T12:34:52.198Z","cooked":"<p>Hello! Thank you so much!! I realized.. I should read the error msg properly to solve the problem!!! xD</p>\n<p>I have one more problem….</p>\n<p>I made a code as below..</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">from pathlib import Path\nimport os, sys\n\nffmpeg_dll_dir = Path(r\"C:\\Users\\majh0\\miniconda3\\Library\\bin\")  \nassert ffmpeg_dll_dir.exists(), ffmpeg_dll_dir\nos.add_dll_directory(str(ffmpeg_dll_dir))  \n\nimport torch, torchcodec, platform, subprocess\nprint(\"exe:\", sys.executable)\nprint(\"torch\", torch.__version__, \"torchcodec\", torchcodec.__version__, \"py\", platform.python_version())\nsubprocess.run([\"ffmpeg\", \"-version\"], check=True)\nprint(\"cuda torch?\",torch.cuda.is_available())\n\n# instantiate the pipeline\nimport torch\nfrom pyannote.audio import Pipeline\n\npipeline = Pipeline.from_pretrained(\n  \"pyannote/speaker-diarization-3.1\",\n  token=\"my token\")\n\n\nif torch.cuda.is_available():\n    pipeline.to(torch.device(\"cuda\"))\n    print(\"Using CUDA\")\nelse:\n    print(\"Using CPU\")\n\naudio_file =\"./guitar.wav\"\ndiarization = pipeline(audio_file)\n\n# dump the diarization output to disk using RTTM format\nwith open(\"./guitar.rttm\", \"w\", encoding=\"utf-8\") as rttm:\n    diarization.write_rttm(rttm)\n</code></pre>\n<p>this thing gave me error as below…</p>\n<pre><code class=\"lang-auto\">---------------------------------------------------------------------------\nAttributeError                            Traceback (most recent call last)\nCell In[15], line 6\n      4 # dump the diarization output to disk using RTTM format\n      5 with open(\"./guitar.rttm\", \"w\", encoding=\"utf-8\") as rttm:\n----&gt; 6     diarization.write_rttm(rttm)\n\nAttributeError: 'DiarizeOutput' object has no attribute 'write_rttm'\n</code></pre>\n<p>This thing is hard to understand for me… because I literally typed “diarization.write_rttm(rttm)” same with the example of this document like picture below <a href=\"https://huggingface.co/pyannote/speaker-diarization-3.1\">https://huggingface.co/pyannote/speaker-diarization-3.1</a></p>\n<p><div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/e/1/e12f6fb814a9818839879f59f631cf0ed994b78d.png\" data-download-href=\"/uploads/short-url/w853TGQotS8EsELlrorkptlyDgN.png?dl=1\" title=\"image\" rel=\"noopener nofollow ugc\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/e/1/e12f6fb814a9818839879f59f631cf0ed994b78d.png\" alt=\"image\" data-base62-sha1=\"w853TGQotS8EsELlrorkptlyDgN\" width=\"690\" height=\"324\" data-dominant-color=\"202222\"><div class=\"meta\"><svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">image</span><span class=\"informations\">768×361 15.6 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg></div></a></div></p>\n<p>the name of the function “write_rttm” has changed? then is there any way to check the new name of it..?</p>\n<p>or did I make another mistake again..?</p>\n<p>I think I am bothering you too much.. but thank you so much for your help..</p>","post_number":3,"post_type":1,"posts_count":6,"updated_at":"2025-10-22T12:34:52.198Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":4,"readers_count":3,"score":15.8,"yours":false,"topic_id":169326,"topic_slug":"problem-with-pyannote-audio-3-1-0","display_username":"MAJH","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/pyannote/speaker-diarization-3.1","internal":false,"reflection":false,"title":"pyannote/speaker-diarization-3.1 · Hugging Face","clicks":0}],"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":105819,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/problem-with-pyannote-audio-3-1-0/169326/3","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243956,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-22T12:48:54.185Z","cooked":"<p>It seems like a partial hit.<img src=\"https://emoji.discourse-cdn.com/apple/sweat_smile.png?v=14\" title=\":sweat_smile:\" class=\"emoji\" alt=\":sweat_smile:\" loading=\"lazy\" width=\"20\" height=\"20\"> The cause is a specification change due to a library version upgrade, but it appears to be because the returned object changed, not because the function itself changed.</p>\n<hr>\n<p>You’re on <code>pyannote.audio</code> 4.x. In 4.x the pipeline returns a <strong><code>DiarizeOutput</code></strong> object, not an <code>Annotation</code>. The <code>Annotation</code> lives at <code>output.speaker_diarization</code>. <code>write_rttm</code> is a method of <code>Annotation</code>, so call it there.</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">from pyannote.audio import Pipeline\nimport torch\n\npipeline = Pipeline.from_pretrained(\n    \"pyannote/speaker-diarization-3.1\",\n    token=\"YOUR_HF_TOKEN\"\n)\nif torch.cuda.is_available():\n    pipeline.to(\"cuda\")\n\nout = pipeline(\"./guitar.wav\")                   # out is DiarizeOutput\nann = out.speaker_diarization                    # this is an Annotation\n\nwith open(\"./guitar.rttm\", \"w\", encoding=\"utf-8\") as f:\n    ann.write_rttm(f)\n</code></pre>\n<p>Evidence</p>\n<ul>\n<li>The current README shows usage as <code>output = pipeline(...); for turn, spk in output.speaker_diarization: ...</code>, proving the wrapper return type in 4.x. (<a href=\"https://github.com/pyannote/pyannote-audio\" title=\"GitHub - pyannote/pyannote-audio: Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding\">GitHub</a>)</li>\n<li><code>write_rttm</code> is defined on <code>pyannote.core.Annotation</code>, not on the wrapper. (<a href=\"https://pyannote.github.io/pyannote-core/_modules/pyannote/core/annotation.html\" title=\"Source code for pyannote.core.annotation\">pyannote.github.io</a>)</li>\n<li>The model card snippet you followed is the legacy 3.1 example that returned an <code>Annotation</code> directly. That is why your call failed on 4.x. (<a href=\"https://huggingface.co/pyannote/speaker-diarization-3.1\" title=\"pyannote/speaker-diarization-3.1\">Hugging Face</a>)</li>\n</ul>\n<p>Option if you want the old behavior: pin to the legacy stack (<code>pyannote.audio==3.1.x</code>) where <code>pipeline(...)</code> returns an <code>Annotation</code>, and the snippet <code>diarization.write_rttm(...)</code> works as-is. Note 4.x introduced several breaking changes, including API renames. (<a href=\"https://github.com/pyannote/pyannote-audio/releases\" title=\"Releases · pyannote/pyannote-audio\">GitHub</a>)</p>","post_number":4,"post_type":1,"posts_count":6,"updated_at":"2025-10-22T12:48:54.185Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":2,"reads":2,"readers_count":1,"score":25.4,"yours":false,"topic_id":169326,"topic_slug":"problem-with-pyannote-audio-3-1-0","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/pyannote/speaker-diarization-3.1","internal":false,"reflection":false,"title":"pyannote/speaker-diarization-3.1 · Hugging Face","clicks":1},{"url":"https://github.com/pyannote/pyannote-audio","internal":false,"reflection":false,"title":"GitHub - pyannote/pyannote-audio: Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding","clicks":1},{"url":"https://pyannote.github.io/pyannote-core/_modules/pyannote/core/annotation.html","internal":false,"reflection":false,"title":"pyannote.core.annotation — pyannote.core 6.0.2.dev0+gb83999a4e.d20250916 documentation","clicks":1},{"url":"https://github.com/pyannote/pyannote-audio/releases","internal":false,"reflection":false,"title":"Releases · pyannote/pyannote-audio · GitHub","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/problem-with-pyannote-audio-3-1-0/169326/4","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":244024,"name":"MAJH","username":"aldkela","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/a/4bbf92/{size}.png","created_at":"2025-10-23T18:31:44.078Z","cooked":"<p>Hello, finally it works!!!</p>\n<p>I thought I made mistake again.. I didn’t even think there was a change due to a library version upgrade..</p>\n<p>Thank you so much now I can use this model without any problem!!!</p>","post_number":5,"post_type":1,"posts_count":6,"updated_at":"2025-10-23T18:31:44.078Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":2,"readers_count":1,"score":20.4,"yours":false,"topic_id":169326,"topic_slug":"problem-with-pyannote-audio-3-1-0","display_username":"MAJH","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":105819,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/problem-with-pyannote-audio-3-1-0/169326/5","reactions":[{"id":"confetti_ball","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":244046,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-10-24T06:32:17.200Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":6,"post_type":3,"posts_count":6,"updated_at":"2025-10-24T06:32:17.200Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":1,"readers_count":0,"score":0.2,"yours":false,"topic_id":169326,"topic_slug":"problem-with-pyannote-audio-3-1-0","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/problem-with-pyannote-audio-3-1-0/169326/6","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Hello, I was trying to use model named pyannote/speaker-diarization-3.1</p>\n<p>so I installed some libraries as below</p>\n<pre><code class=\"lang-auto\">%pip install pyannote.audio==3.1.0\n%pip install numpy==1.26\n</code></pre>\n<p>Here is the result and I think I installed this properly…</p>\n<pre><code class=\"lang-auto\">Collecting pyannote.audio==3.1.0\n  Using cached pyannote.audio-3.1.0-py2.py3-none-any.whl.metadata (7.8 kB)\nRequirement already satisfied: asteroid-filterbanks&gt;=0.4 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (0.4.0)\nRequirement already satisfied: einops&gt;=0.6.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (0.8.1)\nRequirement already satisfied: huggingface-hub&gt;=0.13.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (0.35.3)\nRequirement already satisfied: lightning&gt;=2.0.1 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (2.5.5)\nRequirement already satisfied: omegaconf&lt;3.0,&gt;=2.1 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (2.3.0)\nRequirement already satisfied: pyannote.core&gt;=5.0.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (6.0.1)\nRequirement already satisfied: pyannote.database&gt;=5.0.1 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (6.1.0)\nRequirement already satisfied: pyannote.metrics&gt;=3.2 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (4.0.0)\nRequirement already satisfied: pyannote.pipeline&gt;=3.0.1 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (4.0.0)\nRequirement already satisfied: pytorch-metric-learning&gt;=2.1.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (2.9.0)\nRequirement already satisfied: rich&gt;=12.0.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (14.2.0)\nRequirement already satisfied: semver&gt;=3.0.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (3.0.4)\nRequirement already satisfied: soundfile&gt;=0.12.1 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (0.13.1)\nRequirement already satisfied: speechbrain&gt;=0.5.14 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (1.0.3)\nRequirement already satisfied: tensorboardX&gt;=2.6 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (2.6.4)\nRequirement already satisfied: torch&gt;=2.0.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (2.9.0+cu126)\nRequirement already satisfied: torch-audiomentations&gt;=0.11.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (0.12.0)\nRequirement already satisfied: torchaudio&gt;=2.0.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (2.9.0)\nRequirement already satisfied: torchmetrics&gt;=0.11.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from pyannote.audio==3.1.0) (1.8.2)\nRequirement already satisfied: antlr4-python3-runtime==4.9.* in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from omegaconf&lt;3.0,&gt;=2.1-&gt;pyannote.audio==3.1.0) (4.9.3)\nRequirement already satisfied: PyYAML&gt;=5.1.0 in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from omegaconf&lt;3.0,&gt;=2.1-&gt;pyannote.audio==3.1.0) (6.0.3)\nRequirement already satisfied: numpy in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from asteroid-filterbanks&gt;=0.4-&gt;pyannote.audio==3.1.0) (1.26.0)\nRequirement already satisfied: typing-extensions in c:\\gpt_agent_2025_book\\venv\\lib\\site-packages (from asteroid-filterbanks&gt;=0.4-&gt;pyannote.audio==3.1.0) (4.15.0)\n...\n    Uninstalling numpy-2.3.4:\n      Successfully uninstalled numpy-2.3.4\nSuccessfully installed numpy-1.26.0\nNote: you may need to restart the kernel to use updated packages.\nOutput is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...\nERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\npyannote-core 6.0.1 requires numpy&gt;=2.0, but you have numpy 1.26.0 which is incompatible.\npyannote-metrics 4.0.0 requires numpy&gt;=2.2.2, but you have numpy 1.26.0 which is incompatible.\n</code></pre>\n<p>I ran this code to load the ffmpeg</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">from pathlib import Path\nimport os, sys\n\nffmpeg_dll_dir = Path(r\"C:\\Users\\majh0\\miniconda3\\Library\\bin\")  \nassert ffmpeg_dll_dir.exists(), ffmpeg_dll_dir\nos.add_dll_directory(str(ffmpeg_dll_dir))  \n\nimport torch, torchcodec, platform, subprocess\nprint(\"exe:\", sys.executable)\nprint(\"torch\", torch.__version__, \"torchcodec\", torchcodec.__version__, \"py\", platform.python_version())\nsubprocess.run([\"ffmpeg\", \"-version\"], check=True)\nprint(\"cuda torch?\",torch.cuda.is_available())\n</code></pre>\n<p>and the result looks fine to me..</p>\n<pre><code class=\"lang-auto\">exe: c:\\GPT_AGENT_2025_BOOK\\venv\\Scripts\\python.exe\ntorch 2.9.0+cu126 torchcodec 0.8.0 py 3.12.9\ncuda torch? True\n</code></pre>\n<p>I ran this code and it gave me an error as below…</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\"># instantiate the pipeline\nimport torch\nfrom pyannote.audio import Pipeline\npipeline = Pipeline.from_pretrained(\n  \"pyannote/speaker-diarization-3.1\",\n  token=\"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\")\n\n\nif torch.cuda.is_available():\n    pipeline.to(torch.device(\"cuda\"))\n    print(\"Using CUDA\")\nelse:\n    print(\"Using CPU\")\n</code></pre>\n<pre><code class=\"lang-auto\">---------------------------------------------------------------------------\nAttributeError                            Traceback (most recent call last)\nCell In[3], line 3\n      1 # instantiate the pipeline\n      2 import torch\n----&gt; 3 from pyannote.audio import Pipeline\n      4 pipeline = Pipeline.from_pretrained(\n      5   \"pyannote/speaker-diarization-3.1\",\n      6   token=\"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\")\n      9 if torch.cuda.is_available():\n\nFile c:\\GPT_AGENT_2025_BOOK\\venv\\Lib\\site-packages\\pyannote\\audio\\__init__.py:29\n     25 except ImportError:\n     26     pass\n---&gt; 29 from .core.inference import Inference\n     30 from .core.io import Audio\n     31 from .core.model import Model\n\nFile c:\\GPT_AGENT_2025_BOOK\\venv\\Lib\\site-packages\\pyannote\\audio\\core\\inference.py:36\n     33 from pyannote.core import Segment, SlidingWindow, SlidingWindowFeature\n     34 from pytorch_lightning.utilities.memory import is_oom_error\n---&gt; 36 from pyannote.audio.core.io import AudioFile\n     37 from pyannote.audio.core.model import Model, Specifications\n     38 from pyannote.audio.core.task import Resolution\n...\n     49     - a \"str\" or \"Path\" instance: \"audio.wav\" or Path(\"audio.wav\")\n   (...)     56 integer to load a specific channel: {\"audio\": \"stereo.wav\", \"channel\": 0}\n     57 \"\"\"\n\nAttributeError: module 'torchaudio' has no attribute 'set_audio_backend'\n</code></pre>\n<p>I have checked the document and it says I need to install <a href=\"https://github.com/pyannote/pyannote-audio\" rel=\"noopener nofollow ugc\"><code>pyannote.audio</code></a> <code>3.1</code></p>\n<p>I don’t know why this thing doesn’t work…. I tried to solve this problem for 3hrs changing version of pyannote.audio but this thing didn’t give me solution..</p>\n<p>Do I need to delete venv and reinstall it clearly..?</p>\n<p>Thank you so much for the help in advance..</p>","solution":"<p>It seems like a partial hit.<img src=\"https://emoji.discourse-cdn.com/apple/sweat_smile.png?v=14\" title=\":sweat_smile:\" class=\"emoji\" alt=\":sweat_smile:\" loading=\"lazy\" width=\"20\" height=\"20\"> The cause is a specification change due to a library version upgrade, but it appears to be because the returned object changed, not because the function itself changed.</p>\n<hr>\n<p>You’re on <code>pyannote.audio</code> 4.x. In 4.x the pipeline returns a <strong><code>DiarizeOutput</code></strong> object, not an <code>Annotation</code>. The <code>Annotation</code> lives at <code>output.speaker_diarization</code>. <code>write_rttm</code> is a method of <code>Annotation</code>, so call it there.</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">from pyannote.audio import Pipeline\nimport torch\n\npipeline = Pipeline.from_pretrained(\n    \"pyannote/speaker-diarization-3.1\",\n    token=\"YOUR_HF_TOKEN\"\n)\nif torch.cuda.is_available():\n    pipeline.to(\"cuda\")\n\nout = pipeline(\"./guitar.wav\")                   # out is DiarizeOutput\nann = out.speaker_diarization                    # this is an Annotation\n\nwith open(\"./guitar.rttm\", \"w\", encoding=\"utf-8\") as f:\n    ann.write_rttm(f)\n</code></pre>\n<p>Evidence</p>\n<ul>\n<li>The current README shows usage as <code>output = pipeline(...); for turn, spk in output.speaker_diarization: ...</code>, proving the wrapper return type in 4.x. (<a href=\"https://github.com/pyannote/pyannote-audio\" title=\"GitHub - pyannote/pyannote-audio: Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding\">GitHub</a>)</li>\n<li><code>write_rttm</code> is defined on <code>pyannote.core.Annotation</code>, not on the wrapper. (<a href=\"https://pyannote.github.io/pyannote-core/_modules/pyannote/core/annotation.html\" title=\"Source code for pyannote.core.annotation\">pyannote.github.io</a>)</li>\n<li>The model card snippet you followed is the legacy 3.1 example that returned an <code>Annotation</code> directly. That is why your call failed on 4.x. (<a href=\"https://huggingface.co/pyannote/speaker-diarization-3.1\" title=\"pyannote/speaker-diarization-3.1\">Hugging Face</a>)</li>\n</ul>\n<p>Option if you want the old behavior: pin to the legacy stack (<code>pyannote.audio==3.1.x</code>) where <code>pipeline(...)</code> returns an <code>Annotation</code>, and the snippet <code>diarization.write_rttm(...)</code> works as-is. Note 4.x introduced several breaking changes, including API renames. (<a href=\"https://github.com/pyannote/pyannote-audio/releases\" title=\"Releases · pyannote/pyannote-audio\">GitHub</a>)</p>","evaluation":{"extracted_final_answer":"<p>It seems like a partial hit.<img src=\"https://emoji.discourse-cdn.com/apple/sweat_smile.png?v=14\" title=\":sweat_smile:\" class=\"emoji\" alt=\":sweat_smile:\" loading=\"lazy\" width=\"20\" height=\"20\"> The cause is a specification change due to a library version upgrade, but it appears to be because the returned object changed, not because the function itself changed.</p>\n<hr>\n<p>You’re on <code>pyannote.audio</code> 4.x. In 4.x the pipeline returns a <strong><code>DiarizeOutput</code></strong> object, not an <code>Annotation</code>. The <code>Annotation</code> lives at <code>output.speaker_diarization</code>. <code>write_rttm</code> is a method of <code>Annotation</code>, so call it there.</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">from pyannote.audio import Pipeline\nimport torch\n\npipeline = Pipeline.from_pretrained(\n    \"pyannote/speaker-diarization-3.1\",\n    token=\"YOUR_HF_TOKEN\"\n)\nif torch.cuda.is_available():\n    pipeline.to(\"cuda\")\n\nout = pipeline(\"./guitar.wav\")                   # out is DiarizeOutput\nann = out.speaker_diarization                    # this is an Annotation\n\nwith open(\"./guitar.rttm\", \"w\", encoding=\"utf-8\") as f:\n    ann.write_rttm(f)\n</code></pre>\n<p>Evidence</p>\n<ul>\n<li>The current README shows usage as <code>output = pipeline(...); for turn, spk in output.speaker_diarization: ...</code>, proving the wrapper return type in 4.x. (<a href=\"https://github.com/pyannote/pyannote-audio\" title=\"GitHub - pyannote/pyannote-audio: Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding\">GitHub</a>)</li>\n<li><code>write_rttm</code> is defined on <code>pyannote.core.Annotation</code>, not on the wrapper. (<a href=\"https://pyannote.github.io/pyannote-core/_modules/pyannote/core/annotation.html\" title=\"Source code for pyannote.core.annotation\">pyannote.github.io</a>)</li>\n<li>The model card snippet you followed is the legacy 3.1 example that returned an <code>Annotation</code> directly. That is why your call failed on 4.x. (<a href=\"https://huggingface.co/pyannote/speaker-diarization-3.1\" title=\"pyannote/speaker-diarization-3.1\">Hugging Face</a>)</li>\n</ul>\n<p>Option if you want the old behavior: pin to the legacy stack (<code>pyannote.audio==3.1.x</code>) where <code>pipeline(...)</code> returns an <code>Annotation</code>, and the snippet <code>diarization.write_rttm(...)</code> works as-is. Note 4.x introduced several breaking changes, including API renames. (<a href=\"https://github.com/pyannote/pyannote-audio/releases\" title=\"Releases · pyannote/pyannote-audio\">GitHub</a>)</p>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in content or meaning. Therefore, it is included in the response without any ambiguity or inconsistency.","correct":"yes","confidence":100}}
-{"discussion_title":"How to make my customized pipeline consumable for Transformers.js","discussion_url":"https://discuss.huggingface.co/t/how-to-make-my-customized-pipeline-consumable-for-transformers-js/169036","discussion_topic_id":169036,"discussion_category":5,"discussion_created_at":"2025-10-08T15:06:33.223000Z","thread":[{"id":243309,"name":"Sicheng Mao","username":"alephpi","avatar_template":"/user_avatar/discuss.huggingface.co/alephpi/{size}/54288_2.png","created_at":"2025-10-08T15:06:33.311Z","cooked":"<p>Hi community,</p>\n<p>Here is my image-to-text pipeline:</p>\n<p>(<em>customized</em> means not a registered one in official Transformers)</p>\n<p>A <em>customized</em> Image processor,</p>\n<p>A VisionEncoderDecoder, with a <em>customized</em> vision encoder that inherits the PretrainedModel and a MBartDecoder,</p>\n<p>A WordLevel tokenizer (yes I haven’t used a MBartTokenizer and I have distilled my own one for specific corpus).</p>\n<p>I want to consume this pipeline in Transformers.js, however I notice that all examples given in Transformers.js documentation seem like pulling from a ready made Transformers pipeline with official components and configurations, <strong>I just wonder is it possible to turn my customized pipeline consumable for Transformers.js, or to what extent my pipeline could be partially turned to?</strong></p>\n<p>My guess is that the I should make my own image preprocessing step and send the image input tensor to the model, in that way, which kind of js libraries you recommend to use? (It won’t be very intensive, just simply resize and normalize things plus a crop-white-margin function which doesn’t exist in Transformers’ image processors).</p>\n<p><strong>Also  just to be sure, is my VisionEncoderDecoder possible to export to an onnx format to be consumable for Transformers.js?</strong></p>\n<p>Of course my model should be possible to run in browser (and that’s the whole point for me to do this), as it has only 20M parameters (way less than the showcase in Transformers.js)</p>\n<p>Thanks for your help in advance!</p>","post_number":1,"post_type":1,"posts_count":12,"updated_at":"2025-10-08T15:19:25.343Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":26,"reads":9,"readers_count":8,"score":21.6,"yours":false,"topic_id":169036,"topic_slug":"how-to-make-my-customized-pipeline-consumable-for-transformers-js","display_username":"Sicheng Mao","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":2,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://discuss.huggingface.co/t/load-model-from-platform-other-than-hf-hub-and-display-a-progress-bar-by-from-pretrained-in-transformers-js/169364","internal":true,"reflection":true,"title":"Load model from platform other than HF Hub and display a progress bar by `from_pretrained()` in Transformers.js","clicks":0}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":104516,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/how-to-make-my-customized-pipeline-consumable-for-transformers-js/169036/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":243331,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-08T23:15:26.000Z","cooked":"<p>It <a href=\"https://huggingface.co/datasets/John6666/forum1/blob/main/transformer_js_custom_pipeline_1.md\">seems possible</a>. For Transoformers.js, there’s a dedicated channel on the HF Discord, so asking there would be the most reliable option.</p>","post_number":2,"post_type":1,"posts_count":12,"updated_at":"2025-10-08T23:15:26.000Z","reply_count":2,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":8,"readers_count":7,"score":26.4,"yours":false,"topic_id":169036,"topic_slug":"how-to-make-my-customized-pipeline-consumable-for-transformers-js","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/datasets/John6666/forum1/blob/main/transformer_js_custom_pipeline_1.md","internal":false,"reflection":false,"title":"transformer_js_custom_pipeline_1.md · John6666/forum1 at main","clicks":2}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/how-to-make-my-customized-pipeline-consumable-for-transformers-js/169036/2","reactions":[{"id":"heart","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":243351,"name":"Sicheng Mao","username":"alephpi","avatar_template":"/user_avatar/discuss.huggingface.co/alephpi/{size}/54288_2.png","created_at":"2025-10-09T05:47:31.103Z","cooked":"<p>Thanks let me check!</p>","post_number":3,"post_type":1,"posts_count":12,"updated_at":"2025-10-09T05:47:31.103Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":8,"readers_count":7,"score":16.4,"yours":false,"topic_id":169036,"topic_slug":"how-to-make-my-customized-pipeline-consumable-for-transformers-js","display_username":"Sicheng Mao","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":104516,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/how-to-make-my-customized-pipeline-consumable-for-transformers-js/169036/3","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243504,"name":"Sicheng Mao","username":"alephpi","avatar_template":"/user_avatar/discuss.huggingface.co/alephpi/{size}/54288_2.png","created_at":"2025-10-13T17:27:00.991Z","cooked":"<p>Hi John,<br>\nI try to follow your export script and I made to export 1 onnx file with the following:</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">register_tasks_manager_onnx = TasksManager.create_register(\"onnx\")\n@register_tasks_manager_onnx(\"my_hgnetv2\", *[\"feature-extraction\"])\nclass HGNetv2OnnxConfig(ViTOnnxConfig):\n    @property\n    def inputs(self):\n        return {\"pixel_values\": {0: \"batch\"}} # only dynamical axis is needed to list here\n    @property\n    def outputs(self):\n        return {\"last_hidden_state\": {0: \"batch\"}}\n\ndef export_onnx():\n    path='./model'\n    model = VisionEncoderDecoderModel.from_pretrained(path)\n    onnx_config_constructor = TasksManager.get_exporter_config_constructor(\n        exporter=\"onnx\",\n        model=model,\n        task=\"image-to-text\",\n        library_name=\"transformers\",\n        exporter_config_kwargs={\"use_past\": True},\n    )\n    onnx_config = onnx_config_constructor(model.config)\n    out = Path(\"./model/onnx\")\n    out.mkdir(exist_ok=True)\n\n    inputs, outputs = export(model, \n                             onnx_config, \n                             out/\"model.onnx\", \n                             onnx_config.DEFAULT_ONNX_OPSET,\n                             input_shapes={\"pixel_values\": [1, 3, 384, 384]},\n                             )\n    print(inputs)\n    print(outputs)\n</code></pre>\n<p>However, I don’t know how to export to trio .onnx file with the cli, since within the python script, I can register the customized config, but I don’t know how to register it with cli…</p>","post_number":4,"post_type":1,"posts_count":12,"updated_at":"2025-10-13T17:27:47.078Z","reply_count":1,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":7,"readers_count":6,"score":21.2,"yours":false,"topic_id":169036,"topic_slug":"how-to-make-my-customized-pipeline-consumable-for-transformers-js","display_username":"Sicheng Mao","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":2,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":104516,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/how-to-make-my-customized-pipeline-consumable-for-transformers-js/169036/4","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243505,"name":"Sicheng Mao","username":"alephpi","avatar_template":"/user_avatar/discuss.huggingface.co/alephpi/{size}/54288_2.png","created_at":"2025-10-13T17:54:45.869Z","cooked":"<p>Oh I see, it’s here <a href=\"https://huggingface.co/docs/optimum-onnx/onnx/usage_guides/export_a_model#customize-the-export-of-official-transformers-models\" class=\"inline-onebox\">Export a model to ONNX with optimum.exporters.onnx</a> and we need to use <code>main_export</code> instead of <code>export</code></p>","post_number":5,"post_type":1,"posts_count":12,"updated_at":"2025-10-13T17:54:45.869Z","reply_count":1,"reply_to_post_number":4,"quote_count":0,"incoming_link_count":0,"reads":5,"readers_count":4,"score":21.0,"yours":false,"topic_id":169036,"topic_slug":"how-to-make-my-customized-pipeline-consumable-for-transformers-js","display_username":"Sicheng Mao","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/docs/optimum-onnx/onnx/usage_guides/export_a_model#customize-the-export-of-official-transformers-models","internal":false,"reflection":false,"title":"Export a model to ONNX with optimum.exporters.onnx","clicks":0}],"read":true,"user_title":null,"reply_to_user":{"id":104516,"username":"alephpi","name":"Sicheng Mao","avatar_template":"/user_avatar/discuss.huggingface.co/alephpi/{size}/54288_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":104516,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/how-to-make-my-customized-pipeline-consumable-for-transformers-js/169036/5","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243509,"name":"Sicheng Mao","username":"alephpi","avatar_template":"/user_avatar/discuss.huggingface.co/alephpi/{size}/54288_2.png","created_at":"2025-10-13T20:49:24.000Z","cooked":"<p>Finally I use the following:</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">def export_onnx():\n    path='./model'\n    out = Path(\"./model/trio_onnx\")\n    out.mkdir(exist_ok=True)\n\n    main_export(\n        path,\n        task=\"image-to-text\",\n        output=out,\n    )\n</code></pre>\n<p>However, this can only export to <code>encoder_model.onnx</code> and <code>decoder_model.onnx</code>, since I have no idea how the <code>use_past=True</code> can be injected with main_export’s argument(The example in the above link doesn’t work out), I monkey-patched the source code to make it export to trio onnx.</p>","post_number":6,"post_type":1,"posts_count":12,"updated_at":"2025-10-13T20:49:24.000Z","reply_count":0,"reply_to_post_number":5,"quote_count":0,"incoming_link_count":0,"reads":5,"readers_count":4,"score":16.0,"yours":false,"topic_id":169036,"topic_slug":"how-to-make-my-customized-pipeline-consumable-for-transformers-js","display_username":"Sicheng Mao","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":104516,"username":"alephpi","name":"Sicheng Mao","avatar_template":"/user_avatar/discuss.huggingface.co/alephpi/{size}/54288_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":104516,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/how-to-make-my-customized-pipeline-consumable-for-transformers-js/169036/6","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243513,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-13T23:14:53.440Z","cooked":"<p>For Transformer.js:</p>\n<hr>\n<p>Use <code>main_export()</code> <strong>with</strong> <code>custom_onnx_configs</code> and <code>with_behavior(..., use_past=True)</code> to get the trio. Do not monkey-patch.</p>\n<h1><a name=\"p-243513-background-and-context-1\" class=\"anchor\" href=\"#p-243513-background-and-context-1\"></a>Background and context</h1>\n<ul>\n<li>Why a “trio”: seq2seq generation needs a one-off <strong>decoder</strong> for the first token and a <strong>decoder_with_past</strong> for subsequent tokens so KV-cache is reused. This is the supported pattern. (<a href=\"https://discuss.huggingface.co/t/when-exporting-seq2seq-models-with-onnx-why-do-we-need-both-decoder-with-past-model-onnx-and-decoder-model-onnx/33354\" title=\"When exporting seq2seq models with ONNX, why do we ...\">Hugging Face Forums</a>)</li>\n<li>Where to set it: Optimum’s exporter lets you pass <strong>custom_onnx_configs</strong> to <code>main_export()</code> and choose behaviors per subgraph: <code>\"encoder\"</code>, <code>\"decoder\"</code>, and <code>\"decoder with past\"</code>. You can also disable post-processing so files are kept separate. (<a href=\"https://huggingface.co/docs/optimum-onnx/onnx/usage_guides/export_a_model\" title=\"Export a model to ONNX with optimum.exporters.onnx\">Hugging Face</a>)</li>\n<li>Transformers.js expects this layout. Public web-ready repos ship <code>onnx/{encoder_model.onnx, decoder_model.onnx, decoder_with_past_model.onnx}</code> or a merged decoder. (<a href=\"https://huggingface.co/Xenova/vit-gpt2-image-captioning\" title=\"Xenova/vit-gpt2-image-captioning\">Hugging Face</a>)</li>\n</ul>\n<h1><a name=\"p-243513-minimal-correct-export-no-patches-2\" class=\"anchor\" href=\"#p-243513-minimal-correct-export-no-patches-2\"></a>Minimal, correct export (no patches)</h1>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\"># refs:\n# - Export guide (custom_onnx_configs + with_behavior + no_post_process):\n#   https://huggingface.co/docs/optimum-onnx/onnx/usage_guides/export_a_model\n# - main_export reference:\n#   https://huggingface.co/docs/optimum-onnx/en/onnx/package_reference/export\n\nfrom pathlib import Path\nfrom transformers import AutoConfig\nfrom optimum.exporters.onnx import main_export\nfrom optimum.exporters.tasks import TasksManager\n\nmodel_dir = \"./model\"                       # your VisionEncoderDecoder checkpoint\nout = Path(\"./model/trio_onnx\"); out.mkdir(parents=True, exist_ok=True)\n\n# Build an ONNX config for your model+task\ncfg = AutoConfig.from_pretrained(model_dir)\nctor = TasksManager.get_exporter_config_constructor(\n    model_type=cfg.model_type, backend=\"onnx\", task=\"image-to-text\"  # vision→text task\n)\nonnx_cfg = ctor(config=cfg, task=\"image-to-text\")\n\n# Ask explicitly for the three subgraphs\ncustom_onnx_configs = {\n    \"encoder_model\": onnx_cfg.with_behavior(\"encoder\"),\n    \"decoder_model\": onnx_cfg.with_behavior(\"decoder\", use_past=False),\n    \"decoder_with_past_model\": onnx_cfg.with_behavior(\"decoder\", use_past=True),\n}\n\n# Export. Keep trio separate (avoid automatic merge).\nmain_export(\n    model=model_dir,\n    task=\"image-to-text\",\n    output=str(out),\n    custom_onnx_configs=custom_onnx_configs,\n    no_post_process=True,\n)\n</code></pre>\n<p>Why this works: Optimum documents <code>custom_onnx_configs</code> and <code>with_behavior(\"decoder\", use_past=True)</code> to emit <code>decoder_with_past_model.onnx</code>; <code>no_post_process=True</code> prevents the exporter from merging decoders. (<a href=\"https://huggingface.co/docs/optimum-onnx/onnx/usage_guides/export_a_model\" title=\"Export a model to ONNX with optimum.exporters.onnx\">Hugging Face</a>)</p>\n<h1><a name=\"p-243513-verify-and-align-with-transformersjs-3\" class=\"anchor\" href=\"#p-243513-verify-and-align-with-transformersjs-3\"></a>Verify and align with Transformers.js</h1>\n<ul>\n<li>Check the output folder contains exactly: <code>encoder_model.onnx</code>, <code>decoder_model.onnx</code>, <code>decoder_with_past_model.onnx</code>. This mirrors working web repos. (<a href=\"https://huggingface.co/Xenova/vit-gpt2-image-captioning/tree/main/onnx\" title=\"Xenova/vit-gpt2-image-captioning at main\">Hugging Face</a>)</li>\n<li>Use that folder structure in your web model repo. Xenova’s captioner card recommends this layout for browser use. (<a href=\"https://huggingface.co/Xenova/vit-gpt2-image-captioning\" title=\"Xenova/vit-gpt2-image-captioning\">Hugging Face</a>)</li>\n</ul>\n<h1><a name=\"p-243513-common-failure-modes-and-fixes-4\" class=\"anchor\" href=\"#p-243513-common-failure-modes-and-fixes-4\"></a>Common failure modes and fixes</h1>\n<ul>\n<li><strong>Only two files produced</strong>: you didn’t request the with-past behavior. Add the <code>custom_onnx_configs</code> dict as above. (<a href=\"https://huggingface.co/docs/optimum-onnx/onnx/usage_guides/export_a_model\" title=\"Export a model to ONNX with optimum.exporters.onnx\">Hugging Face</a>)</li>\n<li><strong>Decoder files merged</strong>: remove the merge by setting <code>no_post_process=True</code>. The doc names this exact flag. (<a href=\"https://huggingface.co/docs/optimum-onnx/onnx/usage_guides/export_a_model\" title=\"Export a model to ONNX with optimum.exporters.onnx\">Hugging Face</a>)</li>\n<li><strong>Unsure which tasks your model supports</strong>: query <code>TasksManager.get_supported_tasks_for_model_type(model_type, \"onnx\")</code> and pick the vision→text task. The export guide shows this workflow. (<a href=\"https://huggingface.co/docs/optimum-onnx/onnx/usage_guides/export_a_model\" title=\"Export a model to ONNX with optimum.exporters.onnx\">Hugging Face</a>)</li>\n<li><strong>Why two decoders at all</strong>: first-token vs subsequent tokens. Author of Transformers.js explains the duplication and runtime need. (<a href=\"https://discuss.huggingface.co/t/when-exporting-seq2seq-models-with-onnx-why-do-we-need-both-decoder-with-past-model-onnx-and-decoder-model-onnx/33354\" title=\"When exporting seq2seq models with ONNX, why do we ...\">Hugging Face Forums</a>)</li>\n</ul>\n<h1><a name=\"p-243513-optional-merged-decoder-5\" class=\"anchor\" href=\"#p-243513-optional-merged-decoder-5\"></a>Optional: merged decoder</h1>\n<p>Some exporters can produce a single <strong><code>decoder_model_merged.onnx</code></strong> that handles both first and subsequent tokens. If you prefer that, omit <code>no_post_process=True</code>. The public ViT-GPT2 repo shows merged and split variants side by side. (<a href=\"https://huggingface.co/Xenova/vit-gpt2-image-captioning/tree/main/onnx\" title=\"Xenova/vit-gpt2-image-captioning at main\">Hugging Face</a>)</p>","post_number":7,"post_type":1,"posts_count":12,"updated_at":"2025-10-13T23:14:53.440Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":6,"readers_count":5,"score":6.0,"yours":false,"topic_id":169036,"topic_slug":"how-to-make-my-customized-pipeline-consumable-for-transformers-js","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/docs/optimum-onnx/onnx/usage_guides/export_a_model","internal":false,"reflection":false,"title":"Export a model to ONNX with optimum.exporters.onnx","clicks":1},{"url":"https://huggingface.co/Xenova/vit-gpt2-image-captioning/tree/main/onnx","internal":false,"reflection":false,"title":"Xenova/vit-gpt2-image-captioning at main","clicks":0},{"url":"https://huggingface.co/Xenova/vit-gpt2-image-captioning","internal":false,"reflection":false,"title":"Xenova/vit-gpt2-image-captioning · Hugging Face","clicks":0},{"url":"https://discuss.huggingface.co/t/when-exporting-seq2seq-models-with-onnx-why-do-we-need-both-decoder-with-past-model-onnx-and-decoder-model-onnx/33354","internal":true,"reflection":false,"title":"When exporting seq2seq models with ONNX, why do we need both decoder_with_past_model.onnx and decoder_model.onnx?","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/how-to-make-my-customized-pipeline-consumable-for-transformers-js/169036/7","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243560,"name":"Sicheng Mao","username":"alephpi","avatar_template":"/user_avatar/discuss.huggingface.co/alephpi/{size}/54288_2.png","created_at":"2025-10-14T08:55:40.490Z","cooked":"<p>Well, I still cannot make this work, by debugging, I find that the main_export() will take me to <code>optimum.exporters.utils._get_submodels_and_export_configs()</code>, and an error raises here</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">        # When specifying custom export configs for supported transformers architectures, we do\n        # not force to specify a custom export config for each submodel.\n        for key, custom_export_config in custom_export_configs.items():\n            models_and_export_configs[key] = (models_and_export_configs[key][0], custom_export_config)\n</code></pre>\n<p>where the <code>custom_export_configs</code> is the one we passed in with <code>use_past</code> injected, while the <code>models_and_export_configs</code>,  generated here</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">            # TODO: this succession of if/else strongly suggests a refactor is needed.\n            if (\n                task.startswith(TasksManager._ENCODER_DECODER_TASKS)\n                and model.config.is_encoder_decoder\n                and not monolith\n            ):\n                models_and_export_configs = get_encoder_decoder_models_for_export(model, export_config)\n</code></pre>\n<p>doesn’t contain the key “decoder_with_past”, where the default <code>export_config</code> generated here</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">           export_config_constructor = TasksManager.get_exporter_config_constructor(\n                model=model, exporter=exporter, task=task, library_name=library_name\n            )\n           export_config = export_config_constructor(\n                model.config,\n                int_dtype=int_dtype,\n                float_dtype=float_dtype,\n                preprocessors=preprocessors,\n            )\n</code></pre>\n<p>with a default <code>use_past=False</code>, therefore would not generate a config for “decoder_with_past”.<br>\nAnd actually here is what I monkey_patched during the debugging.</p>\n<p>I think there is a high dependency between the export config and model config in optimum library, where I although use a customized encoder but still the VisionEncoderDecoder Config as the outermost config, which leads me to the <code>not custom_architecture</code> config processing logic here, which leads to the above error, which may not considered as a normal scenario in design.</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">    if not custom_architecture:\n        if library_name == \"diffusers\":\n            export_config = None\n            models_and_export_configs = get_diffusion_models_for_export(\n                model, int_dtype=int_dtype, float_dtype=float_dtype, exporter=exporter\n            )\n        else:\n            export_config_constructor = TasksManager.get_exporter_config_constructor(\n                model=model, exporter=exporter, task=task, library_name=library_name\n            )\n            export_config = export_config_constructor(\n                model.config,\n                int_dtype=int_dtype,\n                float_dtype=float_dtype,\n                preprocessors=preprocessors,\n            )\n\n            export_config.variant = _variant\n            all_variants = \"\\n\".join(\n                [f\"    - {name}: {description}\" for name, description in export_config.VARIANTS.items()]\n            )\n            logger.info(f\"Using the export variant {export_config.variant}. Available variants are:\\n{all_variants}\")\n\n            # TODO: this succession of if/else strongly suggests a refactor is needed.\n            if (\n                task.startswith(TasksManager._ENCODER_DECODER_TASKS)\n                and model.config.is_encoder_decoder\n                and not monolith\n            ):\n                models_and_export_configs = get_encoder_decoder_models_for_export(model, export_config)\n            elif task.startswith(\"text-generation\") and not monolith:\n                models_and_export_configs = get_decoder_models_for_export(model, export_config)\n            elif model.config.model_type == \"sam\":\n                models_and_export_configs = get_sam_models_for_export(model, export_config)\n            elif model.config.model_type == \"speecht5\":\n                models_and_export_configs = get_speecht5_models_for_export(model, export_config, model_kwargs)\n            elif model.config.model_type == \"musicgen\":\n                models_and_export_configs = get_musicgen_models_for_export(model, export_config)\n            else:\n                models_and_export_configs = {\"model\": (model, export_config)}\n\n        # When specifying custom export configs for supported transformers architectures, we do\n        # not force to specify a custom export config for each submodel.\n        for key, custom_export_config in custom_export_configs.items():\n            models_and_export_configs[key] = (models_and_export_configs[key][0], custom_export_config)\n</code></pre>","post_number":8,"post_type":1,"posts_count":12,"updated_at":"2025-10-14T09:00:23.165Z","reply_count":1,"reply_to_post_number":7,"quote_count":0,"incoming_link_count":0,"reads":4,"readers_count":3,"score":20.8,"yours":false,"topic_id":169036,"topic_slug":"how-to-make-my-customized-pipeline-consumable-for-transformers-js","display_username":"Sicheng Mao","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":2,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":104516,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/how-to-make-my-customized-pipeline-consumable-for-transformers-js/169036/8","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243569,"name":"Sicheng Mao","username":"alephpi","avatar_template":"/user_avatar/discuss.huggingface.co/alephpi/{size}/54288_2.png","created_at":"2025-10-14T09:27:23.844Z","cooked":"<p>Alright, actually we don’t need those verbose configs, just change the task from “image-to-text” to “image-to-text-with-past” will solve the issue (no monkey-patch)</p>\n<pre><code class=\"lang-auto\">def export_onnx():\n    path='./model'\n    out = Path(\"./model/trio_onnx\")\n    out.mkdir(exist_ok=True)\n    main_export(\n        path,\n        task=\"image-to-text-with-past\", # to get trio onnx model, use \"-with-past\", otherwise use \"image-to-text\"\n        output=out,\n    )\n</code></pre>","post_number":9,"post_type":1,"posts_count":12,"updated_at":"2025-10-14T09:27:35.932Z","reply_count":0,"reply_to_post_number":8,"quote_count":0,"incoming_link_count":0,"reads":3,"readers_count":2,"score":15.6,"yours":false,"topic_id":169036,"topic_slug":"how-to-make-my-customized-pipeline-consumable-for-transformers-js","display_username":"Sicheng Mao","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":104516,"username":"alephpi","name":"Sicheng Mao","avatar_template":"/user_avatar/discuss.huggingface.co/alephpi/{size}/54288_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":104516,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/how-to-make-my-customized-pipeline-consumable-for-transformers-js/169036/9","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243573,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-14T11:37:36.605Z","cooked":"<p>Great. <a href=\"https://discuss.huggingface.co/t/what-does-the-decoder-with-past-values-means/21088/2\">About <code>_with_past</code></a></p>","post_number":10,"post_type":1,"posts_count":12,"updated_at":"2025-10-14T11:37:36.605Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":4,"readers_count":3,"score":5.8,"yours":false,"topic_id":169036,"topic_slug":"how-to-make-my-customized-pipeline-consumable-for-transformers-js","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://discuss.huggingface.co/t/what-does-the-decoder-with-past-values-means/21088/2","internal":true,"reflection":false,"title":"What does the decoder with past values means","clicks":1}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/how-to-make-my-customized-pipeline-consumable-for-transformers-js/169036/10","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":244005,"name":"Sicheng Mao","username":"alephpi","avatar_template":"/user_avatar/discuss.huggingface.co/alephpi/{size}/54288_2.png","created_at":"2025-10-23T09:33:46.333Z","cooked":"<p>Hi John,</p>\n<p>I’ve finally succeeded in implementing the above things. Thanks for your help!<br>\nYet I still have some other questions and I think I’d better create a new discussion.</p>","post_number":11,"post_type":1,"posts_count":12,"updated_at":"2025-10-23T09:36:01.027Z","reply_count":0,"reply_to_post_number":10,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":15.4,"yours":false,"topic_id":169036,"topic_slug":"how-to-make-my-customized-pipeline-consumable-for-transformers-js","display_username":"Sicheng Mao","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":2,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":104516,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/how-to-make-my-customized-pipeline-consumable-for-transformers-js/169036/11","reactions":[{"id":"confetti_ball","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":244029,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-10-23T21:34:35.488Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":12,"post_type":3,"posts_count":12,"updated_at":"2025-10-23T21:34:35.488Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":1,"readers_count":0,"score":0.2,"yours":false,"topic_id":169036,"topic_slug":"how-to-make-my-customized-pipeline-consumable-for-transformers-js","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/how-to-make-my-customized-pipeline-consumable-for-transformers-js/169036/12","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Hi community,</p>\n<p>Here is my image-to-text pipeline:</p>\n<p>(<em>customized</em> means not a registered one in official Transformers)</p>\n<p>A <em>customized</em> Image processor,</p>\n<p>A VisionEncoderDecoder, with a <em>customized</em> vision encoder that inherits the PretrainedModel and a MBartDecoder,</p>\n<p>A WordLevel tokenizer (yes I haven’t used a MBartTokenizer and I have distilled my own one for specific corpus).</p>\n<p>I want to consume this pipeline in Transformers.js, however I notice that all examples given in Transformers.js documentation seem like pulling from a ready made Transformers pipeline with official components and configurations, <strong>I just wonder is it possible to turn my customized pipeline consumable for Transformers.js, or to what extent my pipeline could be partially turned to?</strong></p>\n<p>My guess is that the I should make my own image preprocessing step and send the image input tensor to the model, in that way, which kind of js libraries you recommend to use? (It won’t be very intensive, just simply resize and normalize things plus a crop-white-margin function which doesn’t exist in Transformers’ image processors).</p>\n<p><strong>Also  just to be sure, is my VisionEncoderDecoder possible to export to an onnx format to be consumable for Transformers.js?</strong></p>\n<p>Of course my model should be possible to run in browser (and that’s the whole point for me to do this), as it has only 20M parameters (way less than the showcase in Transformers.js)</p>\n<p>Thanks for your help in advance!</p>","solution":"<p>It <a href=\"https://huggingface.co/datasets/John6666/forum1/blob/main/transformer_js_custom_pipeline_1.md\">seems possible</a>. For Transoformers.js, there’s a dedicated channel on the HF Discord, so asking there would be the most reliable option.</p>","evaluation":{"extracted_final_answer":"It <a href=\"https://huggingface.co/datasets/John6666/forum1/blob/main/transformer_js_custom_pipeline_1.md\">seems possible</a>. For Transoformers.js, there’s a dedicated channel on the HF Discord, so asking there would be the most reliable option.","reasoning":"The extracted_final_answer matches the correct_answer exactly, with no differences in wording or meaning. Therefore, it is clear that the response includes the precise and unambiguous correct_answer.","correct":"yes","confidence":100}}
-{"discussion_title":"Issue with TorchCodec when fine-tuning Whisper ASR model","discussion_url":"https://discuss.huggingface.co/t/issue-with-torchcodec-when-fine-tuning-whisper-asr-model/169315","discussion_topic_id":169315,"discussion_category":5,"discussion_created_at":"2025-10-21T07:37:40.941000Z","thread":[{"id":243905,"name":"Ong Jun Rong","username":"junnyrong","avatar_template":"/user_avatar/discuss.huggingface.co/junnyrong/{size}/54763_2.png","created_at":"2025-10-21T07:37:41.012Z","cooked":"<p>Hello,</p>\n<p>In the past I have been fine tuning the Whisper-tiny ASR model using these guides:</p>\n<aside class=\"onebox allowlistedgeneric\" data-onebox-src=\"https://learnopencv.com/fine-tuning-whisper-on-custom-dataset/\">\n  <header class=\"source\">\n      <img src=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/2/0/204a927c63845be135413775d0411d987adb24fe.png\" class=\"site-icon\" alt=\"\" data-dominant-color=\"A6CBE1\" width=\"32\" height=\"32\">\n\n      <a href=\"https://learnopencv.com/fine-tuning-whisper-on-custom-dataset/\" target=\"_blank\" rel=\"noopener nofollow ugc\" title=\"01:00PM - 06 August 2024\">LearnOpenCV – Learn OpenCV, PyTorch, Keras, Tensorflow with code, &amp;... – 6 Aug 24</a>\n  </header>\n\n  <article class=\"onebox-body\">\n    <div class=\"aspect-image\" style=\"--aspect-ratio:600/338;\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/c/7/c7750586d9d05f878edd84a6a1a6665ae37136e0.gif\" class=\"thumbnail animated\" alt=\"\" data-dominant-color=\"EDEFF6\" width=\"690\" height=\"388\"></div>\n\n<h3><a href=\"https://learnopencv.com/fine-tuning-whisper-on-custom-dataset/\" target=\"_blank\" rel=\"noopener nofollow ugc\">Fine Tuning Whisper on Custom Dataset</a></h3>\n\n  <p>Fine tuning Whisper on a custom dataset involving Air Traffic Control audio and diving deep into the dataset &amp; training code to understand the process.</p>\n\n\n  </article>\n\n  <div class=\"onebox-metadata\">\n    \n    \n  </div>\n\n  <div style=\"clear: both\"></div>\n</aside>\n\n<aside class=\"onebox allowlistedgeneric\" data-onebox-src=\"https://huggingface.co/blog/fine-tune-whisper\">\n  <header class=\"source\">\n\n      <a href=\"https://huggingface.co/blog/fine-tune-whisper\" target=\"_blank\" rel=\"noopener\">huggingface.co</a>\n  </header>\n\n  <article class=\"onebox-body\">\n    <div class=\"aspect-image\" style=\"--aspect-ratio:690/337;\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/optimized/2X/d/d023324d5f93c9a490894d8ec915989a7a655572_2_690x337.jpeg\" class=\"thumbnail\" alt=\"\" data-dominant-color=\"B0CEC7\" width=\"690\" height=\"337\"></div>\n\n<h3><a href=\"https://huggingface.co/blog/fine-tune-whisper\" target=\"_blank\" rel=\"noopener\">Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers</a></h3>\n\n  <p>We’re on a journey to advance and democratize artificial intelligence through open source and open science.</p>\n\n\n  </article>\n\n  <div class=\"onebox-metadata\">\n    \n    \n  </div>\n\n  <div style=\"clear: both\"></div>\n</aside>\n\n<p>It was all working fine, I was able do everything locally like loading a pre-trained Whisper-tiny model and also my own dataset until recently when I updated the modules. I have been getting errors like these:</p>\n<p><div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/3/e/3e0ff636781aeeb1fdff900eafe2f60051f3ea6c.png\" data-download-href=\"/uploads/short-url/8R1NFqqbFyJBPlB72gGxCx6yM68.png?dl=1\" title=\"image\" rel=\"noopener nofollow ugc\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/3/e/3e0ff636781aeeb1fdff900eafe2f60051f3ea6c.png\" alt=\"image\" data-base62-sha1=\"8R1NFqqbFyJBPlB72gGxCx6yM68\" width=\"690\" height=\"298\" data-dominant-color=\"252727\"><div class=\"meta\"><svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">image</span><span class=\"informations\">1430×618 30.9 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg></div></a></div></p>\n<p>I have tried falling back and testing the samples provided by the guides and they also seem to have broke and started giving the same error. I also tried running them on Google Colab where it will crash when trying to run a cell like this:</p>\n<p><div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/c/2/c2cf5b03a21c3eacb8d525f29c49f087a917a64e.png\" data-download-href=\"/uploads/short-url/rNmSXqNLVggnt0RblKjzDtL6meO.png?dl=1\" title=\"image\" rel=\"noopener nofollow ugc\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/c/2/c2cf5b03a21c3eacb8d525f29c49f087a917a64e.png\" alt=\"image\" data-base62-sha1=\"rNmSXqNLVggnt0RblKjzDtL6meO\" width=\"690\" height=\"398\" data-dominant-color=\"3C3C3B\"><div class=\"meta\"><svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">image</span><span class=\"informations\">693×400 11.8 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg></div></a></div></p>\n<p>I would like to know if anyone else is also facing the same issue and if there are any solutions for it. Thanks in advance!</p>","post_number":1,"post_type":1,"posts_count":4,"updated_at":"2025-10-21T07:37:41.012Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":21,"reads":4,"readers_count":3,"score":50.8,"yours":false,"topic_id":169315,"topic_slug":"issue-with-torchcodec-when-fine-tuning-whisper-asr-model","display_username":"Ong Jun Rong","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://learnopencv.com/fine-tuning-whisper-on-custom-dataset/","internal":false,"reflection":false,"title":"Fine Tuning Whisper on Custom Dataset","clicks":2},{"url":"https://huggingface.co/blog/fine-tune-whisper","internal":false,"reflection":false,"title":"Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers","clicks":1}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":105467,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/issue-with-torchcodec-when-fine-tuning-whisper-asr-model/169315/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":243907,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-21T08:37:37.072Z","cooked":"<p>This error appears to stem from changes to the audio backend in the datasets library. The quickest workaround may be to install using <code>pip install datasets==3.6.0</code>. Additionally, if using version <code>4.0.0</code> or later, <strong>builder script-type datasets can no longer be used directly from the Hub</strong>. <a href=\"https://huggingface.co/lhoestq/datasets\">You will need to find and use datasets that have been converted to the standard type beforehand</a>. If the original datasets were standard datasets, the latter issue should not be a problem.</p>\n<p>Additionally, since Transformers underwent significant changes around version <code>4.49.0</code>, if you encounter errors related to Whisper, <strong>rolling <code>transformers</code> back to version <code>4.48.3</code> or earlier would be the simplest workaround</strong>. Of course, rewriting for the new version is preferable… but for a temporary fix.</p>\n<hr>\n<p>Your error started after upgrading to <strong><img src=\"https://emoji.discourse-cdn.com/apple/hugs.png?v=14\" title=\":hugs:\" class=\"emoji\" alt=\":hugs:\" loading=\"lazy\" width=\"20\" height=\"20\"> Datasets 4.x</strong>. 4.x <strong>switched audio decoding to TorchCodec</strong>, which <strong>loads FFmpeg at runtime</strong> and also <strong>requires a matching torch↔torchcodec pair</strong>. Accessing or printing an <code>Audio</code> column now triggers that decode path, so if FFmpeg is missing or versions don’t line up, you see the probe-and-fail chain (<code>core7 → core6 → core5 → core4 ... Could not load torchcodec</code>). On Windows this is more brittle, and early 4.0 notes even said Windows was not supported yet. (<a href=\"https://huggingface.co/docs/datasets/en/audio_load\" title=\"Load audio data\">Hugging Face</a>)</p>\n<h1><a name=\"p-243907-why-it-broke-now-1\" class=\"anchor\" href=\"#p-243907-why-it-broke-now-1\"></a>Why it broke now</h1>\n<ul>\n<li><strong>Behavior change in Datasets 4.x</strong>: audio is decoded on access via TorchCodec + FFmpeg. Older 3.x used a different backend. Printing an example decodes it. (<a href=\"https://huggingface.co/docs/datasets/en/audio_load\" title=\"Load audio data\">Hugging Face</a>)</li>\n<li><strong>New runtime requirements</strong>: TorchCodec expects FFmpeg on the system and a compatible <code>torch</code> version. The README documents FFmpeg support and the torch↔torchcodec matrix. (<a href=\"https://github.com/meta-pytorch/torchcodec\" title=\"GitHub - meta-pytorch/torchcodec: PyTorch media decoding and encoding\">GitHub</a>)</li>\n<li><strong>Windows caveat</strong>: initial 4.0 release notes warned “not available for Windows yet; use datasets&lt;4.0.” This explains why your previously working Windows setup started failing after upgrade. (<a href=\"https://github.com/huggingface/datasets/releases\" title=\"Releases · huggingface/datasets\">GitHub</a>)</li>\n</ul>\n<h1><a name=\"p-243907-typical-root-causes-2\" class=\"anchor\" href=\"#p-243907-typical-root-causes-2\"></a>Typical root causes</h1>\n<ol>\n<li><strong>FFmpeg missing or wrong major</strong>. TorchCodec supports FFmpeg majors <strong>4–7</strong> on all platforms, with <strong>8</strong> only on macOS/Linux. Missing or mismatched DLLs yields your exact probe sequence. (<a href=\"https://github.com/meta-pytorch/torchcodec\" title=\"GitHub - meta-pytorch/torchcodec: PyTorch media decoding and encoding\">GitHub</a>)</li>\n<li><strong>Torch↔TorchCodec mismatch</strong>. Use the official matrix. Example: <code>torchcodec 0.7 ↔ torch 2.8</code>; <code>0.8 ↔ 2.9</code>. (<a href=\"https://github.com/meta-pytorch/torchcodec\" title=\"GitHub - meta-pytorch/torchcodec: PyTorch media decoding and encoding\">GitHub</a>)</li>\n<li><strong>Fresh 4.0 regressions</strong>. Multiple reports show 3.x works then 4.x fails until TorchCodec+FFmpeg are added and versions pinned. (<a href=\"https://github.com/huggingface/datasets/issues/7678\" title=\"To support decoding audio data, please install 'torchcodec'.\">GitHub</a>)</li>\n</ol>\n<h1><a name=\"p-243907-fixes-and-workarounds-3\" class=\"anchor\" href=\"#p-243907-fixes-and-workarounds-3\"></a>Fixes and workarounds</h1>\n<p>Pick one path. Keep it pinned.</p>\n<h2><a name=\"p-243907-a-fastest-unblock-on-windows-4\" class=\"anchor\" href=\"#p-243907-a-fastest-unblock-on-windows-4\"></a>A) Fastest unblock on Windows</h2>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\"># Downgrade Datasets to pre-TorchCodec behavior\npip install \"datasets&lt;4.0.0\"  # release notes flagged Windows not ready\n# https://github.com/huggingface/datasets/releases/tag/4.0.0\n</code></pre>\n<p>(<a href=\"https://github.com/huggingface/datasets/releases\" title=\"Releases · huggingface/datasets\">GitHub</a>)</p>\n<h2><a name=\"p-243907-b-stay-on-datasets-4x-and-make-it-work-5\" class=\"anchor\" href=\"#p-243907-b-stay-on-datasets-4x-and-make-it-work-5\"></a>B) Stay on Datasets 4.x and make it work</h2>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\"># Windows CPU: install FFmpeg and match versions\nconda install -c conda-forge \"ffmpeg&lt;8\"        # README recommends conda FFmpeg\npip install \"torch==2.8.*\" \"torchcodec==0.7.*\" # matrix: 0.7 &lt;-&gt; 2.8\n# https://github.com/meta-pytorch/torchcodec#installing-torchcodec\n</code></pre>\n<p>If you need CUDA on Windows, use the experimental conda package:</p>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\">conda install -c conda-forge \"ffmpeg&lt;8\" \"torchcodec=*=*cuda*\"\n# https://github.com/meta-pytorch/torchcodec#installing-cuda-enabled-torchcodec\n</code></pre>\n<p>(<a href=\"https://github.com/meta-pytorch/torchcodec\" title=\"GitHub - meta-pytorch/torchcodec: PyTorch media decoding and encoding\">GitHub</a>)</p>\n<h2><a name=\"p-243907-c-linux-or-colab-6\" class=\"anchor\" href=\"#p-243907-c-linux-or-colab-6\"></a>C) Linux or Colab</h2>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\"># Colab VM or Linux\napt-get update &amp;&amp; apt-get install -y ffmpeg\npip install -U \"datasets[audio]\" \"torch==2.8.*\" \"torchcodec==0.7.*\"\n# HF docs: audio decoding uses TorchCodec + FFmpeg\n# https://huggingface.co/docs/datasets/en/audio_load\n</code></pre>\n<p>(<a href=\"https://huggingface.co/docs/datasets/en/audio_load\" title=\"Load audio data\">Hugging Face</a>)</p>\n<h2><a name=\"p-243907-d-bypass-decoding-while-you-train-7\" class=\"anchor\" href=\"#p-243907-d-bypass-decoding-while-you-train-7\"></a>D) Bypass decoding while you train</h2>\n<p>Avoid TorchCodec until your env is fixed.</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">from datasets import Audio\n# Option 1: disable globally\nds = ds.decode(False)  # https://huggingface.co/docs/datasets/en/package_reference/main_classes#datasets.Dataset.decode\n# Option 2: disable per column\nds = ds.cast_column(\"audio\", Audio(decode=False))  # https://huggingface.co/docs/datasets/en/about_dataset_features\n</code></pre>\n<p>These return paths/bytes rather than decoded arrays, so printing items won’t invoke TorchCodec. (<a href=\"https://huggingface.co/docs/datasets/en/package_reference/main_classes\" title=\"Main classes\">Hugging Face</a>)</p>\n<h1><a name=\"p-243907-sanity-checks-8\" class=\"anchor\" href=\"#p-243907-sanity-checks-8\"></a>Sanity checks</h1>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\">python - &lt;&lt;'PY'\nimport subprocess, sys\nimport torch\nprint(\"python:\", sys.version)\nprint(\"torch:\", torch.__version__)\ntry:\n    import torchcodec\n    print(\"torchcodec:\", torchcodec.__version__)\nexcept Exception as e:\n    print(\"torchcodec import failed:\", e)\nsubprocess.run([\"ffmpeg\", \"-hide_banner\", \"-version\"])\nPY\n# Matrix and FFmpeg policy:\n# https://github.com/meta-pytorch/torchcodec#installing-torchcodec\n</code></pre>\n<p>(<a href=\"https://github.com/meta-pytorch/torchcodec\" title=\"GitHub - meta-pytorch/torchcodec: PyTorch media decoding and encoding\">GitHub</a>)</p>\n<h1><a name=\"p-243907-context-from-your-linked-thread-9\" class=\"anchor\" href=\"#p-243907-context-from-your-linked-thread-9\"></a>Context from your linked thread</h1>\n<p>Your screenshots show Datasets 4.x decoding an <code>Audio</code> column, TorchCodec probing FFmpeg 7→6→5→4, then failing. That matches the new 4.x behavior and the FFmpeg/compatibility requirements above. (<a href=\"https://discuss.huggingface.co/t/issue-with-torchcodec-when-fine-tuning-whisper-asr-model/169315\" title=\"Issue with TorchCodec when fine-tuning Whisper ASR model - Beginners - Hugging Face Forums\">Hugging Face Forums</a>)</p>\n<h1><a name=\"p-243907-extra-references-and-pitfalls-10\" class=\"anchor\" href=\"#p-243907-extra-references-and-pitfalls-10\"></a>Extra references and pitfalls</h1>\n<ul>\n<li><strong>Release notes roundup</strong>: breaking changes, removal of scripts, and the Windows note. Useful if other 4.0 changes surfaced after your upgrade. (<a href=\"https://newreleases.io/project/github/huggingface/datasets/release/4.0.0\" title=\"huggingface/datasets 4.0.0 on GitHub\">NewReleases</a>)</li>\n<li><strong>Known mismatch/FFmpeg pitfalls</strong>: reports of brew-FFmpeg conflicts and version-mismatch guidance from TorchCodec maintainers. (<a href=\"https://github.com/pytorch/torchcodec/issues/570\" title=\"torchcodec not compatible with brew-installed ffmpeg #570\">GitHub</a>)</li>\n<li><strong>PyTorch/Torchaudio migration</strong>: decoding is consolidating on TorchCodec (<code>load_with_torchcodec</code> exists as a bridge). Aligns your stack with where the ecosystem is going. (<a href=\"https://docs.pytorch.org/audio/main/torchaudio.html\" title=\"Torchaudio 2.8.0 documentation\">PyTorch Documentation</a>)</li>\n</ul>","post_number":2,"post_type":1,"posts_count":4,"updated_at":"2025-10-21T08:37:37.072Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":5,"reads":3,"readers_count":2,"score":15.6,"yours":false,"topic_id":169315,"topic_slug":"issue-with-torchcodec-when-fine-tuning-whisper-asr-model","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/docs/datasets/en/audio_load","internal":false,"reflection":false,"title":"Load audio data","clicks":1},{"url":"https://github.com/huggingface/datasets/issues/7678","internal":false,"reflection":false,"title":"To support decoding audio data, please install 'torchcodec'. · Issue #7678 · huggingface/datasets · GitHub","clicks":1},{"url":"https://newreleases.io/project/github/huggingface/datasets/release/4.0.0","internal":false,"reflection":false,"title":"huggingface/datasets 4.0.0 on GitHub","clicks":0},{"url":"https://huggingface.co/lhoestq/datasets","internal":false,"reflection":false,"title":"lhoestq (Quentin Lhoest)","clicks":0},{"url":"https://github.com/meta-pytorch/torchcodec","internal":false,"reflection":false,"title":"GitHub - meta-pytorch/torchcodec: PyTorch media decoding and encoding","clicks":0},{"url":"https://docs.pytorch.org/audio/main/torchaudio.html","internal":false,"reflection":false,"title":"torchaudio — Torchaudio 2.8.0 documentation","clicks":0},{"url":"https://github.com/huggingface/datasets/releases","internal":false,"reflection":false,"title":"Releases · huggingface/datasets · GitHub","clicks":0},{"url":"https://github.com/pytorch/torchcodec/issues/570","internal":false,"reflection":false,"title":"torchcodec not compatible with brew-installed ffmpeg · Issue #570 · meta-pytorch/torchcodec · GitHub","clicks":0},{"url":"https://huggingface.co/docs/datasets/en/package_reference/main_classes","internal":false,"reflection":false,"title":"Main classes","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/issue-with-torchcodec-when-fine-tuning-whisper-asr-model/169315/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":243937,"name":"Ong Jun Rong","username":"junnyrong","avatar_template":"/user_avatar/discuss.huggingface.co/junnyrong/{size}/54763_2.png","created_at":"2025-10-22T01:45:23.750Z","cooked":"<p>I was pulling my hair thinking it has something to do with TorchCodec’s versioning, it never came to me that it might have been datasets! Thank you so much for the detailed explanation too, that solved my issue <img src=\"https://emoji.discourse-cdn.com/apple/smile.png?v=14\" title=\":smile:\" class=\"emoji\" alt=\":smile:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>","post_number":3,"post_type":1,"posts_count":4,"updated_at":"2025-10-22T01:45:23.750Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":15.4,"yours":false,"topic_id":169315,"topic_slug":"issue-with-torchcodec-when-fine-tuning-whisper-asr-model","display_username":"Ong Jun Rong","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":105467,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/issue-with-torchcodec-when-fine-tuning-whisper-asr-model/169315/3","reactions":[{"id":"confetti_ball","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243964,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-10-22T13:45:34.064Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":4,"post_type":3,"posts_count":4,"updated_at":"2025-10-22T13:45:34.064Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":1,"readers_count":0,"score":5.2,"yours":false,"topic_id":169315,"topic_slug":"issue-with-torchcodec-when-fine-tuning-whisper-asr-model","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/issue-with-torchcodec-when-fine-tuning-whisper-asr-model/169315/4","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Hello,</p>\n<p>In the past I have been fine tuning the Whisper-tiny ASR model using these guides:</p>\n<aside class=\"onebox allowlistedgeneric\" data-onebox-src=\"https://learnopencv.com/fine-tuning-whisper-on-custom-dataset/\">\n  <header class=\"source\">\n      <img src=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/2/0/204a927c63845be135413775d0411d987adb24fe.png\" class=\"site-icon\" alt=\"\" data-dominant-color=\"A6CBE1\" width=\"32\" height=\"32\">\n\n      <a href=\"https://learnopencv.com/fine-tuning-whisper-on-custom-dataset/\" target=\"_blank\" rel=\"noopener nofollow ugc\" title=\"01:00PM - 06 August 2024\">LearnOpenCV – Learn OpenCV, PyTorch, Keras, Tensorflow with code, &amp;... – 6 Aug 24</a>\n  </header>\n\n  <article class=\"onebox-body\">\n    <div class=\"aspect-image\" style=\"--aspect-ratio:600/338;\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/c/7/c7750586d9d05f878edd84a6a1a6665ae37136e0.gif\" class=\"thumbnail animated\" alt=\"\" data-dominant-color=\"EDEFF6\" width=\"690\" height=\"388\"></div>\n\n<h3><a href=\"https://learnopencv.com/fine-tuning-whisper-on-custom-dataset/\" target=\"_blank\" rel=\"noopener nofollow ugc\">Fine Tuning Whisper on Custom Dataset</a></h3>\n\n  <p>Fine tuning Whisper on a custom dataset involving Air Traffic Control audio and diving deep into the dataset &amp; training code to understand the process.</p>\n\n\n  </article>\n\n  <div class=\"onebox-metadata\">\n    \n    \n  </div>\n\n  <div style=\"clear: both\"></div>\n</aside>\n\n<aside class=\"onebox allowlistedgeneric\" data-onebox-src=\"https://huggingface.co/blog/fine-tune-whisper\">\n  <header class=\"source\">\n\n      <a href=\"https://huggingface.co/blog/fine-tune-whisper\" target=\"_blank\" rel=\"noopener\">huggingface.co</a>\n  </header>\n\n  <article class=\"onebox-body\">\n    <div class=\"aspect-image\" style=\"--aspect-ratio:690/337;\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/optimized/2X/d/d023324d5f93c9a490894d8ec915989a7a655572_2_690x337.jpeg\" class=\"thumbnail\" alt=\"\" data-dominant-color=\"B0CEC7\" width=\"690\" height=\"337\"></div>\n\n<h3><a href=\"https://huggingface.co/blog/fine-tune-whisper\" target=\"_blank\" rel=\"noopener\">Fine-Tune Whisper For Multilingual ASR with 🤗 Transformers</a></h3>\n\n  <p>We’re on a journey to advance and democratize artificial intelligence through open source and open science.</p>\n\n\n  </article>\n\n  <div class=\"onebox-metadata\">\n    \n    \n  </div>\n\n  <div style=\"clear: both\"></div>\n</aside>\n\n<p>It was all working fine, I was able do everything locally like loading a pre-trained Whisper-tiny model and also my own dataset until recently when I updated the modules. I have been getting errors like these:</p>\n<p><div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/3/e/3e0ff636781aeeb1fdff900eafe2f60051f3ea6c.png\" data-download-href=\"/uploads/short-url/8R1NFqqbFyJBPlB72gGxCx6yM68.png?dl=1\" title=\"image\" rel=\"noopener nofollow ugc\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/3/e/3e0ff636781aeeb1fdff900eafe2f60051f3ea6c.png\" alt=\"image\" data-base62-sha1=\"8R1NFqqbFyJBPlB72gGxCx6yM68\" width=\"690\" height=\"298\" data-dominant-color=\"252727\"><div class=\"meta\"><svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">image</span><span class=\"informations\">1430×618 30.9 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg></div></a></div></p>\n<p>I have tried falling back and testing the samples provided by the guides and they also seem to have broke and started giving the same error. I also tried running them on Google Colab where it will crash when trying to run a cell like this:</p>\n<p><div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/c/2/c2cf5b03a21c3eacb8d525f29c49f087a917a64e.png\" data-download-href=\"/uploads/short-url/rNmSXqNLVggnt0RblKjzDtL6meO.png?dl=1\" title=\"image\" rel=\"noopener nofollow ugc\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/c/2/c2cf5b03a21c3eacb8d525f29c49f087a917a64e.png\" alt=\"image\" data-base62-sha1=\"rNmSXqNLVggnt0RblKjzDtL6meO\" width=\"690\" height=\"398\" data-dominant-color=\"3C3C3B\"><div class=\"meta\"><svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">image</span><span class=\"informations\">693×400 11.8 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg></div></a></div></p>\n<p>I would like to know if anyone else is also facing the same issue and if there are any solutions for it. Thanks in advance!</p>","solution":"<p>This error appears to stem from changes to the audio backend in the datasets library. The quickest workaround may be to install using <code>pip install datasets==3.6.0</code>. Additionally, if using version <code>4.0.0</code> or later, <strong>builder script-type datasets can no longer be used directly from the Hub</strong>. <a href=\"https://huggingface.co/lhoestq/datasets\">You will need to find and use datasets that have been converted to the standard type beforehand</a>. If the original datasets were standard datasets, the latter issue should not be a problem.</p>\n<p>Additionally, since Transformers underwent significant changes around version <code>4.49.0</code>, if you encounter errors related to Whisper, <strong>rolling <code>transformers</code> back to version <code>4.48.3</code> or earlier would be the simplest workaround</strong>. Of course, rewriting for the new version is preferable… but for a temporary fix.</p>\n<hr>\n<p>Your error started after upgrading to <strong><img src=\"https://emoji.discourse-cdn.com/apple/hugs.png?v=14\" title=\":hugs:\" class=\"emoji\" alt=\":hugs:\" loading=\"lazy\" width=\"20\" height=\"20\"> Datasets 4.x</strong>. 4.x <strong>switched audio decoding to TorchCodec</strong>, which <strong>loads FFmpeg at runtime</strong> and also <strong>requires a matching torch↔torchcodec pair</strong>. Accessing or printing an <code>Audio</code> column now triggers that decode path, so if FFmpeg is missing or versions don’t line up, you see the probe-and-fail chain (<code>core7 → core6 → core5 → core4 ... Could not load torchcodec</code>). On Windows this is more brittle, and early 4.0 notes even said Windows was not supported yet. (<a href=\"https://huggingface.co/docs/datasets/en/audio_load\" title=\"Load audio data\">Hugging Face</a>)</p>\n<h1><a name=\"p-243907-why-it-broke-now-1\" class=\"anchor\" href=\"#p-243907-why-it-broke-now-1\"></a>Why it broke now</h1>\n<ul>\n<li><strong>Behavior change in Datasets 4.x</strong>: audio is decoded on access via TorchCodec + FFmpeg. Older 3.x used a different backend. Printing an example decodes it. (<a href=\"https://huggingface.co/docs/datasets/en/audio_load\" title=\"Load audio data\">Hugging Face</a>)</li>\n<li><strong>New runtime requirements</strong>: TorchCodec expects FFmpeg on the system and a compatible <code>torch</code> version. The README documents FFmpeg support and the torch↔torchcodec matrix. (<a href=\"https://github.com/meta-pytorch/torchcodec\" title=\"GitHub - meta-pytorch/torchcodec: PyTorch media decoding and encoding\">GitHub</a>)</li>\n<li><strong>Windows caveat</strong>: initial 4.0 release notes warned “not available for Windows yet; use datasets&lt;4.0.” This explains why your previously working Windows setup started failing after upgrade. (<a href=\"https://github.com/huggingface/datasets/releases\" title=\"Releases · huggingface/datasets\">GitHub</a>)</li>\n</ul>\n<h1><a name=\"p-243907-typical-root-causes-2\" class=\"anchor\" href=\"#p-243907-typical-root-causes-2\"></a>Typical root causes</h1>\n<ol>\n<li><strong>FFmpeg missing or wrong major</strong>. TorchCodec supports FFmpeg majors <strong>4–7</strong> on all platforms, with <strong>8</strong> only on macOS/Linux. Missing or mismatched DLLs yields your exact probe sequence. (<a href=\"https://github.com/meta-pytorch/torchcodec\" title=\"GitHub - meta-pytorch/torchcodec: PyTorch media decoding and encoding\">GitHub</a>)</li>\n<li><strong>Torch↔TorchCodec mismatch</strong>. Use the official matrix. Example: <code>torchcodec 0.7 ↔ torch 2.8</code>; <code>0.8 ↔ 2.9</code>. (<a href=\"https://github.com/meta-pytorch/torchcodec\" title=\"GitHub - meta-pytorch/torchcodec: PyTorch media decoding and encoding\">GitHub</a>)</li>\n<li><strong>Fresh 4.0 regressions</strong>. Multiple reports show 3.x works then 4.x fails until TorchCodec+FFmpeg are added and versions pinned. (<a href=\"https://github.com/huggingface/datasets/issues/7678\" title=\"To support decoding audio data, please install 'torchcodec'.\">GitHub</a>)</li>\n</ol>\n<h1><a name=\"p-243907-fixes-and-workarounds-3\" class=\"anchor\" href=\"#p-243907-fixes-and-workarounds-3\"></a>Fixes and workarounds</h1>\n<p>Pick one path. Keep it pinned.</p>\n<h2><a name=\"p-243907-a-fastest-unblock-on-windows-4\" class=\"anchor\" href=\"#p-243907-a-fastest-unblock-on-windows-4\"></a>A) Fastest unblock on Windows</h2>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\"># Downgrade Datasets to pre-TorchCodec behavior\npip install \"datasets&lt;4.0.0\"  # release notes flagged Windows not ready\n# https://github.com/huggingface/datasets/releases/tag/4.0.0\n</code></pre>\n<p>(<a href=\"https://github.com/huggingface/datasets/releases\" title=\"Releases · huggingface/datasets\">GitHub</a>)</p>\n<h2><a name=\"p-243907-b-stay-on-datasets-4x-and-make-it-work-5\" class=\"anchor\" href=\"#p-243907-b-stay-on-datasets-4x-and-make-it-work-5\"></a>B) Stay on Datasets 4.x and make it work</h2>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\"># Windows CPU: install FFmpeg and match versions\nconda install -c conda-forge \"ffmpeg&lt;8\"        # README recommends conda FFmpeg\npip install \"torch==2.8.*\" \"torchcodec==0.7.*\" # matrix: 0.7 &lt;-&gt; 2.8\n# https://github.com/meta-pytorch/torchcodec#installing-torchcodec\n</code></pre>\n<p>If you need CUDA on Windows, use the experimental conda package:</p>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\">conda install -c conda-forge \"ffmpeg&lt;8\" \"torchcodec=*=*cuda*\"\n# https://github.com/meta-pytorch/torchcodec#installing-cuda-enabled-torchcodec\n</code></pre>\n<p>(<a href=\"https://github.com/meta-pytorch/torchcodec\" title=\"GitHub - meta-pytorch/torchcodec: PyTorch media decoding and encoding\">GitHub</a>)</p>\n<h2><a name=\"p-243907-c-linux-or-colab-6\" class=\"anchor\" href=\"#p-243907-c-linux-or-colab-6\"></a>C) Linux or Colab</h2>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\"># Colab VM or Linux\napt-get update &amp;&amp; apt-get install -y ffmpeg\npip install -U \"datasets[audio]\" \"torch==2.8.*\" \"torchcodec==0.7.*\"\n# HF docs: audio decoding uses TorchCodec + FFmpeg\n# https://huggingface.co/docs/datasets/en/audio_load\n</code></pre>\n<p>(<a href=\"https://huggingface.co/docs/datasets/en/audio_load\" title=\"Load audio data\">Hugging Face</a>)</p>\n<h2><a name=\"p-243907-d-bypass-decoding-while-you-train-7\" class=\"anchor\" href=\"#p-243907-d-bypass-decoding-while-you-train-7\"></a>D) Bypass decoding while you train</h2>\n<p>Avoid TorchCodec until your env is fixed.</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">from datasets import Audio\n# Option 1: disable globally\nds = ds.decode(False)  # https://huggingface.co/docs/datasets/en/package_reference/main_classes#datasets.Dataset.decode\n# Option 2: disable per column\nds = ds.cast_column(\"audio\", Audio(decode=False))  # https://huggingface.co/docs/datasets/en/about_dataset_features\n</code></pre>\n<p>These return paths/bytes rather than decoded arrays, so printing items won’t invoke TorchCodec. (<a href=\"https://huggingface.co/docs/datasets/en/package_reference/main_classes\" title=\"Main classes\">Hugging Face</a>)</p>\n<h1><a name=\"p-243907-sanity-checks-8\" class=\"anchor\" href=\"#p-243907-sanity-checks-8\"></a>Sanity checks</h1>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\">python - &lt;&lt;'PY'\nimport subprocess, sys\nimport torch\nprint(\"python:\", sys.version)\nprint(\"torch:\", torch.__version__)\ntry:\n    import torchcodec\n    print(\"torchcodec:\", torchcodec.__version__)\nexcept Exception as e:\n    print(\"torchcodec import failed:\", e)\nsubprocess.run([\"ffmpeg\", \"-hide_banner\", \"-version\"])\nPY\n# Matrix and FFmpeg policy:\n# https://github.com/meta-pytorch/torchcodec#installing-torchcodec\n</code></pre>\n<p>(<a href=\"https://github.com/meta-pytorch/torchcodec\" title=\"GitHub - meta-pytorch/torchcodec: PyTorch media decoding and encoding\">GitHub</a>)</p>\n<h1><a name=\"p-243907-context-from-your-linked-thread-9\" class=\"anchor\" href=\"#p-243907-context-from-your-linked-thread-9\"></a>Context from your linked thread</h1>\n<p>Your screenshots show Datasets 4.x decoding an <code>Audio</code> column, TorchCodec probing FFmpeg 7→6→5→4, then failing. That matches the new 4.x behavior and the FFmpeg/compatibility requirements above. (<a href=\"https://discuss.huggingface.co/t/issue-with-torchcodec-when-fine-tuning-whisper-asr-model/169315\" title=\"Issue with TorchCodec when fine-tuning Whisper ASR model - Beginners - Hugging Face Forums\">Hugging Face Forums</a>)</p>\n<h1><a name=\"p-243907-extra-references-and-pitfalls-10\" class=\"anchor\" href=\"#p-243907-extra-references-and-pitfalls-10\"></a>Extra references and pitfalls</h1>\n<ul>\n<li><strong>Release notes roundup</strong>: breaking changes, removal of scripts, and the Windows note. Useful if other 4.0 changes surfaced after your upgrade. (<a href=\"https://newreleases.io/project/github/huggingface/datasets/release/4.0.0\" title=\"huggingface/datasets 4.0.0 on GitHub\">NewReleases</a>)</li>\n<li><strong>Known mismatch/FFmpeg pitfalls</strong>: reports of brew-FFmpeg conflicts and version-mismatch guidance from TorchCodec maintainers. (<a href=\"https://github.com/pytorch/torchcodec/issues/570\" title=\"torchcodec not compatible with brew-installed ffmpeg #570\">GitHub</a>)</li>\n<li><strong>PyTorch/Torchaudio migration</strong>: decoding is consolidating on TorchCodec (<code>load_with_torchcodec</code> exists as a bridge). Aligns your stack with where the ecosystem is going. (<a href=\"https://docs.pytorch.org/audio/main/torchaudio.html\" title=\"Torchaudio 2.8.0 documentation\">PyTorch Documentation</a>)</li>\n</ul>","evaluation":{"extracted_final_answer":"This error appears to stem from changes to the audio backend in the datasets library. The quickest workaround may be to install using <code>pip install datasets==3.6.0</code>. Additionally, if using version <code>4.0.0</code> or later, <strong>builder script-type datasets can no longer be used directly from the Hub</strong>. <a href=\"https://huggingface.co/lhoestq/datasets\">You will need to find and use datasets that have been converted to the standard type beforehand</a>. If the original datasets were standard datasets, the latter issue should not be a problem.</p>\n<p>Additionally, since Transformers underwent significant changes around version <code>4.49.0</code>, if you encounter errors related to Whisper, <strong>rolling <code>transformers</code> back to version <code>4.48.3</code> or earlier would be the simplest workaround</strong>. Of course, rewriting for the new version is preferable… but for a temporary fix.</p>\n<hr>\n<p>Your error started after upgrading to <strong><img src=\"https://emoji.discourse-cdn.com/apple/hugs.png?v=14\" title=\":hugs:\" class=\"emoji\" alt=\":hugs:\" loading=\"lazy\" width=\"20\" height=\"20\"> Datasets 4.x</strong>. 4.x <strong>switched audio decoding to TorchCodec</strong>, which <strong>loads FFmpeg at runtime</strong> and also <strong>requires a matching torch↔torchcodec pair</strong>. Accessing or printing an <code>Audio</code> column now triggers that decode path, so if FFmpeg is missing or versions don’t line up, you see the probe-and-fail chain (<code>core7 → core6 → core5 → core4 ... Could not load torchcodec</code>). On Windows this is more brittle, and early 4.0 notes even said Windows was not supported yet. (<a href=\"https://huggingface.co/docs/datasets/en/audio_load\" title=\"Load audio data\">Hugging Face</a>)</p>","reasoning":"The extracted final answer includes the exact text from the correct answer, with no discrepancies or omissions. Therefore, it is considered correct and matches the provided correct answer precisely.","correct":"yes","confidence":100}}
-{"discussion_title":"[HF Space not starting] Repeatedly crashes: @semmyKG]","discussion_url":"https://discuss.huggingface.co/t/hf-space-not-starting-repeatedly-crashes-semmykg/169242","discussion_topic_id":169242,"discussion_category":24,"discussion_created_at":"2025-10-17T14:59:37.863000Z","thread":[{"id":243751,"name":"Researcher","username":"semmyk","avatar_template":"/user_avatar/discuss.huggingface.co/semmyk/{size}/46712_2.png","created_at":"2025-10-17T14:59:37.920Z","cooked":"<p>[HF Space repeatedly crashes: <a href=\"https://huggingface.co/spaces/semmyk/semmyKG\">semmyKG</a>]</p>\n<p>HF support team,</p>\n<p>May we request your kind assistance in looking into this HF space</p>\n<ul>\n<li>Hugging Face Space: semmyk/semmyKG</li>\n</ul>\n<p>We have made private and public<br>\nWe have restarted multiple times: from the debug, from settings<br>\nWe have factory rebuilt from settings</p>\n<p>It appears the requirements were ‘successfully’ installed.</p>\n<p>The last logs</p>\n<pre><code class=\"lang-auto\">===== Application Startup at 2025-10-17 14:16:51 ===== \n=== Application restarted at 2025-10-17 14:18:42.702953130 UTC === \n=== Application restarted at 2025-10-17 14:18:42.703405200 UTC === \n=== Application restarted at 2025-10-17 14:18:42.708956192 UTC === \n=== Application stopped (exit code: 0) at 2025-10-17 14:18:53.031719893 UTC ===\n</code></pre>","post_number":1,"post_type":1,"posts_count":7,"updated_at":"2025-10-17T14:59:37.920Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":44,"reads":6,"readers_count":5,"score":66.2,"yours":false,"topic_id":169242,"topic_slug":"hf-space-not-starting-repeatedly-crashes-semmykg","display_username":"Researcher","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/spaces/semmyk/semmyKG","internal":false,"reflection":false,"title":"semmyKG - Knowledge Graph visualiser toolkit (builder from markdown) - a Hugging Face Space by semmyk","clicks":4}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":92554,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/hf-space-not-starting-repeatedly-crashes-semmykg/169242/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":243754,"name":"Megan Riley","username":"meganariley","avatar_template":"/user_avatar/discuss.huggingface.co/meganariley/{size}/20596_2.png","created_at":"2025-10-17T17:09:42.992Z","cooked":"<p>Hey, thanks for reporting! We’re investigating and I’ll update you soon.</p>","post_number":2,"post_type":1,"posts_count":7,"updated_at":"2025-10-17T17:09:42.992Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":5,"reads":5,"readers_count":4,"score":31.0,"yours":false,"topic_id":169242,"topic_slug":"hf-space-not-starting-repeatedly-crashes-semmykg","display_username":"Megan Riley","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":true,"admin":false,"staff":true,"user_id":31941,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/hf-space-not-starting-repeatedly-crashes-semmykg/169242/2","reactions":[{"id":"hugs","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243890,"name":"Megan Riley","username":"meganariley","avatar_template":"/user_avatar/discuss.huggingface.co/meganariley/{size}/20596_2.png","created_at":"2025-10-20T22:36:55.714Z","cooked":"<p>Hi <a class=\"mention\" href=\"/u/semmyk\">@semmyk</a> can you please disable Dev Mode in the settings of the Space and restart? Let us know if you continue experiencing issues.</p>","post_number":3,"post_type":1,"posts_count":7,"updated_at":"2025-10-20T22:36:55.714Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":4,"readers_count":3,"score":20.8,"yours":false,"topic_id":169242,"topic_slug":"hf-space-not-starting-repeatedly-crashes-semmykg","display_username":"Megan Riley","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":true,"admin":false,"staff":true,"user_id":31941,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/hf-space-not-starting-repeatedly-crashes-semmykg/169242/3","reactions":[{"id":"hugs","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243894,"name":"Researcher","username":"semmyk","avatar_template":"/user_avatar/discuss.huggingface.co/semmyk/{size}/46712_2.png","created_at":"2025-10-21T00:00:13.744Z","cooked":"<p><a class=\"mention\" href=\"/u/meganariley\">@meganariley</a> Thanks for coming back too us. We’ve disabled Dev Mode: … Getting …</p>\n<h1><a name=\"p-243894-runtime-error-exit-code-0-reason-application-does-not-seem-to-be-initialized-1\" class=\"anchor\" href=\"#p-243894-runtime-error-exit-code-0-reason-application-does-not-seem-to-be-initialized-1\"></a>runtime error …  Exit code: 0. Reason: application does not seem to be initialized</h1>\n<pre><code class=\"lang-auto\">===== Application Startup at 2025-10-20 23:50:46 =====\n</code></pre>\n<p>NB: Also tried … Restart Space, Factory reset, restart Space, Disable Dev, enable Dev mode, restart, Disable Dev Mode</p>","post_number":4,"post_type":1,"posts_count":7,"updated_at":"2025-10-21T00:00:13.744Z","reply_count":0,"reply_to_post_number":3,"quote_count":0,"incoming_link_count":1,"reads":4,"readers_count":3,"score":15.8,"yours":false,"topic_id":169242,"topic_slug":"hf-space-not-starting-repeatedly-crashes-semmykg","display_username":"Researcher","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":31941,"username":"meganariley","name":"Megan Riley","avatar_template":"/user_avatar/discuss.huggingface.co/meganariley/{size}/20596_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":92554,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/hf-space-not-starting-repeatedly-crashes-semmykg/169242/4","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243895,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-21T00:10:55.333Z","cooked":"<p>In <a href=\"https://huggingface.co/spaces/semmyk/semmyKG/blob/main/README.md\"><code>README.md</code></a>:</p>\n<pre data-code-wrap=\"yaml\"><code class=\"lang-yaml\">app_file: app_gradio_lightrag.py\n</code></pre>\n<p>But seems <a href=\"https://huggingface.co/spaces/semmyk/semmyKG/blob/main/app_gradio_lightrag.py#L831\">actual Gradio UI code is in <code>app.py</code></a>.<br>\nSo, setting <code>app_file: app.py</code> might resolve the issue?</p>","post_number":5,"post_type":1,"posts_count":7,"updated_at":"2025-10-21T00:10:55.333Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":5,"reads":4,"readers_count":3,"score":30.8,"yours":false,"topic_id":169242,"topic_slug":"hf-space-not-starting-repeatedly-crashes-semmykg","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/spaces/semmyk/semmyKG/blob/main/README.md","internal":false,"reflection":false,"title":"README.md · semmyk/semmyKG at main","clicks":0},{"url":"https://huggingface.co/spaces/semmyk/semmyKG/blob/main/app_gradio_lightrag.py#L831","internal":false,"reflection":false,"title":"app_gradio_lightrag.py · semmyk/semmyKG at main","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/hf-space-not-starting-repeatedly-crashes-semmykg/169242/5","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":243926,"name":"Researcher","username":"semmyk","avatar_template":"/user_avatar/discuss.huggingface.co/semmyk/{size}/46712_2.png","created_at":"2025-10-21T18:51:20.001Z","cooked":"<p><a class=\"mention\" href=\"/u/john6666\">@John6666</a>   oops, <img src=\"https://emoji.discourse-cdn.com/apple/face_with_peeking_eye.png?v=14\" title=\":face_with_peeking_eye:\" class=\"emoji\" alt=\":face_with_peeking_eye:\" loading=\"lazy\" width=\"20\" height=\"20\">. That gets it initialised. Apparently, we forgot to update that section of the README after we spilt the Entre point + Gradio UI from the processing coordinating module.</p>\n<p>We’d update once we Space working. At the moment, there is port issue.</p>","post_number":6,"post_type":1,"posts_count":7,"updated_at":"2025-10-21T18:51:20.001Z","reply_count":0,"reply_to_post_number":5,"quote_count":0,"incoming_link_count":2,"reads":3,"readers_count":2,"score":25.6,"yours":false,"topic_id":169242,"topic_slug":"hf-space-not-starting-repeatedly-crashes-semmykg","display_username":"Researcher","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":92554,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/hf-space-not-starting-repeatedly-crashes-semmykg/169242/6","reactions":[{"id":"laughing","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243953,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-10-22T10:44:41.140Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":7,"post_type":3,"posts_count":7,"updated_at":"2025-10-22T10:44:41.140Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":1,"readers_count":0,"score":0.2,"yours":false,"topic_id":169242,"topic_slug":"hf-space-not-starting-repeatedly-crashes-semmykg","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/hf-space-not-starting-repeatedly-crashes-semmykg/169242/7","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>[HF Space repeatedly crashes: <a href=\"https://huggingface.co/spaces/semmyk/semmyKG\">semmyKG</a>]</p>\n<p>HF support team,</p>\n<p>May we request your kind assistance in looking into this HF space</p>\n<ul>\n<li>Hugging Face Space: semmyk/semmyKG</li>\n</ul>\n<p>We have made private and public<br>\nWe have restarted multiple times: from the debug, from settings<br>\nWe have factory rebuilt from settings</p>\n<p>It appears the requirements were ‘successfully’ installed.</p>\n<p>The last logs</p>\n<pre><code class=\"lang-auto\">===== Application Startup at 2025-10-17 14:16:51 ===== \n=== Application restarted at 2025-10-17 14:18:42.702953130 UTC === \n=== Application restarted at 2025-10-17 14:18:42.703405200 UTC === \n=== Application restarted at 2025-10-17 14:18:42.708956192 UTC === \n=== Application stopped (exit code: 0) at 2025-10-17 14:18:53.031719893 UTC ===\n</code></pre>","solution":"<p>In <a href=\"https://huggingface.co/spaces/semmyk/semmyKG/blob/main/README.md\"><code>README.md</code></a>:</p>\n<pre data-code-wrap=\"yaml\"><code class=\"lang-yaml\">app_file: app_gradio_lightrag.py\n</code></pre>\n<p>But seems <a href=\"https://huggingface.co/spaces/semmyk/semmyKG/blob/main/app_gradio_lightrag.py#L831\">actual Gradio UI code is in <code>app.py</code></a>.<br>\nSo, setting <code>app_file: app.py</code> might resolve the issue?</p>","evaluation":{"extracted_final_answer":"<p>In <a href=\"https://huggingface.co/spaces/semmyk/semmyKG/blob/main/README.md\"><code>README.md</code></a>:</p>\n<pre data-code-wrap=\"yaml\"><code class=\"lang-yaml\">app_file: app_gradio_lightrag.py\n</code></pre>\n<p>But seems <a href=\"https://huggingface.co/spaces/semmyk/semmyKG/blob/main/app_gradio_lightrag.py#L831\">actual Gradio UI code is in <code>app.py</code></a>.<br>\nSo, setting <code>app_file: app.py</code> might resolve the issue?</p>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in content or meaning. Both contain the same information regarding the app file and the suggestion to change it to resolve the issue.","correct":"yes","confidence":100}}
-{"discussion_title":"Replacing attention class with identical subclass creates hallucinations","discussion_url":"https://discuss.huggingface.co/t/replacing-attention-class-with-identical-subclass-creates-hallucinations/169215","discussion_topic_id":169215,"discussion_category":6,"discussion_created_at":"2025-10-16T11:23:27.606000Z","thread":[{"id":243707,"name":"Alexander Jephtha","username":"AlexJephtha","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/a/d9b06d/{size}.png","created_at":"2025-10-16T11:23:27.668Z","cooked":"<p>I’m writing a custom versions of LlamaModels, and for one of those approaches I want to overwrite the attention mechanism of each layer. My code looks like this. Note that even when I define LlamaAttentionHybrid (a subclass of LlamaAttention) to be the exact same as LlamaAttention, I still get hallucination issues. This suggest I’m not correctly replacing the attention mechanism.</p>\n<pre><code class=\"lang-auto\">class LlamaHybridForCausalLM(LlamaForCausalLM):\n    def __init__(self, config: LlamaHybridConfig):\n        super().__init__(config)\n        if config.hybrid:\n            for i, layer in enumerate(self.model.layers):\n                # Need to also copy attention weights\n                old_attn = layer.self_attn\n                layer.self_attn = LlamaAttentionHybrid(config, i)\n                layer.self_attn.load_state_dict(old_attn.state_dict())\n</code></pre>\n<p>However, the model works completely fine when I write this code:</p>\n<pre><code class=\"lang-auto\">class LlamaHybridForCausalLM(LlamaForCausalLM):\n    def __init__(self, config: LlamaHybridConfig):\n        super().__init__(config)\n        if config.hybrid:\n            for i, layer in enumerate(self.model.layers):\n                # Need to also copy attention weights\n                old_attn = layer.self_attn\n                layer.self_attn = LlamaAttention(config, i)\n                layer.self_attn.load_state_dict(old_attn.state_dict())\n</code></pre>\n<p>Why would this happen even when in the subclass i don’t make any changes? Note, that the forward function here is defined exactly the same as the source code.</p>\n<pre><code class=\"lang-auto\">class LlamaAttentionHybrid(LlamaAttention):\n    def __init__(self, config: LlamaHybridConfig, layer_idx: int):\n        super().__init__(config, layer_idx)\n\n    def forward(\n        self,\n        hidden_states: torch.Tensor,\n        position_embeddings: tuple[torch.Tensor, torch.Tensor],\n        attention_mask: Optional[torch.Tensor],\n        past_key_values: Optional[Cache] = None,\n        cache_position: Optional[torch.LongTensor] = None,\n        **kwargs: Unpack[FlashAttentionKwargs],\n    ) -&gt; tuple[torch.Tensor, Optional[torch.Tensor], Optional[tuple[torch.Tensor]]]:\n\n        input_shape = hidden_states.shape[:-1]\n        hidden_shape = (*input_shape, -1, self.head_dim)\n\n        query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)\n        key_states = self.k_proj(hidden_states).view(hidden_shape).transpose(1, 2)\n        value_states = self.v_proj(hidden_states).view(hidden_shape).transpose(1, 2)\n\n        cos, sin = position_embeddings\n        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)\n\n        if past_key_values is not None:\n            # sin and cos are specific to RoPE models; cache_position needed for the static cache\n            cache_kwargs = {\"sin\": sin, \"cos\": cos, \"cache_position\": cache_position}\n            key_states, value_states = past_key_values.update(key_states, value_states, self.layer_idx, cache_kwargs)\n\n        attention_interface: Callable = eager_attention_forward\n        if self.config._attn_implementation != \"eager\":\n            attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]\n\n        attn_output, attn_weights = attention_interface(\n            self,\n            query_states,\n            key_states,\n            value_states,\n            attention_mask,\n            dropout=0.0 if not self.training else self.attention_dropout,\n            scaling=self.scaling,\n            **kwargs,\n        )\n\n        attn_output = attn_output.reshape(*input_shape, -1).contiguous()\n        attn_output = self.o_proj(attn_output)\n        return attn_output, attn_weights\n</code></pre>\n<p>Thanks!</p>\n<p>EDIT: I narrowed the issue down to the redefining of the forward function. For some reason when I add the forward function into the subclass even if it’s identical, the model hallucinates dramatically.</p>","post_number":1,"post_type":1,"posts_count":5,"updated_at":"2025-10-16T11:35:01.753Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":4,"readers_count":3,"score":15.8,"yours":false,"topic_id":169215,"topic_slug":"replacing-attention-class-with-identical-subclass-creates-hallucinations","display_username":"Alexander Jephtha","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":5,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":30474,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/replacing-attention-class-with-identical-subclass-creates-hallucinations/169215/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":243732,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-17T04:12:47.941Z","cooked":"<p>There may be <a href=\"https://huggingface.co/datasets/John6666/forum2/blob/main/attn_override_issue_1.md\">points that can be fixed</a>.</p>","post_number":2,"post_type":1,"posts_count":5,"updated_at":"2025-10-17T04:12:47.941Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":3,"readers_count":2,"score":20.6,"yours":false,"topic_id":169215,"topic_slug":"replacing-attention-class-with-identical-subclass-creates-hallucinations","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/datasets/John6666/forum2/blob/main/attn_override_issue_1.md","internal":false,"reflection":false,"title":"attn_override_issue_1.md · John6666/forum2 at main","clicks":2}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/replacing-attention-class-with-identical-subclass-creates-hallucinations/169215/2","reactions":[{"id":"heart","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243819,"name":"Alexander Jephtha","username":"AlexJephtha","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/a/d9b06d/{size}.png","created_at":"2025-10-20T03:52:17.985Z","cooked":"<p>Thanks for your help!</p>","post_number":3,"post_type":1,"posts_count":5,"updated_at":"2025-10-20T03:52:17.985Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":15.4,"yours":false,"topic_id":169215,"topic_slug":"replacing-attention-class-with-identical-subclass-creates-hallucinations","display_username":"Alexander Jephtha","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":30474,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/replacing-attention-class-with-identical-subclass-creates-hallucinations/169215/3","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243821,"name":"Alexander Jephtha","username":"AlexJephtha","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/a/d9b06d/{size}.png","created_at":"2025-10-20T03:57:16.952Z","cooked":"<p>SOLUTION: With SDPA attention, passing in an attention_mask with value not equal to none overrides the causal attention mask! You need to fill the attention mask with -inf (or large negative number) in the upper right triangle. This is only really a problem when calculating the attention scores of the initial text input, since newly generated tokens don’t require any of the existing key tokens to be masked.</p>","post_number":4,"post_type":1,"posts_count":5,"updated_at":"2025-10-20T03:57:16.952Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":15.4,"yours":false,"topic_id":169215,"topic_slug":"replacing-attention-class-with-identical-subclass-creates-hallucinations","display_username":"Alexander Jephtha","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":30474,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/replacing-attention-class-with-identical-subclass-creates-hallucinations/169215/4","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":243867,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-10-20T15:57:45.831Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":5,"post_type":3,"posts_count":5,"updated_at":"2025-10-20T15:57:45.831Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":1,"readers_count":0,"score":0.2,"yours":false,"topic_id":169215,"topic_slug":"replacing-attention-class-with-identical-subclass-creates-hallucinations","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/replacing-attention-class-with-identical-subclass-creates-hallucinations/169215/5","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>I’m writing a custom versions of LlamaModels, and for one of those approaches I want to overwrite the attention mechanism of each layer. My code looks like this. Note that even when I define LlamaAttentionHybrid (a subclass of LlamaAttention) to be the exact same as LlamaAttention, I still get hallucination issues. This suggest I’m not correctly replacing the attention mechanism.</p>\n<pre><code class=\"lang-auto\">class LlamaHybridForCausalLM(LlamaForCausalLM):\n    def __init__(self, config: LlamaHybridConfig):\n        super().__init__(config)\n        if config.hybrid:\n            for i, layer in enumerate(self.model.layers):\n                # Need to also copy attention weights\n                old_attn = layer.self_attn\n                layer.self_attn = LlamaAttentionHybrid(config, i)\n                layer.self_attn.load_state_dict(old_attn.state_dict())\n</code></pre>\n<p>However, the model works completely fine when I write this code:</p>\n<pre><code class=\"lang-auto\">class LlamaHybridForCausalLM(LlamaForCausalLM):\n    def __init__(self, config: LlamaHybridConfig):\n        super().__init__(config)\n        if config.hybrid:\n            for i, layer in enumerate(self.model.layers):\n                # Need to also copy attention weights\n                old_attn = layer.self_attn\n                layer.self_attn = LlamaAttention(config, i)\n                layer.self_attn.load_state_dict(old_attn.state_dict())\n</code></pre>\n<p>Why would this happen even when in the subclass i don’t make any changes? Note, that the forward function here is defined exactly the same as the source code.</p>\n<pre><code class=\"lang-auto\">class LlamaAttentionHybrid(LlamaAttention):\n    def __init__(self, config: LlamaHybridConfig, layer_idx: int):\n        super().__init__(config, layer_idx)\n\n    def forward(\n        self,\n        hidden_states: torch.Tensor,\n        position_embeddings: tuple[torch.Tensor, torch.Tensor],\n        attention_mask: Optional[torch.Tensor],\n        past_key_values: Optional[Cache] = None,\n        cache_position: Optional[torch.LongTensor] = None,\n        **kwargs: Unpack[FlashAttentionKwargs],\n    ) -&gt; tuple[torch.Tensor, Optional[torch.Tensor], Optional[tuple[torch.Tensor]]]:\n\n        input_shape = hidden_states.shape[:-1]\n        hidden_shape = (*input_shape, -1, self.head_dim)\n\n        query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)\n        key_states = self.k_proj(hidden_states).view(hidden_shape).transpose(1, 2)\n        value_states = self.v_proj(hidden_states).view(hidden_shape).transpose(1, 2)\n\n        cos, sin = position_embeddings\n        query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)\n\n        if past_key_values is not None:\n            # sin and cos are specific to RoPE models; cache_position needed for the static cache\n            cache_kwargs = {\"sin\": sin, \"cos\": cos, \"cache_position\": cache_position}\n            key_states, value_states = past_key_values.update(key_states, value_states, self.layer_idx, cache_kwargs)\n\n        attention_interface: Callable = eager_attention_forward\n        if self.config._attn_implementation != \"eager\":\n            attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]\n\n        attn_output, attn_weights = attention_interface(\n            self,\n            query_states,\n            key_states,\n            value_states,\n            attention_mask,\n            dropout=0.0 if not self.training else self.attention_dropout,\n            scaling=self.scaling,\n            **kwargs,\n        )\n\n        attn_output = attn_output.reshape(*input_shape, -1).contiguous()\n        attn_output = self.o_proj(attn_output)\n        return attn_output, attn_weights\n</code></pre>\n<p>Thanks!</p>\n<p>EDIT: I narrowed the issue down to the redefining of the forward function. For some reason when I add the forward function into the subclass even if it’s identical, the model hallucinates dramatically.</p>","solution":"<p>SOLUTION: With SDPA attention, passing in an attention_mask with value not equal to none overrides the causal attention mask! You need to fill the attention mask with -inf (or large negative number) in the upper right triangle. This is only really a problem when calculating the attention scores of the initial text input, since newly generated tokens don’t require any of the existing key tokens to be masked.</p>","evaluation":{"extracted_final_answer":"<p>SOLUTION: With SDPA attention, passing in an attention_mask with value not equal to none overrides the causal attention mask! You need to fill the attention mask with -inf (or large negative number) in the upper right triangle. This is only really a problem when calculating the attention scores of the initial text input, since newly generated tokens don’t require any of the existing key tokens to be masked.</p>","reasoning":"The extracted_final_answer is identical to the correct_answer provided. There are no differences in wording, punctuation, or meaning between the two. Therefore, the correct_answer is fully included in the extracted_final_answer without any discrepancies.","correct":"yes","confidence":100}}
-{"discussion_title":"Cannot load Conll2003","discussion_url":"https://discuss.huggingface.co/t/cannot-load-conll2003/169142","discussion_topic_id":169142,"discussion_category":10,"discussion_created_at":"2025-10-14T12:17:33.072000Z","thread":[{"id":243574,"name":"Radek Štulc","username":"stulcrad","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/s/4bbf92/{size}.png","created_at":"2025-10-14T12:17:33.129Z","cooked":"<p>I am trying to load conll2003 dataset the basic way I learned like this</p>\n<pre><code class=\"lang-auto\">from datasets import load_dataset\ndataset = load_dataset(\"conll2003\")\n</code></pre>\n<p>but I am running into this error</p>\n<pre><code class=\"lang-auto\">---------------------------------------------------------------------------\nRuntimeError                              Traceback (most recent call last)\nCell In[15], line 3\n      1 from datasets import load_dataset\n----&gt; 3 dataset = load_dataset(\"conll2003\")\n\nFile ~/.local/lib/python3.12/site-packages/datasets/load.py:1397, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, keep_in_memory, save_infos, revision, token, streaming, num_proc, storage_options, **config_kwargs)\n   1392 verification_mode = VerificationMode(\n   1393     (verification_mode or VerificationMode.BASIC_CHECKS) if not save_infos else VerificationMode.ALL_CHECKS\n   1394 )\n   1396 # Create a dataset builder\n-&gt; 1397 builder_instance = load_dataset_builder(\n   1398     path=path,\n   1399     name=name,\n   1400     data_dir=data_dir,\n   1401     data_files=data_files,\n   1402     cache_dir=cache_dir,\n   1403     features=features,\n   1404     download_config=download_config,\n   1405     download_mode=download_mode,\n   1406     revision=revision,\n   1407     token=token,\n   1408     storage_options=storage_options,\n   1409     **config_kwargs,\n   1410 )\n   1412 # Return iterable dataset in case of streaming\n   1413 if streaming:\n\nFile ~/.local/lib/python3.12/site-packages/datasets/load.py:1137, in load_dataset_builder(path, name, data_dir, data_files, cache_dir, features, download_config, download_mode, revision, token, storage_options, **config_kwargs)\n   1135 if features is not None:\n   1136     features = _fix_for_backward_compatible_features(features)\n-&gt; 1137 dataset_module = dataset_module_factory(\n   1138     path,\n   1139     revision=revision,\n   1140     download_config=download_config,\n   1141     download_mode=download_mode,\n   1142     data_dir=data_dir,\n   1143     data_files=data_files,\n   1144     cache_dir=cache_dir,\n   1145 )\n   1146 # Get dataset builder class\n   1147 builder_kwargs = dataset_module.builder_kwargs\n\nFile ~/.local/lib/python3.12/site-packages/datasets/load.py:1036, in dataset_module_factory(path, revision, download_config, download_mode, data_dir, data_files, cache_dir, **download_kwargs)\n   1031             if isinstance(e1, FileNotFoundError):\n   1032                 raise FileNotFoundError(\n   1033                     f\"Couldn't find any data file at {relative_to_absolute_path(path)}. \"\n   1034                     f\"Couldn't find '{path}' on the Hugging Face Hub either: {type(e1).__name__}: {e1}\"\n   1035                 ) from None\n-&gt; 1036             raise e1 from None\n   1037 else:\n   1038     raise FileNotFoundError(f\"Couldn't find any data file at {relative_to_absolute_path(path)}.\")\n\nFile ~/.local/lib/python3.12/site-packages/datasets/load.py:994, in dataset_module_factory(path, revision, download_config, download_mode, data_dir, data_files, cache_dir, **download_kwargs)\n    986 try:\n    987     api.hf_hub_download(\n    988         repo_id=path,\n    989         filename=filename,\n   (...)\n    992         proxies=download_config.proxies,\n    993     )\n--&gt; 994     raise RuntimeError(f\"Dataset scripts are no longer supported, but found {filename}\")\n    995 except EntryNotFoundError:\n    996     # Use the infos from the parquet export except in some cases:\n    997     if data_dir or data_files or (revision and revision != \"main\"):\n\nRuntimeError: Dataset scripts are no longer supported, but found conll2003.py\n</code></pre>\n<p>Could someone tell me what is wrong?</p>","post_number":1,"post_type":1,"posts_count":4,"updated_at":"2025-10-14T12:17:33.129Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":43,"reads":8,"readers_count":7,"score":121.4,"yours":false,"topic_id":169142,"topic_slug":"cannot-load-conll2003","display_username":"Radek Štulc","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":41660,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/cannot-load-conll2003/169142/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":243575,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-14T12:28:06.176Z","cooked":"<p>Try:</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">from datasets import load_dataset\ndataset = load_dataset(\"lhoestq/conll2003\")\n</code></pre>\n<p>This is because <a href=\"https://github.com/huggingface/datasets/releases/tag/4.0.0\">support for <code>trust_remote_code=True</code> was removed in <code>datasets</code> library version 4.0.0 and later</a>. You can work around this by using datasets that don’t rely on builder scripts (like the one shown above) or by downgrading the <code>datasets</code> library to version 3.6.0 or earlier.</p>","post_number":2,"post_type":1,"posts_count":4,"updated_at":"2025-10-14T12:28:06.176Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":8,"readers_count":7,"score":21.4,"yours":false,"topic_id":169142,"topic_slug":"cannot-load-conll2003","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/huggingface/datasets/releases/tag/4.0.0","internal":false,"reflection":false,"title":"Release 4.0.0 · huggingface/datasets · GitHub","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/cannot-load-conll2003/169142/2","reactions":[{"id":"heart","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":243576,"name":"Radek Štulc","username":"stulcrad","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/s/4bbf92/{size}.png","created_at":"2025-10-14T12:35:37.592Z","cooked":"<p>That works, thank you.<br>\nThat’s interesting, so I assume the support for loading scripts has also been removed, so if I want to upload a custom dataset, I will need to manually convert it into DatasetDict and push it using this class.</p>","post_number":3,"post_type":1,"posts_count":4,"updated_at":"2025-10-14T12:35:37.592Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":7,"readers_count":6,"score":16.2,"yours":false,"topic_id":169142,"topic_slug":"cannot-load-conll2003","display_username":"Radek Štulc","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":41660,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/cannot-load-conll2003/169142/3","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243611,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-10-15T00:36:12.117Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":4,"post_type":3,"posts_count":4,"updated_at":"2025-10-15T00:36:12.117Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":5,"readers_count":4,"score":5.8,"yours":false,"topic_id":169142,"topic_slug":"cannot-load-conll2003","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/cannot-load-conll2003/169142/4","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>I am trying to load conll2003 dataset the basic way I learned like this</p>\n<pre><code class=\"lang-auto\">from datasets import load_dataset\ndataset = load_dataset(\"conll2003\")\n</code></pre>\n<p>but I am running into this error</p>\n<pre><code class=\"lang-auto\">---------------------------------------------------------------------------\nRuntimeError                              Traceback (most recent call last)\nCell In[15], line 3\n      1 from datasets import load_dataset\n----&gt; 3 dataset = load_dataset(\"conll2003\")\n\nFile ~/.local/lib/python3.12/site-packages/datasets/load.py:1397, in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, verification_mode, keep_in_memory, save_infos, revision, token, streaming, num_proc, storage_options, **config_kwargs)\n   1392 verification_mode = VerificationMode(\n   1393     (verification_mode or VerificationMode.BASIC_CHECKS) if not save_infos else VerificationMode.ALL_CHECKS\n   1394 )\n   1396 # Create a dataset builder\n-&gt; 1397 builder_instance = load_dataset_builder(\n   1398     path=path,\n   1399     name=name,\n   1400     data_dir=data_dir,\n   1401     data_files=data_files,\n   1402     cache_dir=cache_dir,\n   1403     features=features,\n   1404     download_config=download_config,\n   1405     download_mode=download_mode,\n   1406     revision=revision,\n   1407     token=token,\n   1408     storage_options=storage_options,\n   1409     **config_kwargs,\n   1410 )\n   1412 # Return iterable dataset in case of streaming\n   1413 if streaming:\n\nFile ~/.local/lib/python3.12/site-packages/datasets/load.py:1137, in load_dataset_builder(path, name, data_dir, data_files, cache_dir, features, download_config, download_mode, revision, token, storage_options, **config_kwargs)\n   1135 if features is not None:\n   1136     features = _fix_for_backward_compatible_features(features)\n-&gt; 1137 dataset_module = dataset_module_factory(\n   1138     path,\n   1139     revision=revision,\n   1140     download_config=download_config,\n   1141     download_mode=download_mode,\n   1142     data_dir=data_dir,\n   1143     data_files=data_files,\n   1144     cache_dir=cache_dir,\n   1145 )\n   1146 # Get dataset builder class\n   1147 builder_kwargs = dataset_module.builder_kwargs\n\nFile ~/.local/lib/python3.12/site-packages/datasets/load.py:1036, in dataset_module_factory(path, revision, download_config, download_mode, data_dir, data_files, cache_dir, **download_kwargs)\n   1031             if isinstance(e1, FileNotFoundError):\n   1032                 raise FileNotFoundError(\n   1033                     f\"Couldn't find any data file at {relative_to_absolute_path(path)}. \"\n   1034                     f\"Couldn't find '{path}' on the Hugging Face Hub either: {type(e1).__name__}: {e1}\"\n   1035                 ) from None\n-&gt; 1036             raise e1 from None\n   1037 else:\n   1038     raise FileNotFoundError(f\"Couldn't find any data file at {relative_to_absolute_path(path)}.\")\n\nFile ~/.local/lib/python3.12/site-packages/datasets/load.py:994, in dataset_module_factory(path, revision, download_config, download_mode, data_dir, data_files, cache_dir, **download_kwargs)\n    986 try:\n    987     api.hf_hub_download(\n    988         repo_id=path,\n    989         filename=filename,\n   (...)\n    992         proxies=download_config.proxies,\n    993     )\n--&gt; 994     raise RuntimeError(f\"Dataset scripts are no longer supported, but found {filename}\")\n    995 except EntryNotFoundError:\n    996     # Use the infos from the parquet export except in some cases:\n    997     if data_dir or data_files or (revision and revision != \"main\"):\n\nRuntimeError: Dataset scripts are no longer supported, but found conll2003.py\n</code></pre>\n<p>Could someone tell me what is wrong?</p>","solution":"<p>Try:</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">from datasets import load_dataset\ndataset = load_dataset(\"lhoestq/conll2003\")\n</code></pre>\n<p>This is because <a href=\"https://github.com/huggingface/datasets/releases/tag/4.0.0\">support for <code>trust_remote_code=True</code> was removed in <code>datasets</code> library version 4.0.0 and later</a>. You can work around this by using datasets that don’t rely on builder scripts (like the one shown above) or by downgrading the <code>datasets</code> library to version 3.6.0 or earlier.</p>","evaluation":{"extracted_final_answer":"<p>Try:</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">from datasets import load_dataset\ndataset = load_dataset(\"lhoestq/conll2003\")\n</code></pre>\n<p>This is because <a href=\"https://github.com/huggingface/datasets/releases/tag/4.0.0\">support for <code>trust_remote_code=True</code> was removed in <code>datasets</code> library version 4.0.0 and later</a>. You can work around this by using datasets that don’t rely on builder scripts (like the one shown above) or by downgrading the <code>datasets</code> library to version 3.6.0 or earlier.</p>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in content or meaning. Both provide the same code snippet and explanation regarding the issue with loading the conll2003 dataset.","correct":"yes","confidence":100}}
-{"discussion_title":"WGET with Token not working","discussion_url":"https://discuss.huggingface.co/t/wget-with-token-not-working/169024","discussion_topic_id":169024,"discussion_category":5,"discussion_created_at":"2025-10-08T09:03:54.478000Z","thread":[{"id":243271,"name":"Lelièvre","username":"RenanL","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/r/8dc957/{size}.png","created_at":"2025-10-08T09:03:54.532Z","cooked":"<p>Dear Hughingface Team,</p>\n<p>I’m using runpod with the templates “ComfyUI - AI-Dock”.</p>\n<p>In JupyterLab I want to download a login protected model, the one from black-forest-labs/FLUX.1-Krea-dev.</p>\n<p>wget used to work like that, I can download the model from my browser after login on my local pc.</p>\n<p><code>wget --header=“Authorization: Bearer TOKEN” ``https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors</code></p>\n<p>But I get</p>\n<pre><code class=\"lang-auto\">401 Unauthorized\nUsername/Password Authentication Failed.\n</code></pre>\n<p>If I add –debug at the end. I get:</p>\n<pre><code class=\"lang-auto\">DEBUG output created by Wget 1.21.2 on linux-gnu.\n\nReading HSTS entries from /home/user/.wget-hsts\nURI encoding = ‘UTF-8’\nConverted file name 'flux1-dev.safetensors' (UTF-8) -&gt; 'flux1-dev.safetensors' (UTF-8)\n--2025-10-08 09:03:02--  https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors\nResolving huggingface.co (huggingface.co)... 52.84.217.103, 52.84.217.69, 52.84.217.102, ...\nCaching huggingface.co =&gt; 52.84.217.103 52.84.217.69 52.84.217.102 52.84.217.88 2600:9000:203d:6200:17:b174:6d00:93a1 2600:9000:203d:e000:17:b174:6d00:93a1 2600:9000:203d:8800:17:b174:6d00:93a1 2600:9000:203d:e800:17:b174:6d00:93a1 2600:9000:203d:9600:17:b174:6d00:93a1 2600:9000:203d:2400:17:b174:6d00:93a1 2600:9000:203d:ee00:17:b174:6d00:93a1 2600:9000:203d:6400:17:b174:6d00:93a1\nConnecting to huggingface.co (huggingface.co)|52.84.217.103|:443... connected.\nCreated socket 3.\nReleasing 0x000061bc69c86ec0 (new refcount 1).\nInitiating SSL handshake.\nHandshake successful; connected socket 3 to SSL handle 0x000061bc69c888a0\ncertificate:\n  subject: CN=huggingface.co\n  issuer:  CN=Amazon RSA 2048 M02,O=Amazon,C=US\nX509 certificate successfully verified and matches host huggingface.co\n\n---request begin---\nGET /black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors HTTP/1.1\nHost: huggingface.co\nUser-Agent: Wget/1.21.2\nAccept: */*\nAccept-Encoding: identity\nConnection: Keep-Alive\nAuthorization: Bearer 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\n\n---request end---\nHTTP request sent, awaiting response... \n---response begin---\nHTTP/1.1 401 Unauthorized\nContent-Type: text/html; charset=utf-8\nContent-Length: 22349\nConnection: keep-alive\nDate: Wed, 08 Oct 2025 09:03:02 GMT\nETag: W/\"574d-1eC4sA5Q/PbQ5YhsvC0L0NiNhEc\"\nX-Powered-By: huggingface-moon\nRateLimit: \"pages\";r=999;t=66\nRateLimit-Policy: \"fixed window\";\"pages\";q=1000;w=300\ncross-origin-opener-policy: same-origin\nReferrer-Policy: strict-origin-when-cross-origin\nX-Request-Id: Root=1-68e628c6-753c6a394bc274c7764e5a2f\nX-Error-Message: Invalid credentials in Authorization header\nx-frame-options: SAMEORIGIN\nX-Cache: Error from cloudfront\nVia: 1.1 fdd255cb127a7759980ee879db5de580.cloudfront.net (CloudFront)\nX-Amz-Cf-Pop: DFW59-P5\nX-Amz-Cf-Id: tZ4CtuVneK0RyHpWtL5_DbEc3eq4qqEMlGoXvt8V9CLxqmo2CX4puw==\n\n---response end---\n401 Unauthorized\nRegistered socket 3 for persistent reuse.\nDisabling further reuse of socket 3.\nClosed 3/SSL 0x000061bc69c888a0\n\nUsername/Password Authentication Failed.\n</code></pre>\n<p>Thank you for looking into that.</p>","post_number":1,"post_type":1,"posts_count":8,"updated_at":"2025-10-08T09:03:54.532Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":15,"reads":6,"readers_count":5,"score":61.2,"yours":false,"topic_id":169024,"topic_slug":"wget-with-token-not-working","display_username":"Lelièvre","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":105173,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/wget-with-token-not-working/169024/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":243288,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-08T10:22:28.337Z","cooked":"<p>How about <code>resolve</code> instead of <code>blob</code> for now?<br>\n<code>wget --header=\"Authorization: Bearer TOKEN\" \"https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors\"</code></p>","post_number":2,"post_type":1,"posts_count":8,"updated_at":"2025-10-08T10:23:15.516Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":5,"readers_count":4,"score":16.0,"yours":false,"topic_id":169024,"topic_slug":"wget-with-token-not-working","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/wget-with-token-not-working/169024/2","reactions":[{"id":"hugs","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":243295,"name":"Lelièvre","username":"RenanL","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/r/8dc957/{size}.png","created_at":"2025-10-08T11:27:51.251Z","cooked":"<p>resolve is solving the problem!</p>\n<p>Thank you so much for your help.</p>\n<p>Why I get blob instead of resolve in the url?</p>","post_number":3,"post_type":1,"posts_count":8,"updated_at":"2025-10-08T11:27:51.251Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":5,"readers_count":4,"score":16.0,"yours":false,"topic_id":169024,"topic_slug":"wget-with-token-not-working","display_username":"Lelièvre","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":105173,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/wget-with-token-not-working/169024/3","reactions":[{"id":"confetti_ball","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243299,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-08T11:38:28.728Z","cooked":"<p><code>blob</code> is for web UI file-viewer URL. <code>resolve</code> is for file itself. Probably got mixed in from copy-pasting.</p>","post_number":4,"post_type":1,"posts_count":8,"updated_at":"2025-10-08T11:39:07.386Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":5,"readers_count":4,"score":21.0,"yours":false,"topic_id":169024,"topic_slug":"wget-with-token-not-working","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/wget-with-token-not-working/169024/4","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243301,"name":"Lelièvre","username":"RenanL","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/r/8dc957/{size}.png","created_at":"2025-10-08T11:58:23.708Z","cooked":"<p>Need to check that!</p>\n<p>Thank you again.</p>","post_number":5,"post_type":1,"posts_count":8,"updated_at":"2025-10-08T11:58:23.708Z","reply_count":0,"reply_to_post_number":4,"quote_count":0,"incoming_link_count":0,"reads":5,"readers_count":4,"score":16.0,"yours":false,"topic_id":169024,"topic_slug":"wget-with-token-not-working","display_username":"Lelièvre","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":105173,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/wget-with-token-not-working/169024/5","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243326,"name":"Vu Hung Nguyen","username":"vuhung","avatar_template":"/user_avatar/discuss.huggingface.co/vuhung/{size}/53965_2.png","created_at":"2025-10-08T22:23:11.995Z","cooked":"<p>In this context, is curl better than wget?</p>","post_number":6,"post_type":1,"posts_count":8,"updated_at":"2025-10-08T22:23:11.995Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":3,"readers_count":2,"score":20.6,"yours":false,"topic_id":169024,"topic_slug":"wget-with-token-not-working","display_username":"Vu Hung Nguyen","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":103980,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/wget-with-token-not-working/169024/6","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243327,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-08T22:29:30.794Z","cooked":"<p>Yeah. Well, I think most people use <code>curl</code>. The HF sample also uses <code>curl</code>. Even in that case, though, you should probably use URLs with <code>resolve</code> in the default behavior.</p>","post_number":7,"post_type":1,"posts_count":8,"updated_at":"2025-10-08T22:29:30.794Z","reply_count":0,"reply_to_post_number":6,"quote_count":0,"incoming_link_count":0,"reads":3,"readers_count":2,"score":0.6,"yours":false,"topic_id":169024,"topic_slug":"wget-with-token-not-working","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":"Regular","title_is_group":false,"reply_to_user":{"id":103980,"username":"vuhung","name":"Vu Hung Nguyen","avatar_template":"/user_avatar/discuss.huggingface.co/vuhung/{size}/53965_2.png"},"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/wget-with-token-not-working/169024/7","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243371,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-10-09T10:29:31.103Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":8,"post_type":3,"posts_count":8,"updated_at":"2025-10-09T10:29:31.103Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":0.4,"yours":false,"topic_id":169024,"topic_slug":"wget-with-token-not-working","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/wget-with-token-not-working/169024/8","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Dear Hughingface Team,</p>\n<p>I’m using runpod with the templates “ComfyUI - AI-Dock”.</p>\n<p>In JupyterLab I want to download a login protected model, the one from black-forest-labs/FLUX.1-Krea-dev.</p>\n<p>wget used to work like that, I can download the model from my browser after login on my local pc.</p>\n<p><code>wget --header=“Authorization: Bearer TOKEN” ``https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors</code></p>\n<p>But I get</p>\n<pre><code class=\"lang-auto\">401 Unauthorized\nUsername/Password Authentication Failed.\n</code></pre>\n<p>If I add –debug at the end. I get:</p>\n<pre><code class=\"lang-auto\">DEBUG output created by Wget 1.21.2 on linux-gnu.\n\nReading HSTS entries from /home/user/.wget-hsts\nURI encoding = ‘UTF-8’\nConverted file name 'flux1-dev.safetensors' (UTF-8) -&gt; 'flux1-dev.safetensors' (UTF-8)\n--2025-10-08 09:03:02--  https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors\nResolving huggingface.co (huggingface.co)... 52.84.217.103, 52.84.217.69, 52.84.217.102, ...\nCaching huggingface.co =&gt; 52.84.217.103 52.84.217.69 52.84.217.102 52.84.217.88 2600:9000:203d:6200:17:b174:6d00:93a1 2600:9000:203d:e000:17:b174:6d00:93a1 2600:9000:203d:8800:17:b174:6d00:93a1 2600:9000:203d:e800:17:b174:6d00:93a1 2600:9000:203d:9600:17:b174:6d00:93a1 2600:9000:203d:2400:17:b174:6d00:93a1 2600:9000:203d:ee00:17:b174:6d00:93a1 2600:9000:203d:6400:17:b174:6d00:93a1\nConnecting to huggingface.co (huggingface.co)|52.84.217.103|:443... connected.\nCreated socket 3.\nReleasing 0x000061bc69c86ec0 (new refcount 1).\nInitiating SSL handshake.\nHandshake successful; connected socket 3 to SSL handle 0x000061bc69c888a0\ncertificate:\n  subject: CN=huggingface.co\n  issuer:  CN=Amazon RSA 2048 M02,O=Amazon,C=US\nX509 certificate successfully verified and matches host huggingface.co\n\n---request begin---\nGET /black-forest-labs/FLUX.1-dev/blob/main/flux1-dev.safetensors HTTP/1.1\nHost: huggingface.co\nUser-Agent: Wget/1.21.2\nAccept: */*\nAccept-Encoding: identity\nConnection: Keep-Alive\nAuthorization: Bearer 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\n\n---request end---\nHTTP request sent, awaiting response... \n---response begin---\nHTTP/1.1 401 Unauthorized\nContent-Type: text/html; charset=utf-8\nContent-Length: 22349\nConnection: keep-alive\nDate: Wed, 08 Oct 2025 09:03:02 GMT\nETag: W/\"574d-1eC4sA5Q/PbQ5YhsvC0L0NiNhEc\"\nX-Powered-By: huggingface-moon\nRateLimit: \"pages\";r=999;t=66\nRateLimit-Policy: \"fixed window\";\"pages\";q=1000;w=300\ncross-origin-opener-policy: same-origin\nReferrer-Policy: strict-origin-when-cross-origin\nX-Request-Id: Root=1-68e628c6-753c6a394bc274c7764e5a2f\nX-Error-Message: Invalid credentials in Authorization header\nx-frame-options: SAMEORIGIN\nX-Cache: Error from cloudfront\nVia: 1.1 fdd255cb127a7759980ee879db5de580.cloudfront.net (CloudFront)\nX-Amz-Cf-Pop: DFW59-P5\nX-Amz-Cf-Id: tZ4CtuVneK0RyHpWtL5_DbEc3eq4qqEMlGoXvt8V9CLxqmo2CX4puw==\n\n---response end---\n401 Unauthorized\nRegistered socket 3 for persistent reuse.\nDisabling further reuse of socket 3.\nClosed 3/SSL 0x000061bc69c888a0\n\nUsername/Password Authentication Failed.\n</code></pre>\n<p>Thank you for looking into that.</p>","solution":"<p>How about <code>resolve</code> instead of <code>blob</code> for now?<br>\n<code>wget --header=\"Authorization: Bearer TOKEN\" \"https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors\"</code></p>","evaluation":{"extracted_final_answer":"How about <code>resolve</code> instead of <code>blob</code> for now?<br>\n<code>wget --header=\"Authorization: Bearer TOKEN\" \"https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/flux1-dev.safetensors\"</code>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in wording or meaning. Both answers suggest using 'resolve' instead of 'blob' and provide the same wget command with the correct URL format.","correct":"yes","confidence":100}}
-{"discussion_title":"NonMatchingSplitsSizesError","discussion_url":"https://discuss.huggingface.co/t/nonmatchingsplitssizeserror/30033","discussion_topic_id":30033,"discussion_category":10,"discussion_created_at":"2023-01-19T20:12:35.014000Z","thread":[{"id":55242,"name":"Sundeep","username":"sl02","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/s/ba9def/{size}.png","created_at":"2023-01-19T20:12:35.084Z","cooked":"<p>I created a custom script which splits the raw file into train/test split on the fly. The script works with the default arguments. However, when I change the <code>test_size</code> ratio which I pass via <code>load_dataset()</code>, it fails with the following error</p>\n<pre><code class=\"lang-auto\">Traceback (most recent call last):                                                                                                                                                                                                                            \n  File \"&lt;stdin&gt;\", line 1, in &lt;module&gt;\n  File \"/Users/home/.local/share/virtualenvs/1717-yQ3Y_lVD/lib/python3.8/site-packages/datasets/load.py\", line 1757, in load_dataset\n    builder_instance.download_and_prepare(\n  File \"/Users/home/.local/share/virtualenvs/1717-yQ3Y_lVD/lib/python3.8/site-packages/datasets/builder.py\", line 860, in download_and_prepare\n    self._download_and_prepare(\n  File \"/Users/home/.local/share/virtualenvs/1717-yQ3Y_lVD/lib/python3.8/site-packages/datasets/builder.py\", line 1611, in _download_and_prepare\n    super()._download_and_prepare(\n  File \"/Users/home/.local/share/virtualenvs/1717-yQ3Y_lVD/lib/python3.8/site-packages/datasets/builder.py\", line 971, in _download_and_prepare\n    verify_splits(self.info.splits, split_dict)\n  File \"/Users/home/.local/share/virtualenvs/1717-yQ3Y_lVD/lib/python3.8/site-packages/datasets/utils/info_utils.py\", line 74, in verify_splits\n    raise NonMatchingSplitsSizesError(str(bad_splits))\ndatasets.utils.info_utils.NonMatchingSplitsSizesError\n</code></pre>\n<p>It fails the integrity check as expected. The <a href=\"https://huggingface.co/docs/datasets/about_dataset_load#maintaining-integrity\">Build and load</a> doesn’t show how to update the checks. I thought, using the <code>download_mode=force_redownload</code> argument in <code>load_dataset()</code> would fix it but it throws the same error as shown above. How do I resolve this?</p>","post_number":1,"post_type":1,"posts_count":7,"updated_at":"2023-01-19T20:12:35.084Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":6141,"reads":159,"readers_count":158,"score":30671.8,"yours":false,"topic_id":30033,"topic_slug":"nonmatchingsplitssizeserror","display_username":"Sundeep","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/docs/datasets/about_dataset_load#maintaining-integrity","internal":false,"reflection":false,"title":"Build and load","clicks":7}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":12315,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/nonmatchingsplitssizeserror/30033/1","reactions":[{"id":"heart","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":55836,"name":"Polina Kazakova","username":"polinaeterna","avatar_template":"/user_avatar/discuss.huggingface.co/polinaeterna/{size}/19055_2.png","created_at":"2023-01-25T12:10:34.924Z","cooked":"<p>Hi <a class=\"mention\" href=\"/u/sl02\">@sl02</a> ! Is <code>test_size</code> a custom builder parameter you define in your loading script?</p>\n<p>You can set <code>ignore_verifications=True</code> param in <code>load_dataset</code> to skip splits sizes verification.</p>\n<p>Also note that <code>Dataset</code> object has <a href=\"https://huggingface.co/docs/datasets/process#split\"><code>.train_test_split()</code></a> method, probably it might be useful for your case.</p>","post_number":2,"post_type":1,"posts_count":7,"updated_at":"2023-01-25T12:10:34.924Z","reply_count":2,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":60,"reads":151,"readers_count":150,"score":355.2,"yours":false,"topic_id":30033,"topic_slug":"nonmatchingsplitssizeserror","display_username":"Polina Kazakova","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/docs/datasets/process#split","internal":false,"reflection":false,"title":"Process","clicks":54}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":8429,"hidden":false,"trust_level":2,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/nonmatchingsplitssizeserror/30033/2","reactions":[{"id":"heart","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":56144,"name":"Sundeep","username":"sl02","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/s/ba9def/{size}.png","created_at":"2023-01-27T13:14:44.170Z","cooked":"<aside class=\"quote no-group\" data-username=\"sl02\" data-post=\"1\" data-topic=\"30033\">\n<div class=\"title\">\n<div class=\"quote-controls\"></div>\n<img loading=\"lazy\" alt=\"\" width=\"24\" height=\"24\" src=\"https://avatars.discourse-cdn.com/v4/letter/s/ba9def/48.png\" class=\"avatar\"> sl02:</div>\n<blockquote>\n<p><code>s.NonMatchingSplitsSizesError</code></p>\n</blockquote>\n</aside>\n<p>Hi <a class=\"mention\" href=\"/u/polinaeterna\">@polinaeterna</a><br>\nYes. <code>test_size</code> is a parameter. Sure with the <code>ignore_verifications=True</code> parameter it works. But I would like to know how, for other datasets when it changes at the source, do you update the information; The instructions in the document, to which I provide a link in the above thread, doesn’t explain this clearly.</p>\n<p>I am doing a group shuffle split because I have to ensure no overlap in the id column in the respective splits.</p>","post_number":3,"post_type":1,"posts_count":7,"updated_at":"2023-01-27T13:14:44.170Z","reply_count":1,"reply_to_post_number":2,"quote_count":1,"incoming_link_count":85,"reads":148,"readers_count":147,"score":459.6,"yours":false,"topic_id":30033,"topic_slug":"nonmatchingsplitssizeserror","display_username":"Sundeep","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":8429,"username":"polinaeterna","name":"Polina Kazakova","avatar_template":"/user_avatar/discuss.huggingface.co/polinaeterna/{size}/19055_2.png"},"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":12315,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/nonmatchingsplitssizeserror/30033/3","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":56173,"name":"Polina Kazakova","username":"polinaeterna","avatar_template":"/user_avatar/discuss.huggingface.co/polinaeterna/{size}/19055_2.png","created_at":"2023-01-27T17:56:14.846Z","cooked":"<p><a class=\"mention\" href=\"/u/sl02\">@sl02</a><br>\nWhen you load your dataset locally for the first time, it creates <code>dataset_info.json</code> file under its cache folder, the file contains all these splits info (like <code>num_examples</code>, <code>num_bytes</code>, etc.). If you regenerate the dataset while the script is unchanged (for example, run <code>load_dataset</code> with <code>download_mode=\"reuse_cache_if_exists\"</code>), it performs verifications against this file.</p>\n<p>We used to have <code>dataset_info.json</code> files in datasets repositories on the Hub (so, not just in a local cache folder) to verify splits info on the first download but now it’s <strong>deprecated</strong>, we use <code>README.md</code> instead for storing these numbers.<br>\nTo (re)compute these numbers automatically and dump them to a <code>README.md</code> file, one should run <code>datasets-cli test your_dataset --save_info</code>. And as it’s done manually, it depends on datasets’ authors if they update and push this info or not as it’s not required.<br>\nHope it’s more or less clear, feel free to ask any questions if it’s not <img src=\"https://emoji.discourse-cdn.com/apple/slight_smile.png?v=12\" title=\":slight_smile:\" class=\"emoji\" alt=\":slight_smile:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>","post_number":4,"post_type":1,"posts_count":7,"updated_at":"2023-01-27T17:56:14.846Z","reply_count":1,"reply_to_post_number":3,"quote_count":0,"incoming_link_count":101,"reads":133,"readers_count":132,"score":581.6,"yours":false,"topic_id":30033,"topic_slug":"nonmatchingsplitssizeserror","display_username":"Polina Kazakova","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":12315,"username":"sl02","name":"Sundeep","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/s/ba9def/{size}.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":3}],"moderator":false,"admin":false,"staff":false,"user_id":8429,"hidden":false,"trust_level":2,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/nonmatchingsplitssizeserror/30033/4","reactions":[{"id":"heart","type":"emoji","count":3}],"current_user_reaction":null,"reaction_users_count":3,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":56267,"name":"Sundeep","username":"sl02","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/s/ba9def/{size}.png","created_at":"2023-01-28T14:18:23.729Z","cooked":"<p><a class=\"mention\" href=\"/u/polinaeterna\">@polinaeterna</a><br>\nThanks for clearing that up!</p>","post_number":5,"post_type":1,"posts_count":7,"updated_at":"2023-01-28T14:18:23.729Z","reply_count":0,"reply_to_post_number":4,"quote_count":0,"incoming_link_count":36,"reads":114,"readers_count":113,"score":202.8,"yours":false,"topic_id":30033,"topic_slug":"nonmatchingsplitssizeserror","display_username":"Sundeep","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":8429,"username":"polinaeterna","name":"Polina Kazakova","avatar_template":"/user_avatar/discuss.huggingface.co/polinaeterna/{size}/19055_2.png"},"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":12315,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/nonmatchingsplitssizeserror/30033/5","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":89573,"name":"Adam Hjerpe","username":"hjerpe","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/h/7993a0/{size}.png","created_at":"2023-09-13T19:07:17.850Z","cooked":"<p>Note that you could get this error when you try and download an updated dataset without using the cache. E.g.,<br>\ndataset = load_dataset(url, download_mode=“force_redownload”)</p>\n<p>If the underlying dataset has been updated there can be a miss-match between the number of read records and what is read from the cache. You can read about the cache here, <a href=\"https://huggingface.co/docs/datasets/cache\" class=\"inline-onebox\">Cache management</a>.</p>","post_number":6,"post_type":1,"posts_count":7,"updated_at":"2023-09-13T19:07:17.850Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":26,"reads":85,"readers_count":84,"score":147.0,"yours":false,"topic_id":30033,"topic_slug":"nonmatchingsplitssizeserror","display_username":"Adam Hjerpe","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/docs/datasets/cache","internal":false,"reflection":false,"title":"Cache management","clicks":123}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":27951,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/nonmatchingsplitssizeserror/30033/6","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":243312,"name":"Albert Zeyer","username":"albertzeyer","avatar_template":"/user_avatar/discuss.huggingface.co/albertzeyer/{size}/46906_2.png","created_at":"2025-10-08T16:51:31.810Z","cooked":"<aside class=\"quote no-group\" data-username=\"polinaeterna\" data-post=\"2\" data-topic=\"30033\">\n<div class=\"title\">\n<div class=\"quote-controls\"></div>\n<img alt=\"\" width=\"24\" height=\"24\" src=\"https://sea2.discourse-cdn.com/hellohellohello/user_avatar/discuss.huggingface.co/polinaeterna/48/19055_2.png\" class=\"avatar\"> polinaeterna:</div>\n<blockquote>\n<p>ignore_verifications=True</p>\n</blockquote>\n</aside>\n<p>This does not work anymore. I think now you have to use <code>verification_mode=VerificationMode.NO_CHECKS</code>.</p>","post_number":7,"post_type":1,"posts_count":7,"updated_at":"2025-10-08T16:51:31.810Z","reply_count":0,"reply_to_post_number":2,"quote_count":1,"incoming_link_count":0,"reads":4,"readers_count":3,"score":15.8,"yours":false,"topic_id":30033,"topic_slug":"nonmatchingsplitssizeserror","display_username":"Albert Zeyer","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":92881,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/nonmatchingsplitssizeserror/30033/7","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>I created a custom script which splits the raw file into train/test split on the fly. The script works with the default arguments. However, when I change the <code>test_size</code> ratio which I pass via <code>load_dataset()</code>, it fails with the following error</p>\n<pre><code class=\"lang-auto\">Traceback (most recent call last):                                                                                                                                                                                                                            \n  File \"&lt;stdin&gt;\", line 1, in &lt;module&gt;\n  File \"/Users/home/.local/share/virtualenvs/1717-yQ3Y_lVD/lib/python3.8/site-packages/datasets/load.py\", line 1757, in load_dataset\n    builder_instance.download_and_prepare(\n  File \"/Users/home/.local/share/virtualenvs/1717-yQ3Y_lVD/lib/python3.8/site-packages/datasets/builder.py\", line 860, in download_and_prepare\n    self._download_and_prepare(\n  File \"/Users/home/.local/share/virtualenvs/1717-yQ3Y_lVD/lib/python3.8/site-packages/datasets/builder.py\", line 1611, in _download_and_prepare\n    super()._download_and_prepare(\n  File \"/Users/home/.local/share/virtualenvs/1717-yQ3Y_lVD/lib/python3.8/site-packages/datasets/builder.py\", line 971, in _download_and_prepare\n    verify_splits(self.info.splits, split_dict)\n  File \"/Users/home/.local/share/virtualenvs/1717-yQ3Y_lVD/lib/python3.8/site-packages/datasets/utils/info_utils.py\", line 74, in verify_splits\n    raise NonMatchingSplitsSizesError(str(bad_splits))\ndatasets.utils.info_utils.NonMatchingSplitsSizesError\n</code></pre>\n<p>It fails the integrity check as expected. The <a href=\"https://huggingface.co/docs/datasets/about_dataset_load#maintaining-integrity\">Build and load</a> doesn’t show how to update the checks. I thought, using the <code>download_mode=force_redownload</code> argument in <code>load_dataset()</code> would fix it but it throws the same error as shown above. How do I resolve this?</p>","solution":"<p><a class=\"mention\" href=\"/u/sl02\">@sl02</a><br>\nWhen you load your dataset locally for the first time, it creates <code>dataset_info.json</code> file under its cache folder, the file contains all these splits info (like <code>num_examples</code>, <code>num_bytes</code>, etc.). If you regenerate the dataset while the script is unchanged (for example, run <code>load_dataset</code> with <code>download_mode=\"reuse_cache_if_exists\"</code>), it performs verifications against this file.</p>\n<p>We used to have <code>dataset_info.json</code> files in datasets repositories on the Hub (so, not just in a local cache folder) to verify splits info on the first download but now it’s <strong>deprecated</strong>, we use <code>README.md</code> instead for storing these numbers.<br>\nTo (re)compute these numbers automatically and dump them to a <code>README.md</code> file, one should run <code>datasets-cli test your_dataset --save_info</code>. And as it’s done manually, it depends on datasets’ authors if they update and push this info or not as it’s not required.<br>\nHope it’s more or less clear, feel free to ask any questions if it’s not <img src=\"https://emoji.discourse-cdn.com/apple/slight_smile.png?v=12\" title=\":slight_smile:\" class=\"emoji\" alt=\":slight_smile:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>","evaluation":{"extracted_final_answer":"<p><a class=\"mention\" href=\"/u/sl02\">@sl02</a><br>When you load your dataset locally for the first time, it creates <code>dataset_info.json</code> file under its cache folder, the file contains all these splits info (like <code>num_examples</code>, <code>num_bytes</code>, etc.). If you regenerate the dataset while the script is unchanged (for example, run <code>load_dataset</code> with <code>download_mode=\"reuse_cache_if_exists\"</code>), it performs verifications against this file.</p>\n<p>We used to have <code>dataset_info.json</code> files in datasets repositories on the Hub (so, not just in a local cache folder) to verify splits info on the first download but now it’s <strong>deprecated</strong>, we use <code>README.md</code> instead for storing these numbers.<br>To (re)compute these numbers automatically and dump them to a <code>README.md</code> file, one should run <code>datasets-cli test your_dataset --save_info</code>. And as it’s done manually, it depends on datasets’ authors if they update and push this info or not as it’s not required.<br>Hope it’s more or less clear, feel free to ask any questions if it’s not <img src=\"https://emoji.discourse-cdn.com/apple/slight_smile.png?v=12\" title=\":slight_smile:\" class=\"emoji\" alt=\":slight_smile:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in content or meaning. Therefore, the correct answer is included in the extracted final answer without any ambiguity or inconsistency.","correct":"yes","confidence":100}}
-{"discussion_title":"Error 404 when downloading the tokenizer","discussion_url":"https://discuss.huggingface.co/t/error-404-when-downloading-the-tokenizer/168993","discussion_topic_id":168993,"discussion_category":9,"discussion_created_at":"2025-10-07T08:40:03.319000Z","thread":[{"id":243207,"name":"Stefano","username":"stefra","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/s/a9a28c/{size}.png","created_at":"2025-10-07T08:40:03.383Z","cooked":"<p>When I try to execute the following lines of code:</p>\n<p>quantization_config = BitsAndBytesConfig(load_in_8bit=True)<br>\ntokenizer = AutoTokenizer.from_pretrained(model_id)<br>\nmodel = AutoModelForCausalLM.from_pretrained(<br>\nmodel_id,<br>\ndevice_map=“auto”,<br>\nquantization_config=quantization_config<br>\n)</p>\n<p>The tokenizer raises a 404 Client Error: Not Found, specifically:<br>\n“Entry Not Found for URL: <a href=\"https://huggingface.co/api/models/Qwen/Qwen2.5-7B-Instruct/tree/main/additional_chat_templates?recursive=false&amp;expand=false\">https://huggingface.co/api/models/Qwen/Qwen2.5-7B-Instruct/tree/main/additional_chat_templates?recursive=false&amp;expand=false</a>.<br>\n<code>additional_chat_templates</code> does not exist on ‘main’.”</p>\n<p>The libraries I am using are:</p>\n<ul>\n<li>\n<p><code>tokenizers == 0.21.2</code></p>\n</li>\n<li>\n<p><code>transformers == 4.53.3</code></p>\n</li>\n<li>\n<p><code>bitsandbytes == 0.48.1</code></p>\n</li>\n</ul>\n<p>Is there anything I can do to fix this issue? Could it be related to a version mismatch? Any advice would be appreciated.</p>","post_number":1,"post_type":1,"posts_count":3,"updated_at":"2025-10-07T08:40:03.383Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":595,"reads":12,"readers_count":11,"score":2142.0,"yours":false,"topic_id":168993,"topic_slug":"error-404-when-downloading-the-tokenizer","display_username":"Stefano","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/api/models/Qwen/Qwen2.5-7B-Instruct/tree/main/additional_chat_templates?recursive=false&expand=false","internal":false,"reflection":false,"clicks":1}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":105159,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/error-404-when-downloading-the-tokenizer/168993/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":243209,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-10-07T09:34:58.688Z","cooked":"<p>Seems <a href=\"https://github.com/huggingface/transformers/issues/39873\">a resolved bug of Transformers</a>. Try upgrade <code>pip install -U transformers</code></p>","post_number":2,"post_type":1,"posts_count":3,"updated_at":"2025-10-07T09:34:58.688Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":24,"reads":11,"readers_count":10,"score":86.8,"yours":false,"topic_id":168993,"topic_slug":"error-404-when-downloading-the-tokenizer","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/huggingface/transformers/issues/39873","internal":false,"reflection":false,"title":"Checking for additional_chat_templates doesn't work without internet (ConnectionError) · Issue #39873 · huggingface/transformers · GitHub","clicks":89},{"url":"https://discuss.huggingface.co/t/autotokenizer-404-error-issue/169085/2","internal":true,"reflection":true,"title":"AutoTokenizer 404 error issue","clicks":6}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/error-404-when-downloading-the-tokenizer/168993/2","reactions":[{"id":"heart","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":243240,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-10-07T21:35:22.053Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":3,"post_type":3,"posts_count":3,"updated_at":"2025-10-07T21:35:22.053Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":6,"reads":10,"readers_count":9,"score":16.6,"yours":false,"topic_id":168993,"topic_slug":"error-404-when-downloading-the-tokenizer","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/error-404-when-downloading-the-tokenizer/168993/3","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>When I try to execute the following lines of code:</p>\n<p>quantization_config = BitsAndBytesConfig(load_in_8bit=True)<br>\ntokenizer = AutoTokenizer.from_pretrained(model_id)<br>\nmodel = AutoModelForCausalLM.from_pretrained(<br>\nmodel_id,<br>\ndevice_map=“auto”,<br>\nquantization_config=quantization_config<br>\n)</p>\n<p>The tokenizer raises a 404 Client Error: Not Found, specifically:<br>\n“Entry Not Found for URL: <a href=\"https://huggingface.co/api/models/Qwen/Qwen2.5-7B-Instruct/tree/main/additional_chat_templates?recursive=false&amp;expand=false\">https://huggingface.co/api/models/Qwen/Qwen2.5-7B-Instruct/tree/main/additional_chat_templates?recursive=false&amp;expand=false</a>.<br>\n<code>additional_chat_templates</code> does not exist on ‘main’.”</p>\n<p>The libraries I am using are:</p>\n<ul>\n<li>\n<p><code>tokenizers == 0.21.2</code></p>\n</li>\n<li>\n<p><code>transformers == 4.53.3</code></p>\n</li>\n<li>\n<p><code>bitsandbytes == 0.48.1</code></p>\n</li>\n</ul>\n<p>Is there anything I can do to fix this issue? Could it be related to a version mismatch? Any advice would be appreciated.</p>","solution":"<p>Seems <a href=\"https://github.com/huggingface/transformers/issues/39873\">a resolved bug of Transformers</a>. Try upgrade <code>pip install -U transformers</code></p>","evaluation":{"extracted_final_answer":"<p>Seems <a href=\"https://github.com/huggingface/transformers/issues/39873\">a resolved bug of Transformers</a>. Try upgrade <code>pip install -U transformers</code></p>","reasoning":"The extracted_final_answer matches the correct_answer exactly, with no differences in wording or meaning. Both contain the same information and phrasing, indicating that the response is correct and directly addresses the question.","correct":"yes","confidence":100}}
-{"discussion_title":"Permission error when starting a LableStudio space","discussion_url":"https://discuss.huggingface.co/t/permission-error-when-starting-a-lablestudio-space/168735","discussion_topic_id":168735,"discussion_category":5,"discussion_created_at":"2025-09-28T01:03:19.470000Z","thread":[{"id":242700,"name":"Lin Chen you","username":"cylin577","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/c/dbc845/{size}.png","created_at":"2025-09-28T01:03:19.540Z","cooked":"<p>It says</p>\n<pre><code class=\"lang-auto\">Exit code: 1. Reason: =&gt; Database and media directory: /label-studio/data\n=&gt; Static URL is set to: /static/\nTraceback (most recent call last):\n  File \"/label-studio/.venv/bin/label-studio\", line 3, in &lt;module&gt;\n    from label_studio.server import main\n  File \"/label-studio/label_studio/server.py\", line 23, in &lt;module&gt;\n    from label_studio.core.argparser import parse_input_args\n  File \"/label-studio/label_studio/core/argparser.py\", line 5, in &lt;module&gt;\n    from .settings.base import EXPORT_DIR\n  File \"/label-studio/label_studio/core/settings/base.py\", line 470, in &lt;module&gt;\n    os.makedirs(MEDIA_ROOT, exist_ok=True)\n  File \"&lt;frozen os&gt;\", line 225, in makedirs\nPermissionError: [Errno 13] Permission denied: '/label-studio/data/media'\n</code></pre>\n<p>When starting up</p>","post_number":1,"post_type":1,"posts_count":5,"updated_at":"2025-09-28T01:05:44.089Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":16,"reads":5,"readers_count":4,"score":76.0,"yours":false,"topic_id":168735,"topic_slug":"permission-error-when-starting-a-lablestudio-space","display_username":"Lin Chen you","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":2,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":104613,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/permission-error-when-starting-a-lablestudio-space/168735/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":242703,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-09-28T03:39:16.858Z","cooked":"<p><a href=\"https://discuss.huggingface.co/t/permissionerror-errno-13-permission-denied-cache/146951/5\">The cause is attempting to write to a directory that is not writable due to permissions</a>. <a href=\"https://labelstud.io/guide/start\">Setting the following environment variable</a> would resolve this.<br>\n<code>LABEL_STUDIO_BASE_DATA_DIR=/tmp/label-studio</code><br>\nAny directory with write permissions will work.</p>","post_number":2,"post_type":1,"posts_count":5,"updated_at":"2025-09-28T03:40:55.524Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":2,"reads":4,"readers_count":3,"score":5.8,"yours":false,"topic_id":168735,"topic_slug":"permission-error-when-starting-a-lablestudio-space","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://discuss.huggingface.co/t/permissionerror-errno-13-permission-denied-cache/146951/5","internal":true,"reflection":false,"title":"PermissionError: [Errno 13] Permission denied: '/.cache'","clicks":1},{"url":"https://labelstud.io/guide/start","internal":false,"reflection":false,"title":"Label Studio Documentation — Start commands for Label Studio","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/permission-error-when-starting-a-lablestudio-space/168735/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":242707,"name":"James David","username":"JamesDavids","avatar_template":"/user_avatar/discuss.huggingface.co/jamesdavids/{size}/54347_2.png","created_at":"2025-09-28T08:09:39.165Z","cooked":"<p>That error is pretty straightforward — <strong>Label Studio is trying to create its <code>media</code> folder but doesn’t have permission.</strong></p>\n<p>Here’s how to fix it:</p>\n<ol>\n<li>\n<p><strong>Check who owns the folder</strong></p>\n<pre><code class=\"lang-auto\">ls -ld /label-studio/data\n\n</code></pre>\n<p>If it’s owned by <code>root</code>, Label Studio (running as a different user) can’t write there.</p>\n</li>\n<li>\n<p><strong>Give yourself permission</strong></p>\n<pre><code class=\"lang-auto\">sudo chown -R $USER:$USER /label-studio/data\n\n</code></pre>\n<p>or if you’re running inside Docker, adjust ownership to the container user (often <code>1001</code> or <code>label-studio</code>).</p>\n</li>\n<li>\n<p><strong>Set writable permissions</strong> (if quick and dirty):</p>\n<pre><code class=\"lang-auto\">sudo chmod -R 777 /label-studio/data\n\n</code></pre>\n<p>This is less safe, but fine for local experiments.</p>\n</li>\n<li>\n<p><strong>If Dockerized</strong>:</p>\n<ul>\n<li>\n<p>Mount a local volume that’s writable:</p>\n<pre><code class=\"lang-auto\">docker run -it -p 8080:8080 \\\n  -v $(pwd)/mydata:/label-studio/data \\\n  heartexlabs/label-studio:latest\n\n</code></pre>\n</li>\n<li>\n<p>Replace <code>$(pwd)/mydata</code> with a folder on your machine you own.</p>\n</li>\n</ul>\n</li>\n</ol>","post_number":3,"post_type":1,"posts_count":5,"updated_at":"2025-09-28T08:09:39.165Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":2,"reads":4,"readers_count":3,"score":10.8,"yours":false,"topic_id":168735,"topic_slug":"permission-error-when-starting-a-lablestudio-space","display_username":"James David","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":104627,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/permission-error-when-starting-a-lablestudio-space/168735/3","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":242716,"name":"Lin Chen you","username":"cylin577","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/c/dbc845/{size}.png","created_at":"2025-09-28T10:36:56.104Z","cooked":"<p>Thanks! It worked!</p>","post_number":4,"post_type":1,"posts_count":5,"updated_at":"2025-09-28T10:36:56.104Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":15.4,"yours":false,"topic_id":168735,"topic_slug":"permission-error-when-starting-a-lablestudio-space","display_username":"Lin Chen you","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":104613,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/permission-error-when-starting-a-lablestudio-space/168735/4","reactions":[{"id":"confetti_ball","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":242730,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-09-28T22:37:38.529Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":5,"post_type":3,"posts_count":5,"updated_at":"2025-09-28T22:37:38.529Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":9,"reads":1,"readers_count":0,"score":45.2,"yours":false,"topic_id":168735,"topic_slug":"permission-error-when-starting-a-lablestudio-space","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/permission-error-when-starting-a-lablestudio-space/168735/5","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>It says</p>\n<pre><code class=\"lang-auto\">Exit code: 1. Reason: =&gt; Database and media directory: /label-studio/data\n=&gt; Static URL is set to: /static/\nTraceback (most recent call last):\n  File \"/label-studio/.venv/bin/label-studio\", line 3, in &lt;module&gt;\n    from label_studio.server import main\n  File \"/label-studio/label_studio/server.py\", line 23, in &lt;module&gt;\n    from label_studio.core.argparser import parse_input_args\n  File \"/label-studio/label_studio/core/argparser.py\", line 5, in &lt;module&gt;\n    from .settings.base import EXPORT_DIR\n  File \"/label-studio/label_studio/core/settings/base.py\", line 470, in &lt;module&gt;\n    os.makedirs(MEDIA_ROOT, exist_ok=True)\n  File \"&lt;frozen os&gt;\", line 225, in makedirs\nPermissionError: [Errno 13] Permission denied: '/label-studio/data/media'\n</code></pre>\n<p>When starting up</p>","solution":"<p><a href=\"https://discuss.huggingface.co/t/permissionerror-errno-13-permission-denied-cache/146951/5\">The cause is attempting to write to a directory that is not writable due to permissions</a>. <a href=\"https://labelstud.io/guide/start\">Setting the following environment variable</a> would resolve this.<br>\n<code>LABEL_STUDIO_BASE_DATA_DIR=/tmp/label-studio</code><br>\nAny directory with write permissions will work.</p>","evaluation":{"extracted_final_answer":"<p><a href=\"https://discuss.huggingface.co/t/permissionerror-errno-13-permission-denied-cache/146951/5\">The cause is attempting to write to a directory that is not writable due to permissions</a>. <a href=\"https://labelstud.io/guide/start\">Setting the following environment variable</a> would resolve this.<br>\n<code>LABEL_STUDIO_BASE_DATA_DIR=/tmp/label-studio</code><br>\nAny directory with write permissions will work.</p>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in content or meaning. Both contain the same HTML structure, links, and text, indicating that the response is accurate and complete.","correct":"yes","confidence":100}}
-{"discussion_title":"The best model is not being saved","discussion_url":"https://discuss.huggingface.co/t/the-best-model-is-not-being-saved/168528","discussion_topic_id":168528,"discussion_category":5,"discussion_created_at":"2025-09-18T14:00:56.645000Z","thread":[{"id":242243,"name":"Alex","username":"SuperBowser","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/s/9f8e36/{size}.png","created_at":"2025-09-18T14:00:56.730Z","cooked":"<p>I am using custom metric and in my training arguments I have</p>\n<pre><code class=\"lang-auto\">greater_is_better=True,\nload_best_model_at_end=True,\n</code></pre>\n<p>But as far as I can the best model is not being saved. Here is link to my Colab notebook:</p>\n<p><a href=\"https://colab.research.google.com/drive/1ehTt53xlGV0Byx6yelifdEZcSgFREncy?usp=drive_link\" rel=\"noopener nofollow ugc\">Colab</a></p>\n<p>And here are all the details just in case:</p>\n<p>My platform and system data:</p>\n<p><code>platform: Linux</code><br>\n<code>release: 6.1.123+</code><br>\n<code>version: #1 SMP PREEMPT_DYNAMIC Sun Mar 30 16:01:29 UTC 2025</code><br>\n<code>machine: x86_64</code><br>\n<code>torch: 2.8.0+cu126</code><br>\n<code>transformers:4.55.4</code><br>\n<code>compiler: 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0]</code><br>\n<code>GPU/TPU: Tesla T4</code><br>\n<code>CUDA compiler:</code><br>\n<code>nvcc: NVIDIA (R) Cuda compiler driver</code><br>\n<code>Copyright (c) 2005-2024 NVIDIA Corporation</code><br>\n<code>Built on Thu_Jun__6_02:18:23_PDT_2024</code><br>\n<code>Cuda compilation tools, release 12.5, V12.5.82</code><br>\n<code>Build cuda_12.5.r12.5/compiler.34385749_0</code></p>\n<p>Here is my code:</p>\n<pre><code class=\"lang-auto\">from transformers import AutoModelForSequenceClassification, AutoTokenizer\nimport transformersimport sysimport torch\nimport pandas as pd, numpy as npfrom sklearn.preprocessing\nimport LabelEncoder\n</code></pre>\n<pre><code class=\"lang-auto\">import joblibimport pandas as pd\nimport os\nfrom sklearn.model_selection import train_test_split\nfrom datasets import Datasetimport numpy as np\nfrom transformers import TrainingArguments,Trainer\nimport platform\n\nimport os\nmodel_name = 'microsoft/deberta-v3-xsmall'\nmodel_name_path = 'deberta-v3-xsmall'\nDIR = '../MAP_models/'+model_name_path+'/tuned/'\nos.makedirs('../MAP_models', exist_ok = True)\nos.makedirs('../MAP_models/'+model_name_path, exist_ok = True)\nos.makedirs('../MAP_models/'+model_name_path+'/tuned', exist_ok=True)\nos.makedirs('../MAP_models/'+model_name_path+'/tuned/model', exist_ok=True)\n\n\nNUM_LABELS = 65\ntext = [f\"example {i}\" for i in range(300)]\nlabel = [i % NUM_LABELS for i in range(300)]\ntrain = pd.DataFrame({'text': text, 'label': label})\n\ntrain_df, val_df = train_test_split(train, test_size=0.2, random_state=42)\n\n# Convert to Hugging Face Dataset\nCOLS = ['text','label']\ntrain_ds = Dataset.from_pandas(train_df[COLS])\nval_ds = Dataset.from_pandas(val_df[COLS])\n\n\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nMAX_LEN = 256\n   \n# Tokenization function\ndef tokenize(batch):\n    return tokenizer(batch[\"text\"], padding=\"max_length\", truncation=True, max_length=256)\n    \ntrain_ds = train_ds.map(tokenize, batched=True)\nval_ds = val_ds.map(tokenize, batched=True)\n    \n# Set format for PyTorch\ncolumns = ['input_ids', 'attention_mask', 'label']\ntrain_ds.set_format(type='torch', columns=columns)\nval_ds.set_format(type='torch', columns=columns)\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n    model_name,\n    num_labels=NUM_LABELS, trust_remote_code=True\n    )\n\ndef compute_map3(eval_pred):\n    logits, labels = eval_pred\n    probs = torch.nn.functional.softmax(torch.tensor(logits), dim=-1).numpy()\n    \n    top3 = np.argsort(-probs, axis=1)[:, :3]  # Top 3 predictions\n    match = (top3 == labels[:, None])\n\n    # Compute MAP@3 manually\n    map3 = 0\n    for i in range(len(labels)):\n        if match[i, 0]:\n            map3 += 1.0\n        elif match[i, 1]:\n            map3 += 1.0 / 2\n        elif match[i, 2]:\n            map3 += 1.0 / 3\n    return {\"map@3\": map3 / len(labels)}\n\nargs = TrainingArguments(\n        per_device_train_batch_size = 2, \n        per_device_eval_batch_size= 2,\n        gradient_accumulation_steps = 1,\n        warmup_steps = 10,\n        num_train_epochs = 1,\n        learning_rate = 5e-5,\n        fp16 = True,\n        bf16 = False,\n        logging_steps = 1,\n        optim = \"adamw_torch_fused\",\n        weight_decay = 0.01,\n        eval_strategy=\"steps\",\n        lr_scheduler_type = \"cosine_with_restarts\",\n        seed = 3407,\n        output_dir = DIR+\"output\",\n        logging_dir=DIR+\"logs\",\n        greater_is_better=True,\n        load_best_model_at_end=True,\n        save_steps=10,\n        eval_steps=10,\n        save_total_limit=3,\n        report_to = \"none\", \n    )\n\ntrainer = Trainer(\n    model = model,\n    processing_class = tokenizer,\n    eval_dataset = val_ds,\n    train_dataset = train_ds,\n    args = args,\n    compute_metrics = compute_map3,\n)\n\ntrainer_stats = trainer.train()\n\n\n</code></pre>\n<p>It produces the following output</p>\n<p><code>Step\tTraining Loss\tValidation Loss\tMap@3</code><br>\n<code>10\t4.235900\t4.182212\t0.025000</code><br>\n<code>20\t4.245500\t4.176703\t0.038889</code><br>\n<code>30\t4.166400\t4.171503\t0.030556</code><br>\n<code>40\t4.163400\t4.174795\t0.025000</code><br>\n<code>50\t4.187000\t4.174973\t0.025000</code><br>\n<code>60\t4.240600\t4.176061\t0.038889</code><br>\n<code>70\t4.123800\t4.177481\t0.036111</code><br>\n<code>80\t4.130100\t4.177088\t0.033333</code><br>\n<code>90\t4.140700\t4.177318\t0.022222</code><br>\n<code>100\t4.180000\t4.178491\t0.022222</code><br>\n<code>110\t4.112100\t4.178146\t0.025000</code><br>\n<code>120\t4.229100\t4.178137\t0.025000</code></p>\n<p>But when I run</p>\n<p><code>trainer.evaluate(val_ds)</code></p>\n<p><code>{‘eval_loss’: 4.1822123527526855,</code><br>\n<code>‘eval_map@3’: 0.025,</code><br>\n<code>‘eval_runtime’: 0.9703,</code><br>\n<code>‘eval_samples_per_second’: 61.836,</code><br>\n<code>‘eval_steps_per_second’: 30.918,</code><br>\n<code>‘epoch’: 1.0}</code></p>\n<p>It seems like evaluation is done on the very first 10 steps, rather than on the best model.</p>\n<p>What am I doing wrong?</p>","post_number":1,"post_type":1,"posts_count":4,"updated_at":"2025-09-18T14:02:06.119Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":4,"reads":9,"readers_count":8,"score":36.8,"yours":false,"topic_id":168528,"topic_slug":"the-best-model-is-not-being-saved","display_username":"Alex","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://colab.research.google.com/drive/1ehTt53xlGV0Byx6yelifdEZcSgFREncy?usp=drive_link","internal":false,"reflection":false,"title":"Google Colab","clicks":1}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102016,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/the-best-model-is-not-being-saved/168528/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":242254,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-09-18T15:10:23.889Z","cooked":"<p>Due to <a href=\"https://huggingface.co/datasets/John6666/forum1/blob/main/best_model_not_saved.md\"><code>metric_for_best_model</code> is missing, etc.</a> ?</p>","post_number":2,"post_type":1,"posts_count":4,"updated_at":"2025-09-18T15:10:23.889Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":2,"reads":6,"readers_count":5,"score":11.2,"yours":false,"topic_id":168528,"topic_slug":"the-best-model-is-not-being-saved","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/datasets/John6666/forum1/blob/main/best_model_not_saved.md","internal":false,"reflection":false,"clicks":2}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/the-best-model-is-not-being-saved/168528/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":242256,"name":"Alex","username":"SuperBowser","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/s/9f8e36/{size}.png","created_at":"2025-09-18T15:30:32.007Z","cooked":"<p>Thank you so much! What a blunder!</p>","post_number":3,"post_type":1,"posts_count":4,"updated_at":"2025-09-18T15:30:32.007Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":6,"readers_count":5,"score":16.2,"yours":false,"topic_id":168528,"topic_slug":"the-best-model-is-not-being-saved","display_username":"Alex","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102016,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/the-best-model-is-not-being-saved/168528/3","reactions":[{"id":"confetti_ball","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":242284,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-09-19T03:31:12.250Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":4,"post_type":3,"posts_count":4,"updated_at":"2025-09-19T03:31:12.250Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":0.4,"yours":false,"topic_id":168528,"topic_slug":"the-best-model-is-not-being-saved","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/the-best-model-is-not-being-saved/168528/4","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>I am using custom metric and in my training arguments I have</p>\n<pre><code class=\"lang-auto\">greater_is_better=True,\nload_best_model_at_end=True,\n</code></pre>\n<p>But as far as I can the best model is not being saved. Here is link to my Colab notebook:</p>\n<p><a href=\"https://colab.research.google.com/drive/1ehTt53xlGV0Byx6yelifdEZcSgFREncy?usp=drive_link\" rel=\"noopener nofollow ugc\">Colab</a></p>\n<p>And here are all the details just in case:</p>\n<p>My platform and system data:</p>\n<p><code>platform: Linux</code><br>\n<code>release: 6.1.123+</code><br>\n<code>version: #1 SMP PREEMPT_DYNAMIC Sun Mar 30 16:01:29 UTC 2025</code><br>\n<code>machine: x86_64</code><br>\n<code>torch: 2.8.0+cu126</code><br>\n<code>transformers:4.55.4</code><br>\n<code>compiler: 3.12.11 (main, Jun  4 2025, 08:56:18) [GCC 11.4.0]</code><br>\n<code>GPU/TPU: Tesla T4</code><br>\n<code>CUDA compiler:</code><br>\n<code>nvcc: NVIDIA (R) Cuda compiler driver</code><br>\n<code>Copyright (c) 2005-2024 NVIDIA Corporation</code><br>\n<code>Built on Thu_Jun__6_02:18:23_PDT_2024</code><br>\n<code>Cuda compilation tools, release 12.5, V12.5.82</code><br>\n<code>Build cuda_12.5.r12.5/compiler.34385749_0</code></p>\n<p>Here is my code:</p>\n<pre><code class=\"lang-auto\">from transformers import AutoModelForSequenceClassification, AutoTokenizer\nimport transformersimport sysimport torch\nimport pandas as pd, numpy as npfrom sklearn.preprocessing\nimport LabelEncoder\n</code></pre>\n<pre><code class=\"lang-auto\">import joblibimport pandas as pd\nimport os\nfrom sklearn.model_selection import train_test_split\nfrom datasets import Datasetimport numpy as np\nfrom transformers import TrainingArguments,Trainer\nimport platform\n\nimport os\nmodel_name = 'microsoft/deberta-v3-xsmall'\nmodel_name_path = 'deberta-v3-xsmall'\nDIR = '../MAP_models/'+model_name_path+'/tuned/'\nos.makedirs('../MAP_models', exist_ok = True)\nos.makedirs('../MAP_models/'+model_name_path, exist_ok = True)\nos.makedirs('../MAP_models/'+model_name_path+'/tuned', exist_ok=True)\nos.makedirs('../MAP_models/'+model_name_path+'/tuned/model', exist_ok=True)\n\n\nNUM_LABELS = 65\ntext = [f\"example {i}\" for i in range(300)]\nlabel = [i % NUM_LABELS for i in range(300)]\ntrain = pd.DataFrame({'text': text, 'label': label})\n\ntrain_df, val_df = train_test_split(train, test_size=0.2, random_state=42)\n\n# Convert to Hugging Face Dataset\nCOLS = ['text','label']\ntrain_ds = Dataset.from_pandas(train_df[COLS])\nval_ds = Dataset.from_pandas(val_df[COLS])\n\n\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nMAX_LEN = 256\n   \n# Tokenization function\ndef tokenize(batch):\n    return tokenizer(batch[\"text\"], padding=\"max_length\", truncation=True, max_length=256)\n    \ntrain_ds = train_ds.map(tokenize, batched=True)\nval_ds = val_ds.map(tokenize, batched=True)\n    \n# Set format for PyTorch\ncolumns = ['input_ids', 'attention_mask', 'label']\ntrain_ds.set_format(type='torch', columns=columns)\nval_ds.set_format(type='torch', columns=columns)\n\nmodel = AutoModelForSequenceClassification.from_pretrained(\n    model_name,\n    num_labels=NUM_LABELS, trust_remote_code=True\n    )\n\ndef compute_map3(eval_pred):\n    logits, labels = eval_pred\n    probs = torch.nn.functional.softmax(torch.tensor(logits), dim=-1).numpy()\n    \n    top3 = np.argsort(-probs, axis=1)[:, :3]  # Top 3 predictions\n    match = (top3 == labels[:, None])\n\n    # Compute MAP@3 manually\n    map3 = 0\n    for i in range(len(labels)):\n        if match[i, 0]:\n            map3 += 1.0\n        elif match[i, 1]:\n            map3 += 1.0 / 2\n        elif match[i, 2]:\n            map3 += 1.0 / 3\n    return {\"map@3\": map3 / len(labels)}\n\nargs = TrainingArguments(\n        per_device_train_batch_size = 2, \n        per_device_eval_batch_size= 2,\n        gradient_accumulation_steps = 1,\n        warmup_steps = 10,\n        num_train_epochs = 1,\n        learning_rate = 5e-5,\n        fp16 = True,\n        bf16 = False,\n        logging_steps = 1,\n        optim = \"adamw_torch_fused\",\n        weight_decay = 0.01,\n        eval_strategy=\"steps\",\n        lr_scheduler_type = \"cosine_with_restarts\",\n        seed = 3407,\n        output_dir = DIR+\"output\",\n        logging_dir=DIR+\"logs\",\n        greater_is_better=True,\n        load_best_model_at_end=True,\n        save_steps=10,\n        eval_steps=10,\n        save_total_limit=3,\n        report_to = \"none\", \n    )\n\ntrainer = Trainer(\n    model = model,\n    processing_class = tokenizer,\n    eval_dataset = val_ds,\n    train_dataset = train_ds,\n    args = args,\n    compute_metrics = compute_map3,\n)\n\ntrainer_stats = trainer.train()\n\n\n</code></pre>\n<p>It produces the following output</p>\n<p><code>Step\tTraining Loss\tValidation Loss\tMap@3</code><br>\n<code>10\t4.235900\t4.182212\t0.025000</code><br>\n<code>20\t4.245500\t4.176703\t0.038889</code><br>\n<code>30\t4.166400\t4.171503\t0.030556</code><br>\n<code>40\t4.163400\t4.174795\t0.025000</code><br>\n<code>50\t4.187000\t4.174973\t0.025000</code><br>\n<code>60\t4.240600\t4.176061\t0.038889</code><br>\n<code>70\t4.123800\t4.177481\t0.036111</code><br>\n<code>80\t4.130100\t4.177088\t0.033333</code><br>\n<code>90\t4.140700\t4.177318\t0.022222</code><br>\n<code>100\t4.180000\t4.178491\t0.022222</code><br>\n<code>110\t4.112100\t4.178146\t0.025000</code><br>\n<code>120\t4.229100\t4.178137\t0.025000</code></p>\n<p>But when I run</p>\n<p><code>trainer.evaluate(val_ds)</code></p>\n<p><code>{‘eval_loss’: 4.1822123527526855,</code><br>\n<code>‘eval_map@3’: 0.025,</code><br>\n<code>‘eval_runtime’: 0.9703,</code><br>\n<code>‘eval_samples_per_second’: 61.836,</code><br>\n<code>‘eval_steps_per_second’: 30.918,</code><br>\n<code>‘epoch’: 1.0}</code></p>\n<p>It seems like evaluation is done on the very first 10 steps, rather than on the best model.</p>\n<p>What am I doing wrong?</p>","solution":"<p>Due to <a href=\"https://huggingface.co/datasets/John6666/forum1/blob/main/best_model_not_saved.md\"><code>metric_for_best_model</code> is missing, etc.</a> ?</p>","evaluation":{"extracted_final_answer":"Due to <a href=\"https://huggingface.co/datasets/John6666/forum1/blob/main/best_model_not_saved.md\"><code>metric_for_best_model</code> is missing, etc.</a> ?","reasoning":"The extracted_final_answer matches the correct_answer exactly, with no differences in wording or meaning. Both contain the same content and structure, including the link and the mention of <code>metric_for_best_model</code>.","correct":"yes","confidence":100}}
-{"discussion_title":"Getting the Space name programmatically","discussion_url":"https://discuss.huggingface.co/t/getting-the-space-name-programmatically/168253","discussion_topic_id":168253,"discussion_category":24,"discussion_created_at":"2025-09-10T09:20:15.719000Z","thread":[{"id":241610,"name":"João Ricardo Silva","username":"jrsilva","avatar_template":"/user_avatar/discuss.huggingface.co/jrsilva/{size}/53168_2.png","created_at":"2025-09-10T09:20:15.781Z","cooked":"<p>Is there a programmatic way of a Space knowing its own name?</p>\n<p>For instance, the restart_space method of the huggingface_hub API requires a repo_id. If, say, I want the Space to restart itself, is there a programmatic way of getting this repo_id (and thus working without requiring changes if the Space is ever renamed) or do I have to hard-code it?</p>","post_number":1,"post_type":1,"posts_count":4,"updated_at":"2025-09-10T09:20:15.781Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":12,"reads":4,"readers_count":3,"score":65.8,"yours":false,"topic_id":168253,"topic_slug":"getting-the-space-name-programmatically","display_username":"João Ricardo Silva","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102714,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/getting-the-space-name-programmatically/168253/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":241616,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-09-10T10:59:05.305Z","cooked":"<p>Maybe <a href=\"https://huggingface.co/docs/hub/en/spaces-overview#helper-environment-variables\">simply by this</a>?</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">import os\nspace_id = os.getenv(\"SPACE_ID\", \"\")          # e.g. \"username/space-name\"\n</code></pre>","post_number":2,"post_type":1,"posts_count":4,"updated_at":"2025-09-10T10:59:05.305Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":4,"readers_count":3,"score":5.8,"yours":false,"topic_id":168253,"topic_slug":"getting-the-space-name-programmatically","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/docs/hub/en/spaces-overview#helper-environment-variables","internal":false,"reflection":false,"title":"Spaces Overview","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/getting-the-space-name-programmatically/168253/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":241627,"name":"João Ricardo Silva","username":"jrsilva","avatar_template":"/user_avatar/discuss.huggingface.co/jrsilva/{size}/53168_2.png","created_at":"2025-09-10T12:04:43.563Z","cooked":"<p>You are quite right. I somehow missed that part of the documentation. Thank you.</p>","post_number":3,"post_type":1,"posts_count":4,"updated_at":"2025-09-10T12:04:43.563Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":3,"readers_count":2,"score":15.6,"yours":false,"topic_id":168253,"topic_slug":"getting-the-space-name-programmatically","display_username":"João Ricardo Silva","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102714,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/getting-the-space-name-programmatically/168253/3","reactions":[{"id":"confetti_ball","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":241672,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-09-11T00:04:44.148Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":4,"post_type":3,"posts_count":4,"updated_at":"2025-09-11T00:04:44.148Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":1,"readers_count":0,"score":0.2,"yours":false,"topic_id":168253,"topic_slug":"getting-the-space-name-programmatically","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/getting-the-space-name-programmatically/168253/4","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Is there a programmatic way of a Space knowing its own name?</p>\n<p>For instance, the restart_space method of the huggingface_hub API requires a repo_id. If, say, I want the Space to restart itself, is there a programmatic way of getting this repo_id (and thus working without requiring changes if the Space is ever renamed) or do I have to hard-code it?</p>","solution":"<p>Maybe <a href=\"https://huggingface.co/docs/hub/en/spaces-overview#helper-environment-variables\">simply by this</a>?</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">import os\nspace_id = os.getenv(\"SPACE_ID\", \"\")          # e.g. \"username/space-name\"\n</code></pre>","evaluation":{"extracted_final_answer":"<p>Maybe <a href=\"https://huggingface.co/docs/hub/en/spaces-overview#helper-environment-variables\">simply by this</a>?</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">import os\nspace_id = os.getenv(\"SPACE_ID\", \"\")          # e.g. \"username/space-name\"\n</code></pre>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in content or meaning. Both include the same text and code snippet, indicating that the response is correct and complete.","correct":"yes","confidence":100}}
-{"discussion_title":"Image to text using blip2 gives incorrect answer","discussion_url":"https://discuss.huggingface.co/t/image-to-text-using-blip2-gives-incorrect-answer/168177","discussion_topic_id":168177,"discussion_category":5,"discussion_created_at":"2025-09-07T15:31:05.250000Z","thread":[{"id":241418,"name":"Raman Shah","username":"rxshah","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/r/a587f6/{size}.png","created_at":"2025-09-07T15:31:05.323Z","cooked":"<p>Here is code snippet slightly modified from blip2 site:</p>\n<p>first prompt  “Question: How many cats are there? Answer:” –&gt; gives correct answer Two</p>\n<p>However, second prompt “Question: How many dogs are there? Answer:” –&gt; gives incorrect answer - Two  should be Zero or None.</p>\n<p>Is this because the accuracy of the trained model is not 100%  we should get incorrect answers? OR AM I doing something incorrectly?</p>\n<p>Here is the complete code:</p>\n<p>from PIL import Image<br>\nimport requests<br>\nfrom transformers import Blip2Processor, Blip2ForConditionalGeneration<br>\nimport torch</p>\n<p>device = “cuda” if torch.cuda.is_available() else “cpu”</p>\n<p>processor = Blip2Processor.from_pretrained(“Salesforce/blip2-opt-2.7b”)<br>\nmodel = Blip2ForConditionalGeneration.from_pretrained(<br>\n“Salesforce/blip2-opt-2.7b”, torch_dtype=torch.float16<br>\n)<br>\nmodel.to(device)</p>\n<p>url = “<a href=\"http://images.cocodataset.org/val2017/000000039769.jpg%E2%80%9D\" rel=\"noopener nofollow ugc\">http://images.cocodataset.org/val2017/000000039769.jpg”</a><br>\nimage = Image.open(requests.get(url, stream=True).raw)</p>\n<p>prompt = “Question: How many cats are there? Answer:”<br>\ninputs = processor(images=image, text=prompt, return_tensors=“pt”).to(<br>\ndevice, torch.float16<br>\n)</p>\n<p>outputs = model.generate(**inputs)</p>\n<p>text = processor.tokenizer.batch_decode(outputs, skip_special_tokens=True)<br>\nprint(text)</p>\n<p>Gives correct answer: [‘Question: How many cats are there? Answer: Two\\n’]</p>\n<p>However, when I change prompt to</p>\n<p>prompt2 = \"Question: How many dogs are there? Answer: \"</p>\n<p>inputs2 = processor(images=image, text=prompt2, return_tensors=“pt”).to(<br>\ndevice, torch.float16<br>\n)</p>\n<p>outputs2 = model.generate(**inputs2)</p>\n<p>text2 = processor.tokenizer.batch_decode(outputs2, skip_special_tokens=True)<br>\nprint(text2)</p>\n<p>[‘Question: How many dogs are there? Answer: Two\\n’]</p>","post_number":1,"post_type":1,"posts_count":6,"updated_at":"2025-09-07T15:45:45.288Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":9,"reads":6,"readers_count":5,"score":61.2,"yours":false,"topic_id":168177,"topic_slug":"image-to-text-using-blip2-gives-incorrect-answer","display_username":"Raman Shah","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":3,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"http://images.cocodataset.org/val2017/000000039769.jpg%E2%80%9D","internal":false,"reflection":false,"clicks":0}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":80638,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/image-to-text-using-blip2-gives-incorrect-answer/168177/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":241436,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-09-07T20:48:34.727Z","cooked":"<blockquote>\n<p>OR AM I doing something incorrectly?</p>\n</blockquote>\n<p>There’s no problem with the code; <a href=\"https://arxiv.org/pdf/2403.01373\">it seems to be a known issue with the model / architecture</a>. You might want to try <a href=\"https://huggingface.co/Salesforce/blip2-opt-2.7b-coco\">using some fine-tuned version</a>.</p>","post_number":2,"post_type":1,"posts_count":6,"updated_at":"2025-09-07T20:48:34.727Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":4,"readers_count":3,"score":5.8,"yours":false,"topic_id":168177,"topic_slug":"image-to-text-using-blip2-gives-incorrect-answer","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/Salesforce/blip2-opt-2.7b-coco","internal":false,"reflection":false,"title":"Salesforce/blip2-opt-2.7b-coco · Hugging Face","clicks":2},{"url":"https://arxiv.org/pdf/2403.01373","internal":false,"reflection":false,"clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/image-to-text-using-blip2-gives-incorrect-answer/168177/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":241443,"name":"Raman Shah","username":"rxshah","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/r/a587f6/{size}.png","created_at":"2025-09-08T01:14:33.037Z","cooked":"<p>Thanks!!</p>\n<p>Tried the examples you pointed to. The number of dogs still gave Two.  However, following the examples further  got following results:</p>\n<pre><code class=\"lang-auto\">55.3% that image 0 is 'a photo of a cat'\n44.7% that image 0 is 'a photo of a dog'\n</code></pre>\n<p>Perhaps this explains why the model cannot distinguish between cats, dogs or anything else?</p>","post_number":3,"post_type":1,"posts_count":6,"updated_at":"2025-09-08T01:14:33.037Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":4,"readers_count":3,"score":15.8,"yours":false,"topic_id":168177,"topic_slug":"image-to-text-using-blip2-gives-incorrect-answer","display_username":"Raman Shah","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":80638,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/image-to-text-using-blip2-gives-incorrect-answer/168177/3","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":241446,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-09-08T03:51:52.414Z","cooked":"<p>Yeah. For example, CLIP can perfectly classify dogs and cats, but <a href=\"https://huggingface.co/datasets/John6666/forum1/blob/main/blip2_cats_dogs.md\">BLIP seems utterly unsuitable for classification</a>…</p>","post_number":4,"post_type":1,"posts_count":6,"updated_at":"2025-09-08T03:51:52.414Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":4,"readers_count":3,"score":20.8,"yours":false,"topic_id":168177,"topic_slug":"image-to-text-using-blip2-gives-incorrect-answer","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/datasets/John6666/forum1/blob/main/blip2_cats_dogs.md","internal":false,"reflection":false,"title":"blip2_cats_dogs.md · John6666/forum1 at main","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/image-to-text-using-blip2-gives-incorrect-answer/168177/4","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":241472,"name":"Raman Shah","username":"rxshah","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/r/a587f6/{size}.png","created_at":"2025-09-08T13:52:59.063Z","cooked":"<p>Thanks for the clear explanation!!</p>","post_number":5,"post_type":1,"posts_count":6,"updated_at":"2025-09-08T13:52:59.063Z","reply_count":0,"reply_to_post_number":4,"quote_count":0,"incoming_link_count":0,"reads":3,"readers_count":2,"score":15.6,"yours":false,"topic_id":168177,"topic_slug":"image-to-text-using-blip2-gives-incorrect-answer","display_username":"Raman Shah","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":80638,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/image-to-text-using-blip2-gives-incorrect-answer/168177/5","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":241501,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-09-09T01:53:46.094Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":6,"post_type":3,"posts_count":6,"updated_at":"2025-09-09T01:53:46.094Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":0.4,"yours":false,"topic_id":168177,"topic_slug":"image-to-text-using-blip2-gives-incorrect-answer","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/image-to-text-using-blip2-gives-incorrect-answer/168177/6","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Here is code snippet slightly modified from blip2 site:</p>\n<p>first prompt  “Question: How many cats are there? Answer:” –&gt; gives correct answer Two</p>\n<p>However, second prompt “Question: How many dogs are there? Answer:” –&gt; gives incorrect answer - Two  should be Zero or None.</p>\n<p>Is this because the accuracy of the trained model is not 100%  we should get incorrect answers? OR AM I doing something incorrectly?</p>\n<p>Here is the complete code:</p>\n<p>from PIL import Image<br>\nimport requests<br>\nfrom transformers import Blip2Processor, Blip2ForConditionalGeneration<br>\nimport torch</p>\n<p>device = “cuda” if torch.cuda.is_available() else “cpu”</p>\n<p>processor = Blip2Processor.from_pretrained(“Salesforce/blip2-opt-2.7b”)<br>\nmodel = Blip2ForConditionalGeneration.from_pretrained(<br>\n“Salesforce/blip2-opt-2.7b”, torch_dtype=torch.float16<br>\n)<br>\nmodel.to(device)</p>\n<p>url = “<a href=\"http://images.cocodataset.org/val2017/000000039769.jpg%E2%80%9D\" rel=\"noopener nofollow ugc\">http://images.cocodataset.org/val2017/000000039769.jpg”</a><br>\nimage = Image.open(requests.get(url, stream=True).raw)</p>\n<p>prompt = “Question: How many cats are there? Answer:”<br>\ninputs = processor(images=image, text=prompt, return_tensors=“pt”).to(<br>\ndevice, torch.float16<br>\n)</p>\n<p>outputs = model.generate(**inputs)</p>\n<p>text = processor.tokenizer.batch_decode(outputs, skip_special_tokens=True)<br>\nprint(text)</p>\n<p>Gives correct answer: [‘Question: How many cats are there? Answer: Two\\n’]</p>\n<p>However, when I change prompt to</p>\n<p>prompt2 = \"Question: How many dogs are there? Answer: \"</p>\n<p>inputs2 = processor(images=image, text=prompt2, return_tensors=“pt”).to(<br>\ndevice, torch.float16<br>\n)</p>\n<p>outputs2 = model.generate(**inputs2)</p>\n<p>text2 = processor.tokenizer.batch_decode(outputs2, skip_special_tokens=True)<br>\nprint(text2)</p>\n<p>[‘Question: How many dogs are there? Answer: Two\\n’]</p>","solution":"<p>Yeah. For example, CLIP can perfectly classify dogs and cats, but <a href=\"https://huggingface.co/datasets/John6666/forum1/blob/main/blip2_cats_dogs.md\">BLIP seems utterly unsuitable for classification</a>…</p>","evaluation":{"extracted_final_answer":"Yeah. For example, CLIP can perfectly classify dogs and cats, but <a href=\"https://huggingface.co/datasets/John6666/forum1/blob/main/blip2_cats_dogs.md\">BLIP seems utterly unsuitable for classification</a>…","reasoning":"The extracted_final_answer matches the correct_answer exactly, with no differences in wording or meaning. Therefore, the correct_answer is included in the extracted_final_answer without any discrepancies.","correct":"yes","confidence":100}}
-{"discussion_title":"Prevent creation of multiple checkpoints","discussion_url":"https://discuss.huggingface.co/t/prevent-creation-of-multiple-checkpoints/168144","discussion_topic_id":168144,"discussion_category":5,"discussion_created_at":"2025-09-05T20:15:07.934000Z","thread":[{"id":241309,"name":"Alex","username":"SuperBowser","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/s/9f8e36/{size}.png","created_at":"2025-09-05T20:15:08.005Z","cooked":"<p>In my training arguments I selected to save every 200 steps, but my model is fairly large (relative to my disk size). I would like to save every 200 steps, but every save should just overwrite previous save instead of creating new save point. Is this possible?</p>","post_number":1,"post_type":1,"posts_count":3,"updated_at":"2025-09-05T20:15:08.005Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":7,"reads":5,"readers_count":4,"score":51.0,"yours":false,"topic_id":168144,"topic_slug":"prevent-creation-of-multiple-checkpoints","display_username":"Alex","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102016,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/prevent-creation-of-multiple-checkpoints/168144/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":241317,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-09-06T00:19:59.432Z","cooked":"<p>Strictly speaking, it’s not overwriting, but I think<a href=\"https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments.save_total_limit\"> <code>save_total_limit</code> or <code>save_only_model</code></a> are closer to the intended purpose.</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">from transformers import TrainingArguments\n\nargs = TrainingArguments(\n    output_dir=\"out\",\n    save_strategy=\"steps\",\n    save_steps=200,\n    save_total_limit=1,      # deletes older checkpoints\n    save_only_model=True,    # 4.37+; skips optimizer/scheduler to shrink size\n)\n</code></pre>","post_number":2,"post_type":1,"posts_count":3,"updated_at":"2025-09-06T00:19:59.432Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":5,"readers_count":4,"score":1.0,"yours":false,"topic_id":168144,"topic_slug":"prevent-creation-of-multiple-checkpoints","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments.save_total_limit","internal":false,"reflection":false,"title":"Trainer","clicks":1}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/prevent-creation-of-multiple-checkpoints/168144/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":241444,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-09-08T01:48:01.261Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":3,"post_type":3,"posts_count":3,"updated_at":"2025-09-08T01:48:01.261Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":1,"readers_count":0,"score":0.2,"yours":false,"topic_id":168144,"topic_slug":"prevent-creation-of-multiple-checkpoints","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/prevent-creation-of-multiple-checkpoints/168144/3","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>In my training arguments I selected to save every 200 steps, but my model is fairly large (relative to my disk size). I would like to save every 200 steps, but every save should just overwrite previous save instead of creating new save point. Is this possible?</p>","solution":"<p>Strictly speaking, it’s not overwriting, but I think<a href=\"https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments.save_total_limit\"> <code>save_total_limit</code> or <code>save_only_model</code></a> are closer to the intended purpose.</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">from transformers import TrainingArguments\n\nargs = TrainingArguments(\n    output_dir=\"out\",\n    save_strategy=\"steps\",\n    save_steps=200,\n    save_total_limit=1,      # deletes older checkpoints\n    save_only_model=True,    # 4.37+; skips optimizer/scheduler to shrink size\n)\n</code></pre>","evaluation":{"extracted_final_answer":"<p>Strictly speaking, it’s not overwriting, but I think<a href=\"https://huggingface.co/docs/transformers/en/main_classes/trainer#transformers.TrainingArguments.save_total_limit\"> <code>save_total_limit</code> or <code>save_only_model</code></a> are closer to the intended purpose.</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">from transformers import TrainingArguments\n\nargs = TrainingArguments(\n    output_dir=\"out\",\n    save_strategy=\"steps\",\n    save_steps=200,\n    save_total_limit=1,      # deletes older checkpoints\n    save_only_model=True,    # 4.37+; skips optimizer/scheduler to shrink size\n)\n</code></pre>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in content or meaning. Therefore, it is included in the extracted final answer without any ambiguity or inconsistency.","correct":"yes","confidence":100}}
-{"discussion_title":"IndexError: Target N is out of bounds within trainer.train() function","discussion_url":"https://discuss.huggingface.co/t/indexerror-target-n-is-out-of-bounds-within-trainer-train-function/168143","discussion_topic_id":168143,"discussion_category":5,"discussion_created_at":"2025-09-05T19:13:46.123000Z","thread":[{"id":241307,"name":"Javier M.A.","username":"JavierMA","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/j/f19dbf/{size}.png","created_at":"2025-09-05T19:13:46.184Z","cooked":"<p>Hi all,</p>\n<p>I am trying to train a custom model for NLP sequence classification (multiclass) and struggling to be able to train it for a reason I don’t know, that is the reason why I am asking on this forum. I already had a look at similar posts on the forum with no luck.</p>\n<p>First of all, my dataset looks like the following in DataFrame before introducing it to a dataset (5 instances per class or label, being 0 the lowest label number and 251 the maximum one, so 252 labels in total):</p>\n<pre><code class=\"lang-auto\">                                                   text  label\n0        Configuración del área de selección de TV Set       0\n1         Configuración del área de selección de TV Set      0\n2      Conformación de la sección de selección de TV...      0\n3     Conformación ae la stcción de seldcción de TV Set      0\n4     Validar la configuración del área de selección...      0\n...                                                 ...    ...\n1281  Validación incorrecta por identificador de art...    251\n1282  Validación incorrecta mediante identificador d...    251\n1283  Validación incorrecta por identificador de art...    251\n1284  Validación incorrecta por identificador de art...    251\n1285  Validar Validación incorrecta por identificado...    251\n</code></pre>\n<p>As It is a custom model, I changed the value of out_features at out_proj in the classification part, so the resulting architecture looks like the following:</p>\n<pre><code class=\"lang-auto\">RobertaForSequenceClassification(\n  (roberta): RobertaModel(\n    (embeddings): RobertaEmbeddings(\n      (word_embeddings): Embedding(50262, 1024, padding_idx=1)\n      (position_embeddings): Embedding(514, 1024, padding_idx=1)\n      (token_type_embeddings): Embedding(1, 1024)\n      (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)\n      (dropout): Dropout(p=0.0, inplace=False)\n    )\n    (encoder): RobertaEncoder(\n      (layer): ModuleList(\n        (0-23): 24 x RobertaLayer(\n          (attention): RobertaAttention(\n            (self): RobertaSdpaSelfAttention(\n              (query): Linear(in_features=1024, out_features=1024, bias=True)\n              (key): Linear(in_features=1024, out_features=1024, bias=True)\n              (value): Linear(in_features=1024, out_features=1024, bias=True)\n              (dropout): Dropout(p=0.0, inplace=False)\n            )\n            (output): RobertaSelfOutput(\n              (dense): Linear(in_features=1024, out_features=1024, bias=True)\n              (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)\n              (dropout): Dropout(p=0.0, inplace=False)\n            )\n          )\n          (intermediate): RobertaIntermediate(\n            (dense): Linear(in_features=1024, out_features=4096, bias=True)\n            (intermediate_act_fn): GELUActivation()\n          )\n          (output): RobertaOutput(\n            (dense): Linear(in_features=4096, out_features=1024, bias=True)\n            (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)\n            (dropout): Dropout(p=0.0, inplace=False)\n          )\n        )\n      )\n    )\n  )\n  (classifier): RobertaClassificationHead(\n    (dense): Linear(in_features=1024, out_features=1024, bias=True)\n    (dropout): Dropout(p=0.0, inplace=False)\n    (out_proj): Linear(in_features=1024, out_features=252, bias=True)\n  )\n)\n</code></pre>\n<p>Then I use the following code in order to create a HuggingFace Dataset:</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">dataset = Dataset.from_pandas(df, split='train')\ndataset = dataset.train_test_split(shuffle=True, seed=42, test_size=0.2)\nprint(dataset)\n</code></pre>\n<p>Where the print gives the following result (I already checked that values in label go from 0 to N-1 labels or classes):</p>\n<pre><code class=\"lang-auto\">DatasetDict({\n    train: Dataset({\n        features: ['text', 'label'],\n        num_rows: 1028\n    })\n    test: Dataset({\n        features: ['text', 'label'],\n        num_rows: 258\n    })\n})\n</code></pre>\n<p>Despite having done all the remaining steps before training correctly (or so I believe) and having at least one instance per class in train and test dataset, when I get to the function train, I get the following error:</p>\n<pre><code class=\"lang-auto\">---------------------------------------------------------------------------\nIndexError                                Traceback (most recent call last)\nCell In[103], line 1\n----&gt; 1 trainer.train()\n      2 modelo_peft.to('cpu')\n      3 modelo_peft.eval()\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\transformers\\trainer.py:2238, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)\n   2236         hf_hub_utils.enable_progress_bars()\n   2237 else:\n-&gt; 2238     return inner_training_loop(\n   2239         args=args,\n   2240         resume_from_checkpoint=resume_from_checkpoint,\n   2241         trial=trial,\n   2242         ignore_keys_for_eval=ignore_keys_for_eval,\n   2243     )\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\transformers\\trainer.py:2582, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)\n   2575 context = (\n   2576     functools.partial(self.accelerator.no_sync, model=model)\n   2577     if i != len(batch_samples) - 1\n   2578     and self.accelerator.distributed_type != DistributedType.DEEPSPEED\n   2579     else contextlib.nullcontext\n   2580 )\n   2581 with context():\n-&gt; 2582     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)\n   2584 if (\n   2585     args.logging_nan_inf_filter\n   2586     and not is_torch_xla_available()\n   2587     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))\n   2588 ):\n   2589     # if loss is nan or inf simply add the average of previous logged losses\n   2590     tr_loss = tr_loss + tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\transformers\\trainer.py:3796, in Trainer.training_step(self, model, inputs, num_items_in_batch)\n   3793     return loss_mb.reduce_mean().detach().to(self.args.device)\n   3795 with self.compute_loss_context_manager():\n-&gt; 3796     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)\n   3798 del inputs\n   3799 if (\n   3800     self.args.torch_empty_cache_steps is not None\n   3801     and self.state.global_step % self.args.torch_empty_cache_steps == 0\n   3802 ):\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\transformers\\trainer.py:3884, in Trainer.compute_loss(self, model, inputs, return_outputs, num_items_in_batch)\n   3882         kwargs[\"num_items_in_batch\"] = num_items_in_batch\n   3883     inputs = {**inputs, **kwargs}\n-&gt; 3884 outputs = model(**inputs)\n   3885 # Save past state if it exists\n   3886 # TODO: this needs to be fixed and made cleaner later.\n   3887 if self.args.past_index &gt;= 0:\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\module.py:1773, in Module._wrapped_call_impl(self, *args, **kwargs)\n   1771     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]\n   1772 else:\n-&gt; 1773     return self._call_impl(*args, **kwargs)\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\module.py:1784, in Module._call_impl(self, *args, **kwargs)\n   1779 # If we don't have any hooks, we want to skip the rest of the logic in\n   1780 # this function, and just call forward.\n   1781 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks\n   1782         or _global_backward_pre_hooks or _global_backward_hooks\n   1783         or _global_forward_hooks or _global_forward_pre_hooks):\n-&gt; 1784     return forward_call(*args, **kwargs)\n   1786 result = None\n   1787 called_always_called_hooks = set()\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\peft\\peft_model.py:1652, in PeftModelForSequenceClassification.forward(self, input_ids, attention_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict, task_ids, **kwargs)\n   1650         if peft_config.peft_type == PeftType.POLY:\n   1651             kwargs[\"task_ids\"] = task_ids\n-&gt; 1652         return self.base_model(\n   1653             input_ids=input_ids,\n   1654             attention_mask=attention_mask,\n   1655             inputs_embeds=inputs_embeds,\n   1656             labels=labels,\n   1657             output_attentions=output_attentions,\n   1658             output_hidden_states=output_hidden_states,\n   1659             return_dict=return_dict,\n   1660             **kwargs,\n   1661         )\n   1663 batch_size = _get_batch_size(input_ids, inputs_embeds)\n   1664 if attention_mask is not None:\n   1665     # concat prompt attention mask\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\module.py:1773, in Module._wrapped_call_impl(self, *args, **kwargs)\n   1771     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]\n   1772 else:\n-&gt; 1773     return self._call_impl(*args, **kwargs)\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\module.py:1784, in Module._call_impl(self, *args, **kwargs)\n   1779 # If we don't have any hooks, we want to skip the rest of the logic in\n   1780 # this function, and just call forward.\n   1781 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks\n   1782         or _global_backward_pre_hooks or _global_backward_hooks\n   1783         or _global_forward_hooks or _global_forward_pre_hooks):\n-&gt; 1784     return forward_call(*args, **kwargs)\n   1786 result = None\n   1787 called_always_called_hooks = set()\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\peft\\tuners\\tuners_utils.py:222, in BaseTuner.forward(self, *args, **kwargs)\n    221 def forward(self, *args: Any, **kwargs: Any):\n--&gt; 222     return self.model.forward(*args, **kwargs)\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\transformers\\models\\roberta\\modeling_roberta.py:1228, in RobertaForSequenceClassification.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)\n   1226 elif self.config.problem_type == \"single_label_classification\":\n   1227     loss_fct = CrossEntropyLoss()\n-&gt; 1228     loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))\n   1229 elif self.config.problem_type == \"multi_label_classification\":\n   1230     loss_fct = BCEWithLogitsLoss()\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\module.py:1773, in Module._wrapped_call_impl(self, *args, **kwargs)\n   1771     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]\n   1772 else:\n-&gt; 1773     return self._call_impl(*args, **kwargs)\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\module.py:1784, in Module._call_impl(self, *args, **kwargs)\n   1779 # If we don't have any hooks, we want to skip the rest of the logic in\n   1780 # this function, and just call forward.\n   1781 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks\n   1782         or _global_backward_pre_hooks or _global_backward_hooks\n   1783         or _global_forward_hooks or _global_forward_pre_hooks):\n-&gt; 1784     return forward_call(*args, **kwargs)\n   1786 result = None\n   1787 called_always_called_hooks = set()\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\loss.py:1310, in CrossEntropyLoss.forward(self, input, target)\n   1309 def forward(self, input: Tensor, target: Tensor) -&gt; Tensor:\n-&gt; 1310     return F.cross_entropy(\n   1311         input,\n   1312         target,\n   1313         weight=self.weight,\n   1314         ignore_index=self.ignore_index,\n   1315         reduction=self.reduction,\n   1316         label_smoothing=self.label_smoothing,\n   1317     )\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\functional.py:3462, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)\n   3460 if size_average is not None or reduce is not None:\n   3461     reduction = _Reduction.legacy_get_string(size_average, reduce)\n-&gt; 3462 return torch._C._nn.cross_entropy_loss(\n   3463     input,\n   3464     target,\n   3465     weight,\n   3466     _Reduction.get_enum(reduction),\n   3467     ignore_index,\n   3468     label_smoothing,\n   3469 )\n\nIndexError: Target 134 is out of bounds.\n</code></pre>\n<p>Any ideas of what may be wrong? Let me know if any other information is needed.</p>\n<p>Thanks,</p>\n<p>Javier</p>","post_number":1,"post_type":1,"posts_count":6,"updated_at":"2025-09-06T10:35:54.160Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":5,"reads":7,"readers_count":6,"score":41.4,"yours":false,"topic_id":168143,"topic_slug":"indexerror-target-n-is-out-of-bounds-within-trainer-train-function","display_username":"Javier M.A.","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":4,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":103219,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/indexerror-target-n-is-out-of-bounds-within-trainer-train-function/168143/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":241316,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-09-06T00:10:31.575Z","cooked":"<p>This may occur <a href=\"https://discuss.huggingface.co/t/target-is-out-of-bounds/13802\">if <code>num_labels</code> is not passed during model loading</a>.</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">from datasets import Dataset\nfrom transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments\nimport numpy as np\nimport pandas as pd\nimport torch\nimport math\n\n# 0) Example dataframe (replace with your df)\n# df = pd.read_csv(\"your_data.csv\")  # must contain 'text' and integer 'label'\ndf = pd.DataFrame({\n    \"text\": [f\"ejemplo {i}\" for i in range(3000)],\n    \"label\": np.repeat(np.arange(252), repeats=math.ceil(3000/252))[:3000]\n})\n\n# 1) Ensure labels are 0..C-1\nC = int(df[\"label\"].max() + 1)\nm = int(df[\"label\"].min())\nif m != 0:\n    df[\"label\"] = df[\"label\"] - m\nassert df[\"label\"].between(0, C - 1).all(), \"labels must be in [0, C-1]\"\n\n# 2) Build small train/test datasets\nds = Dataset.from_pandas(df[[\"text\", \"label\"]], split=\"train\").train_test_split(test_size=0.1, seed=42)\n\n# 3) Tokenize\ntok = AutoTokenizer.from_pretrained(\"roberta-base\")\ndef preprocess(ex):\n    return tok(ex[\"text\"], truncation=True, padding=\"max_length\", max_length=64)\nds_tok = ds.map(preprocess, batched=True).remove_columns([\"text\"]).with_format(\"torch\")\n\n# 4) Create model with the correct class count; let Transformers swap the head\nmodel = AutoModelForSequenceClassification.from_pretrained(\n    \"roberta-base\",\n    num_labels=C, # tells the new classifier size\n    ignore_mismatched_sizes=True,  # skip loading the old head\n)\n# optional but recommended: explicit label maps\nmodel.config.id2label = {i: str(i) for i in range(C)}\nmodel.config.label2id = {v: k for k, v in model.config.id2label.items()}\n\n# 5) Train briefly\nargs = TrainingArguments(\n    output_dir=\"out_fix\",\n    per_device_train_batch_size=8,\n    per_device_eval_batch_size=8,\n    learning_rate=5e-5,\n    num_train_epochs=1,\n    logging_steps=10,\n    eval_strategy=\"no\",\n    report_to=\"none\",\n)\n\ntrainer = Trainer(model=model, args=args, train_dataset=ds_tok[\"train\"])\ntrainer.train() # IndexError: Target ** is out of bounds. (If without num_labels and ignore_mismatched_sizes)\n</code></pre>","post_number":2,"post_type":1,"posts_count":6,"updated_at":"2025-09-06T00:10:31.575Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":4,"readers_count":3,"score":5.8,"yours":false,"topic_id":168143,"topic_slug":"indexerror-target-n-is-out-of-bounds-within-trainer-train-function","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://discuss.huggingface.co/t/target-is-out-of-bounds/13802","internal":true,"reflection":false,"title":"Target {} is out of bounds","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/indexerror-target-n-is-out-of-bounds-within-trainer-train-function/168143/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":241346,"name":"Javier M.A.","username":"JavierMA","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/j/f19dbf/{size}.png","created_at":"2025-09-06T10:33:50.813Z","cooked":"<p>Many thanks for your answer John. Regarding what you said regarding num_labels, the way I did it in my code was the following (first line in the following code):</p>\n<pre><code class=\"lang-auto\">nueva_configuracion_modelo = AutoConfig.from_pretrained(nombre_modelo, num_labels=numero_de_etiquetas, id2label=ids_a_etiquetas, label2id=etiquetas_a_id, cache_dir='./huggingface_mirror')\n\nmodelo_roberta = AutoModelForSequenceClassification.from_pretrained('PlanTL-GOB-ES/roberta-large-bne-massive', cache_dir='./huggingface_mirror', local_files_only=True)\n\n\nif modelo_roberta.config.num_labels != nueva_configuracion_modelo.num_labels or modelo_roberta.config.id2label != nueva_configuracion_modelo_config.id2label:\n    modelo_roberta.classifier.out_proj.out_features=nueva_configuracion_modelo.num_labels\n    \nmodelo_roberta.config = nueva_configuracion_modelo\n\nprint(modelo_roberta.config)\n\ntokenizador_roberta = AutoTokenizer.from_pretrained(nombre_modelo, cache_dir='./huggingface_mirror', local_files_only=True, from_pt=True)\n</code></pre>\n<p>With that code I changed the value in out_features parameter of layer out_proj in the classification part to 252 (the number of different classes) and saw label2id and id2label updated with values from my custom model.</p>","post_number":3,"post_type":1,"posts_count":6,"updated_at":"2025-09-06T11:12:36.335Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":3,"readers_count":2,"score":15.6,"yours":false,"topic_id":168143,"topic_slug":"indexerror-target-n-is-out-of-bounds-within-trainer-train-function","display_username":"Javier M.A.","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":2,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":103219,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/indexerror-target-n-is-out-of-bounds-within-trainer-train-function/168143/3","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":241348,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-09-06T13:12:56.958Z","cooked":"<p>In that case,  the actual weigh probably won’t change t even if the attribute is modified.</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">from transformers import AutoModelForSequenceClassification, AutoTokenizer\nimport torch\n\n# 1) Load a small model with 2 labels so the classifier head is tiny\nmodel = AutoModelForSequenceClassification.from_pretrained(\"roberta-base\", num_labels=2)\ntok = AutoTokenizer.from_pretrained(\"roberta-base\")\n\nhead = model.classifier.out_proj  # this is an nn.Linear\n\nprint(\"=== BEFORE ===\")\nprint(\"repr:\", head)\nprint(\"out_features attr:\", head.out_features)\nprint(\"weight shape:\", tuple(head.weight.shape))\nprint(\"bias shape:\", tuple(head.bias.shape))\n\n# 2) Change ONLY the attribute (what your code effectively does)\nhead.out_features = 252  # &lt;-- attribute changed, tensors untouched\n\nprint(\"\\n=== AFTER CHANGING ATTRIBUTE ONLY ===\")\nprint(\"repr:\", head)  # repr now claims out_features=252\nprint(\"out_features attr:\", head.out_features)\nprint(\"weight shape:\", tuple(head.weight.shape))  # still (2, hidden_size)\nprint(\"bias shape:\", tuple(head.bias.shape))      # still (2,)\n\n# 3) Show the model still produces 2 logits, not 252\nbatch = tok(\"hola mundo\", return_tensors=\"pt\", padding=True, truncation=True, max_length=16)\nwith torch.no_grad():\n    logits = model(**batch).logits\nprint(\"\\nlogits shape from forward():\", tuple(logits.shape))  # last dim is 2\n\n# 4) The correct fix is to REPLACE the Linear layer\nin_f = head.in_features\nmodel.classifier.out_proj = torch.nn.Linear(in_f, 252, bias=True)\n\nprint(\"\\n=== AFTER REPLACING THE LAYER ===\")\nprint(\"repr:\", model.classifier.out_proj)\nprint(\"out_features attr:\", model.classifier.out_proj.out_features)\nprint(\"weight shape:\", tuple(model.classifier.out_proj.weight.shape))  # now (252, hidden_size)\nprint(\"bias shape:\", tuple(model.classifier.out_proj.bias.shape))      # now (252,)\n\nwith torch.no_grad():\n    logits = model(**batch).logits\nprint(\"logits shape from forward():\", tuple(logits.shape))  # last dim is 252\n\"\"\"\n=== BEFORE ===\nrepr: Linear(in_features=768, out_features=2, bias=True)\nout_features attr: 2\nweight shape: (2, 768)\nbias shape: (2,)\n\n=== AFTER CHANGING ATTRIBUTE ONLY ===\nrepr: Linear(in_features=768, out_features=252, bias=True)\nout_features attr: 252\nweight shape: (2, 768)\nbias shape: (2,)\n\nlogits shape from forward(): (1, 2)\n\n=== AFTER REPLACING THE LAYER ===\nrepr: Linear(in_features=768, out_features=252, bias=True)\nout_features attr: 252\nweight shape: (252, 768)\nbias shape: (252,)\nlogits shape from forward(): (1, 252)\n\"\"\"\n</code></pre>","post_number":4,"post_type":1,"posts_count":6,"updated_at":"2025-09-06T13:12:56.958Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":3,"readers_count":2,"score":5.6,"yours":false,"topic_id":168143,"topic_slug":"indexerror-target-n-is-out-of-bounds-within-trainer-train-function","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/indexerror-target-n-is-out-of-bounds-within-trainer-train-function/168143/4","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":241357,"name":"Javier M.A.","username":"JavierMA","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/j/f19dbf/{size}.png","created_at":"2025-09-06T16:13:50.937Z","cooked":"<p>You were totally right John <img src=\"https://emoji.discourse-cdn.com/apple/clap/2.png?v=14\" title=\":clap:t2:\" class=\"emoji\" alt=\":clap:t2:\" loading=\"lazy\" width=\"20\" height=\"20\"> !  I just printed the weight and bias in my code and the results were the original ones, so indeed I was modifying it the wrong way.</p>\n<p>So following the example I modified my code from this:</p>\n<pre><code class=\"lang-auto\">if modelo_roberta.config.num_labels != nueva_configuracion_modelo.num_labels or modelo_roberta.config.id2label != nueva_configuracion_modelo_config.id2label:\n    modelo_roberta.classifier.out_proj.out_features=nueva_configuracion_modelo.num_labels\n    \nmodelo_roberta.config = nueva_configuracion_modelo\n</code></pre>\n<p>To this:</p>\n<pre><code class=\"lang-auto\">modelo_roberta.classifier.out_proj = torch.nn.Linear(modelo_roberta.classifier.out_proj.in_features, numero_de_etiquetas, bias=True)\nmodelo_roberta.num_labels = numero_de_etiquetas\nmodelo_roberta.config = nueva_configuracion_modelo\n</code></pre>\n<p>And now it trains.</p>\n<p>Many thanks for your help!</p>","post_number":5,"post_type":1,"posts_count":6,"updated_at":"2025-09-06T16:35:51.006Z","reply_count":0,"reply_to_post_number":4,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":15.4,"yours":false,"topic_id":168143,"topic_slug":"indexerror-target-n-is-out-of-bounds-within-trainer-train-function","display_username":"Javier M.A.","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":2,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":103219,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/indexerror-target-n-is-out-of-bounds-within-trainer-train-function/168143/5","reactions":[{"id":"confetti_ball","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":241392,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-09-07T04:13:52.319Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":6,"post_type":3,"posts_count":6,"updated_at":"2025-09-07T04:13:52.319Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":1,"readers_count":0,"score":0.2,"yours":false,"topic_id":168143,"topic_slug":"indexerror-target-n-is-out-of-bounds-within-trainer-train-function","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/indexerror-target-n-is-out-of-bounds-within-trainer-train-function/168143/6","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Hi all,</p>\n<p>I am trying to train a custom model for NLP sequence classification (multiclass) and struggling to be able to train it for a reason I don’t know, that is the reason why I am asking on this forum. I already had a look at similar posts on the forum with no luck.</p>\n<p>First of all, my dataset looks like the following in DataFrame before introducing it to a dataset (5 instances per class or label, being 0 the lowest label number and 251 the maximum one, so 252 labels in total):</p>\n<pre><code class=\"lang-auto\">                                                   text  label\n0        Configuración del área de selección de TV Set       0\n1         Configuración del área de selección de TV Set      0\n2      Conformación de la sección de selección de TV...      0\n3     Conformación ae la stcción de seldcción de TV Set      0\n4     Validar la configuración del área de selección...      0\n...                                                 ...    ...\n1281  Validación incorrecta por identificador de art...    251\n1282  Validación incorrecta mediante identificador d...    251\n1283  Validación incorrecta por identificador de art...    251\n1284  Validación incorrecta por identificador de art...    251\n1285  Validar Validación incorrecta por identificado...    251\n</code></pre>\n<p>As It is a custom model, I changed the value of out_features at out_proj in the classification part, so the resulting architecture looks like the following:</p>\n<pre><code class=\"lang-auto\">RobertaForSequenceClassification(\n  (roberta): RobertaModel(\n    (embeddings): RobertaEmbeddings(\n      (word_embeddings): Embedding(50262, 1024, padding_idx=1)\n      (position_embeddings): Embedding(514, 1024, padding_idx=1)\n      (token_type_embeddings): Embedding(1, 1024)\n      (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)\n      (dropout): Dropout(p=0.0, inplace=False)\n    )\n    (encoder): RobertaEncoder(\n      (layer): ModuleList(\n        (0-23): 24 x RobertaLayer(\n          (attention): RobertaAttention(\n            (self): RobertaSdpaSelfAttention(\n              (query): Linear(in_features=1024, out_features=1024, bias=True)\n              (key): Linear(in_features=1024, out_features=1024, bias=True)\n              (value): Linear(in_features=1024, out_features=1024, bias=True)\n              (dropout): Dropout(p=0.0, inplace=False)\n            )\n            (output): RobertaSelfOutput(\n              (dense): Linear(in_features=1024, out_features=1024, bias=True)\n              (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)\n              (dropout): Dropout(p=0.0, inplace=False)\n            )\n          )\n          (intermediate): RobertaIntermediate(\n            (dense): Linear(in_features=1024, out_features=4096, bias=True)\n            (intermediate_act_fn): GELUActivation()\n          )\n          (output): RobertaOutput(\n            (dense): Linear(in_features=4096, out_features=1024, bias=True)\n            (LayerNorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)\n            (dropout): Dropout(p=0.0, inplace=False)\n          )\n        )\n      )\n    )\n  )\n  (classifier): RobertaClassificationHead(\n    (dense): Linear(in_features=1024, out_features=1024, bias=True)\n    (dropout): Dropout(p=0.0, inplace=False)\n    (out_proj): Linear(in_features=1024, out_features=252, bias=True)\n  )\n)\n</code></pre>\n<p>Then I use the following code in order to create a HuggingFace Dataset:</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">dataset = Dataset.from_pandas(df, split='train')\ndataset = dataset.train_test_split(shuffle=True, seed=42, test_size=0.2)\nprint(dataset)\n</code></pre>\n<p>Where the print gives the following result (I already checked that values in label go from 0 to N-1 labels or classes):</p>\n<pre><code class=\"lang-auto\">DatasetDict({\n    train: Dataset({\n        features: ['text', 'label'],\n        num_rows: 1028\n    })\n    test: Dataset({\n        features: ['text', 'label'],\n        num_rows: 258\n    })\n})\n</code></pre>\n<p>Despite having done all the remaining steps before training correctly (or so I believe) and having at least one instance per class in train and test dataset, when I get to the function train, I get the following error:</p>\n<pre><code class=\"lang-auto\">---------------------------------------------------------------------------\nIndexError                                Traceback (most recent call last)\nCell In[103], line 1\n----&gt; 1 trainer.train()\n      2 modelo_peft.to('cpu')\n      3 modelo_peft.eval()\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\transformers\\trainer.py:2238, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)\n   2236         hf_hub_utils.enable_progress_bars()\n   2237 else:\n-&gt; 2238     return inner_training_loop(\n   2239         args=args,\n   2240         resume_from_checkpoint=resume_from_checkpoint,\n   2241         trial=trial,\n   2242         ignore_keys_for_eval=ignore_keys_for_eval,\n   2243     )\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\transformers\\trainer.py:2582, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)\n   2575 context = (\n   2576     functools.partial(self.accelerator.no_sync, model=model)\n   2577     if i != len(batch_samples) - 1\n   2578     and self.accelerator.distributed_type != DistributedType.DEEPSPEED\n   2579     else contextlib.nullcontext\n   2580 )\n   2581 with context():\n-&gt; 2582     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)\n   2584 if (\n   2585     args.logging_nan_inf_filter\n   2586     and not is_torch_xla_available()\n   2587     and (torch.isnan(tr_loss_step) or torch.isinf(tr_loss_step))\n   2588 ):\n   2589     # if loss is nan or inf simply add the average of previous logged losses\n   2590     tr_loss = tr_loss + tr_loss / (1 + self.state.global_step - self._globalstep_last_logged)\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\transformers\\trainer.py:3796, in Trainer.training_step(self, model, inputs, num_items_in_batch)\n   3793     return loss_mb.reduce_mean().detach().to(self.args.device)\n   3795 with self.compute_loss_context_manager():\n-&gt; 3796     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)\n   3798 del inputs\n   3799 if (\n   3800     self.args.torch_empty_cache_steps is not None\n   3801     and self.state.global_step % self.args.torch_empty_cache_steps == 0\n   3802 ):\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\transformers\\trainer.py:3884, in Trainer.compute_loss(self, model, inputs, return_outputs, num_items_in_batch)\n   3882         kwargs[\"num_items_in_batch\"] = num_items_in_batch\n   3883     inputs = {**inputs, **kwargs}\n-&gt; 3884 outputs = model(**inputs)\n   3885 # Save past state if it exists\n   3886 # TODO: this needs to be fixed and made cleaner later.\n   3887 if self.args.past_index &gt;= 0:\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\module.py:1773, in Module._wrapped_call_impl(self, *args, **kwargs)\n   1771     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]\n   1772 else:\n-&gt; 1773     return self._call_impl(*args, **kwargs)\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\module.py:1784, in Module._call_impl(self, *args, **kwargs)\n   1779 # If we don't have any hooks, we want to skip the rest of the logic in\n   1780 # this function, and just call forward.\n   1781 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks\n   1782         or _global_backward_pre_hooks or _global_backward_hooks\n   1783         or _global_forward_hooks or _global_forward_pre_hooks):\n-&gt; 1784     return forward_call(*args, **kwargs)\n   1786 result = None\n   1787 called_always_called_hooks = set()\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\peft\\peft_model.py:1652, in PeftModelForSequenceClassification.forward(self, input_ids, attention_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict, task_ids, **kwargs)\n   1650         if peft_config.peft_type == PeftType.POLY:\n   1651             kwargs[\"task_ids\"] = task_ids\n-&gt; 1652         return self.base_model(\n   1653             input_ids=input_ids,\n   1654             attention_mask=attention_mask,\n   1655             inputs_embeds=inputs_embeds,\n   1656             labels=labels,\n   1657             output_attentions=output_attentions,\n   1658             output_hidden_states=output_hidden_states,\n   1659             return_dict=return_dict,\n   1660             **kwargs,\n   1661         )\n   1663 batch_size = _get_batch_size(input_ids, inputs_embeds)\n   1664 if attention_mask is not None:\n   1665     # concat prompt attention mask\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\module.py:1773, in Module._wrapped_call_impl(self, *args, **kwargs)\n   1771     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]\n   1772 else:\n-&gt; 1773     return self._call_impl(*args, **kwargs)\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\module.py:1784, in Module._call_impl(self, *args, **kwargs)\n   1779 # If we don't have any hooks, we want to skip the rest of the logic in\n   1780 # this function, and just call forward.\n   1781 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks\n   1782         or _global_backward_pre_hooks or _global_backward_hooks\n   1783         or _global_forward_hooks or _global_forward_pre_hooks):\n-&gt; 1784     return forward_call(*args, **kwargs)\n   1786 result = None\n   1787 called_always_called_hooks = set()\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\peft\\tuners\\tuners_utils.py:222, in BaseTuner.forward(self, *args, **kwargs)\n    221 def forward(self, *args: Any, **kwargs: Any):\n--&gt; 222     return self.model.forward(*args, **kwargs)\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\transformers\\models\\roberta\\modeling_roberta.py:1228, in RobertaForSequenceClassification.forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)\n   1226 elif self.config.problem_type == \"single_label_classification\":\n   1227     loss_fct = CrossEntropyLoss()\n-&gt; 1228     loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))\n   1229 elif self.config.problem_type == \"multi_label_classification\":\n   1230     loss_fct = BCEWithLogitsLoss()\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\module.py:1773, in Module._wrapped_call_impl(self, *args, **kwargs)\n   1771     return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]\n   1772 else:\n-&gt; 1773     return self._call_impl(*args, **kwargs)\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\module.py:1784, in Module._call_impl(self, *args, **kwargs)\n   1779 # If we don't have any hooks, we want to skip the rest of the logic in\n   1780 # this function, and just call forward.\n   1781 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks\n   1782         or _global_backward_pre_hooks or _global_backward_hooks\n   1783         or _global_forward_hooks or _global_forward_pre_hooks):\n-&gt; 1784     return forward_call(*args, **kwargs)\n   1786 result = None\n   1787 called_always_called_hooks = set()\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\modules\\loss.py:1310, in CrossEntropyLoss.forward(self, input, target)\n   1309 def forward(self, input: Tensor, target: Tensor) -&gt; Tensor:\n-&gt; 1310     return F.cross_entropy(\n   1311         input,\n   1312         target,\n   1313         weight=self.weight,\n   1314         ignore_index=self.ignore_index,\n   1315         reduction=self.reduction,\n   1316         label_smoothing=self.label_smoothing,\n   1317     )\n\nFile ~\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\torch\\nn\\functional.py:3462, in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction, label_smoothing)\n   3460 if size_average is not None or reduce is not None:\n   3461     reduction = _Reduction.legacy_get_string(size_average, reduce)\n-&gt; 3462 return torch._C._nn.cross_entropy_loss(\n   3463     input,\n   3464     target,\n   3465     weight,\n   3466     _Reduction.get_enum(reduction),\n   3467     ignore_index,\n   3468     label_smoothing,\n   3469 )\n\nIndexError: Target 134 is out of bounds.\n</code></pre>\n<p>Any ideas of what may be wrong? Let me know if any other information is needed.</p>\n<p>Thanks,</p>\n<p>Javier</p>","solution":"<p>In that case,  the actual weigh probably won’t change t even if the attribute is modified.</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">from transformers import AutoModelForSequenceClassification, AutoTokenizer\nimport torch\n\n# 1) Load a small model with 2 labels so the classifier head is tiny\nmodel = AutoModelForSequenceClassification.from_pretrained(\"roberta-base\", num_labels=2)\ntok = AutoTokenizer.from_pretrained(\"roberta-base\")\n\nhead = model.classifier.out_proj  # this is an nn.Linear\n\nprint(\"=== BEFORE ===\")\nprint(\"repr:\", head)\nprint(\"out_features attr:\", head.out_features)\nprint(\"weight shape:\", tuple(head.weight.shape))\nprint(\"bias shape:\", tuple(head.bias.shape))\n\n# 2) Change ONLY the attribute (what your code effectively does)\nhead.out_features = 252  # &lt;-- attribute changed, tensors untouched\n\nprint(\"\\n=== AFTER CHANGING ATTRIBUTE ONLY ===\")\nprint(\"repr:\", head)  # repr now claims out_features=252\nprint(\"out_features attr:\", head.out_features)\nprint(\"weight shape:\", tuple(head.weight.shape))  # still (2, hidden_size)\nprint(\"bias shape:\", tuple(head.bias.shape))      # still (2,)\n\n# 3) Show the model still produces 2 logits, not 252\nbatch = tok(\"hola mundo\", return_tensors=\"pt\", padding=True, truncation=True, max_length=16)\nwith torch.no_grad():\n    logits = model(**batch).logits\nprint(\"\\nlogits shape from forward():\", tuple(logits.shape))  # last dim is 2\n\n# 4) The correct fix is to REPLACE the Linear layer\nin_f = head.in_features\nmodel.classifier.out_proj = torch.nn.Linear(in_f, 252, bias=True)\n\nprint(\"\\n=== AFTER REPLACING THE LAYER ===\")\nprint(\"repr:\", model.classifier.out_proj)\nprint(\"out_features attr:\", model.classifier.out_proj.out_features)\nprint(\"weight shape:\", tuple(model.classifier.out_proj.weight.shape))  # now (252, hidden_size)\nprint(\"bias shape:\", tuple(model.classifier.out_proj.bias.shape))      # now (252,)\n\nwith torch.no_grad():\n    logits = model(**batch).logits\nprint(\"logits shape from forward():\", tuple(logits.shape))  # last dim is 252\n\"\"\"\n=== BEFORE ===\nrepr: Linear(in_features=768, out_features=2, bias=True)\nout_features attr: 2\nweight shape: (2, 768)\nbias shape: (2,)\n\n=== AFTER CHANGING ATTRIBUTE ONLY ===\nrepr: Linear(in_features=768, out_features=252, bias=True)\nout_features attr: 252\nweight shape: (2, 768)\nbias shape: (2,)\n\nlogits shape from forward(): (1, 2)\n\n=== AFTER REPLACING THE LAYER ===\nrepr: Linear(in_features=768, out_features=252, bias=True)\nout_features attr: 252\nweight shape: (252, 768)\nbias shape: (252,)\nlogits shape from forward(): (1, 252)\n\"\"\"\n</code></pre>","evaluation":{"extracted_final_answer":"None","reasoning":"The extracted final answer is 'None' because the response does not contain the correct answer provided in the question. The response discusses modifying the model's classifier head but does not include the specific text or code snippet from the correct answer. Therefore, there is a clear discrepancy between the correct answer and the response.","correct":"no","confidence":100}}
-{"discussion_title":"Adding Metadata to a dataset","discussion_url":"https://discuss.huggingface.co/t/adding-metadata-to-a-dataset/165626","discussion_topic_id":165626,"discussion_category":5,"discussion_created_at":"2025-08-04T17:21:08.096000Z","thread":[{"id":236538,"name":"Daniel Russ","username":"danielruss","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/d/bbce88/{size}.png","created_at":"2025-08-04T17:21:08.153Z","cooked":"<p>Hi, I have a dataset where the text has a label that is a standardized code.  The each code has a title describing the code.    The data is in a pandas df called jobs_data</p>\n<pre><code class=\"lang-auto\">data = {\n    \"text\": jobs_data.JobTitle.to_list(),\n    \"label\": jobs_data.soc2010.to_list(),\n}\nfeatures = {\n    \"text\": Value(\"string\"),\n    \"label\": ClassLabel(names=soc2010.code.to_list()),\n}\n\njobs_ds = Dataset.from_dict(data,features=Features(features))\n</code></pre>\n<p>I would like to include a codes to title dictionary/function to make it easier to convert from a label → code → title<br>\nIs this possible?<br>\nThank you</p>","post_number":1,"post_type":1,"posts_count":3,"updated_at":"2025-08-04T17:21:08.153Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":16,"reads":6,"readers_count":5,"score":91.2,"yours":false,"topic_id":165626,"topic_slug":"adding-metadata-to-a-dataset","display_username":"Daniel Russ","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":41087,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/adding-metadata-to-a-dataset/165626/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":236574,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-08-05T00:28:09.191Z","cooked":"<p>If metadata alone is sufficient, using <a href=\"https://huggingface.co/docs/datasets/v4.0.0/en/package_reference/main_classes#datasets.DatasetInfo\">the <code>DatasetInfo</code> class</a> is probably the quickest option.</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">from datasets import DatasetInfo\n\ndata = {\n    \"text\": jobs_data.JobTitle.to_list(),\n    \"label\": jobs_data.soc2010.to_list(),\n}\n\nfeatures = {\n    \"text\": Value(\"string\"),\n    \"label\": ClassLabel(names=soc2010.code.to_list()),\n}\n\ncode2title = \"codes to convert from a label → code → title\"\n\ninfo = DatasetInfo(\n    description=\"Jobs dataset with SOC‐2010 codes\",\n    metadata={\"code2title\": code2title}\n)\n\njobs_ds = Dataset.from_dict(data, features=Features(features), info=info)\n</code></pre>","post_number":2,"post_type":1,"posts_count":3,"updated_at":"2025-08-05T00:30:44.478Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":3,"readers_count":2,"score":0.6,"yours":false,"topic_id":165626,"topic_slug":"adding-metadata-to-a-dataset","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/docs/datasets/v4.0.0/en/package_reference/main_classes#datasets.DatasetInfo","internal":false,"reflection":false,"title":"Main classes","clicks":2}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/adding-metadata-to-a-dataset/165626/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":241236,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-09-04T20:41:28.087Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":3,"post_type":3,"posts_count":3,"updated_at":"2025-09-04T20:41:28.087Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":1,"readers_count":0,"score":0.2,"yours":false,"topic_id":165626,"topic_slug":"adding-metadata-to-a-dataset","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/adding-metadata-to-a-dataset/165626/3","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Hi, I have a dataset where the text has a label that is a standardized code.  The each code has a title describing the code.    The data is in a pandas df called jobs_data</p>\n<pre><code class=\"lang-auto\">data = {\n    \"text\": jobs_data.JobTitle.to_list(),\n    \"label\": jobs_data.soc2010.to_list(),\n}\nfeatures = {\n    \"text\": Value(\"string\"),\n    \"label\": ClassLabel(names=soc2010.code.to_list()),\n}\n\njobs_ds = Dataset.from_dict(data,features=Features(features))\n</code></pre>\n<p>I would like to include a codes to title dictionary/function to make it easier to convert from a label → code → title<br>\nIs this possible?<br>\nThank you</p>","solution":"<p>If metadata alone is sufficient, using <a href=\"https://huggingface.co/docs/datasets/v4.0.0/en/package_reference/main_classes#datasets.DatasetInfo\">the <code>DatasetInfo</code> class</a> is probably the quickest option.</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">from datasets import DatasetInfo\n\ndata = {\n    \"text\": jobs_data.JobTitle.to_list(),\n    \"label\": jobs_data.soc2010.to_list(),\n}\n\nfeatures = {\n    \"text\": Value(\"string\"),\n    \"label\": ClassLabel(names=soc2010.code.to_list()),\n}\n\ncode2title = \"codes to convert from a label → code → title\"\n\ninfo = DatasetInfo(\n    description=\"Jobs dataset with SOC‐2010 codes\",\n    metadata={\"code2title\": code2title}\n)\n\njobs_ds = Dataset.from_dict(data, features=Features(features), info=info)\n</code></pre>","evaluation":{"extracted_final_answer":"<p>If metadata alone is sufficient, using <a href=\"https://huggingface.co/docs/datasets/v4.0.0/en/package_reference/main_classes#datasets.DatasetInfo\">the <code>DatasetInfo</code> class</a> is probably the quickest option.</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">from datasets import DatasetInfo\n\ndata = {\n    \"text\": jobs_data.JobTitle.to_list(),\n    \"label\": jobs_data.soc2010.to_list(),\n}\n\nfeatures = {\n    \"text\": Value(\"string\"),\n    \"label\": ClassLabel(names=soc2010.code.to_list()),\n}\n\ncode2title = \"codes to convert from a label → code → title\"\n\ninfo = DatasetInfo(\n    description=\"Jobs dataset with SOC‐2010 codes\",\n    metadata={\"code2title\": code2title}\n)\n\njobs_ds = Dataset.from_dict(data, features=Features(features), info=info)\n</code></pre>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in content or structure. Therefore, the response is correct and includes the precise and unambiguous correct answer.","correct":"yes","confidence":100}}
-{"discussion_title":"Can I use LoRA with jhu-clsp/ettin-encoder-1b?","discussion_url":"https://discuss.huggingface.co/t/can-i-use-lora-with-jhu-clsp-ettin-encoder-1b/167903","discussion_topic_id":167903,"discussion_category":5,"discussion_created_at":"2025-08-29T14:49:48.934000Z","thread":[{"id":240628,"name":"Alex","username":"SuperBowser","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/s/9f8e36/{size}.png","created_at":"2025-08-29T14:49:49.002Z","cooked":"<p>It looks like <code>jhu-clsp/ettin-encoder-1b</code> does not have any <code>proj</code> layers. Is it possible to use LoRA with this model:</p>\n<pre><code class=\"lang-auto\">from transformers import AutoModelForSequenceClassification\nmodel_name = ‘jhu-clsp/ettin-encoder-1b’\nmodel = AutoModelForSequenceClassification.from_pretrained(model_name)\nfor parent_name, module in model.named_modules():\n    for child_name, child in module.named_children():\n        if ‘proj’ in child_name:\n            print(child_name)\n            print(“_________”)\n</code></pre>\n<p>This code returned nothing.</p>","post_number":1,"post_type":1,"posts_count":3,"updated_at":"2025-08-29T14:49:49.002Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":5,"reads":5,"readers_count":4,"score":41.0,"yours":false,"topic_id":167903,"topic_slug":"can-i-use-lora-with-jhu-clsp-ettin-encoder-1b","display_username":"Alex","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102016,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/can-i-use-lora-with-jhu-clsp-ettin-encoder-1b/167903/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":240648,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-08-30T00:29:33.998Z","cooked":"<p>It seems that <a href=\"https://huggingface.co/Wb-az/modernbert-lora-adapter-for-emotion-classification/blob/main/adapter_config.json\">for ModernBERT-based models, the <code>target_modules</code> names aren’t <code>proj*</code></a>. You can apparently also <a href=\"https://huggingface.co/docs/peft/v0.17.0/developer_guides/lora#efficiently-train-tokens-alongside-lora\">automatically select the <code>target_modules</code> using <code>=\"all-linear\"</code></a>.</p>\n<pre data-code-wrap=\"yaml\"><code class=\"lang-yaml\">  \"target_modules\": [\n    \"Wqkv\",\n    \"Wi\",\n    \"Wo\"\n  ],\n</code></pre>","post_number":2,"post_type":1,"posts_count":3,"updated_at":"2025-08-30T00:29:33.998Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":5,"readers_count":4,"score":16.0,"yours":false,"topic_id":167903,"topic_slug":"can-i-use-lora-with-jhu-clsp-ettin-encoder-1b","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/Wb-az/modernbert-lora-adapter-for-emotion-classification/blob/main/adapter_config.json","internal":false,"reflection":false,"title":"adapter_config.json · Wb-az/modernbert-lora-adapter-for-emotion-classification at main","clicks":0},{"url":"https://huggingface.co/docs/peft/v0.17.0/developer_guides/lora#efficiently-train-tokens-alongside-lora","internal":false,"reflection":false,"title":"LoRA","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/can-i-use-lora-with-jhu-clsp-ettin-encoder-1b/167903/2","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":241012,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-09-02T14:59:52.226Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":3,"post_type":3,"posts_count":3,"updated_at":"2025-09-02T14:59:52.226Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":1,"readers_count":0,"score":0.2,"yours":false,"topic_id":167903,"topic_slug":"can-i-use-lora-with-jhu-clsp-ettin-encoder-1b","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/can-i-use-lora-with-jhu-clsp-ettin-encoder-1b/167903/3","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>It looks like <code>jhu-clsp/ettin-encoder-1b</code> does not have any <code>proj</code> layers. Is it possible to use LoRA with this model:</p>\n<pre><code class=\"lang-auto\">from transformers import AutoModelForSequenceClassification\nmodel_name = ‘jhu-clsp/ettin-encoder-1b’\nmodel = AutoModelForSequenceClassification.from_pretrained(model_name)\nfor parent_name, module in model.named_modules():\n    for child_name, child in module.named_children():\n        if ‘proj’ in child_name:\n            print(child_name)\n            print(“_________”)\n</code></pre>\n<p>This code returned nothing.</p>","solution":"<p>It seems that <a href=\"https://huggingface.co/Wb-az/modernbert-lora-adapter-for-emotion-classification/blob/main/adapter_config.json\">for ModernBERT-based models, the <code>target_modules</code> names aren’t <code>proj*</code></a>. You can apparently also <a href=\"https://huggingface.co/docs/peft/v0.17.0/developer_guides/lora#efficiently-train-tokens-alongside-lora\">automatically select the <code>target_modules</code> using <code>=\"all-linear\"</code></a>.</p>\n<pre data-code-wrap=\"yaml\"><code class=\"lang-yaml\">  \"target_modules\": [\n    \"Wqkv\",\n    \"Wi\",\n    \"Wo\"\n  ],\n</code></pre>","evaluation":{"extracted_final_answer":"<p>It seems that <a href=\"https://huggingface.co/Wb-az/modernbert-lora-adapter-for-emotion-classification/blob/main/adapter_config.json\">for ModernBERT-based models, the <code>target_modules</code> names aren’t <code>proj*</code></a>. You can apparently also <a href=\"https://huggingface.co/docs/peft/v0.17.0/developer_guides/lora#efficiently-train-tokens-alongside-lora\">automatically select the <code>target_modules</code> using <code>=\"all-linear\"</code></a>.</p>\n<pre data-code-wrap=\"yaml\"><code class=\"lang-yaml\">  \"target_modules\": [\n    \"Wqkv\",\n    \"Wi\",\n    \"Wo\"\n  ],\n</code></pre>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in content or meaning. Therefore, it is included in the extracted final answer.","correct":"yes","confidence":100}}
-{"discussion_title":"Could not find MistralForCausalLM in transformers","discussion_url":"https://discuss.huggingface.co/t/could-not-find-mistralforcausallm-in-transformers/167978","discussion_topic_id":167978,"discussion_category":5,"discussion_created_at":"2025-09-01T02:12:05.710000Z","thread":[{"id":240814,"name":"Jay","username":"jaydeepb","avatar_template":"/user_avatar/discuss.huggingface.co/jaydeepb/{size}/14906_2.png","created_at":"2025-09-01T02:12:05.764Z","cooked":"<p>Hi. I finetuned <code>mistralai/Mistral-Small-24B-Base-2501</code> on a dataset and now I’m trying to run inference for it. I’m using <code>AutoModelForCausalLM.from_pretrained</code> to load it but getting this error: <code>Could not find MistralForCausalLM neither in transformers</code>. I’m running the latest version of transformers 4.56.0. What might be the reason? Installing transformers from source according to this post <a href=\"https://github.com/huggingface/transformers/issues/26458\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">support for MistralForCausalLM · Issue #26458 · huggingface/transformers · GitHub</a> didn’t fix it.</p>","post_number":1,"post_type":1,"posts_count":12,"updated_at":"2025-09-01T02:13:05.174Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":181,"reads":5,"readers_count":4,"score":826.0,"yours":false,"topic_id":167978,"topic_slug":"could-not-find-mistralforcausallm-in-transformers","display_username":"Jay","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":2,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/huggingface/transformers/issues/26458","internal":false,"reflection":false,"title":"support for MistralForCausalLM · Issue #26458 · huggingface/transformers · GitHub","clicks":3}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":16838,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/could-not-find-mistralforcausallm-in-transformers/167978/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":240817,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-09-01T02:46:35.152Z","cooked":"<p>Hmm, maybe <a href=\"https://huggingface.co/docs/transformers/en/model_doc/mistral\">it’s missing dependencies or something</a>…?<br>\nI don’t think the class itself is actually missing…</p>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\">pip install -U mistral_common sentencepiece\n</code></pre>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">import transformers, sys\nprint(\"transformers\", transformers.__version__)\ntry:\n    from transformers.models.mistral.modeling_mistral import MistralForCausalLM\n    print(\"MistralForCausalLM OK\")\nexcept Exception as e:\n    print(\"MistralForCausalLM FAIL:\", e, file=sys.stderr)\n</code></pre>","post_number":2,"post_type":1,"posts_count":12,"updated_at":"2025-09-01T02:46:35.152Z","reply_count":2,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":6,"reads":5,"readers_count":4,"score":41.0,"yours":false,"topic_id":167978,"topic_slug":"could-not-find-mistralforcausallm-in-transformers","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/docs/transformers/en/model_doc/mistral","internal":false,"reflection":false,"title":"Mistral","clicks":4}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/could-not-find-mistralforcausallm-in-transformers/167978/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240825,"name":"Jay","username":"jaydeepb","avatar_template":"/user_avatar/discuss.huggingface.co/jaydeepb/{size}/14906_2.png","created_at":"2025-09-01T03:22:20.500Z","cooked":"<p><a class=\"mention\" href=\"/u/john6666\">@John6666</a> getting this when I run that code snippet<br>\n``<br>\n<code>MistralForCausalLM FAIL: partially initialized module ‘torchvision’ has no attribute ‘extension’ (most likely due to a circular import)</code><br>\n```</p>","post_number":3,"post_type":1,"posts_count":12,"updated_at":"2025-09-01T03:22:20.500Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":2,"reads":4,"readers_count":3,"score":25.8,"yours":false,"topic_id":167978,"topic_slug":"could-not-find-mistralforcausallm-in-transformers","display_username":"Jay","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":16838,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/could-not-find-mistralforcausallm-in-transformers/167978/3","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240826,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-09-01T03:29:23.628Z","cooked":"<p>Judging just by the error, it’s probably <a href=\"https://github.com/timeseriesAI/tsai/issues/919\">a version mismatch between <code>torch</code> and <code>torchvision</code></a>.</p>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\">pip install torchvision==x.xx.x\n</code></pre>\n<h3><a name=\"p-240826-domain-version-compatibility-matrix-for-pytorchhttpsgithubcompytorchpytorchwikipytorch-versionsdomain-version-compatibility-matrix-for-pytorch-1\" class=\"anchor\" href=\"#p-240826-domain-version-compatibility-matrix-for-pytorchhttpsgithubcompytorchpytorchwikipytorch-versionsdomain-version-compatibility-matrix-for-pytorch-1\"></a><a href=\"https://github.com/pytorch/pytorch/wiki/PyTorch-Versions#domain-version-compatibility-matrix-for-pytorch\">Domain Version Compatibility Matrix for PyTorch</a></h3>","post_number":4,"post_type":1,"posts_count":12,"updated_at":"2025-09-01T03:29:23.628Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":6,"reads":4,"readers_count":3,"score":50.8,"yours":false,"topic_id":167978,"topic_slug":"could-not-find-mistralforcausallm-in-transformers","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/pytorch/pytorch/wiki/PyTorch-Versions#domain-version-compatibility-matrix-for-pytorch","internal":false,"reflection":false,"title":"PyTorch Versions · pytorch/pytorch Wiki · GitHub","clicks":6},{"url":"https://github.com/timeseriesAI/tsai/issues/919","internal":false,"reflection":false,"title":"AttributeError: partially initialized module 'torchvision' has no attribute 'extension' (most likely due to a circular import) · Issue #919 · timeseriesAI/tsai · GitHub","clicks":2}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/could-not-find-mistralforcausallm-in-transformers/167978/4","reactions":[{"id":"heart","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":240829,"name":"Jay","username":"jaydeepb","avatar_template":"/user_avatar/discuss.huggingface.co/jaydeepb/{size}/14906_2.png","created_at":"2025-09-01T04:02:13.578Z","cooked":"<aside class=\"quote no-group\" data-username=\"John6666\" data-post=\"2\" data-topic=\"167978\">\n<div class=\"title\">\n<div class=\"quote-controls\"></div>\n<img alt=\"\" width=\"24\" height=\"24\" src=\"https://sea2.discourse-cdn.com/hellohellohello/user_avatar/discuss.huggingface.co/john6666/48/27664_2.png\" class=\"avatar\"> John6666:</div>\n<blockquote>\n<p>it’s missing dependencies or something</p>\n</blockquote>\n</aside>\n<p><a class=\"mention\" href=\"/u/john6666\">@John6666</a> thanks! yes, aligning the versions helped <img src=\"https://emoji.discourse-cdn.com/apple/slight_smile.png?v=14\" title=\":slight_smile:\" class=\"emoji\" alt=\":slight_smile:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>\n<p>I have fine-tuned the model and now running into this run-time error while loading it:<br>\n<code>RuntimeError: Error(s) in loading state_dict for Embedding:</code><br>\n<code>size mismatch for weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([131072, 5120]).</code> Any idea what might be causing this?</p>","post_number":5,"post_type":1,"posts_count":12,"updated_at":"2025-09-01T04:02:13.578Z","reply_count":0,"reply_to_post_number":4,"quote_count":1,"incoming_link_count":1,"reads":4,"readers_count":3,"score":20.8,"yours":false,"topic_id":167978,"topic_slug":"could-not-find-mistralforcausallm-in-transformers","display_username":"Jay","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":16838,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/could-not-find-mistralforcausallm-in-transformers/167978/5","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240830,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-09-01T04:14:41.113Z","cooked":"<p>Based on the error message, I’d guess it’s either trying to load the PEFT adapter as a whole model weight or the model weights are corrupted…</p>\n<ul>\n<li><a href=\"https://github.com/huggingface/transformers/issues/16479#issuecomment-1083225080\">Embedding size mismatch when hyperparameter search #16479</a></li>\n<li><a href=\"https://huggingface.co/docs/transformers/v4.56.0/en/peft?load=from_pretrained#load-adapter\">Load adapter</a></li>\n<li><a href=\"https://discuss.huggingface.co/t/size-mismatch-error-for-llm-checkpoint-of-peft-model-with-a-resized-token-embeddings/104157\">Size Mismatch error for LLM checkpoint of PEFT model with a resized token embeddings</a></li>\n</ul>","post_number":6,"post_type":1,"posts_count":12,"updated_at":"2025-09-01T04:14:41.113Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":7,"reads":4,"readers_count":3,"score":30.8,"yours":false,"topic_id":167978,"topic_slug":"could-not-find-mistralforcausallm-in-transformers","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/huggingface/transformers/issues/16479#issuecomment-1083225080","internal":false,"reflection":false,"title":"Embedding size mismatch when hyperparameter search · Issue #16479 · huggingface/transformers · GitHub","clicks":0},{"url":"https://huggingface.co/docs/transformers/v4.56.0/en/peft?load=from_pretrained#load-adapter","internal":false,"reflection":false,"title":"PEFT","clicks":0},{"url":"https://discuss.huggingface.co/t/size-mismatch-error-for-llm-checkpoint-of-peft-model-with-a-resized-token-embeddings/104157","internal":true,"reflection":false,"title":"Size Mismatch error for LLM checkpoint of PEFT model with a resized token embeddings","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/could-not-find-mistralforcausallm-in-transformers/167978/6","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240831,"name":"Jay","username":"jaydeepb","avatar_template":"/user_avatar/discuss.huggingface.co/jaydeepb/{size}/14906_2.png","created_at":"2025-09-01T04:22:52.075Z","cooked":"<p><a class=\"mention\" href=\"/u/john6666\">@John6666</a> could this be because of deepspeed? when I do <code>len(tokenizer)</code> it prints <code>131072</code>.</p>","post_number":7,"post_type":1,"posts_count":12,"updated_at":"2025-09-01T04:22:52.075Z","reply_count":0,"reply_to_post_number":6,"quote_count":0,"incoming_link_count":1,"reads":3,"readers_count":2,"score":20.6,"yours":false,"topic_id":167978,"topic_slug":"could-not-find-mistralforcausallm-in-transformers","display_username":"Jay","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":16838,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/could-not-find-mistralforcausallm-in-transformers/167978/7","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240832,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-09-01T04:39:09.015Z","cooked":"<blockquote>\n<p>could this be because of deepspeed</p>\n</blockquote>\n<p>I think very likely…<br>\nWhen saving fails in DeepSpeed, it appears an empty tensor is saved instead.</p>\n<ul>\n<li><a href=\"https://github.com/huggingface/peft/issues/2450\">modules_to_save resulting in empty tensor with deepspeed zero3 LoRA training #2450</a></li>\n<li><a href=\"https://huggingface.co/docs/transformers/v4.56.0/en/deepspeed#save-model-weights\">DeepSpeed - Save model weights</a></li>\n</ul>","post_number":8,"post_type":1,"posts_count":12,"updated_at":"2025-09-01T04:39:09.015Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":3,"readers_count":2,"score":10.6,"yours":false,"topic_id":167978,"topic_slug":"could-not-find-mistralforcausallm-in-transformers","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/huggingface/peft/issues/2450","internal":false,"reflection":false,"title":"modules_to_save resulting in empty tensor with deepspeed zero3 LoRA training · Issue #2450 · huggingface/peft · GitHub","clicks":0},{"url":"https://huggingface.co/docs/transformers/v4.56.0/en/deepspeed#save-model-weights","internal":false,"reflection":false,"title":"DeepSpeed","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/could-not-find-mistralforcausallm-in-transformers/167978/8","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240833,"name":"Jay","username":"jaydeepb","avatar_template":"/user_avatar/discuss.huggingface.co/jaydeepb/{size}/14906_2.png","created_at":"2025-09-01T05:04:32.685Z","cooked":"<p><a class=\"mention\" href=\"/u/john6666\">@John6666</a> I’m using <code>\"stage3_gather_16bit_weights_on_model_save\": true</code> as suggested <a href=\"https://huggingface.co/docs/transformers/v4.56.0/en/deepspeed#save-model-weights\">here</a>. Not sure what else is causing this.</p>","post_number":9,"post_type":1,"posts_count":12,"updated_at":"2025-09-01T05:04:32.685Z","reply_count":0,"reply_to_post_number":8,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":15.4,"yours":false,"topic_id":167978,"topic_slug":"could-not-find-mistralforcausallm-in-transformers","display_username":"Jay","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/docs/transformers/v4.56.0/en/deepspeed#save-model-weights","internal":false,"reflection":false,"title":"DeepSpeed","clicks":0}],"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":16838,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/could-not-find-mistralforcausallm-in-transformers/167978/9","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240838,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-09-01T06:40:53.193Z","cooked":"<p>This may also occur <a href=\"https://github.com/deepspeedai/Megatron-DeepSpeed/issues/298\">when using BF16</a> or <a href=\"https://github.com/huggingface/peft/issues/2450\">when using older version of PEFT</a>.</p>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\">pip install -U peft\n</code></pre>","post_number":10,"post_type":1,"posts_count":12,"updated_at":"2025-09-01T06:40:53.193Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":2,"readers_count":1,"score":10.4,"yours":false,"topic_id":167978,"topic_slug":"could-not-find-mistralforcausallm-in-transformers","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/deepspeedai/Megatron-DeepSpeed/issues/298","internal":false,"reflection":false,"title":"Deepspeed Zero Stage 3 save a empty model state_dict · Issue #298 · deepspeedai/Megatron-DeepSpeed · GitHub","clicks":0},{"url":"https://github.com/huggingface/peft/issues/2450","internal":false,"reflection":false,"title":"modules_to_save resulting in empty tensor with deepspeed zero3 LoRA training · Issue #2450 · huggingface/peft · GitHub","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/could-not-find-mistralforcausallm-in-transformers/167978/10","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240844,"name":"Jay","username":"jaydeepb","avatar_template":"/user_avatar/discuss.huggingface.co/jaydeepb/{size}/14906_2.png","created_at":"2025-09-01T09:08:55.940Z","cooked":"<p><a class=\"mention\" href=\"/u/john6666\">@John6666</a> using <code>model.save_16bit_model()</code> to save the model insread of <code>save_pretrained()</code> fixed this!</p>","post_number":11,"post_type":1,"posts_count":12,"updated_at":"2025-09-01T09:08:55.940Z","reply_count":0,"reply_to_post_number":10,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":15.4,"yours":false,"topic_id":167978,"topic_slug":"could-not-find-mistralforcausallm-in-transformers","display_username":"Jay","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":16838,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/could-not-find-mistralforcausallm-in-transformers/167978/11","reactions":[{"id":"confetti_ball","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240913,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-09-01T21:09:24.800Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":12,"post_type":3,"posts_count":12,"updated_at":"2025-09-01T21:09:24.800Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":1,"readers_count":0,"score":5.2,"yours":false,"topic_id":167978,"topic_slug":"could-not-find-mistralforcausallm-in-transformers","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/could-not-find-mistralforcausallm-in-transformers/167978/12","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Hi. I finetuned <code>mistralai/Mistral-Small-24B-Base-2501</code> on a dataset and now I’m trying to run inference for it. I’m using <code>AutoModelForCausalLM.from_pretrained</code> to load it but getting this error: <code>Could not find MistralForCausalLM neither in transformers</code>. I’m running the latest version of transformers 4.56.0. What might be the reason? Installing transformers from source according to this post <a href=\"https://github.com/huggingface/transformers/issues/26458\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">support for MistralForCausalLM · Issue #26458 · huggingface/transformers · GitHub</a> didn’t fix it.</p>","solution":"<p>Judging just by the error, it’s probably <a href=\"https://github.com/timeseriesAI/tsai/issues/919\">a version mismatch between <code>torch</code> and <code>torchvision</code></a>.</p>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\">pip install torchvision==x.xx.x\n</code></pre>\n<h3><a name=\"p-240826-domain-version-compatibility-matrix-for-pytorchhttpsgithubcompytorchpytorchwikipytorch-versionsdomain-version-compatibility-matrix-for-pytorch-1\" class=\"anchor\" href=\"#p-240826-domain-version-compatibility-matrix-for-pytorchhttpsgithubcompytorchpytorchwikipytorch-versionsdomain-version-compatibility-matrix-for-pytorch-1\"></a><a href=\"https://github.com/pytorch/pytorch/wiki/PyTorch-Versions#domain-version-compatibility-matrix-for-pytorch\">Domain Version Compatibility Matrix for PyTorch</a></h3>","evaluation":{"extracted_final_answer":"Judging just by the error, it’s probably <a href=\"https://github.com/timeseriesAI/tsai/issues/919\">a version mismatch between <code>torch</code> and <code>torchvision</code></a>.\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\">pip install torchvision==x.xx.x\n</code></pre>\n<h3><a name=\"p-240826-domain-version-compatibility-matrix-for-pytorchhttpsgithubcompytorchpytorchwikipytorch-versionsdomain-version-compatibility-matrix-for-pytorch-1\" class=\"anchor\" href=\"#p-240826-domain-version-compatibility-matrix-for-pytorchhttpsgithubcompytorchpytorchwikipytorch-versionsdomain-version-compatibility-matrix-for-pytorch-1\"></a><a href=\"https://github.com/pytorch/pytorch/wiki/PyTorch-Versions#domain-version-compatibility-matrix-for-pytorch\">Domain Version Compatibility Matrix for PyTorch</a></h3>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in content or meaning. Therefore, the response is correct.","correct":"yes","confidence":100}}
-{"discussion_title":"Which data parallel does trainer use? DP or DDP?","discussion_url":"https://discuss.huggingface.co/t/which-data-parallel-does-trainer-use-dp-or-ddp/16021","discussion_topic_id":16021,"discussion_category":9,"discussion_created_at":"2022-03-24T06:03:27.073000Z","thread":[{"id":33067,"name":"dr_xiami","username":"xiami","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/x/dc4da7/{size}.png","created_at":"2022-03-24T06:03:27.154Z","cooked":"<p>I try to search in the doc. But I didn’t find the answer anywhere.</p>\n<p>Thank you</p>","post_number":1,"post_type":1,"posts_count":7,"updated_at":"2022-03-24T06:03:27.154Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":5299,"reads":205,"readers_count":204,"score":26516.0,"yours":false,"topic_id":16021,"topic_slug":"which-data-parallel-does-trainer-use-dp-or-ddp","display_username":"dr_xiami","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":2}],"moderator":false,"admin":false,"staff":false,"user_id":3838,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/which-data-parallel-does-trainer-use-dp-or-ddp/16021/1","reactions":[{"id":"heart","type":"emoji","count":2}],"current_user_reaction":null,"reaction_users_count":2,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":33091,"name":"Sylvain Gugger","username":"sgugger","avatar_template":"/user_avatar/discuss.huggingface.co/sgugger/{size}/2291_2.png","created_at":"2022-03-24T12:22:07.153Z","cooked":"<p>It depends if you launch your training script with <code>python</code> (in which case it will use DP) or <code>python -m torch.distributed.launch</code> (in which case it will use DDP).</p>","post_number":2,"post_type":1,"posts_count":7,"updated_at":"2022-03-24T12:22:07.153Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":331,"reads":203,"readers_count":202,"score":1750.6,"yours":false,"topic_id":16021,"topic_slug":"which-data-parallel-does-trainer-use-dp-or-ddp","display_username":"Sylvain Gugger","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":4}],"moderator":false,"admin":false,"staff":false,"user_id":6,"hidden":false,"trust_level":2,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/which-data-parallel-does-trainer-use-dp-or-ddp/16021/2","reactions":[{"id":"heart","type":"emoji","count":4}],"current_user_reaction":null,"reaction_users_count":4,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":42484,"name":"Brando Miranda","username":"brando","avatar_template":"/user_avatar/discuss.huggingface.co/brando/{size}/30114_2.png","created_at":"2022-08-17T15:03:18.063Z","cooked":"<p>perhaps useful to you: <a href=\"https://discuss.huggingface.co/t/using-transformers-with-distributeddataparallel-any-examples/10775\" class=\"inline-onebox\">Using Transformers with DistributedDataParallel — any examples?</a></p>","post_number":3,"post_type":1,"posts_count":7,"updated_at":"2022-08-17T15:03:18.063Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":47,"reads":193,"readers_count":192,"score":318.6,"yours":false,"topic_id":16021,"topic_slug":"which-data-parallel-does-trainer-use-dp-or-ddp","display_username":"Brando Miranda","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://discuss.huggingface.co/t/using-transformers-with-distributeddataparallel-any-examples/10775","internal":true,"reflection":false,"title":"Using Transformers with DistributedDataParallel — any examples?","clicks":1940},{"url":"https://discuss.huggingface.co/t/how-to-run-an-end-to-end-example-of-distributed-data-parallel-with-hugging-faces-trainer-api-ideally-on-a-single-node-multiple-gpus/21750","internal":true,"reflection":true,"title":"How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)?","clicks":16}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":3}],"moderator":false,"admin":false,"staff":false,"user_id":3664,"hidden":false,"trust_level":2,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/which-data-parallel-does-trainer-use-dp-or-ddp/16021/3","reactions":[{"id":"heart","type":"emoji","count":3}],"current_user_reaction":null,"reaction_users_count":3,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240653,"name":"Rylan Schaeffer","username":"RylanSchaeffer","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/r/6f9a4e/{size}.png","created_at":"2025-08-30T01:34:06.356Z","cooked":"<p>I know this is a bit of an old thread, but I have a follow up question. I’m creating a <code>Trainer()</code> , evaluating, training and evaluating again. Here’s a snippet of my code:</p>\n<p>```<br>\ntrainer = Trainer(<br>\nmodel=model,<br>\nprocessing_class=tokenizer,<br>\nargs=pretraining_config,<br>\ntrain_dataset=train_dataset,<br>\neval_dataset=eval_dataset,<br>\ndata_collator=data_collator,<br>\n)</p>\n<p>logging.info(“Evaluating before training…”)<br>\neval_metrics_before = trainer.evaluate()<br>\nwandb.log({f\"eval_before/{k}\": v for k, v in eval_metrics_before.items()})<br>\npprint.pprint(eval_metrics_before)</p>\n<p>logging.info(“Beginning training…”)<br>\ntrainer.train()</p>\n<p>logging.info(“Finished training. Beginning final evaluation…”)<br>\neval_metrics_after = trainer.evaluate()<br>\nwandb.log({f\"eval_after/{k}\": v for k, v in eval_metrics_after.items()})<br>\npprint.pprint(eval_metrics_after)<br>\n```</p>\n<p>When I run with two GPUs and a model small enough to fit on each, I noticed while the job is running that evaluating appears to use data parallelism over the two visible GPUs, but does not for training. Do you know what might cause that or how to fix it?</p>","post_number":4,"post_type":1,"posts_count":7,"updated_at":"2025-08-30T01:34:56.436Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":15.4,"yours":false,"topic_id":16021,"topic_slug":"which-data-parallel-does-trainer-use-dp-or-ddp","display_username":"Rylan Schaeffer","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":4145,"hidden":false,"trust_level":2,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/which-data-parallel-does-trainer-use-dp-or-ddp/16021/4","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240654,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-08-30T02:42:00.790Z","cooked":"<p>Hmm… Have you tried <a href=\"https://discuss.huggingface.co/t/how-to-run-single-node-multi-gpu-training-with-hf-trainer/19503\">launching it via <code>accelerate</code> or <code>torchrun</code></a>?</p>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\"># single node, 2 GPUs\ntorchrun --nproc_per_node=2 train.py\n# or\naccelerate launch --num_processes=2 train.py\n</code></pre>\n<h3><a name=\"p-240654-accelerator-selectionhttpshuggingfacecodocstransformersv4560enaccelerator_selection-1\" class=\"anchor\" href=\"#p-240654-accelerator-selectionhttpshuggingfacecodocstransformersv4560enaccelerator_selection-1\"></a><a href=\"https://huggingface.co/docs/transformers/v4.56.0/en/accelerator_selection\">Accelerator selection</a></h3>","post_number":5,"post_type":1,"posts_count":7,"updated_at":"2025-08-30T02:42:00.790Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":2,"reads":2,"readers_count":1,"score":15.4,"yours":false,"topic_id":16021,"topic_slug":"which-data-parallel-does-trainer-use-dp-or-ddp","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://discuss.huggingface.co/t/how-to-run-single-node-multi-gpu-training-with-hf-trainer/19503","internal":true,"reflection":false,"title":"How to run single-node, multi-GPU training with HF Trainer?","clicks":1},{"url":"https://huggingface.co/docs/transformers/v4.56.0/en/accelerator_selection","internal":false,"reflection":false,"title":"Accelerator selection","clicks":1}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/which-data-parallel-does-trainer-use-dp-or-ddp/16021/5","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240658,"name":"Rylan Schaeffer","username":"RylanSchaeffer","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/r/6f9a4e/{size}.png","created_at":"2025-08-30T04:23:56.271Z","cooked":"<aside class=\"quote no-group\" data-username=\"John6666\" data-post=\"5\" data-topic=\"16021\">\n<div class=\"title\">\n<div class=\"quote-controls\"></div>\n<img alt=\"\" width=\"24\" height=\"24\" src=\"https://sea2.discourse-cdn.com/hellohellohello/user_avatar/discuss.huggingface.co/john6666/48/27664_2.png\" class=\"avatar\"> John6666:</div>\n<blockquote>\n<p>Hmm… Have you tried <a href=\"https://discuss.huggingface.co/t/how-to-run-single-node-multi-gpu-training-with-hf-trainer/19503\">launching it via <code>accelerate</code> or <code>torchrun</code></a>?</p>\n</blockquote>\n</aside>\n<p>Yeah, I would’ve thought that launching with <code>python</code> would use DP and thus would only use 1 available GPU. And that’s partially correct: <code>train()</code> indeed only uses 1 GPU, but <code>evaluate()</code> uses 2 GPUs. Hence my confusion…</p>","post_number":6,"post_type":1,"posts_count":7,"updated_at":"2025-08-30T04:23:56.271Z","reply_count":0,"reply_to_post_number":5,"quote_count":1,"incoming_link_count":0,"reads":2,"readers_count":1,"score":15.4,"yours":false,"topic_id":16021,"topic_slug":"which-data-parallel-does-trainer-use-dp-or-ddp","display_username":"Rylan Schaeffer","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":4145,"hidden":false,"trust_level":2,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/which-data-parallel-does-trainer-use-dp-or-ddp/16021/6","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240668,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-08-30T05:25:09.372Z","cooked":"<p>I see. When running distributed training, if you <a href=\"https://github.com/huggingface/transformers/issues/28956\">launch it as a single process, <code>evaluate</code> sometimes behaves differently from the Trainer part</a>…Since <a href=\"https://discuss.pytorch.org/t/bug-in-dataparallel-only-works-if-the-dataset-device-is-cuda-0/28634\"><code>DP</code> itself seems quite fragile</a>, using <code>DDP</code> is probably the simpler approach…</p>","post_number":7,"post_type":1,"posts_count":7,"updated_at":"2025-08-30T05:25:09.372Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":1,"readers_count":0,"score":0.2,"yours":false,"topic_id":16021,"topic_slug":"which-data-parallel-does-trainer-use-dp-or-ddp","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://discuss.pytorch.org/t/bug-in-dataparallel-only-works-if-the-dataset-device-is-cuda-0/28634","internal":false,"reflection":false,"title":"Bug in DataParallel? Only works if the dataset device is cuda:0 - PyTorch Forums","clicks":1},{"url":"https://github.com/huggingface/transformers/issues/28956","internal":false,"reflection":false,"title":"The Trainer uses all available GPU devices when training but only one when evaluating. · Issue #28956 · huggingface/transformers · GitHub","clicks":1}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/which-data-parallel-does-trainer-use-dp-or-ddp/16021/7","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>I try to search in the doc. But I didn’t find the answer anywhere.</p>\n<p>Thank you</p>","solution":"<p>It depends if you launch your training script with <code>python</code> (in which case it will use DP) or <code>python -m torch.distributed.launch</code> (in which case it will use DDP).</p>","evaluation":{"extracted_final_answer":"<p>It depends if you launch your training script with <code>python</code> (in which case it will use DP) or <code>python -m torch.distributed.launch</code> (in which case it will use DDP).</p>","reasoning":"The extracted_final_answer is identical to the correct_answer provided. There are no differences in wording, structure, or meaning between the two. Therefore, the correct_answer is fully included in the extracted_final_answer.","correct":"yes","confidence":100}}
-{"discussion_title":"Gradient Overflow issue while using deepspeed","discussion_url":"https://discuss.huggingface.co/t/gradient-overflow-issue-while-using-deepspeed/167833","discussion_topic_id":167833,"discussion_category":5,"discussion_created_at":"2025-08-28T00:39:29.361000Z","thread":[{"id":240473,"name":"Jay","username":"jaydeepb","avatar_template":"/user_avatar/discuss.huggingface.co/jaydeepb/{size}/14906_2.png","created_at":"2025-08-28T00:39:29.422Z","cooked":"<p>Hi. I’m trying to fine-tune <code>mistralai/Mistral-Small-24B-Base-2501</code> using deepspeed and consistently getting the overflow error. When I use <code>bf16</code> and <code>fp32,</code>I don’t see the overflow issue but the training loss is Nan. When I switch to <code>fp16</code> the training loss is correct but it throws the overflow error. How can I fix this? This works fine with smaller models. Using <code>lr=1e-7</code>.</p>\n<p>My <code>df_config.json</code>:</p>\n<pre><code class=\"lang-auto\">{\n    \"train_micro_batch_size_per_gpu\": 1,\n    \"gradient_accumulation_steps\": 8,\n    \"zero_optimization\": {\n        \"stage\": 2\n    },\n    \"zero_allow_untested_optimizer\": true,\n    \"fp16\": {\n        \"enabled\": true,\n        \"loss_scale\": 0,\n        \"initial_scale_power\": 32,\n        \"loss_scale_window\": 1000,\n        \"hysteresis\": 2,\n        \"min_loss_scale\": 1\n    },\n    \"gradient_clipping\": 1.0,\n    \"wall_clock_breakdown\": false\n}\n</code></pre>\n<p>Using <code>deepspeed 0.17.2</code> and <code>transformers 4.42.4</code>.</p>","post_number":1,"post_type":1,"posts_count":4,"updated_at":"2025-08-28T00:42:21.118Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":17,"reads":6,"readers_count":5,"score":81.2,"yours":false,"topic_id":167833,"topic_slug":"gradient-overflow-issue-while-using-deepspeed","display_username":"Jay","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":16838,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/gradient-overflow-issue-while-using-deepspeed/167833/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":240474,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-08-28T01:04:31.600Z","cooked":"<p>If the GPU supports bfloat16, it’s probably better to use bfloat16. Regarding <code>NaN</code> issues, SDPA seems to be the culprit in many cases. Try <code>attn_implementation=\"eager\"</code>.</p>\n<ul>\n<li><a href=\"https://github.com/pytorch/pytorch/issues/139298\">CUDNN sdp attention causes loss explosion #139298</a></li>\n<li><a href=\"https://github.com/pytorch/pytorch/issues/103749\">SDPA produces NaN with padding mask #103749</a></li>\n<li><a href=\"https://github.com/huggingface/transformers/issues/32390\">Gemma 2 returns NaN when using default attn (sdpa) with padding #32390</a></li>\n</ul>","post_number":2,"post_type":1,"posts_count":4,"updated_at":"2025-08-28T01:04:31.600Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":5,"readers_count":4,"score":26.0,"yours":false,"topic_id":167833,"topic_slug":"gradient-overflow-issue-while-using-deepspeed","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/pytorch/pytorch/issues/103749","internal":false,"reflection":false,"title":"SDPA produces NaN with padding mask · Issue #103749 · pytorch/pytorch · GitHub","clicks":1},{"url":"https://github.com/pytorch/pytorch/issues/139298","internal":false,"reflection":false,"title":"CUDNN sdp attention causes loss explosion · Issue #139298 · pytorch/pytorch · GitHub","clicks":0},{"url":"https://github.com/huggingface/transformers/issues/32390","internal":false,"reflection":false,"title":"Gemma 2 returns NaN when using default attn (sdpa) with padding · Issue #32390 · huggingface/transformers · GitHub","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/gradient-overflow-issue-while-using-deepspeed/167833/2","reactions":[{"id":"heart","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":240480,"name":"Jay","username":"jaydeepb","avatar_template":"/user_avatar/discuss.huggingface.co/jaydeepb/{size}/14906_2.png","created_at":"2025-08-28T04:50:31.820Z","cooked":"<p><a class=\"mention\" href=\"/u/john6666\">@John6666</a> loading the model in <code>bfloat16</code> and then using <code>bf16=true</code> in deepspeed seems to solve this issue for now!</p>","post_number":3,"post_type":1,"posts_count":4,"updated_at":"2025-08-28T04:50:31.820Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":4,"readers_count":3,"score":15.8,"yours":false,"topic_id":167833,"topic_slug":"gradient-overflow-issue-while-using-deepspeed","display_username":"Jay","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":16838,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/gradient-overflow-issue-while-using-deepspeed/167833/3","reactions":[{"id":"confetti_ball","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240534,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-08-28T16:51:04.376Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":4,"post_type":3,"posts_count":4,"updated_at":"2025-08-28T16:51:04.376Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":1,"readers_count":0,"score":0.2,"yours":false,"topic_id":167833,"topic_slug":"gradient-overflow-issue-while-using-deepspeed","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/gradient-overflow-issue-while-using-deepspeed/167833/4","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Hi. I’m trying to fine-tune <code>mistralai/Mistral-Small-24B-Base-2501</code> using deepspeed and consistently getting the overflow error. When I use <code>bf16</code> and <code>fp32,</code>I don’t see the overflow issue but the training loss is Nan. When I switch to <code>fp16</code> the training loss is correct but it throws the overflow error. How can I fix this? This works fine with smaller models. Using <code>lr=1e-7</code>.</p>\n<p>My <code>df_config.json</code>:</p>\n<pre><code class=\"lang-auto\">{\n    \"train_micro_batch_size_per_gpu\": 1,\n    \"gradient_accumulation_steps\": 8,\n    \"zero_optimization\": {\n        \"stage\": 2\n    },\n    \"zero_allow_untested_optimizer\": true,\n    \"fp16\": {\n        \"enabled\": true,\n        \"loss_scale\": 0,\n        \"initial_scale_power\": 32,\n        \"loss_scale_window\": 1000,\n        \"hysteresis\": 2,\n        \"min_loss_scale\": 1\n    },\n    \"gradient_clipping\": 1.0,\n    \"wall_clock_breakdown\": false\n}\n</code></pre>\n<p>Using <code>deepspeed 0.17.2</code> and <code>transformers 4.42.4</code>.</p>","solution":"<p>If the GPU supports bfloat16, it’s probably better to use bfloat16. Regarding <code>NaN</code> issues, SDPA seems to be the culprit in many cases. Try <code>attn_implementation=\"eager\"</code>.</p>\n<ul>\n<li><a href=\"https://github.com/pytorch/pytorch/issues/139298\">CUDNN sdp attention causes loss explosion #139298</a></li>\n<li><a href=\"https://github.com/pytorch/pytorch/issues/103749\">SDPA produces NaN with padding mask #103749</a></li>\n<li><a href=\"https://github.com/huggingface/transformers/issues/32390\">Gemma 2 returns NaN when using default attn (sdpa) with padding #32390</a></li>\n</ul>","evaluation":{"extracted_final_answer":"<p>If the GPU supports bfloat16, it’s probably better to use bfloat16. Regarding <code>NaN</code> issues, SDPA seems to be the culprit in many cases. Try <code>attn_implementation=\"eager\"</code>.</p>\n<ul>\n<li><a href=\"https://github.com/pytorch/pytorch/issues/139298\">CUDNN sdp attention causes loss explosion #139298</a></li>\n<li><a href=\"https://github.com/pytorch/pytorch/issues/103749\">SDPA produces NaN with padding mask #103749</a></li>\n<li><a href=\"https://github.com/huggingface/transformers/issues/32390\">Gemma 2 returns NaN when using default attn (sdpa) with padding #32390</a></li>\n</ul>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in wording or content. Therefore, the response is correct and includes the precise and unambiguous correct answer.","correct":"yes","confidence":100}}
-{"discussion_title":"Setting max_length does not limit length of output","discussion_url":"https://discuss.huggingface.co/t/setting-max-length-does-not-limit-length-of-output/167794","discussion_topic_id":167794,"discussion_category":20,"discussion_created_at":"2025-08-27T00:53:51.090000Z","thread":[{"id":240359,"name":"Travis Lelle","username":"info5ec","avatar_template":"/user_avatar/discuss.huggingface.co/info5ec/{size}/53106_2.png","created_at":"2025-08-27T00:53:51.147Z","cooked":"<pre><code class=\"lang-auto\">&gt;&gt;&gt; generator = pipeline(\"text-generation\", model=\"HuggingFaceTB/SmolLM2-360M\")\nconfig.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 689/689 [00:00&lt;00:00, 415kB/s]\nmodel.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 724M/724M [00:09&lt;00:00, 73.1MB/s]\ngeneration_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 111/111 [00:00&lt;00:00, 697kB/s]\ntokenizer_config.json: 3.66kB [00:00, 10.4MB/s]\nvocab.json: 801kB [00:00, 9.48MB/s]\nmerges.txt: 466kB [00:00, 36.9MB/s]\ntokenizer.json: 2.10MB [00:00, 53.9MB/s]\nspecial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 831/831 [00:00&lt;00:00, 1.66MB/s]\nDevice set to use mps:0\n&gt;&gt;&gt; generator(\"I'm not sure if I know how to\", max_length=50, num_return_sequences=3,)\nTruncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.\nSetting `pad_token_id` to `eos_token_id`:0 for open-end generation.\nBoth `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)\n[{'generated_text': \"I'm not sure if I know how to explain this. The problem basically is that you can't have a value of 0 in the output. I'm trying to do the following:\\n\\nfloat x = 2.0;\\nfloat y = 0.0;\\nfloat z = 1.0;\\nfloat z2;\\n\\nz2 = z + x*y;\\n\\nI understand that y*z should be 2.0*0.0 = 0.0, but I'm not sure how to get the 0.0 in the z2 variable.\\n\\n## Answers\\n\\n0\\n1. If you are trying to get the 0.0 in z2, please look at the following code:\\nbool true = (z2*z2) &gt; 0;\\n\\n// The result is 0.0\\n\\nfloat z2 = z2*z2;\\n\\n// The result is 0.0\\n\\nfloat z2 = z2*z2*z2;\\n\\n// The result is 0.0\\n\\n## Re: How to get 0 in a value in the output in a function\\n\\nThanks for the reply! I understand the problem now.\\n\\nI was trying\"}, {'generated_text': \"I'm not sure if I know how to do that.\\n\\nHow can I find the derivative of 1/x?\\n\\nI can't find the derivative of x^3\\n\\nI can't find the derivative of x^1/2\\n\\nI can't find the derivative of x^1/3\\n\\nI can't find the derivative of x^1/4\\n\\nI can't find the derivative of x^1/5\\n\\nI can't find the derivative of x^1/6\\n\\nI can't find the derivative of x^1/7\\n\\nI can't find the derivative of x^1/8\\n\\nI can't find the derivative of x^1/9\\n\\nI can't find the derivative of x^10\\n\\nI can't find the derivative of x^11\\n\\nI can't find the derivative of x^12\\n\\nI can't find the derivative of x^13\\n\\nI can't find the derivative of x^14\\n\\nI can't find the derivative of x^15\\n\\nI can't find the derivative of x^16\\n\\nI can't find the derivative of x^17\\n\\nI can't find the derivative of x^\"}, {'generated_text': \"I'm not sure if I know how to do this, but I tried to make a function that generates the 64 bit numbers and I got 128 bit numbers.\\n\\n```function rand64(digits = 128) {\\nconst digits = digits;\\nconst d = 7;\\nconst s = 2147483647;\\nconst e = -2147483648;\\nconst f = 1;\\nconst g = 2;\\nconst h = 3;\\nconst i = 4;\\n\\nconst m = 1024;\\nconst d1 = 1 &lt;&lt; d;\\nconst d2 = 1 &lt;&lt; d - d1;\\nconst d3 = 1 &lt;&lt; d - d1 - d2;\\nconst d4 = 1 &lt;&lt; d - d1 - d2 - d3;\\nconst d5 = 1 &lt;&lt; d - d1 - d2 - d3 - d4;\\nconst d6 = 1 &lt;&lt; d - d1 - d2 - d3 - d4 - d5;\\nconst d7 = 1 &lt;&lt; d - d1 - d2 - d3 - d4 - d\"}]\n\n</code></pre>\n<p>It doesn’t seem like the max_length is being honored when this is run. This is straight out of the LLM course under the “Transformers, what can they do?” section.</p>","post_number":1,"post_type":1,"posts_count":3,"updated_at":"2025-08-27T00:53:51.147Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":13,"reads":7,"readers_count":6,"score":81.4,"yours":false,"topic_id":167794,"topic_slug":"setting-max-length-does-not-limit-length-of-output","display_username":"Travis Lelle","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102600,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/setting-max-length-does-not-limit-length-of-output/167794/1","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":240366,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-08-27T03:20:49.986Z","cooked":"<p>With the current Transformers library code, <a href=\"https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationConfig.max_length\"><code>max_new_tokens</code> takes precedence over <code>max_length</code></a>, so specifying <code>max_new_tokens</code> is the simplest approach.</p>","post_number":2,"post_type":1,"posts_count":3,"updated_at":"2025-08-27T03:20:49.986Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":7,"readers_count":6,"score":16.4,"yours":false,"topic_id":167794,"topic_slug":"setting-max-length-does-not-limit-length-of-output","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationConfig.max_length","internal":false,"reflection":false,"title":"Generation","clicks":4}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/setting-max-length-does-not-limit-length-of-output/167794/2","reactions":[{"id":"heart","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":240416,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-08-27T15:21:13.240Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":3,"post_type":3,"posts_count":3,"updated_at":"2025-08-27T15:21:13.240Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":4,"readers_count":3,"score":5.8,"yours":false,"topic_id":167794,"topic_slug":"setting-max-length-does-not-limit-length-of-output","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/setting-max-length-does-not-limit-length-of-output/167794/3","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<pre><code class=\"lang-auto\">&gt;&gt;&gt; generator = pipeline(\"text-generation\", model=\"HuggingFaceTB/SmolLM2-360M\")\nconfig.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 689/689 [00:00&lt;00:00, 415kB/s]\nmodel.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 724M/724M [00:09&lt;00:00, 73.1MB/s]\ngeneration_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 111/111 [00:00&lt;00:00, 697kB/s]\ntokenizer_config.json: 3.66kB [00:00, 10.4MB/s]\nvocab.json: 801kB [00:00, 9.48MB/s]\nmerges.txt: 466kB [00:00, 36.9MB/s]\ntokenizer.json: 2.10MB [00:00, 53.9MB/s]\nspecial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 831/831 [00:00&lt;00:00, 1.66MB/s]\nDevice set to use mps:0\n&gt;&gt;&gt; generator(\"I'm not sure if I know how to\", max_length=50, num_return_sequences=3,)\nTruncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.\nSetting `pad_token_id` to `eos_token_id`:0 for open-end generation.\nBoth `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)\n[{'generated_text': \"I'm not sure if I know how to explain this. The problem basically is that you can't have a value of 0 in the output. I'm trying to do the following:\\n\\nfloat x = 2.0;\\nfloat y = 0.0;\\nfloat z = 1.0;\\nfloat z2;\\n\\nz2 = z + x*y;\\n\\nI understand that y*z should be 2.0*0.0 = 0.0, but I'm not sure how to get the 0.0 in the z2 variable.\\n\\n## Answers\\n\\n0\\n1. If you are trying to get the 0.0 in z2, please look at the following code:\\nbool true = (z2*z2) &gt; 0;\\n\\n// The result is 0.0\\n\\nfloat z2 = z2*z2;\\n\\n// The result is 0.0\\n\\nfloat z2 = z2*z2*z2;\\n\\n// The result is 0.0\\n\\n## Re: How to get 0 in a value in the output in a function\\n\\nThanks for the reply! I understand the problem now.\\n\\nI was trying\"}, {'generated_text': \"I'm not sure if I know how to do that.\\n\\nHow can I find the derivative of 1/x?\\n\\nI can't find the derivative of x^3\\n\\nI can't find the derivative of x^1/2\\n\\nI can't find the derivative of x^1/3\\n\\nI can't find the derivative of x^1/4\\n\\nI can't find the derivative of x^1/5\\n\\nI can't find the derivative of x^1/6\\n\\nI can't find the derivative of x^1/7\\n\\nI can't find the derivative of x^1/8\\n\\nI can't find the derivative of x^1/9\\n\\nI can't find the derivative of x^10\\n\\nI can't find the derivative of x^11\\n\\nI can't find the derivative of x^12\\n\\nI can't find the derivative of x^13\\n\\nI can't find the derivative of x^14\\n\\nI can't find the derivative of x^15\\n\\nI can't find the derivative of x^16\\n\\nI can't find the derivative of x^17\\n\\nI can't find the derivative of x^\"}, {'generated_text': \"I'm not sure if I know how to do this, but I tried to make a function that generates the 64 bit numbers and I got 128 bit numbers.\\n\\n```function rand64(digits = 128) {\\nconst digits = digits;\\nconst d = 7;\\nconst s = 2147483647;\\nconst e = -2147483648;\\nconst f = 1;\\nconst g = 2;\\nconst h = 3;\\nconst i = 4;\\n\\nconst m = 1024;\\nconst d1 = 1 &lt;&lt; d;\\nconst d2 = 1 &lt;&lt; d - d1;\\nconst d3 = 1 &lt;&lt; d - d1 - d2;\\nconst d4 = 1 &lt;&lt; d - d1 - d2 - d3;\\nconst d5 = 1 &lt;&lt; d - d1 - d2 - d3 - d4;\\nconst d6 = 1 &lt;&lt; d - d1 - d2 - d3 - d4 - d5;\\nconst d7 = 1 &lt;&lt; d - d1 - d2 - d3 - d4 - d\"}]\n\n</code></pre>\n<p>It doesn’t seem like the max_length is being honored when this is run. This is straight out of the LLM course under the “Transformers, what can they do?” section.</p>","solution":"<p>With the current Transformers library code, <a href=\"https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationConfig.max_length\"><code>max_new_tokens</code> takes precedence over <code>max_length</code></a>, so specifying <code>max_new_tokens</code> is the simplest approach.</p>","evaluation":{"extracted_final_answer":"With the current Transformers library code, <a href=\"https://huggingface.co/docs/transformers/en/main_classes/text_generation#transformers.GenerationConfig.max_length\"><code>max_new_tokens</code> takes precedence over <code>max_length</code></a>, so specifying <code>max_new_tokens</code> is the simplest approach.","reasoning":"The extracted_final_answer matches the correct_answer exactly, with no differences in wording or meaning. Both answers convey the same information regarding the precedence of `max_new_tokens` over `max_length` in the Transformers library.","correct":"yes","confidence":100}}
-{"discussion_title":"Cannot import name &lsquo;_resolve_process_group&rsquo; from &lsquo;torch.distributed.distributed_c10d&rsquo;","discussion_url":"https://discuss.huggingface.co/t/cannot-import-name-resolve-process-group-from-torch-distributed-distributed-c10d/167762","discussion_topic_id":167762,"discussion_category":9,"discussion_created_at":"2025-08-25T19:56:34.430000Z","thread":[{"id":240239,"name":"Elizabeth Wainwright","username":"ewainwright","avatar_template":"/user_avatar/discuss.huggingface.co/ewainwright/{size}/53052_2.png","created_at":"2025-08-25T19:56:34.479Z","cooked":"<p>I got the following error when calling  the HuggingFaceLLM class:</p>\n<pre><code class=\"lang-auto\">Failed to import transformers.generation.utils because of the following error (look up to see its traceback): cannot import name '_resolve_process_group' from 'torch.distributed.distributed_c10d'\n</code></pre>\n<p>I looked into the source code and sure enough that function is not in there.  Is this a versioning problem?</p>\n<p>Update: I downgraded transformers to version 4.27.4 and that seemed to solve that issue but now I have a keyerror for “mistral”. Is there anyway I can solve this issue without downgrading transformers?</p>","post_number":1,"post_type":1,"posts_count":4,"updated_at":"2025-08-25T20:47:38.847Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":24,"reads":3,"readers_count":2,"score":135.6,"yours":false,"topic_id":167762,"topic_slug":"cannot-import-name-resolve-process-group-from-torch-distributed-distributed-c10d","display_username":"Elizabeth Wainwright","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":3,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102505,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/cannot-import-name-resolve-process-group-from-torch-distributed-distributed-c10d/167762/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":240260,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-08-26T00:33:05.978Z","cooked":"<p><a href=\"https://forums.developer.nvidia.com/t/pytorch-2-0-0-nv23-05/273736\">This error seems to occur when PyTorch is far older than Transformers</a>. It should be OK with PyTorch 2.4 or later.</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">import torch, torch.distributed as dist\nprint(torch.__version__, 'dist?', dist.is_available())\n# Expect: 2.4+  dist? True\n</code></pre>","post_number":2,"post_type":1,"posts_count":4,"updated_at":"2025-08-26T00:33:05.978Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":4,"reads":3,"readers_count":2,"score":25.6,"yours":false,"topic_id":167762,"topic_slug":"cannot-import-name-resolve-process-group-from-torch-distributed-distributed-c10d","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://forums.developer.nvidia.com/t/pytorch-2-0-0-nv23-05/273736","internal":false,"reflection":false,"title":"pyTorch 2.0.0.nv23.05 - Jetson Orin Nano - NVIDIA Developer Forums","clicks":1}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/cannot-import-name-resolve-process-group-from-torch-distributed-distributed-c10d/167762/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":240294,"name":"Elizabeth Wainwright","username":"ewainwright","avatar_template":"/user_avatar/discuss.huggingface.co/ewainwright/{size}/53052_2.png","created_at":"2025-08-26T12:32:16.124Z","cooked":"<p>Thanks this worked</p>","post_number":3,"post_type":1,"posts_count":4,"updated_at":"2025-08-26T12:32:16.124Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":3,"readers_count":2,"score":15.6,"yours":false,"topic_id":167762,"topic_slug":"cannot-import-name-resolve-process-group-from-torch-distributed-distributed-c10d","display_username":"Elizabeth Wainwright","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102505,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/cannot-import-name-resolve-process-group-from-torch-distributed-distributed-c10d/167762/3","reactions":[{"id":"confetti_ball","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240358,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-08-27T00:32:22.645Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":4,"post_type":3,"posts_count":4,"updated_at":"2025-08-27T00:32:22.645Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":1,"readers_count":0,"score":0.2,"yours":false,"topic_id":167762,"topic_slug":"cannot-import-name-resolve-process-group-from-torch-distributed-distributed-c10d","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/cannot-import-name-resolve-process-group-from-torch-distributed-distributed-c10d/167762/4","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>I got the following error when calling  the HuggingFaceLLM class:</p>\n<pre><code class=\"lang-auto\">Failed to import transformers.generation.utils because of the following error (look up to see its traceback): cannot import name '_resolve_process_group' from 'torch.distributed.distributed_c10d'\n</code></pre>\n<p>I looked into the source code and sure enough that function is not in there.  Is this a versioning problem?</p>\n<p>Update: I downgraded transformers to version 4.27.4 and that seemed to solve that issue but now I have a keyerror for “mistral”. Is there anyway I can solve this issue without downgrading transformers?</p>","solution":"<p><a href=\"https://forums.developer.nvidia.com/t/pytorch-2-0-0-nv23-05/273736\">This error seems to occur when PyTorch is far older than Transformers</a>. It should be OK with PyTorch 2.4 or later.</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">import torch, torch.distributed as dist\nprint(torch.__version__, 'dist?', dist.is_available())\n# Expect: 2.4+  dist? True\n</code></pre>","evaluation":{"extracted_final_answer":"<p><a href=\"https://forums.developer.nvidia.com/t/pytorch-2-0-0-nv23-05/273736\">This error seems to occur when PyTorch is far older than Transformers</a>. It should be OK with PyTorch 2.4 or later.</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">import torch, torch.distributed as dist\nprint(torch.__version__, 'dist?', dist.is_available())\n# Expect: 2.4+  dist? True\n</code></pre>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in content or meaning. Both contain the same information regarding the error and the required version of PyTorch.","correct":"yes","confidence":100}}
-{"discussion_title":"Private Space authentication for external API calls","discussion_url":"https://discuss.huggingface.co/t/private-space-authentication-for-external-api-calls/167772","discussion_topic_id":167772,"discussion_category":24,"discussion_created_at":"2025-08-26T08:43:45.781000Z","thread":[{"id":240276,"name":"Mohamed Nasr","username":"nasr7322","avatar_template":"/user_avatar/discuss.huggingface.co/nasr7322/{size}/53080_2.png","created_at":"2025-08-26T08:43:45.839Z","cooked":"<p>Hello everyone!<br>\nI’m using a Docker <img src=\"https://emoji.discourse-cdn.com/apple/hugs.png?v=14\" title=\":hugs:\" class=\"emoji\" alt=\":hugs:\" loading=\"lazy\" width=\"20\" height=\"20\"> Space to deploy my FastAPI application that uses multiple models, but I’ve set it to private since my project contains sensitive code. My problem is that I can’t send requests to the endpoints from anywhere outside my browser and get a 404.</p>\n<p>Is it possible to send a <img src=\"https://emoji.discourse-cdn.com/apple/hugs.png?v=14\" title=\":hugs:\" class=\"emoji\" alt=\":hugs:\" loading=\"lazy\" width=\"20\" height=\"20\"> token with the request to authenticate myself? If so, how should I include it in my request to make it work properly?</p>\n<p>Thank you all in advance! <img src=\"https://emoji.discourse-cdn.com/apple/hand_with_fingers_splayed.png?v=14\" title=\":hand_with_fingers_splayed:\" class=\"emoji\" alt=\":hand_with_fingers_splayed:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>","post_number":1,"post_type":1,"posts_count":4,"updated_at":"2025-08-26T08:43:45.839Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":17,"reads":12,"readers_count":11,"score":97.2,"yours":false,"topic_id":167772,"topic_slug":"private-space-authentication-for-external-api-calls","display_username":"Mohamed Nasr","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://discuss.huggingface.co/t/http-1-1-404-not-found/167933/2","internal":true,"reflection":true,"title":"HTTP/1.1 404 Not Found","clicks":0}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102545,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/private-space-authentication-for-external-api-calls/167772/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":240277,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-08-26T09:10:04.255Z","cooked":"<p>If the space is functioning properly, you should be able to access it like following.<br>\nYou can figure out the actual space URL yourself, also <a href=\"https://huggingface.co/docs/hub/en/spaces-embed\">you can also find it using the GUI</a>.</p>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\">curl -X POST https://OWNER-SPACENAME.hf.space/api/predict \\\n  -H \"Authorization: Bearer $HF_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\":\"hello\"}'\n</code></pre>\n<p>or</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">import os, requests\nurl = \"https://OWNER-SPACENAME.hf.space/api/predict\"\nr = requests.post(url,\n                  headers={\"Authorization\": f\"Bearer {os.getenv('HF_TOKEN')}\"},\n                  json={\"text\": \"hello\"},\n                  timeout=60)\nprint(r.status_code, r.text)\n</code></pre>\n<p>If you want to implement <a href=\"https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/88#68a736ebb21506a456c47c81\">more complex access control</a>.</p>","post_number":2,"post_type":1,"posts_count":4,"updated_at":"2025-08-26T09:10:43.033Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":11,"readers_count":10,"score":22.0,"yours":false,"topic_id":167772,"topic_slug":"private-space-authentication-for-external-api-calls","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/docs/hub/en/spaces-embed","internal":false,"reflection":false,"title":"Embed your Space in another website","clicks":2},{"url":"https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/88#68a736ebb21506a456c47c81","internal":false,"reflection":false,"clicks":2}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/private-space-authentication-for-external-api-calls/167772/2","reactions":[{"id":"heart","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":240278,"name":"Mohamed Nasr","username":"nasr7322","avatar_template":"/user_avatar/discuss.huggingface.co/nasr7322/{size}/53080_2.png","created_at":"2025-08-26T09:11:44.798Z","cooked":"<p>yup it worked, thank youu!<br>\nmy problem was with the token</p>","post_number":3,"post_type":1,"posts_count":4,"updated_at":"2025-08-26T09:11:44.798Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":11,"readers_count":10,"score":17.0,"yours":false,"topic_id":167772,"topic_slug":"private-space-authentication-for-external-api-calls","display_username":"Mohamed Nasr","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102545,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/private-space-authentication-for-external-api-calls/167772/3","reactions":[{"id":"confetti_ball","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240346,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-08-26T21:12:23.222Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":4,"post_type":3,"posts_count":4,"updated_at":"2025-08-26T21:12:23.222Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":5,"readers_count":4,"score":0.8,"yours":false,"topic_id":167772,"topic_slug":"private-space-authentication-for-external-api-calls","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/private-space-authentication-for-external-api-calls/167772/4","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Hello everyone!<br>\nI’m using a Docker <img src=\"https://emoji.discourse-cdn.com/apple/hugs.png?v=14\" title=\":hugs:\" class=\"emoji\" alt=\":hugs:\" loading=\"lazy\" width=\"20\" height=\"20\"> Space to deploy my FastAPI application that uses multiple models, but I’ve set it to private since my project contains sensitive code. My problem is that I can’t send requests to the endpoints from anywhere outside my browser and get a 404.</p>\n<p>Is it possible to send a <img src=\"https://emoji.discourse-cdn.com/apple/hugs.png?v=14\" title=\":hugs:\" class=\"emoji\" alt=\":hugs:\" loading=\"lazy\" width=\"20\" height=\"20\"> token with the request to authenticate myself? If so, how should I include it in my request to make it work properly?</p>\n<p>Thank you all in advance! <img src=\"https://emoji.discourse-cdn.com/apple/hand_with_fingers_splayed.png?v=14\" title=\":hand_with_fingers_splayed:\" class=\"emoji\" alt=\":hand_with_fingers_splayed:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>","solution":"<p>If the space is functioning properly, you should be able to access it like following.<br>\nYou can figure out the actual space URL yourself, also <a href=\"https://huggingface.co/docs/hub/en/spaces-embed\">you can also find it using the GUI</a>.</p>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\">curl -X POST https://OWNER-SPACENAME.hf.space/api/predict \\\n  -H \"Authorization: Bearer $HF_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\":\"hello\"}'\n</code></pre>\n<p>or</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">import os, requests\nurl = \"https://OWNER-SPACENAME.hf.space/api/predict\"\nr = requests.post(url,\n                  headers={\"Authorization\": f\"Bearer {os.getenv('HF_TOKEN')}\"},\n                  json={\"text\": \"hello\"},\n                  timeout=60)\nprint(r.status_code, r.text)\n</code></pre>\n<p>If you want to implement <a href=\"https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/88#68a736ebb21506a456c47c81\">more complex access control</a>.</p>","evaluation":{"extracted_final_answer":"<p>If the space is functioning properly, you should be able to access it like following.<br>\nYou can figure out the actual space URL yourself, also <a href=\"https://huggingface.co/docs/hub/en/spaces-embed\">you can also find it using the GUI</a>.</p>\n<pre data-code-wrap=\"bash\"><code class=\"lang-bash\">curl -X POST https://OWNER-SPACENAME.hf.space/api/predict \\\n  -H \"Authorization: Bearer $HF_TOKEN\" \\\n  -H \"Content-Type: application/json\" \\\n  -d '{\"text\":\"hello\"}'\n</code></pre>\n<p>or</p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\">import os, requests\nurl = \"https://OWNER-SPACENAME.hf.space/api/predict\"\nr = requests.post(url,\n                  headers={\"Authorization\": f\"Bearer {os.getenv('HF_TOKEN')}\"},\n                  json={\"text\": \"hello\"},\n                  timeout=60)\nprint(r.status_code, r.text)\n</code></pre>\n<p>If you want to implement <a href=\"https://huggingface.co/spaces/zero-gpu-explorers/README/discussions/88#68a736ebb21506a456c47c81\">more complex access control</a>.</p>","reasoning":"The extracted_final_answer matches the correct_answer exactly, with no differences in content or structure. Therefore, the response is correct as it includes the precise and unambiguous correct_answer.","correct":"yes","confidence":100}}
-{"discussion_title":"Vet/vetgpt-2-7b n8n connection","discussion_url":"https://discuss.huggingface.co/t/vet-vetgpt-2-7b-n8n-connection/167187","discussion_topic_id":167187,"discussion_category":5,"discussion_created_at":"2025-08-18T16:40:15.956000Z","thread":[{"id":239110,"name":"Cristiane Sousa","username":"ketask","avatar_template":"/user_avatar/discuss.huggingface.co/ketask/{size}/52727_2.png","created_at":"2025-08-18T16:40:16.017Z","cooked":"<p>Hi! I’m trying to connect HF model at N8N, but I receive error: “NodeOperationError: An error occurred while fetching the blob”. Is it due to I’m not using HF Pro plan?</p>\n<p><div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/5/9/59fb79169fde184c76f553fdbe69afc508069458.jpeg\" data-download-href=\"/uploads/short-url/cQ1gWwQH1nqIfcmgDMbWdGRLUj6.jpeg?dl=1\" title=\"erro HF\" rel=\"noopener nofollow ugc\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/optimized/3X/5/9/59fb79169fde184c76f553fdbe69afc508069458_2_690x350.jpeg\" alt=\"erro HF\" data-base62-sha1=\"cQ1gWwQH1nqIfcmgDMbWdGRLUj6\" width=\"690\" height=\"350\" srcset=\"https://us1.discourse-cdn.com/hellohellohello/optimized/3X/5/9/59fb79169fde184c76f553fdbe69afc508069458_2_690x350.jpeg, https://us1.discourse-cdn.com/hellohellohello/original/3X/5/9/59fb79169fde184c76f553fdbe69afc508069458.jpeg 1.5x, https://us1.discourse-cdn.com/hellohellohello/original/3X/5/9/59fb79169fde184c76f553fdbe69afc508069458.jpeg 2x\" data-dominant-color=\"EEF0F4\"><div class=\"meta\"><svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">erro HF</span><span class=\"informations\">841×427 36.4 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg></div></a></div></p>","post_number":1,"post_type":1,"posts_count":3,"updated_at":"2025-08-18T16:40:16.017Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":12,"reads":3,"readers_count":2,"score":75.6,"yours":false,"topic_id":167187,"topic_slug":"vet-vetgpt-2-7b-n8n-connection","display_username":"Cristiane Sousa","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102003,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/vet-vetgpt-2-7b-n8n-connection/167187/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":239200,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-08-19T04:36:31.730Z","cooked":"<p><a href=\"https://huggingface.co/ArcanaBT/vetgpt-2-7b\">That model location may be incorrect</a>. Also, <a href=\"https://huggingface.co/models?inference_provider=all&amp;sort=trending&amp;search=vetgpt\">that model is not currently deployed</a>, so it should not be available via the API.</p>","post_number":2,"post_type":1,"posts_count":3,"updated_at":"2025-08-19T04:36:31.730Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":3,"readers_count":2,"score":0.6,"yours":false,"topic_id":167187,"topic_slug":"vet-vetgpt-2-7b-n8n-connection","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/models?inference_provider=all&sort=trending&search=vetgpt","internal":false,"reflection":false,"title":"Models - Hugging Face","clicks":1},{"url":"https://huggingface.co/ArcanaBT/vetgpt-2-7b","internal":false,"reflection":false,"title":"ArcanaBT/vetgpt-2-7b · Hugging Face","clicks":0}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/vet-vetgpt-2-7b-n8n-connection/167187/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":240301,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-08-26T13:15:40.680Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":3,"post_type":3,"posts_count":3,"updated_at":"2025-08-26T13:15:40.680Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":1,"readers_count":0,"score":0.2,"yours":false,"topic_id":167187,"topic_slug":"vet-vetgpt-2-7b-n8n-connection","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/vet-vetgpt-2-7b-n8n-connection/167187/3","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Hi! I’m trying to connect HF model at N8N, but I receive error: “NodeOperationError: An error occurred while fetching the blob”. Is it due to I’m not using HF Pro plan?</p>\n<p><div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"https://us1.discourse-cdn.com/hellohellohello/original/3X/5/9/59fb79169fde184c76f553fdbe69afc508069458.jpeg\" data-download-href=\"/uploads/short-url/cQ1gWwQH1nqIfcmgDMbWdGRLUj6.jpeg?dl=1\" title=\"erro HF\" rel=\"noopener nofollow ugc\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/optimized/3X/5/9/59fb79169fde184c76f553fdbe69afc508069458_2_690x350.jpeg\" alt=\"erro HF\" data-base62-sha1=\"cQ1gWwQH1nqIfcmgDMbWdGRLUj6\" width=\"690\" height=\"350\" srcset=\"https://us1.discourse-cdn.com/hellohellohello/optimized/3X/5/9/59fb79169fde184c76f553fdbe69afc508069458_2_690x350.jpeg, https://us1.discourse-cdn.com/hellohellohello/original/3X/5/9/59fb79169fde184c76f553fdbe69afc508069458.jpeg 1.5x, https://us1.discourse-cdn.com/hellohellohello/original/3X/5/9/59fb79169fde184c76f553fdbe69afc508069458.jpeg 2x\" data-dominant-color=\"EEF0F4\"><div class=\"meta\"><svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use href=\"#far-image\"></use></svg><span class=\"filename\">erro HF</span><span class=\"informations\">841×427 36.4 KB</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use href=\"#discourse-expand\"></use></svg></div></a></div></p>","solution":"<p><a href=\"https://huggingface.co/ArcanaBT/vetgpt-2-7b\">That model location may be incorrect</a>. Also, <a href=\"https://huggingface.co/models?inference_provider=all&amp;sort=trending&amp;search=vetgpt\">that model is not currently deployed</a>, so it should not be available via the API.</p>","evaluation":{"extracted_final_answer":"<p><a href=\"https://huggingface.co/ArcanaBT/vetgpt-2-7b\">That model location may be incorrect</a>. Also, <a href=\"https://huggingface.co/models?inference_provider=all&amp;sort=trending&amp;search=vetgpt\">that model is not currently deployed</a>, so it should not be available via the API.</p>","reasoning":"The extracted_final_answer matches the correct_answer exactly without any differences. There are no inconsistencies or ambiguities between the two answers, making them equivalent.","correct":"yes","confidence":100}}
-{"discussion_title":"Text-Classification Pipeline - Newbie question","discussion_url":"https://discuss.huggingface.co/t/text-classification-pipeline-newbie-question/167640","discussion_topic_id":167640,"discussion_category":5,"discussion_created_at":"2025-08-22T19:06:44.140000Z","thread":[{"id":239963,"name":"Markus Eicher","username":"MarkusEicher","avatar_template":"/user_avatar/discuss.huggingface.co/markuseicher/{size}/52883_2.png","created_at":"2025-08-22T19:06:44.198Z","cooked":"<p>Hello huggingface community. I am wondering if I did understand the pipeline text-classification correctly. Is it the case, that the model I choose defines the task I can do with it and the output I will get? I was a bit confused, because I used pipeline(“sentiment-analysis”) but did not find “sentiment-analysis” as a model or option setting. And VSCode autocomplete also did not suggest it, but it still works. So I came to the conclusion I laid out before. Is this correct or am I wrong. Thanks and may you all have a good time.</p>","post_number":1,"post_type":1,"posts_count":5,"updated_at":"2025-08-22T19:06:44.198Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":11,"reads":7,"readers_count":6,"score":71.4,"yours":false,"topic_id":167640,"topic_slug":"text-classification-pipeline-newbie-question","display_username":"Markus Eicher","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":29747,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/text-classification-pipeline-newbie-question/167640/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":239972,"name":"Daniel Kleine","username":"dkleine","avatar_template":"/user_avatar/discuss.huggingface.co/dkleine/{size}/33964_2.png","created_at":"2025-08-22T19:51:01.268Z","cooked":"<p>Hi Markus,</p>\n<p><code>“sentiment-analysis”</code> is the task specifying what you want a large language model to perform on the text. Sentiment analysis practically changes the model’s head to a classifier, which you can see here:</p>\n<aside class=\"onebox githubblob\" data-onebox-src=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L154-L159\">\n  <header class=\"source\">\n\n      <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L154-L159\" target=\"_blank\" rel=\"noopener nofollow ugc\">github.com/huggingface/transformers</a>\n  </header>\n\n  <article class=\"onebox-body\">\n    <h4><a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L154-L159\" target=\"_blank\" rel=\"noopener nofollow ugc\">src/transformers/pipelines/__init__.py</a></h4>\n\n<div class=\"git-blob-info\">\n  <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L154-L159\" rel=\"noopener nofollow ugc\"><code>7d88f57fc</code></a>\n</div>\n\n\n\n    <pre class=\"onebox\"><code class=\"lang-py\">\n      <ol class=\"start lines\" start=\"154\" style=\"counter-reset: li-counter 153 ;\">\n          <li>TASK_ALIASES = {</li>\n          <li>    \"sentiment-analysis\": \"text-classification\",</li>\n          <li>    \"ner\": \"token-classification\",</li>\n          <li>    \"vqa\": \"visual-question-answering\",</li>\n          <li>    \"text-to-speech\": \"text-to-audio\",</li>\n          <li>}</li>\n      </ol>\n    </code></pre>\n\n\n\n  </article>\n\n  <div class=\"onebox-metadata\">\n    \n    \n  </div>\n\n  <div style=\"clear: both\"></div>\n</aside>\n\n<p>This pipeline is pre-configured, the settings can be found below in the same file defined here:</p>\n<aside class=\"onebox githubblob\" data-onebox-src=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L193-L205\">\n  <header class=\"source\">\n\n      <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L193-L205\" target=\"_blank\" rel=\"noopener nofollow ugc\">github.com/huggingface/transformers</a>\n  </header>\n\n  <article class=\"onebox-body\">\n    <h4><a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L193-L205\" target=\"_blank\" rel=\"noopener nofollow ugc\">src/transformers/pipelines/__init__.py</a></h4>\n\n<div class=\"git-blob-info\">\n  <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L193-L205\" rel=\"noopener nofollow ugc\"><code>7d88f57fc</code></a>\n</div>\n\n\n\n    <pre class=\"onebox\"><code class=\"lang-py\">\n      <ol class=\"start lines\" start=\"193\" style=\"counter-reset: li-counter 192 ;\">\n          <li>},</li>\n          <li>\"text-classification\": {</li>\n          <li>    \"impl\": TextClassificationPipeline,</li>\n          <li>    \"tf\": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),</li>\n          <li>    \"pt\": (AutoModelForSequenceClassification,) if is_torch_available() else (),</li>\n          <li>    \"default\": {</li>\n          <li>        \"model\": {</li>\n          <li>            \"pt\": (\"distilbert/distilbert-base-uncased-finetuned-sst-2-english\", \"714eb0f\"),</li>\n          <li>            \"tf\": (\"distilbert/distilbert-base-uncased-finetuned-sst-2-english\", \"714eb0f\"),</li>\n          <li>        },</li>\n          <li>    },</li>\n          <li>    \"type\": \"text\",</li>\n          <li>},</li>\n      </ol>\n    </code></pre>\n\n\n\n  </article>\n\n  <div class=\"onebox-metadata\">\n    \n    \n  </div>\n\n  <div style=\"clear: both\"></div>\n</aside>\n","post_number":2,"post_type":1,"posts_count":5,"updated_at":"2025-08-22T19:51:27.289Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":7,"readers_count":6,"score":36.4,"yours":false,"topic_id":167640,"topic_slug":"text-classification-pipeline-newbie-question","display_username":"Daniel Kleine","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L193-L205","internal":false,"reflection":false,"title":"transformers/src/transformers/pipelines/__init__.py at 7d88f57fc6892b9b3d0092c53e27ae033f1bebc8 · huggingface/transformers · GitHub","clicks":1},{"url":"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L154-L159","internal":false,"reflection":false,"title":"transformers/src/transformers/pipelines/__init__.py at 7d88f57fc6892b9b3d0092c53e27ae033f1bebc8 · huggingface/transformers · GitHub","clicks":0},{"url":"https://discuss.huggingface.co/t/default-models-for-pipeline-tasks/2559/6","internal":true,"reflection":true,"title":"Default models for pipeline tasks","clicks":0}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":2}],"moderator":false,"admin":false,"staff":false,"user_id":69473,"hidden":false,"trust_level":2,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/text-classification-pipeline-newbie-question/167640/2","reactions":[{"id":"+1","type":"emoji","count":1},{"id":"heart","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":2,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":239973,"name":"Markus Eicher","username":"MarkusEicher","avatar_template":"/user_avatar/discuss.huggingface.co/markuseicher/{size}/52883_2.png","created_at":"2025-08-22T20:11:08.187Z","cooked":"<p>Thank you. So it is generally an alias for text-classification. I was confused because it did not show up as a separate pipeline in chapter 1 of the LLM course on huggingface. But now I understand why. Appreciate your support and the quick answer.</p>","post_number":3,"post_type":1,"posts_count":5,"updated_at":"2025-08-22T20:11:08.187Z","reply_count":1,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":4,"reads":6,"readers_count":5,"score":56.2,"yours":false,"topic_id":167640,"topic_slug":"text-classification-pipeline-newbie-question","display_username":"Markus Eicher","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":69473,"username":"dkleine","name":"Daniel Kleine","avatar_template":"/user_avatar/discuss.huggingface.co/dkleine/{size}/33964_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":2}],"moderator":false,"admin":false,"staff":false,"user_id":29747,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/text-classification-pipeline-newbie-question/167640/3","reactions":[{"id":"+1","type":"emoji","count":2}],"current_user_reaction":null,"reaction_users_count":2,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":239974,"name":"Daniel Kleine","username":"dkleine","avatar_template":"/user_avatar/discuss.huggingface.co/dkleine/{size}/33964_2.png","created_at":"2025-08-22T20:23:18.891Z","cooked":"<p>That’s right – <code>“sentiment-analysis”</code> practically does <strong>sequence classification</strong> (there are also other types of classification tasks possible, for example token classification, just fyi) under the hood in the linear output layer of the LLM. Please also see the docstring for the <code>TextClassificationPipeline</code> here:</p><aside class=\"onebox githubblob\" data-onebox-src=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/text_classification.py#L49-L79\">\n  <header class=\"source\">\n\n      <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/text_classification.py#L49-L79\" target=\"_blank\" rel=\"noopener nofollow ugc\">github.com/huggingface/transformers</a>\n  </header>\n\n  <article class=\"onebox-body\">\n    <h4><a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/text_classification.py#L49-L79\" target=\"_blank\" rel=\"noopener nofollow ugc\">src/transformers/pipelines/text_classification.py</a></h4>\n\n<div class=\"git-blob-info\">\n  <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/text_classification.py#L49-L79\" rel=\"noopener nofollow ugc\"><code>7d88f57fc</code></a>\n</div>\n\n\n\n    <pre class=\"onebox\"><code class=\"lang-py\">\n      <ol class=\"start lines\" start=\"49\" style=\"counter-reset: li-counter 48 ;\">\n          <li>class TextClassificationPipeline(Pipeline):</li>\n          <li>    \"\"\"</li>\n          <li>    Text classification pipeline using any `ModelForSequenceClassification`. See the [sequence classification</li>\n          <li>    examples](../task_summary#sequence-classification) for more information.</li>\n          <li></li>\n          <li>    Example:</li>\n          <li></li>\n          <li>    ```python</li>\n          <li>    &gt;&gt;&gt; from transformers import pipeline</li>\n          <li></li>\n          <li>    &gt;&gt;&gt; classifier = pipeline(model=\"distilbert/distilbert-base-uncased-finetuned-sst-2-english\")</li>\n          <li>    &gt;&gt;&gt; classifier(\"This movie is disgustingly good !\")</li>\n          <li>    [{'label': 'POSITIVE', 'score': 1.0}]</li>\n          <li></li>\n          <li>    &gt;&gt;&gt; classifier(\"Director tried too much.\")</li>\n          <li>    [{'label': 'NEGATIVE', 'score': 0.996}]</li>\n          <li>    ```</li>\n          <li></li>\n          <li>    Learn more about the basics of using a pipeline in the [pipeline tutorial](../pipeline_tutorial)</li>\n          <li></li>\n      </ol>\n    </code></pre>\n\n\n  This file has been truncated. <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/text_classification.py#L49-L79\" target=\"_blank\" rel=\"noopener nofollow ugc\">show original</a>\n\n  </article>\n\n  <div class=\"onebox-metadata\">\n    \n    \n  </div>\n\n  <div style=\"clear: both\"></div>\n</aside>\n","post_number":4,"post_type":1,"posts_count":5,"updated_at":"2025-08-22T20:23:18.891Z","reply_count":0,"reply_to_post_number":3,"quote_count":0,"incoming_link_count":0,"reads":6,"readers_count":5,"score":46.2,"yours":false,"topic_id":167640,"topic_slug":"text-classification-pipeline-newbie-question","display_username":"Daniel Kleine","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/text_classification.py#L49-L79","internal":false,"reflection":false,"title":"transformers/src/transformers/pipelines/text_classification.py at 7d88f57fc6892b9b3d0092c53e27ae033f1bebc8 · huggingface/transformers · GitHub","clicks":1}],"read":true,"user_title":null,"reply_to_user":{"id":29747,"username":"MarkusEicher","name":"Markus Eicher","avatar_template":"/user_avatar/discuss.huggingface.co/markuseicher/{size}/52883_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":3}],"moderator":false,"admin":false,"staff":false,"user_id":69473,"hidden":false,"trust_level":2,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/text-classification-pipeline-newbie-question/167640/4","reactions":[{"id":"heart","type":"emoji","count":2},{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":3,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":240000,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-08-23T08:23:30.049Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":5,"post_type":3,"posts_count":5,"updated_at":"2025-08-23T08:23:30.049Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":0.4,"yours":false,"topic_id":167640,"topic_slug":"text-classification-pipeline-newbie-question","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/text-classification-pipeline-newbie-question/167640/5","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Hello huggingface community. I am wondering if I did understand the pipeline text-classification correctly. Is it the case, that the model I choose defines the task I can do with it and the output I will get? I was a bit confused, because I used pipeline(“sentiment-analysis”) but did not find “sentiment-analysis” as a model or option setting. And VSCode autocomplete also did not suggest it, but it still works. So I came to the conclusion I laid out before. Is this correct or am I wrong. Thanks and may you all have a good time.</p>","solution":"<p>Hi Markus,</p>\n<p><code>“sentiment-analysis”</code> is the task specifying what you want a large language model to perform on the text. Sentiment analysis practically changes the model’s head to a classifier, which you can see here:</p>\n<aside class=\"onebox githubblob\" data-onebox-src=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L154-L159\">\n  <header class=\"source\">\n\n      <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L154-L159\" target=\"_blank\" rel=\"noopener nofollow ugc\">github.com/huggingface/transformers</a>\n  </header>\n\n  <article class=\"onebox-body\">\n    <h4><a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L154-L159\" target=\"_blank\" rel=\"noopener nofollow ugc\">src/transformers/pipelines/__init__.py</a></h4>\n\n<div class=\"git-blob-info\">\n  <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L154-L159\" rel=\"noopener nofollow ugc\"><code>7d88f57fc</code></a>\n</div>\n\n\n\n    <pre class=\"onebox\"><code class=\"lang-py\">\n      <ol class=\"start lines\" start=\"154\" style=\"counter-reset: li-counter 153 ;\">\n          <li>TASK_ALIASES = {</li>\n          <li>    \"sentiment-analysis\": \"text-classification\",</li>\n          <li>    \"ner\": \"token-classification\",</li>\n          <li>    \"vqa\": \"visual-question-answering\",</li>\n          <li>    \"text-to-speech\": \"text-to-audio\",</li>\n          <li>}</li>\n      </ol>\n    </code></pre>\n\n\n\n  </article>\n\n  <div class=\"onebox-metadata\">\n    \n    \n  </div>\n\n  <div style=\"clear: both\"></div>\n</aside>\n\n<p>This pipeline is pre-configured, the settings can be found below in the same file defined here:</p>\n<aside class=\"onebox githubblob\" data-onebox-src=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L193-L205\">\n  <header class=\"source\">\n\n      <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L193-L205\" target=\"_blank\" rel=\"noopener nofollow ugc\">github.com/huggingface/transformers</a>\n  </header>\n\n  <article class=\"onebox-body\">\n    <h4><a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L193-L205\" target=\"_blank\" rel=\"noopener nofollow ugc\">src/transformers/pipelines/__init__.py</a></h4>\n\n<div class=\"git-blob-info\">\n  <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L193-L205\" rel=\"noopener nofollow ugc\"><code>7d88f57fc</code></a>\n</div>\n\n\n\n    <pre class=\"onebox\"><code class=\"lang-py\">\n      <ol class=\"start lines\" start=\"193\" style=\"counter-reset: li-counter 192 ;\">\n          <li>},</li>\n          <li>\"text-classification\": {</li>\n          <li>    \"impl\": TextClassificationPipeline,</li>\n          <li>    \"tf\": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),</li>\n          <li>    \"pt\": (AutoModelForSequenceClassification,) if is_torch_available() else (),</li>\n          <li>    \"default\": {</li>\n          <li>        \"model\": {</li>\n          <li>            \"pt\": (\"distilbert/distilbert-base-uncased-finetuned-sst-2-english\", \"714eb0f\"),</li>\n          <li>            \"tf\": (\"distilbert/distilbert-base-uncased-finetuned-sst-2-english\", \"714eb0f\"),</li>\n          <li>        },</li>\n          <li>    },</li>\n          <li>    \"type\": \"text\",</li>\n          <li>},</li>\n      </ol>\n    </code></pre>\n\n\n\n  </article>\n\n  <div class=\"onebox-metadata\">\n    \n    \n  </div>\n\n  <div style=\"clear: both\"></div>\n</aside>\n","evaluation":{"extracted_final_answer":"<p>Hi Markus,</p>\n<p><code>“sentiment-analysis”</code> is the task specifying what you want a large language model to perform on the text. Sentiment analysis practically changes the model’s head to a classifier, which you can see here:</p>\n<aside class=\"onebox githubblob\" data-onebox-src=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L154-L159\">\n  <header class=\"source\">\n\n      <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L154-L159\" target=\"_blank\" rel=\"noopener nofollow ugc\">github.com/huggingface/transformers</a>\n  </header>\n\n  <article class=\"onebox-body\">\n    <h4><a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L154-L159\" target=\"_blank\" rel=\"noopener nofollow ugc\">src/transformers/pipelines/__init__.py</a></h4>\n\n<div class=\"git-blob-info\">\n  <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L154-L159\" rel=\"noopener nofollow ugc\"><code>7d88f57fc</code></a>\n</div>\n\n\n    <pre class=\"onebox\"><code class=\"lang-py\">\n      <ol class=\"start lines\" start=\"154\" style=\"counter-reset: li-counter 153 ;\">\n          <li>TASK_ALIASES = {</li>\n          <li>    \"sentiment-analysis\": \"text-classification\",</li>\n          <li>    \"ner\": \"token-classification\",</li>\n          <li>    \"vqa\": \"visual-question-answering\",</li>\n          <li>    \"text-to-speech\": \"text-to-audio\",</li>\n          <li>}</li>\n      </ol>\n    </code></pre>\n\n\n  </article>\n\n  <div class=\"onebox-metadata\">\n    \n  </div>\n\n  <div style=\"clear: both\"></div>\n</aside>\n\n<p>This pipeline is pre-configured, the settings can be found below in the same file defined here:</p>\n<aside class=\"onebox githubblob\" data-onebox-src=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L193-L205\">\n  <header class=\"source\">\n\n      <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L193-L205\" target=\"_blank\" rel=\"noopener nofollow ugc\">github.com/huggingface/transformers</a>\n  </header>\n\n  <article class=\"onebox-body\">\n    <h4><a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L193-L205\" target=\"_blank\" rel=\"noopener nofollow ugc\">src/transformers/pipelines/__init__.py</a></h4>\n\n<div class=\"git-blob-info\">\n  <a href=\"https://github.com/huggingface/transformers/blob/7d88f57fc6892b9b3d0092c53e27ae033f1bebc8/src/transformers/pipelines/__init__.py#L193-L205\" rel=\"noopener nofollow ugc\"><code>7d88f57fc</code></a>\n</div>\n\n\n    <pre class=\"onebox\"><code class=\"lang-py\">\n      <ol class=\"start lines\" start=\"193\" style=\"counter-reset: li-counter 192 ;\">\n          <li>},</li>\n          <li>\"text-classification\": {</li>\n          <li>    \"impl\": TextClassificationPipeline,</li>\n          <li>    \"tf\": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),</li>\n          <li>    \"pt\": (AutoModelForSequenceClassification,) if is_torch_available() else (),</li>\n          <li>    \"default\": {</li>\n          <li>        \"model\": {</li>\n          <li>            \"pt\": (\"distilbert/distilbert-base-uncased-finetuned-sst-2-english\", \"714eb0f\"),</li>\n          <li>            \"tf\": (\"distilbert/distilbert-base-uncased-finetuned-sst-2-english\", \"714eb0f\"),</li>\n          <li>        },</li>\n          <li>    },</li>\n          <li>    \"type\": \"text\",</li>\n          <li>},</li>\n      </ol>\n    </code></pre>\n\n\n  </article>\n\n  <div class=\"onebox-metadata\">\n    \n  </div>\n\n  <div style=\"clear: both\"></div>\n</aside>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in content or meaning. Therefore, the response is correct and includes the precise and unambiguous correct answer.","correct":"yes","confidence":100}}
-{"discussion_title":"ImportError: cannot import name &lsquo;ModelFilter&rsquo; from &lsquo;huggingface_hub&rsquo;","discussion_url":"https://discuss.huggingface.co/t/importerror-cannot-import-name-modelfilter-from-huggingface-hub/167632","discussion_topic_id":167632,"discussion_category":5,"discussion_created_at":"2025-08-22T13:18:09.224000Z","thread":[{"id":239912,"name":"Alex","username":"SuperBowser","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/s/9f8e36/{size}.png","created_at":"2025-08-22T13:18:09.284Z","cooked":"<p>I am running this line in Kaggle notebook:</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">from huggingface_hub import ModelFilter\n</code></pre>\n<p>and getting back error:</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">---------------------------------------------------------------------------\nImportError                               Traceback (most recent call last)\n/tmp/ipykernel_36/1451250264.py in &lt;cell line: 0&gt;()\n----&gt; 1 from huggingface_hub import ModelFilter\n\nImportError: cannot import name 'ModelFilter' from 'huggingface_hub' (/usr/local/lib/python3.11/dist-packages/huggingface_hub/__init__.py)\n</code></pre>\n<p>My huggingface_hub._<em>version</em>_  is ‘0.33.1’</p>","post_number":1,"post_type":1,"posts_count":5,"updated_at":"2025-08-22T13:18:09.284Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":108,"reads":6,"readers_count":5,"score":481.2,"yours":false,"topic_id":167632,"topic_slug":"importerror-cannot-import-name-modelfilter-from-huggingface-hub","display_username":"Alex","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102016,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/importerror-cannot-import-name-modelfilter-from-huggingface-hub/167632/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":239950,"name":"Daniel Kleine","username":"dkleine","avatar_template":"/user_avatar/discuss.huggingface.co/dkleine/{size}/33964_2.png","created_at":"2025-08-22T15:21:25.382Z","cooked":"<p><code>ModelFilter</code> is deprecated, please see here: <a href=\"https://github.com/huggingface/huggingface_hub/issues/2478\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">ImportError: cannot import name 'ModelFilter' from 'huggingface_hub' · Issue #2478 · huggingface/huggingface_hub · GitHub</a></p>","post_number":2,"post_type":1,"posts_count":5,"updated_at":"2025-08-22T15:21:25.382Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":15,"reads":6,"readers_count":5,"score":96.2,"yours":false,"topic_id":167632,"topic_slug":"importerror-cannot-import-name-modelfilter-from-huggingface-hub","display_username":"Daniel Kleine","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/huggingface/huggingface_hub/issues/2478","internal":false,"reflection":false,"title":"ImportError: cannot import name 'ModelFilter' from 'huggingface_hub' · Issue #2478 · huggingface/huggingface_hub · GitHub","clicks":16}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":2}],"moderator":false,"admin":false,"staff":false,"user_id":69473,"hidden":false,"trust_level":2,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/importerror-cannot-import-name-modelfilter-from-huggingface-hub/167632/2","reactions":[{"id":"+1","type":"emoji","count":2}],"current_user_reaction":null,"reaction_users_count":2,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":239957,"name":"Alex","username":"SuperBowser","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/s/9f8e36/{size}.png","created_at":"2025-08-22T17:28:31.353Z","cooked":"<p>Thank you so much for your answer. Do you what values I can use in <code>filter</code> field. I am looking for complete list. So far I know only a few values such <code>text-classification</code></p>\n<p>Minor update. Here is my search:</p>\n<p><code>from huggingface_hub import HfApi</code><br>\n<code>api = HfApi()</code><br>\n<code>models = api.list_models(task=“text-classification”,</code><br>\n<code>sort=‘downloads’, gated = False, limit = 100)</code><br>\n<code>models = list(models)</code><br>\n<code>print(len(models))</code><br>\n<code>print(models[1].modelId)</code></p>\n<p>It returns <code>cross-encoder/ms-marco-MiniLM-L6-v2</code>, which is “Text Ranking” and it is different from what I asked “Text Classification” as per <a href=\"https://huggingface.co/tasks\">tasks page</a>.<br>\nI got the same result when using “filter” field.</p>","post_number":3,"post_type":1,"posts_count":5,"updated_at":"2025-08-22T17:37:59.882Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":2,"reads":6,"readers_count":5,"score":26.2,"yours":false,"topic_id":167632,"topic_slug":"importerror-cannot-import-name-modelfilter-from-huggingface-hub","display_username":"Alex","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":2,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/tasks","internal":false,"reflection":false,"title":"Tasks - Hugging Face","clicks":0}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":102016,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/importerror-cannot-import-name-modelfilter-from-huggingface-hub/167632/3","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":239964,"name":"Daniel Kleine","username":"dkleine","avatar_template":"/user_avatar/discuss.huggingface.co/dkleine/{size}/33964_2.png","created_at":"2025-08-22T19:07:25.281Z","cooked":"<blockquote>\n<p>It returns <code>cross-encoder/ms-marco-MiniLM-L6-v2</code>, which is “Text Ranking” and it is different from what I asked “Text Classification” as per <a href=\"https://huggingface.co/tasks\">tasks page</a>.<br>\nI got the same result when using “filter” field.</p>\n</blockquote>\n<p>This is probably because this model is tagged as both as “Text Ranking” as well as “Text Classification”, see tags above:</p>\n<aside class=\"onebox allowlistedgeneric\" data-onebox-src=\"https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2\">\n  <header class=\"source\">\n\n      <a href=\"https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2\" target=\"_blank\" rel=\"noopener\">huggingface.co</a>\n  </header>\n\n  <article class=\"onebox-body\">\n    <div class=\"aspect-image\" style=\"--aspect-ratio:690/372;\"><img src=\"https://us1.discourse-cdn.com/hellohellohello/optimized/3X/4/c/4c391c1ddfbb83ee2eb373f3b021983beeaf845d_2_690x372.png\" class=\"thumbnail\" alt=\"\" data-dominant-color=\"5B70A4\" width=\"690\" height=\"372\"></div>\n\n<h3><a href=\"https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2\" target=\"_blank\" rel=\"noopener\">cross-encoder/ms-marco-MiniLM-L6-v2 · Hugging Face</a></h3>\n\n  <p>We’re on a journey to advance and democratize artificial intelligence through open source and open science.</p>\n\n\n  </article>\n\n  <div class=\"onebox-metadata\">\n    \n    \n  </div>\n\n  <div style=\"clear: both\"></div>\n</aside>\n","post_number":4,"post_type":1,"posts_count":5,"updated_at":"2025-08-22T19:08:35.289Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":5,"reads":4,"readers_count":3,"score":55.8,"yours":false,"topic_id":167632,"topic_slug":"importerror-cannot-import-name-modelfilter-from-huggingface-hub","display_username":"Daniel Kleine","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2","internal":false,"reflection":false,"title":"cross-encoder/ms-marco-MiniLM-L6-v2 · Hugging Face","clicks":1},{"url":"https://huggingface.co/tasks","internal":false,"reflection":false,"title":"Tasks - Hugging Face","clicks":0}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":2}],"moderator":false,"admin":false,"staff":false,"user_id":69473,"hidden":false,"trust_level":2,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/importerror-cannot-import-name-modelfilter-from-huggingface-hub/167632/4","reactions":[{"id":"+1","type":"emoji","count":2}],"current_user_reaction":null,"reaction_users_count":2,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":239997,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-08-23T07:07:27.219Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":5,"post_type":3,"posts_count":5,"updated_at":"2025-08-23T07:07:27.219Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":0.4,"yours":false,"topic_id":167632,"topic_slug":"importerror-cannot-import-name-modelfilter-from-huggingface-hub","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/importerror-cannot-import-name-modelfilter-from-huggingface-hub/167632/5","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>I am running this line in Kaggle notebook:</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">from huggingface_hub import ModelFilter\n</code></pre>\n<p>and getting back error:</p>\n<pre data-code-wrap=\"python\"><code class=\"lang-python\">---------------------------------------------------------------------------\nImportError                               Traceback (most recent call last)\n/tmp/ipykernel_36/1451250264.py in &lt;cell line: 0&gt;()\n----&gt; 1 from huggingface_hub import ModelFilter\n\nImportError: cannot import name 'ModelFilter' from 'huggingface_hub' (/usr/local/lib/python3.11/dist-packages/huggingface_hub/__init__.py)\n</code></pre>\n<p>My huggingface_hub._<em>version</em>_  is ‘0.33.1’</p>","solution":"<p><code>ModelFilter</code> is deprecated, please see here: <a href=\"https://github.com/huggingface/huggingface_hub/issues/2478\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">ImportError: cannot import name 'ModelFilter' from 'huggingface_hub' · Issue #2478 · huggingface/huggingface_hub · GitHub</a></p>","evaluation":{"extracted_final_answer":"<code>ModelFilter</code> is deprecated, please see here: <a href=\"https://github.com/huggingface/huggingface_hub/issues/2478\" class=\"inline-onebox\" rel=\"noopener nofollow ugc\">ImportError: cannot import name 'ModelFilter' from 'huggingface_hub' · Issue #2478 · huggingface/huggingface_hub · GitHub</a>","reasoning":"The extracted_final_answer matches the correct_answer exactly, with no differences in content or meaning. Both contain the same statement about the deprecation of ModelFilter and provide the same link to the GitHub issue.","correct":"yes","confidence":100}}
-{"discussion_title":"Tool/Function calling abilities of LLM&rsquo;s that are used locally pulled through ollama","discussion_url":"https://discuss.huggingface.co/t/tool-function-calling-abilities-of-llms-that-are-used-locally-pulled-through-ollama/165277","discussion_topic_id":165277,"discussion_category":13,"discussion_created_at":"2025-08-01T11:20:02.837000Z","thread":[{"id":235956,"name":"Aravindha Sivabalan J","username":"cranky-coder08","avatar_template":"/user_avatar/discuss.huggingface.co/cranky-coder08/{size}/51972_2.png","created_at":"2025-08-01T11:20:02.900Z","cooked":"<p>i was trying to build a small AI agent that would query the DB and get the details of the customers, for which i tried many models that are available in the ollama model library, but every  model keeps throwing an “invalid tool”, or keeps using the irrelevant tool or keeps hallucinating and giving back made up answers!!! is this an issue that is common  when pulling and running LLM’s locally using OLLAMA, when i use the paid Gemini API from google cloud, it works so well (uses the correct tool’s, and returns the exact correct answer), i need help in understanding what is happening when i use a locally run LLM, and is there anyway to make the Local LLM work like the Gemini API??</p>\n<p>Thanks in advance</p>","post_number":1,"post_type":1,"posts_count":3,"updated_at":"2025-08-01T11:20:02.900Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":109,"reads":5,"readers_count":4,"score":536.0,"yours":false,"topic_id":165277,"topic_slug":"tool-function-calling-abilities-of-llms-that-are-used-locally-pulled-through-ollama","display_username":"Aravindha Sivabalan J","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":100794,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/tool-function-calling-abilities-of-llms-that-are-used-locally-pulled-through-ollama/165277/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":235983,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-08-01T14:01:03.637Z","cooked":"<p>If you are using Ollama directly without <a href=\"https://huggingface.co/posts/prithivMLmods/142876386338407\">any Agent framework</a>, <a href=\"https://ollama.com/blog/tool-support\">the models that support tool calling are limited</a>, and there seems to be <a href=\"https://github.com/ollama/ollama/issues/11538\">an issue that is not  a bug</a>.</p>\n<p>As a workaround, you <a href=\"https://discuss.huggingface.co/t/how-to-run-agents-from-smolagents-locally/152874/3\">could use Ollama through external Agent frameworks</a>.</p>","post_number":2,"post_type":1,"posts_count":3,"updated_at":"2025-08-01T14:01:03.637Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":6,"reads":5,"readers_count":4,"score":46.0,"yours":false,"topic_id":165277,"topic_slug":"tool-function-calling-abilities-of-llms-that-are-used-locally-pulled-through-ollama","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://discuss.huggingface.co/t/how-to-run-agents-from-smolagents-locally/152874/3","internal":true,"reflection":false,"title":"How to run agents from `smolagents` locally?","clicks":12},{"url":"https://ollama.com/blog/tool-support","internal":false,"reflection":false,"title":"Tool support · Ollama Blog","clicks":9},{"url":"https://huggingface.co/posts/prithivMLmods/142876386338407","internal":false,"reflection":false,"title":"@prithivMLmods on Hugging Face: \"OpenAI, Google, Hugging Face, and Anthropic have released guides and courses…\"","clicks":7},{"url":"https://github.com/ollama/ollama/issues/11538","internal":false,"reflection":false,"title":"Qwen3:14b not using <tool_call> and calling functions with plaintext · Issue #11538 · ollama/ollama · GitHub","clicks":5}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/tool-function-calling-abilities-of-llms-that-are-used-locally-pulled-through-ollama/165277/2","reactions":[{"id":"heart","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":239244,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-08-19T09:27:01.360Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":3,"post_type":3,"posts_count":3,"updated_at":"2025-08-19T09:27:01.360Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":0,"reads":2,"readers_count":1,"score":0.4,"yours":false,"topic_id":165277,"topic_slug":"tool-function-calling-abilities-of-llms-that-are-used-locally-pulled-through-ollama","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/tool-function-calling-abilities-of-llms-that-are-used-locally-pulled-through-ollama/165277/3","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>i was trying to build a small AI agent that would query the DB and get the details of the customers, for which i tried many models that are available in the ollama model library, but every  model keeps throwing an “invalid tool”, or keeps using the irrelevant tool or keeps hallucinating and giving back made up answers!!! is this an issue that is common  when pulling and running LLM’s locally using OLLAMA, when i use the paid Gemini API from google cloud, it works so well (uses the correct tool’s, and returns the exact correct answer), i need help in understanding what is happening when i use a locally run LLM, and is there anyway to make the Local LLM work like the Gemini API??</p>\n<p>Thanks in advance</p>","solution":"<p>If you are using Ollama directly without <a href=\"https://huggingface.co/posts/prithivMLmods/142876386338407\">any Agent framework</a>, <a href=\"https://ollama.com/blog/tool-support\">the models that support tool calling are limited</a>, and there seems to be <a href=\"https://github.com/ollama/ollama/issues/11538\">an issue that is not  a bug</a>.</p>\n<p>As a workaround, you <a href=\"https://discuss.huggingface.co/t/how-to-run-agents-from-smolagents-locally/152874/3\">could use Ollama through external Agent frameworks</a>.</p>","evaluation":{"extracted_final_answer":"<p>If you are using Ollama directly without <a href=\"https://huggingface.co/posts/prithivMLmods/142876386338407\">any Agent framework</a>, <a href=\"https://ollama.com/blog/tool-support\">the models that support tool calling are limited</a>, and there seems to be <a href=\"https://github.com/ollama/ollama/issues/11538\">an issue that is not  a bug</a>.</p>\n<p>As a workaround, you <a href=\"https://discuss.huggingface.co/t/how-to-run-agents-from-smolagents-locally/152874/3\">could use Ollama through external Agent frameworks</a>.</p>","reasoning":"The extracted final answer matches the correct answer exactly, with no differences in content or meaning. Both answers provide the same information regarding the use of Ollama and the limitations of models that support tool calling, as well as the workaround of using external Agent frameworks.","correct":"yes","confidence":100}}
-{"discussion_title":"ModuleNotFoundError: No module named &lsquo;transformers&rsquo;","discussion_url":"https://discuss.huggingface.co/t/modulenotfounderror-no-module-named-transformers/11609","discussion_topic_id":11609,"discussion_category":9,"discussion_created_at":"2021-11-11T21:05:23.353000Z","thread":[{"id":24972,"name":"ardo tee","username":"mashedpotatotime","avatar_template":"/user_avatar/discuss.huggingface.co/mashedpotatotime/{size}/3103_2.png","created_at":"2021-11-11T21:05:23.422Z","cooked":"<p>Hi! I’ve been having trouble getting <code>transformers</code> to work in Spaces.</p>\n<p>When tested in my environment using <code>python -c \"from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))\"</code>, the results show it’s been properly installed. When imported in Colab it works fine too, but whenever deployed to Spaces it always returns the same ModuleNotFound error. Full traceback message:</p>\n<p>Traceback:</p>\n<pre><code class=\"lang-auto\">File \"/home/user/.local/lib/python3.8/site-packages/streamlit/script_runner.py\", line 354, in _run_script\n    exec(code, module.__dict__)File \"/home/user/app/app.py\", line 1, in &lt;module&gt;\n    from transformers import pipeline\n</code></pre>\n<p>It’s a simple test app using <code>transformers</code> and <code>streamlit</code>, - both of which were reinstalled with pip after creating a new venv and reinstalling tensorflow and pytorch. I also tried cleaning, uninstalling, and reinstalling conda based on advice from another forum. No dice.</p>\n<p>Currently using:</p>\n<p>Python 3.9.4<br>\nTensorflow 2.7.0<br>\nPyTorch 1.10.0<br>\nTransformers 4.12.3<br>\nStreamlit 1.2.0</p>\n<p>Any help greatly appreciated! Thanks <img src=\"https://emoji.discourse-cdn.com/apple/hugs.png?v=10\" title=\":hugs:\" class=\"emoji\" alt=\":hugs:\"></p>","post_number":1,"post_type":1,"posts_count":4,"updated_at":"2021-11-11T21:08:03.051Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":24187,"reads":263,"readers_count":262,"score":120517.6,"yours":false,"topic_id":11609,"topic_slug":"modulenotfounderror-no-module-named-transformers","display_username":"ardo tee","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":4950,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/modulenotfounderror-no-module-named-transformers/11609/1","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":24988,"name":"Nikhil","username":"NDugar","avatar_template":"/user_avatar/discuss.huggingface.co/ndugar/{size}/40501_2.png","created_at":"2021-11-12T06:41:54.938Z","cooked":"<p>it might be due to not having a requirements file. Here is an example of what your spaces app should have -  <a href=\"https://huggingface.co/spaces/flax-community/image-captioning/tree/main\" class=\"inline-onebox\">flax-community/image-captioning at main</a> try adding the requirements as they till the environment what packages to load. Hope this helps.</p>","post_number":2,"post_type":1,"posts_count":4,"updated_at":"2021-11-12T06:41:54.938Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":198,"reads":221,"readers_count":220,"score":1114.2,"yours":false,"topic_id":11609,"topic_slug":"modulenotfounderror-no-module-named-transformers","display_username":"Nikhil","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://huggingface.co/spaces/flax-community/image-captioning/tree/main","internal":false,"reflection":false,"title":"flax-community/image-captioning at main","clicks":2788}],"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":5}],"moderator":false,"admin":false,"staff":false,"user_id":4732,"hidden":false,"trust_level":2,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/modulenotfounderror-no-module-named-transformers/11609/2","reactions":[{"id":"heart","type":"emoji","count":5}],"current_user_reaction":null,"reaction_users_count":5,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":26022,"name":"ardo tee","username":"mashedpotatotime","avatar_template":"/user_avatar/discuss.huggingface.co/mashedpotatotime/{size}/3103_2.png","created_at":"2021-11-19T23:23:39.383Z","cooked":"<p>That worked perfectly. Thank you!</p>","post_number":3,"post_type":1,"posts_count":4,"updated_at":"2021-11-19T23:23:39.383Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":137,"reads":206,"readers_count":205,"score":741.2,"yours":false,"topic_id":11609,"topic_slug":"modulenotfounderror-no-module-named-transformers","display_username":"ardo tee","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":4732,"username":"NDugar","name":"Nikhil","avatar_template":"/user_avatar/discuss.huggingface.co/ndugar/{size}/40501_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":4950,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/modulenotfounderror-no-module-named-transformers/11609/3","reactions":[{"id":"heart","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":238096,"name":"Yue Zhao","username":"Alwaysboy","avatar_template":"/user_avatar/discuss.huggingface.co/alwaysboy/{size}/52486_2.png","created_at":"2025-08-12T13:40:25.363Z","cooked":"<p>Same issue and solved by this method, thanks!</p>","post_number":4,"post_type":1,"posts_count":4,"updated_at":"2025-08-12T13:40:25.363Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":12,"reads":7,"readers_count":6,"score":71.4,"yours":false,"topic_id":11609,"topic_slug":"modulenotfounderror-no-module-named-transformers","display_username":"Yue Zhao","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":101586,"hidden":false,"trust_level":0,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/modulenotfounderror-no-module-named-transformers/11609/4","reactions":[{"id":"confetti_ball","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>Hi! I’ve been having trouble getting <code>transformers</code> to work in Spaces.</p>\n<p>When tested in my environment using <code>python -c \"from transformers import pipeline; print(pipeline('sentiment-analysis')('we love you'))\"</code>, the results show it’s been properly installed. When imported in Colab it works fine too, but whenever deployed to Spaces it always returns the same ModuleNotFound error. Full traceback message:</p>\n<p>Traceback:</p>\n<pre><code class=\"lang-auto\">File \"/home/user/.local/lib/python3.8/site-packages/streamlit/script_runner.py\", line 354, in _run_script\n    exec(code, module.__dict__)File \"/home/user/app/app.py\", line 1, in &lt;module&gt;\n    from transformers import pipeline\n</code></pre>\n<p>It’s a simple test app using <code>transformers</code> and <code>streamlit</code>, - both of which were reinstalled with pip after creating a new venv and reinstalling tensorflow and pytorch. I also tried cleaning, uninstalling, and reinstalling conda based on advice from another forum. No dice.</p>\n<p>Currently using:</p>\n<p>Python 3.9.4<br>\nTensorflow 2.7.0<br>\nPyTorch 1.10.0<br>\nTransformers 4.12.3<br>\nStreamlit 1.2.0</p>\n<p>Any help greatly appreciated! Thanks <img src=\"https://emoji.discourse-cdn.com/apple/hugs.png?v=10\" title=\":hugs:\" class=\"emoji\" alt=\":hugs:\"></p>","solution":"<p>it might be due to not having a requirements file. Here is an example of what your spaces app should have -  <a href=\"https://huggingface.co/spaces/flax-community/image-captioning/tree/main\" class=\"inline-onebox\">flax-community/image-captioning at main</a> try adding the requirements as they till the environment what packages to load. Hope this helps.</p>","evaluation":{"extracted_final_answer":"it might be due to not having a requirements file. Here is an example of what your spaces app should have -  <a href=\"https://huggingface.co/spaces/flax-community/image-captioning/tree/main\" class=\"inline-onebox\">flax-community/image-captioning at main</a> try adding the requirements as they till the environment what packages to load. Hope this helps.","reasoning":"The extracted_final_answer matches the correct_answer exactly, with no differences in wording or meaning. Therefore, it is included in the response without any ambiguity or inconsistency.","correct":"yes","confidence":100}}
-{"discussion_title":"The Gradio API is not working","discussion_url":"https://discuss.huggingface.co/t/the-gradio-api-is-not-working/166407","discussion_topic_id":166407,"discussion_category":5,"discussion_created_at":"2025-08-11T13:02:56.970000Z","thread":[{"id":237842,"name":"Dany Gold","username":"GoldDany","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/g/bbce88/{size}.png","created_at":"2025-08-11T13:02:57.043Z","cooked":"<p>the gradio throws error: Traceback (most recent call last):<br>\nFile “C:\\Users\\danya\\PycharmProjects\\DiDefBackend\\DiDef\\SentenceTransformer.py”, line 45, in<br>\nclient = Client(<br>\nFile “C:\\Users\\danya\\AppData\\Local\\Programs\\Python\\Python39\\lib\\site-packages\\gradio_client\\client.py”, line 171, in <strong>init</strong><br>\nself._info = self._get_api_info()<br>\nFile “C:\\Users\\danya\\AppData\\Local\\Programs\\Python\\Python39\\lib\\site-packages\\gradio_client\\client.py”, line 564, in <em>get_api_info<br>\ninfo = r.json()<br>\nFile “C:\\Users\\danya\\AppData\\Local\\Programs\\Python\\Python39\\lib\\site-packages\\httpx_models.py”, line 764, in json<br>\nreturn jsonlib.loads(self.content, **kwargs)<br>\nFile \"C:\\Users\\danya\\AppData\\Local\\Programs\\Python\\Python39\\lib\\json_init</em>.py\", line 346, in loads<br>\nreturn _default_decoder.decode(s)<br>\nFile “C:\\Users\\danya\\AppData\\Local\\Programs\\Python\\Python39\\lib\\json\\decoder.py”, line 337, in decode<br>\nobj, end = self.raw_decode(s, idx=_w(s, 0).end())<br>\nFile “C:\\Users\\danya\\AppData\\Local\\Programs\\Python\\Python39\\lib\\json\\decoder.py”, line 355, in raw_decode<br>\nraise JSONDecodeError(“Expecting value”, s, err.value) from None<br>\njson.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)</p>\n<p>why? My code is very simple:</p>\n<p>from gradio_client import Client</p>\n<p>client = Client(<br>\nsrc = “GoldDany/DiDefBackend”, <span class=\"hashtag-raw\">#my</span> Space is public<br>\n)<br>\nresult = client.predict(<br>\ntext=“Hello!!”,<br>\napi_name=“/predict”,<br>\n)<br>\nprint(result)</p>","post_number":1,"post_type":1,"posts_count":4,"updated_at":"2025-08-11T13:05:34.640Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":17,"reads":6,"readers_count":5,"score":86.2,"yours":false,"topic_id":166407,"topic_slug":"the-gradio-api-is-not-working","display_username":"Dany Gold","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":101505,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/the-gradio-api-is-not-working/166407/1","reactions":[{"id":"eyes","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true,"can_vote":false},{"id":237845,"name":"John Smith","username":"John6666","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png","created_at":"2025-08-11T13:53:44.313Z","cooked":"<blockquote>\n<p>Python39</p>\n</blockquote>\n<p>I think this is probably the culprit this time.</p>\n<p><a href=\"https://github.com/gradio-app/gradio/issues/9634\">Gradio 5 only works with Python <code>3.10</code> or later</a> on both the server and client, so I think the error is occurring because the versions are different between the client and server.<br>\nI don’t know if this error can be potentially resolved…</p>\n<p>The simplest solution is to use Python <code>3.10</code> or later.<img src=\"https://emoji.discourse-cdn.com/apple/sweat_smile.png?v=14\" title=\":sweat_smile:\" class=\"emoji\" alt=\":sweat_smile:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\"># pip install -U gradio_client (in Python 3.9 environment)\nimport subprocess\nsubprocess.run(\"pip show gradio_client\", shell=True) # Version: 1.3.0 (Release date: 2024.08.08)\nfrom gradio_client import Client\n\nclient = Client(src=\"John6666/apitest1\") # Gradio 4.41.0\nresult = client.predict(text=\"Hello!!\", api_name=\"/predict\")\nprint(result) # [0.010964062064886093, 0.02713009901344776, -0.024556249380111694, 0.01713254489004612, 0.04088324308395386, -0.005583592690527439, 0.015990763902664185,...\n\nclient = Client(src=\"GoldDany/DiDefBackend\") # Gradio 5.42.0\nresult = client.predict(text=\"Hello!!\", api_name=\"/predict\")\nprint(result) # error\n</code></pre>","post_number":2,"post_type":1,"posts_count":4,"updated_at":"2025-08-11T13:54:42.512Z","reply_count":1,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":5,"readers_count":4,"score":11.0,"yours":false,"topic_id":166407,"topic_slug":"the-gradio-api-is-not-working","display_username":"John Smith","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"link_counts":[{"url":"https://github.com/gradio-app/gradio/issues/9634","internal":false,"reflection":false,"title":"Support older versions of python in gradio 5 · Issue #9634 · gradio-app/gradio · GitHub","clicks":1}],"read":true,"user_title":"Regular","title_is_group":false,"bookmarked":false,"actions_summary":[],"moderator":false,"admin":false,"staff":false,"user_id":52272,"hidden":false,"trust_level":3,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/the-gradio-api-is-not-working/166407/2","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":true,"topic_accepted_answer":true},{"id":237851,"name":"Dany Gold","username":"GoldDany","avatar_template":"https://avatars.discourse-cdn.com/v4/letter/g/bbce88/{size}.png","created_at":"2025-08-11T14:24:40.173Z","cooked":"<p>Thanks) But I may have to use an even lower version python, because integrating it <img src=\"https://emoji.discourse-cdn.com/apple/skull_and_crossbones.png?v=14\" title=\":skull_and_crossbones:\" class=\"emoji\" alt=\":skull_and_crossbones:\" loading=\"lazy\" width=\"20\" height=\"20\"> . But downgrading the version of Gradio works))</p>","post_number":3,"post_type":1,"posts_count":4,"updated_at":"2025-08-11T14:24:40.173Z","reply_count":0,"reply_to_post_number":2,"quote_count":0,"incoming_link_count":0,"reads":4,"readers_count":3,"score":15.8,"yours":false,"topic_id":166407,"topic_slug":"the-gradio-api-is-not-working","display_username":"Dany Gold","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"reply_to_user":{"id":52272,"username":"John6666","name":"John Smith","avatar_template":"/user_avatar/discuss.huggingface.co/john6666/{size}/27664_2.png"},"bookmarked":false,"actions_summary":[{"id":2,"count":1}],"moderator":false,"admin":false,"staff":false,"user_id":101505,"hidden":false,"trust_level":1,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"post_url":"/t/the-gradio-api-is-not-working/166407/3","reactions":[{"id":"+1","type":"emoji","count":1}],"current_user_reaction":null,"reaction_users_count":1,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true},{"id":237939,"name":"system","username":"system","avatar_template":"https://us1.discourse-cdn.com/hellohellohello/original/2X/d/de4155eb4aa4108ecb32a1389d7cc37ae69f88b7.png","created_at":"2025-08-12T02:25:10.323Z","cooked":"<p>This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.</p>","post_number":4,"post_type":3,"posts_count":4,"updated_at":"2025-08-12T02:25:10.323Z","reply_count":0,"reply_to_post_number":null,"quote_count":0,"incoming_link_count":1,"reads":3,"readers_count":2,"score":5.6,"yours":false,"topic_id":166407,"topic_slug":"the-gradio-api-is-not-working","display_username":"system","primary_group_name":null,"flair_name":null,"flair_url":null,"flair_bg_color":null,"flair_color":null,"flair_group_id":null,"badges_granted":[],"version":1,"can_edit":false,"can_delete":false,"can_recover":false,"can_see_hidden_post":false,"can_wiki":false,"read":true,"user_title":null,"bookmarked":false,"actions_summary":[],"moderator":true,"admin":true,"staff":true,"user_id":-1,"hidden":false,"trust_level":4,"deleted_at":null,"user_deleted":false,"edit_reason":null,"can_view_edit_history":true,"wiki":false,"action_code":"autoclosed.enabled","post_url":"/t/the-gradio-api-is-not-working/166407/4","reactions":[],"current_user_reaction":null,"reaction_users_count":0,"current_user_used_main_reaction":false,"can_accept_answer":false,"can_unaccept_answer":false,"accepted_answer":false,"topic_accepted_answer":true}],"question":"<p>the gradio throws error: Traceback (most recent call last):<br>\nFile “C:\\Users\\danya\\PycharmProjects\\DiDefBackend\\DiDef\\SentenceTransformer.py”, line 45, in<br>\nclient = Client(<br>\nFile “C:\\Users\\danya\\AppData\\Local\\Programs\\Python\\Python39\\lib\\site-packages\\gradio_client\\client.py”, line 171, in <strong>init</strong><br>\nself._info = self._get_api_info()<br>\nFile “C:\\Users\\danya\\AppData\\Local\\Programs\\Python\\Python39\\lib\\site-packages\\gradio_client\\client.py”, line 564, in <em>get_api_info<br>\ninfo = r.json()<br>\nFile “C:\\Users\\danya\\AppData\\Local\\Programs\\Python\\Python39\\lib\\site-packages\\httpx_models.py”, line 764, in json<br>\nreturn jsonlib.loads(self.content, **kwargs)<br>\nFile \"C:\\Users\\danya\\AppData\\Local\\Programs\\Python\\Python39\\lib\\json_init</em>.py\", line 346, in loads<br>\nreturn _default_decoder.decode(s)<br>\nFile “C:\\Users\\danya\\AppData\\Local\\Programs\\Python\\Python39\\lib\\json\\decoder.py”, line 337, in decode<br>\nobj, end = self.raw_decode(s, idx=_w(s, 0).end())<br>\nFile “C:\\Users\\danya\\AppData\\Local\\Programs\\Python\\Python39\\lib\\json\\decoder.py”, line 355, in raw_decode<br>\nraise JSONDecodeError(“Expecting value”, s, err.value) from None<br>\njson.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)</p>\n<p>why? My code is very simple:</p>\n<p>from gradio_client import Client</p>\n<p>client = Client(<br>\nsrc = “GoldDany/DiDefBackend”, <span class=\"hashtag-raw\">#my</span> Space is public<br>\n)<br>\nresult = client.predict(<br>\ntext=“Hello!!”,<br>\napi_name=“/predict”,<br>\n)<br>\nprint(result)</p>","solution":"<blockquote>\n<p>Python39</p>\n</blockquote>\n<p>I think this is probably the culprit this time.</p>\n<p><a href=\"https://github.com/gradio-app/gradio/issues/9634\">Gradio 5 only works with Python <code>3.10</code> or later</a> on both the server and client, so I think the error is occurring because the versions are different between the client and server.<br>\nI don’t know if this error can be potentially resolved…</p>\n<p>The simplest solution is to use Python <code>3.10</code> or later.<img src=\"https://emoji.discourse-cdn.com/apple/sweat_smile.png?v=14\" title=\":sweat_smile:\" class=\"emoji\" alt=\":sweat_smile:\" loading=\"lazy\" width=\"20\" height=\"20\"></p>\n<pre data-code-wrap=\"py\"><code class=\"lang-py\"># pip install -U gradio_client (in Python 3.9 environment)\nimport subprocess\nsubprocess.run(\"pip show gradio_client\", shell=True) # Version: 1.3.0 (Release date: 2024.08.08)\nfrom gradio_client import Client\n\nclient = Client(src=\"John6666/apitest1\") # Gradio 4.41.0\nresult = client.predict(text=\"Hello!!\", api_name=\"/predict\")\nprint(result) # [0.010964062064886093, 0.02713009901344776, -0.024556249380111694, 0.01713254489004612, 0.04088324308395386, -0.005583592690527439, 0.015990763902664185,...\n\nclient = Client(src=\"GoldDany/DiDefBackend\") # Gradio 5.42.0\nresult = client.predict(text=\"Hello!!\", api_name=\"/predict\")\nprint(result) # error\n</code></pre>","evaluation":{"extracted_final_answer":"None","reasoning":"The extracted final answer is 'None', indicating that there is no exact, final answer extracted from the response. The response does not contain the precise and unambiguous text of the correct answer provided, which includes specific details about the Python version and the reasoning behind the error. Therefore, the response does not match the correct answer.","correct":"no","confidence":100}}