[ { "additions": 1, "author": "sywangyi", "author_association": "CONTRIBUTOR", "body_excerpt": "InternVLQwen2IntegrationTest::test_qwen2_medium_model_integration_video - Intel XPU: @IlyasMoutawwakil", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45526", "created_at": "2026-04-20T08:00:28Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45526/files", "html_url": "https://github.com/huggingface/transformers/pull/45526", "labels": [], "merged": true, "number": 45526, "review_comments_count": 0, "state": "closed", "title": "xpu output align with cuda in test case", "updated_at": "2026-04-20T09:05:28Z" }, { "additions": 1, "author": "jiqing-feng", "author_association": "CONTRIBUTOR", "body_excerpt": "## What does this PR do? `CsmProcessor` defaults `add_special_tokens=False` (designed for `apply_chat_template`, which includes `` in the Jinja template). When the pipeline calls `preprocessor(text)` directly for raw text input, `` token", "updated_at": "2026-04-20T06:23:34Z" }, { "additions": 4, "author": "SAY-5", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "Fixes #45520. `is_flash_attn_2_available`, `is_flash_attn_3_available`, `is_flash_attn_4_available`, and `is_flash_attn_greater_or_equal` all do two checks: ```python is_available, _ = _is_package_available(\"flash_attn\", return_version=Tru\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/45524", "created_at": "2026-04-20T05:44:53Z", "deletions": 4, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45524/files", "html_url": "https://github.com/huggingface/transformers/pull/45524", "labels": [], "merged": false, "number": 45524, "review_comments_count": 0, "state": "open", "title": "utils: handle flash_attn missing from importlib packages_distributions without crashing", "updated_at": "2026-04-20T05:44:53Z" }, { "additions": 59, "author": "duyhv-qualgo", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "## Problem Two related bugs in `src/transformers/integrations/executorch.py` that break seq2seq (T5) ExecuTorch export: ### Bug 1 \u2014 `encoder_attention_mask` not forwarded to decoder `Seq2SeqLMDecoderExportableModuleWithStaticCache.forward`\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/45523", "created_at": "2026-04-20T05:20:19Z", "deletions": 17, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45523/files", "html_url": "https://github.com/huggingface/transformers/pull/45523", "labels": [], "merged": false, "number": 45523, "review_comments_count": 0, "state": "open", "title": "Fix Seq2SeqLM ExecuTorch export: add encoder_attention_mask to decoder and use static encoder shapes", "updated_at": "2026-04-20T05:27:07Z" }, { "additions": 62, "author": "KeitaW", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "# What does this PR do? Exposes PyTorch DDP's `static_graph` flag via a new `ddp_static_graph: Optional[bool]` field on `TrainingArguments`, forwarded through `Trainer._build_accelerator_args` into Accelerate's [`DistributedDataParallelKwa\u2026", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/45519", "created_at": "2026-04-20T02:08:37Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45519/files", "html_url": "https://github.com/huggingface/transformers/pull/45519", "labels": [], "merged": false, "number": 45519, "review_comments_count": 0, "state": "open", "title": "[Trainer] Add ddp_static_graph option", "updated_at": "2026-04-20T02:08:37Z" }, { "additions": 3, "author": "Tokarak", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "# What does this PR do? Rename arg from `input_ids` to `labels` (using IDE rename), to be consistent with other models' `prepare_decoder_input_ids_from_labels`. Fix broken interaction when DataCollatorForSeq2Seq passes argument as a hard-c\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/45516", "created_at": "2026-04-19T17:22:36Z", "deletions": 3, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45516/files", "html_url": "https://github.com/huggingface/transformers/pull/45516", "labels": [], "merged": false, "number": 45516, "review_comments_count": 0, "state": "open", "title": "T5Gemma2: fix `prepare_decoder_input_ids_from_labels`", "updated_at": "2026-04-19T21:04:21Z" }, { "additions": 1, "author": "Jah-yee", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "Good day ## Summary This PR fixes a bug in `src/transformers/testing_utils.py` where `get_device_properties()` calls `torch.cuda.get_device_capability()` without first checking if a GPU is actually available. ## Bug Description When CUDA i\u2026", "changed_files": 1, "cluster_id": "cluster-45341-4", "cluster_ids": [ "cluster-45341-4" ], "cluster_role": "member", "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45515", "created_at": "2026-04-19T11:59:23Z", "deletions": 3, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45515/files", "html_url": "https://github.com/huggingface/transformers/pull/45515", "labels": [], "merged": false, "number": 45515, "review_comments_count": 0, "state": "open", "title": "Fix CUDA availability check in get_device_properties()", "updated_at": "2026-04-19T12:16:25Z" }, { "additions": 6, "author": "tianhaocui", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "Fixes #45507 ## Summary `GraniteMoeHybridModel._update_mamba_mask` calls `past_key_values.has_previous_state()` without checking whether the model actually has mamba layers. When all layers are attention-only (no mamba layers in `config.la\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45514", "created_at": "2026-04-19T10:27:36Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45514/files", "html_url": "https://github.com/huggingface/transformers/pull/45514", "labels": [], "merged": false, "number": 45514, "review_comments_count": 0, "state": "open", "title": "Fix GraniteMoeHybrid _update_mamba_mask crash on attention-only models", "updated_at": "2026-04-19T10:28:45Z" }, { "additions": 95, "author": "kashif", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? The gated-delta-net forward only used the cached recurrent state when `seq_len == 1`. For any multi-token forward with a populated cache (e.g. chunked prefill continuation or speculative-decoding verification), it f\u2026", "changed_files": 4, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45513", "created_at": "2026-04-19T09:40:59Z", "deletions": 15, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45513/files", "html_url": "https://github.com/huggingface/transformers/pull/45513", "labels": [], "merged": false, "number": 45513, "review_comments_count": 0, "state": "open", "title": "[Qwen3.5] Fix Qwen3.5 linear attention multi-token cached forward", "updated_at": "2026-04-19T15:43:06Z" }, { "additions": 48, "author": "eustlb", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? As discussed in #44408, this adds the possibility to do regex matching for `layer_name` in the `OutputRecorder`. ```python OutputRecorder(GraniteSpeechConformerBlock, layer_name=r\"layers\\.(6|12|18)$\") ``` **Backward\u2026", "changed_files": 16, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/45512", "created_at": "2026-04-19T08:38:10Z", "deletions": 28, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45512/files", "html_url": "https://github.com/huggingface/transformers/pull/45512", "labels": [], "merged": false, "number": 45512, "review_comments_count": 0, "state": "open", "title": "[OutputRecorder] re.search on layer_name", "updated_at": "2026-04-19T10:17:05Z" }, { "additions": 8, "author": "Kabir08", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "This fixes the NaN issue when batching mixed-length sequences with sliding window attention (e.g., with `EmbeddingGemma` / Gemma3 models). The patch ensures that when a query position's entire sliding window falls within the padding region\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/45511", "created_at": "2026-04-19T08:27:15Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45511/files", "html_url": "https://github.com/huggingface/transformers/pull/45511", "labels": [], "merged": false, "number": 45511, "review_comments_count": 0, "state": "open", "title": "Fix NaN in Gemma3/EmbeddingGemma when batching mixed-length sequences\u2026", "updated_at": "2026-04-19T10:18:50Z" }, { "additions": 55, "author": "GitGlimpse895", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "# What does this PR do? `QuantizedLayer` maintains two separate storage regions: a full-precision residual buffer (`self.keys` / `self.values`) and a quantized buffer (`self._quantized_keys` / `self._quantized_values`). However, the four m\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/45510", "created_at": "2026-04-19T07:34:56Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45510/files", "html_url": "https://github.com/huggingface/transformers/pull/45510", "labels": [], "merged": false, "number": 45510, "review_comments_count": 0, "state": "open", "title": "cache_utils: fix QuantizedLayer to correctly propagate reorder_cache, crop, and batch ops to quantized buffers", "updated_at": "2026-04-19T07:36:22Z" }, { "additions": 1, "author": "Jah-yee", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "Good day ## Problem In `src/transformers/testing_utils.py`, the `get_device_properties()` function checks `IS_CUDA_SYSTEM` to determine whether to call `torch.cuda.get_device_capability()`. However, `IS_CUDA_SYSTEM` is set to `True` when `\u2026", "changed_files": 1, "cluster_id": "cluster-45341-4", "cluster_ids": [ "cluster-45341-4" ], "cluster_role": "member", "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45509", "created_at": "2026-04-18T23:41:33Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45509/files", "html_url": "https://github.com/huggingface/transformers/pull/45509", "labels": [], "merged": false, "number": 45509, "review_comments_count": 0, "state": "open", "title": "Fix get_device_properties crash when CUDA is installed but no GPU", "updated_at": "2026-04-18T23:55:36Z" }, { "additions": 3, "author": "avasis-ai", "author_association": "CONTRIBUTOR", "body_excerpt": "## Summary Fixes a systematic docstring typo in all three streamer classes in `src/transformers/generation/streamers.py`: - `TextStreamer` - `TextIteratorStreamer` - `AsyncTextIteratorStreamer` Each had `\"The tokenized used to decode the t\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45508", "created_at": "2026-04-18T21:48:22Z", "deletions": 3, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45508/files", "html_url": "https://github.com/huggingface/transformers/pull/45508", "labels": [], "merged": true, "number": 45508, "review_comments_count": 0, "state": "closed", "title": "[Doc] Fix 'tokenized' -> 'tokenizer' typo in streamer docstrings", "updated_at": "2026-04-20T05:18:43Z" }, { "additions": 194, "author": "sirzechs66", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "# What does this PR do? This PR adds full GGUF loading support for GPT\u2011OSS models (20B/120B). It allows Transformers (and consequently vLLM) to directly load GPT\u2011OSS GGUF files without falling back to a wrong architecture. The changes incl\u2026", "changed_files": 4, "cluster_id": "cluster-43366-4", "cluster_ids": [ "cluster-43366-4" ], "cluster_role": "canonical", "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/45506", "created_at": "2026-04-18T08:43:19Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45506/files", "html_url": "https://github.com/huggingface/transformers/pull/45506", "labels": [], "merged": false, "number": 45506, "review_comments_count": 0, "state": "open", "title": "Add full GGUF loading support for GPT\u2011OSS (fixes #43366, supersedes #43757) latest", "updated_at": "2026-04-20T08:12:23Z" }, { "additions": 1245, "author": "kchpp940", "author_association": "NONE", "body_excerpt": "# What does this PR do? everything should live directly under `mapping` except we add onto it - Solar change is the same as qwen2 moe - Cohere moved - Ernie moe similar to minimax with one additional rename", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/45483", "created_at": "2026-04-16T15:59:26Z", "deletions": 64, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45483/files", "html_url": "https://github.com/huggingface/transformers/pull/45483", "labels": [], "merged": false, "number": 45483, "review_comments_count": 3, "state": "open", "title": "[`Conversion Mapping`] Small fixups", "updated_at": "2026-04-17T12:09:04Z" }, { "additions": 184, "author": "zucchini-nlp", "author_association": "MEMBER", "body_excerpt": ".", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45481", "created_at": "2026-04-16T15:51:21Z", "deletions": 185, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45481/files", "html_url": "https://github.com/huggingface/transformers/pull/45481", "labels": [], "merged": true, "number": 45481, "review_comments_count": 1, "state": "closed", "title": "Add check-auto in repo-consistency and fix sorting", "updated_at": "2026-04-17T11:18:52Z" }, { "additions": 27, "author": "SunMarc", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? This PR fixes quantization tests. A few things were deprecated when compressed-tensors had their latest release, so i'm updating the tests. For fouroversix, it's just that the model was a bit too big for the CI", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 5, "conversation_url": "https://github.com/huggingface/transformers/pull/45480", "created_at": "2026-04-16T15:23:30Z", "deletions": 86, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45480/files", "html_url": "https://github.com/huggingface/transformers/pull/45480", "labels": [], "merged": false, "number": 45480, "review_comments_count": 0, "state": "open", "title": "Update quants tests ", "updated_at": "2026-04-17T13:46:50Z" }, { "additions": 330, "author": "zucchini-nlp", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? As per title, I think this pattern is used quite often and deserves to be a public mask-fn. Used currently in gemma/paligemma family, GIT, PI0 and will be used in two upcoming models (deepseekOcr and Molmo2) This PR\u2026", "changed_files": 9, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 6, "conversation_url": "https://github.com/huggingface/transformers/pull/45477", "created_at": "2026-04-16T14:12:03Z", "deletions": 595, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45477/files", "html_url": "https://github.com/huggingface/transformers/pull/45477", "labels": [], "merged": false, "number": 45477, "review_comments_count": 0, "state": "open", "title": "Blockwise mask fn as opt arg in all masking functions", "updated_at": "2026-04-17T14:33:10Z" }, { "additions": 14, "author": "ydshieh", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Call CI workflow", "changed_files": 4, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/45476", "created_at": "2026-04-16T13:49:50Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45476/files", "html_url": "https://github.com/huggingface/transformers/pull/45476", "labels": [], "merged": false, "number": 45476, "review_comments_count": 0, "state": "open", "title": "[Don't merge] Call CI workflow", "updated_at": "2026-04-16T21:04:16Z" }, { "additions": 54, "author": "tarekziade", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Split out the `mlinter` tool see https://github.com/huggingface/transformers-mlinter We want to be able to: - use it from other CI projects - remove the ability to alter the linter from Transformers PRs This change\u2026", "changed_files": 29, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45475", "created_at": "2026-04-16T12:01:49Z", "deletions": 3405, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45475/files", "html_url": "https://github.com/huggingface/transformers/pull/45475", "labels": [], "merged": true, "number": 45475, "review_comments_count": 0, "state": "closed", "title": "chore(qa): split out mlinter", "updated_at": "2026-04-20T08:37:40Z" }, { "additions": 2, "author": "rtrompier", "author_association": "MEMBER", "body_excerpt": "Bump the pinned doc-builder SHA so that main documentation builds also sync to the HF bucket (dual-write).", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45474", "created_at": "2026-04-16T11:59:17Z", "deletions": 2, "draft": true, "files_url": "https://github.com/huggingface/transformers/pull/45474/files", "html_url": "https://github.com/huggingface/transformers/pull/45474", "labels": [], "merged": false, "number": 45474, "review_comments_count": 0, "state": "closed", "title": "chore: bump doc-builder SHA for main doc build workflow", "updated_at": "2026-04-17T09:17:57Z" }, { "additions": 201, "author": "AmineDiro", "author_association": "MEMBER", "body_excerpt": "While benchmarking Qwen3-30B-A3B SFT training with Expert Parallelism (EP) using TRL, I found three bugs that combine to produce silently wrong results or NaN loss. Every existing test uses `tp_plan=\"auto\"` which bypasses `RouterParallel`\u2026", "changed_files": 4, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45473", "created_at": "2026-04-16T10:59:14Z", "deletions": 41, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45473/files", "html_url": "https://github.com/huggingface/transformers/pull/45473", "labels": [], "merged": false, "number": 45473, "review_comments_count": 2, "state": "open", "title": "Fix EP: RouterParallel shape, tp_plan property, grouped_mm sentinels", "updated_at": "2026-04-20T09:16:21Z" }, { "additions": 2, "author": "kevinmalana", "author_association": "NONE", "body_excerpt": "## What does this PR do? Fixes a crash in `get_device_properties()` in `testing_utils.py` when CUDA is installed on the system but no GPU device is present (e.g., a CPU-only cloud studio with CUDA libraries installed). The function called\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45472", "created_at": "2026-04-16T10:03:07Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45472/files", "html_url": "https://github.com/huggingface/transformers/pull/45472", "labels": [ "Code agent slop" ], "merged": false, "number": 45472, "review_comments_count": 0, "state": "closed", "title": "fix(testing_utils): guard get_device_capability with torch.cuda.is_available()", "updated_at": "2026-04-16T10:57:34Z" }, { "additions": 2655, "author": "nuxlear", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? Add EXAONE 4.5 architecture for the [EXAONE 4.5 model](https://huggingface.co/collections/LGAI-EXAONE/exaone-45) released by LG AI Research. This PR adds the modeling code for EXAONE 4.5, which uses the same LLM arc\u2026", "changed_files": 18, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45471", "created_at": "2026-04-16T08:52:35Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45471/files", "html_url": "https://github.com/huggingface/transformers/pull/45471", "labels": [], "merged": false, "number": 45471, "review_comments_count": 23, "state": "open", "title": "Add EXAONE 4.5 implementations", "updated_at": "2026-04-20T08:01:56Z" }, { "additions": 7, "author": "kaixuanliu", "author_association": "CONTRIBUTOR", "body_excerpt": "@ydshieh pls help review, thx!", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45470", "created_at": "2026-04-16T07:44:20Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45470/files", "html_url": "https://github.com/huggingface/transformers/pull/45470", "labels": [], "merged": false, "number": 45470, "review_comments_count": 0, "state": "open", "title": "skip test_flash_attn_2_can_dispatch_composite_models tests for", "updated_at": "2026-04-16T07:45:33Z" }, { "additions": 12, "author": "Spectual", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "# What does this PR do? `PixioPatchEmbeddings.forward` already accepted an `interpolate_pos_encoding` flag (inherited from `ViTPatchEmbeddings`) to skip image-size validation and allow variable-resolution inputs. However, neither `PixioEmb\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45469", "created_at": "2026-04-16T06:54:56Z", "deletions": 10, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45469/files", "html_url": "https://github.com/huggingface/transformers/pull/45469", "labels": [], "merged": false, "number": 45469, "review_comments_count": 0, "state": "open", "title": "Fix: propagate interpolate_pos_encoding through Pixio model hierarchy", "updated_at": "2026-04-16T06:56:04Z" }, { "additions": 17, "author": "Jah-yee", "author_association": "NONE", "body_excerpt": "Good day, ## Problem On Apple Silicon (MPS backend), `torch.nn.functional.scaled_dot_product_attention` produces incorrect output when the value tensor's head dimension differs from the query tensor's head dimension. This affects DeepSeek\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45467", "created_at": "2026-04-16T06:44:51Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45467/files", "html_url": "https://github.com/huggingface/transformers/pull/45467", "labels": [ "Code agent slop" ], "merged": false, "number": 45467, "review_comments_count": 0, "state": "closed", "title": "Fix MPS SDPA output shape when value head dim differs from query head dim", "updated_at": "2026-04-16T10:53:42Z" }, { "additions": 3, "author": "Jah-yee", "author_association": "NONE", "body_excerpt": "Fixes #45459 - Previously import_protobuf_decode_error() raised ImportError when protobuf wasn't installed even for other exceptions, hiding the real error. Now returns empty tuple () so the actual exception propagates.", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/45466", "created_at": "2026-04-16T00:55:40Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45466/files", "html_url": "https://github.com/huggingface/transformers/pull/45466", "labels": [], "merged": false, "number": 45466, "review_comments_count": 0, "state": "closed", "title": "fix: return empty tuple when protobuf not available", "updated_at": "2026-04-16T03:16:39Z" }, { "additions": 65, "author": "stevhliu", "author_association": "MEMBER", "body_excerpt": "refactors the \"Contribute to Transformers\" doc to be more lightweight and an easy entry point that links out to more dedicated guides", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45465", "created_at": "2026-04-15T23:12:11Z", "deletions": 469, "draft": true, "files_url": "https://github.com/huggingface/transformers/pull/45465/files", "html_url": "https://github.com/huggingface/transformers/pull/45465", "labels": [], "merged": false, "number": 45465, "review_comments_count": 0, "state": "open", "title": "[docs] contributing", "updated_at": "2026-04-15T23:22:12Z" }, { "additions": 186, "author": "SunMarc", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? This PR updates the support for response api. I was mainly basing myself on chat completion api but there are minor differences with it e.g `input_image` vs `image_url` for `type` or `input_text` vs `text`, differen\u2026", "changed_files": 5, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 7, "conversation_url": "https://github.com/huggingface/transformers/pull/45463", "created_at": "2026-04-15T15:56:45Z", "deletions": 181, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45463/files", "html_url": "https://github.com/huggingface/transformers/pull/45463", "labels": [], "merged": true, "number": 45463, "review_comments_count": 6, "state": "closed", "title": "Fix response api support ", "updated_at": "2026-04-16T20:54:08Z" }, { "additions": 118, "author": "tarekziade", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Activated some bandit rules and fixed a few spots", "changed_files": 33, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 5, "conversation_url": "https://github.com/huggingface/transformers/pull/45462", "created_at": "2026-04-15T15:37:15Z", "deletions": 67, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45462/files", "html_url": "https://github.com/huggingface/transformers/pull/45462", "labels": [], "merged": false, "number": 45462, "review_comments_count": 0, "state": "open", "title": "chore(sec): added a handful of security checks", "updated_at": "2026-04-15T16:04:51Z" }, { "additions": 0, "author": "JiauZhang", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? Remove redundant condition checks in `get_image_size` method", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 7, "conversation_url": "https://github.com/huggingface/transformers/pull/45461", "created_at": "2026-04-15T14:00:11Z", "deletions": 16, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45461/files", "html_url": "https://github.com/huggingface/transformers/pull/45461", "labels": [], "merged": true, "number": 45461, "review_comments_count": 1, "state": "closed", "title": "Remove redundant condition checks in `get_image_size` method", "updated_at": "2026-04-17T13:38:35Z" }, { "additions": 5, "author": "cloudyun888", "author_association": "NONE", "body_excerpt": "## Summary When protobuf is not installed, is a function call used as an expression. Because it is evaluated lazily when the try block raises, the resulting from the function itself bypasses the RuntimeError and OSError handlers below it,\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/45460", "created_at": "2026-04-15T13:18:36Z", "deletions": 5, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45460/files", "html_url": "https://github.com/huggingface/transformers/pull/45460", "labels": [ "Code agent slop" ], "merged": false, "number": 45460, "review_comments_count": 0, "state": "closed", "title": "fix(tokenization): re-raise ImportError to allow RuntimeError/OSError fallback (#45459)", "updated_at": "2026-04-16T10:48:07Z" }, { "additions": 2, "author": "tomaarsen", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Currently, for qwen2_5_omni and qwen3_omni_moe, you can only load the 'Talker' variant, i.e. with the audio output. This is a bit like only being able to load a checkpoint with `AutoModelForCausalLM` while `AutoMode\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 8, "conversation_url": "https://github.com/huggingface/transformers/pull/45457", "created_at": "2026-04-15T12:29:47Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45457/files", "html_url": "https://github.com/huggingface/transformers/pull/45457", "labels": [], "merged": true, "number": 45457, "review_comments_count": 4, "state": "closed", "title": "Allow loading Qwen Thinker 'base' models without generative head", "updated_at": "2026-04-16T12:24:48Z" }, { "additions": 2, "author": "tarekziade", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? we're extending `ty` to more modules and we need stubs from more libs like openai.", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45456", "created_at": "2026-04-15T12:10:18Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45456/files", "html_url": "https://github.com/huggingface/transformers/pull/45456", "labels": [], "merged": true, "number": 45456, "review_comments_count": 0, "state": "closed", "title": "refactor(qa): extend extras so ty can run on server modules", "updated_at": "2026-04-15T16:08:23Z" }, { "additions": 3, "author": "tomaarsen", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? There's 2 changes, one is a definite fix and one is a preference. Some background: there are a lot of models that have finetuned `qwen2_5_omni`, e.g. https://huggingface.co/LCO-Embedding/LCO-Embedding-Omni-3B, and i\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/45455", "created_at": "2026-04-15T11:38:05Z", "deletions": 25, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45455/files", "html_url": "https://github.com/huggingface/transformers/pull/45455", "labels": [], "merged": true, "number": 45455, "review_comments_count": 5, "state": "closed", "title": "[`fix`] Make Qwen2_5OmniProcessor warning a lot less noisy via warning_once", "updated_at": "2026-04-16T12:10:43Z" }, { "additions": 81, "author": "zucchini-nlp", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Fixes https://github.com/huggingface/transformers/issues/45200 As per title, this error was actually needed only in PG. Other models don't have such prefix/suffix separation when training", "changed_files": 7, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 13, "conversation_url": "https://github.com/huggingface/transformers/pull/45454", "created_at": "2026-04-15T11:11:34Z", "deletions": 104, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45454/files", "html_url": "https://github.com/huggingface/transformers/pull/45454", "labels": [], "merged": false, "number": 45454, "review_comments_count": 0, "state": "open", "title": "Gemma4 training with text-only samples", "updated_at": "2026-04-16T04:05:06Z" }, { "additions": 654, "author": "ArthurZucker", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Ai init", "changed_files": 5, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45453", "created_at": "2026-04-15T10:22:17Z", "deletions": 67, "draft": true, "files_url": "https://github.com/huggingface/transformers/pull/45453/files", "html_url": "https://github.com/huggingface/transformers/pull/45453", "labels": [], "merged": false, "number": 45453, "review_comments_count": 1, "state": "open", "title": "Draft commit", "updated_at": "2026-04-20T05:28:39Z" }, { "additions": 3058, "author": "DavidSolanas", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "Fixes #45306 ## What and why All `models/X/__init__.py` files used `from .module import *` inside the `TYPE_CHECKING` block. This makes it impossible for static analysis tools (pyright, mypy, IDEs) to know which symbols are actually export\u2026", "changed_files": 446, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45452", "created_at": "2026-04-15T08:44:00Z", "deletions": 1305, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45452/files", "html_url": "https://github.com/huggingface/transformers/pull/45452", "labels": [], "merged": false, "number": 45452, "review_comments_count": 0, "state": "open", "title": "refactor: replace wildcard imports with explicit imports in model __init__.py files", "updated_at": "2026-04-15T12:15:24Z" }, { "additions": 20, "author": "MukundaKatta", "author_association": "NONE", "body_excerpt": "Docstring/comment-only typo fixes across Qwen3-VL, Qwen3.5, GLM4V, GLM4.6V, GLM-OCR and their MoE variants. `seperate` -> `separate`. No behavior changes. I deliberately left `image_seperate.weight` in `convert_mm_grounding_dino_to_hf.py`\u2026", "changed_files": 10, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45451", "created_at": "2026-04-15T08:33:29Z", "deletions": 20, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45451/files", "html_url": "https://github.com/huggingface/transformers/pull/45451", "labels": [], "merged": false, "number": 45451, "review_comments_count": 0, "state": "closed", "title": "Fix 'seperate' typo in qwen3/glm video-model docstrings", "updated_at": "2026-04-15T10:57:26Z" }, { "additions": 1, "author": "rtrompier", "author_association": "MEMBER", "body_excerpt": "Switch the PR doc upload flow from the legacy dataset push to the new HF bucket.", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45450", "created_at": "2026-04-15T08:26:45Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45450/files", "html_url": "https://github.com/huggingface/transformers/pull/45450", "labels": [], "merged": true, "number": 45450, "review_comments_count": 0, "state": "closed", "title": "chore: bump doc-builder SHA for PR upload workflow", "updated_at": "2026-04-20T09:15:25Z" }, { "additions": 1, "author": "hmellor", "author_association": "MEMBER", "body_excerpt": "This model also has the wrong tokenizer class in its config", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45449", "created_at": "2026-04-15T08:26:05Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45449/files", "html_url": "https://github.com/huggingface/transformers/pull/45449", "labels": [], "merged": true, "number": 45449, "review_comments_count": 0, "state": "closed", "title": "Add `step3_vl` to `MODELS_WITH_INCORRECT_HUB_TOKENIZER_CLASS`", "updated_at": "2026-04-15T09:28:46Z" }, { "additions": 432, "author": "Cyrilvallez", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? As per the title. ## The issue The problem is that transforms that want to remove a full part of a model name (such as a prefix, e.g. the `model.` start) are non bijective in general, i.e. we completely lose the inf\u2026", "changed_files": 5, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 10, "conversation_url": "https://github.com/huggingface/transformers/pull/45448", "created_at": "2026-04-15T08:06:53Z", "deletions": 124, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45448/files", "html_url": "https://github.com/huggingface/transformers/pull/45448", "labels": [], "merged": true, "number": 45448, "review_comments_count": 44, "state": "closed", "title": "[loading] Clean way to add/remove full parts in checkpoint names", "updated_at": "2026-04-20T07:17:27Z" }, { "additions": 1, "author": "ZSLsherly", "author_association": "FIRST_TIMER", "body_excerpt": "This commit corrects the PyTorch version check for importing `AuxRequest` from `torch.nn.attention.flex_attention`(line51). The `AuxRequest` class was actually introduced in PyTorch 2.9.1, not 2.9.0. The current code attempts to import it\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 8, "conversation_url": "https://github.com/huggingface/transformers/pull/45445", "created_at": "2026-04-15T03:09:38Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45445/files", "html_url": "https://github.com/huggingface/transformers/pull/45445", "labels": [], "merged": false, "number": 45445, "review_comments_count": 1, "state": "closed", "title": "Update Torch version check for flex attention", "updated_at": "2026-04-15T11:40:34Z" }, { "additions": 50, "author": "tomaarsen", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Resolves https://github.com/huggingface/sentence-transformers/issues/3724 ## Code Agent Policy - [x] I confirm that this is not a pure code agent PR. ## Before submitting - [ ] This PR fixes a typo or improves the d\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 7, "conversation_url": "https://github.com/huggingface/transformers/pull/45444", "created_at": "2026-04-14T19:28:34Z", "deletions": 20, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45444/files", "html_url": "https://github.com/huggingface/transformers/pull/45444", "labels": [], "merged": true, "number": 45444, "review_comments_count": 4, "state": "closed", "title": "[`fix`] Always early return for non-Mistral models in _patch_mistral_regex", "updated_at": "2026-04-20T08:19:27Z" }, { "additions": 38, "author": "qgallouedec", "author_association": "MEMBER", "body_excerpt": "When `transformers serve` is launched with a positional model argument, the server silently overwrites the `\"model\"` field in every incoming request with the pinned model id. This is surprising: a client that asks for model B receives a re\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45443", "created_at": "2026-04-14T19:14:10Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45443/files", "html_url": "https://github.com/huggingface/transformers/pull/45443", "labels": [], "merged": false, "number": 45443, "review_comments_count": 0, "state": "open", "title": "Raise 400 on model mismatch when `transformers serve` is pinned", "updated_at": "2026-04-15T11:42:44Z" }, { "additions": 4, "author": "paulinebm", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Fixes # (issue) ## Code Agent Policy The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by c\u2026", "changed_files": 8, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/45350", "created_at": "2026-04-09T17:46:37Z", "deletions": 0, "draft": true, "files_url": "https://github.com/huggingface/transformers/pull/45350/files", "html_url": "https://github.com/huggingface/transformers/pull/45350", "labels": [], "merged": false, "number": 45350, "review_comments_count": 0, "state": "open", "title": "WIP: Add support for Granite4VisionForConditionalGeneration", "updated_at": "2026-04-10T12:34:50Z" }, { "additions": 90, "author": "florian6973", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? Fixes #45305 Add a regression test in `TrainerGradientAccumulationTest` to avoid passing the GAS value to Accelerate by mistake Description: I force the value of the `num_steps` parameter to be 1, and the regression\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45349", "created_at": "2026-04-09T17:24:39Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45349/files", "html_url": "https://github.com/huggingface/transformers/pull/45349", "labels": [ "for patch" ], "merged": true, "number": 45349, "review_comments_count": 6, "state": "closed", "title": "Fix #45305 + add regression test GAS", "updated_at": "2026-04-13T14:41:43Z" }, { "additions": 50, "author": "qgallouedec", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Fixes #45290 ## Code Agent Policy The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by code agents. We are currently bottlenecked by our ability to review and r\u2026", "changed_files": 5, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45348", "created_at": "2026-04-09T15:59:07Z", "deletions": 19, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45348/files", "html_url": "https://github.com/huggingface/transformers/pull/45348", "labels": [], "merged": true, "number": 45348, "review_comments_count": 7, "state": "closed", "title": "Fix `apply_chat_template` crash on `tool_call` messages without content", "updated_at": "2026-04-13T19:44:38Z" }, { "additions": 35, "author": "Cyrilvallez", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? As per the title. `accelerate` destroys the dict otherwise, if it's not BOTH passed as kwarg AND part of `_skip_keys_device_placement`.......... `per_layer_input` needs to stay as a positional arg, for gradient chec\u2026", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45347", "created_at": "2026-04-09T15:31:34Z", "deletions": 6, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45347/files", "html_url": "https://github.com/huggingface/transformers/pull/45347", "labels": [], "merged": true, "number": 45347, "review_comments_count": 0, "state": "closed", "title": "[gemma4] Fix device map auto", "updated_at": "2026-04-09T15:45:15Z" }, { "additions": 16, "author": "ionut-anghelina", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": null, "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 5, "conversation_url": "https://github.com/huggingface/transformers/pull/45346", "created_at": "2026-04-09T14:48:28Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45346/files", "html_url": "https://github.com/huggingface/transformers/pull/45346", "labels": [], "merged": false, "number": 45346, "review_comments_count": 1, "state": "open", "title": "Fix Double Application of Softmax for Router Logits in MoE models", "updated_at": "2026-04-13T12:40:28Z" }, { "additions": 30, "author": "ansley", "author_association": "NONE", "body_excerpt": "The `transformers` V5 \"rm slow tokenizers\" refactor (\\#40936) aliased `LlamaTokenizerFast` to `LlamaTokenizer`, whose `__init__` unconditionally installs a SentencePiece Metaspace pre-tokenizer. This is correct for classic Llama/Llama-2 mo\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 6, "conversation_url": "https://github.com/huggingface/transformers/pull/45345", "created_at": "2026-04-09T14:31:40Z", "deletions": 14, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45345/files", "html_url": "https://github.com/huggingface/transformers/pull/45345", "labels": [], "merged": false, "number": 45345, "review_comments_count": 0, "state": "closed", "title": "Fix ByteLevel-BPE tokenizers silently breaking in `LlamaTokenizer`", "updated_at": "2026-04-10T12:45:24Z" }, { "additions": 6, "author": "tarekziade", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Simple hook to display test duration. This will append inline duration per test during the run, example: ``` tests/utils/test_configuration_utils.py::ConfigPushToHubTester::test_push_to_hub [gw1] [ 90%] PASSED tests\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45344", "created_at": "2026-04-09T14:22:46Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45344/files", "html_url": "https://github.com/huggingface/transformers/pull/45344", "labels": [], "merged": true, "number": 45344, "review_comments_count": 0, "state": "closed", "title": "refactor: display test duration", "updated_at": "2026-04-09T15:19:26Z" }, { "additions": 8, "author": "Cyrilvallez", "author_association": "MEMBER", "body_excerpt": null, "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45342", "created_at": "2026-04-09T14:13:15Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45342/files", "html_url": "https://github.com/huggingface/transformers/pull/45342", "labels": [], "merged": false, "number": 45342, "review_comments_count": 0, "state": "open", "title": "Use `_keys_to_ignore_on_load_unexpected/missing` recursively from children", "updated_at": "2026-04-09T14:23:31Z" }, { "additions": 17, "author": "Cyrilvallez", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Supersedes https://github.com/huggingface/transformers/pull/45314 with a better fix. Fixes https://github.com/huggingface/transformers/issues/45216 and https://github.com/huggingface/transformers/issues/45310 and ht\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 8, "conversation_url": "https://github.com/huggingface/transformers/pull/45340", "created_at": "2026-04-09T12:02:14Z", "deletions": 14, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45340/files", "html_url": "https://github.com/huggingface/transformers/pull/45340", "labels": [], "merged": true, "number": 45340, "review_comments_count": 0, "state": "closed", "title": "Fix conversion mappings for vlms", "updated_at": "2026-04-17T08:25:29Z" }, { "additions": 156, "author": "tarekziade", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? The CircleCI config file is not ruff formatted, leading to unwanted changes when it's opened in an editor that follows our repository ruff configuration. This patch adds it and runs `make style` to update it", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/45339", "created_at": "2026-04-09T09:44:16Z", "deletions": 58, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45339/files", "html_url": "https://github.com/huggingface/transformers/pull/45339", "labels": [], "merged": true, "number": 45339, "review_comments_count": 0, "state": "closed", "title": "chore: added circleci python script to ruff and ty checkers", "updated_at": "2026-04-09T12:00:08Z" }, { "additions": 37, "author": "RudrenduPaul", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "Closes #45162 ## What this PR does Expands the docstrings of `_can_set_attn_implementation` and `_can_set_experts_implementation` in `modeling_utils.py` to explicitly document the known limitations of their source-inspection heuristic. **C\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45338", "created_at": "2026-04-09T09:35:52Z", "deletions": 4, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45338/files", "html_url": "https://github.com/huggingface/transformers/pull/45338", "labels": [], "merged": false, "number": 45338, "review_comments_count": 0, "state": "closed", "title": "docs: document known limitations of _can_set_attn/experts_implementation source inspection", "updated_at": "2026-04-09T13:43:04Z" }, { "additions": 13, "author": "tarekziade", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Removing test_hub from CI for now", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/45337", "created_at": "2026-04-09T08:54:45Z", "deletions": 30, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45337/files", "html_url": "https://github.com/huggingface/transformers/pull/45337", "labels": [], "merged": true, "number": 45337, "review_comments_count": 0, "state": "closed", "title": "chore: remove test_hub for now", "updated_at": "2026-04-09T09:28:52Z" }, { "additions": 84, "author": "Cyrilvallez", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? As per the title. Follow-up of https://github.com/huggingface/transformers/pull/45312. This removes the unnecessary weights, and silently skip them during loading, so that the checkpoints on the hub do not have to b\u2026", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45336", "created_at": "2026-04-09T08:43:55Z", "deletions": 26, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45336/files", "html_url": "https://github.com/huggingface/transformers/pull/45336", "labels": [], "merged": true, "number": 45336, "review_comments_count": 7, "state": "closed", "title": "[gemma4] Remove all shared weights, and silently skip them during loading", "updated_at": "2026-04-09T13:23:33Z" }, { "additions": 1333, "author": "kmswin1", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "Add A.X K1 model architecture What does this PR do? This PR adds support for A.X K1, a large-scale Mixture-of-Experts (MoE) language model developed by [SK Telecom](https://huggingface.co/skt). A.X K1 contains 519B total parameters with 33\u2026", "changed_files": 8, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/45334", "created_at": "2026-04-09T06:21:43Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45334/files", "html_url": "https://github.com/huggingface/transformers/pull/45334", "labels": [], "merged": false, "number": 45334, "review_comments_count": 0, "state": "closed", "title": "Feature/add axk1", "updated_at": "2026-04-14T07:23:08Z" }, { "additions": 471, "author": "eladsegal", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? Adds heterogeneous model support - the ability for individual layers to differ from the global config (e.g., different `intermediate_size`, `num_key_value_heads`) and to skip sub-modules entirely (MLP, attention, et\u2026", "changed_files": 5, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45333", "created_at": "2026-04-09T06:18:11Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45333/files", "html_url": "https://github.com/huggingface/transformers/pull/45333", "labels": [], "merged": false, "number": 45333, "review_comments_count": 0, "state": "open", "title": "Add heterogeneous config support (per-layer configuration)", "updated_at": "2026-04-14T14:08:07Z" }, { "additions": 2152, "author": "eladsegal", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? Adds heterogeneous model support - the ability for individual layers to differ from the global config (e.g., different `intermediate_size`, `num_key_value_heads`) and to skip sub-modules entirely (MLP, attention, et\u2026", "changed_files": 14, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45332", "created_at": "2026-04-09T05:56:31Z", "deletions": 40, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45332/files", "html_url": "https://github.com/huggingface/transformers/pull/45332", "labels": [], "merged": false, "number": 45332, "review_comments_count": 0, "state": "open", "title": "Add heterogeneous model support (per-layer config and modeling)", "updated_at": "2026-04-15T04:50:09Z" }, { "additions": 12, "author": "Kash6", "author_association": "CONTRIBUTOR", "body_excerpt": "get_rope_index unconditionally applies tokens_per_second temporal scaling to both images and videos. For still images (modality_type == 1), this shifts the temporal position origin to start_position * tokens_per_second instead of start_pos\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/45330", "created_at": "2026-04-08T23:51:52Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45330/files", "html_url": "https://github.com/huggingface/transformers/pull/45330", "labels": [ "for patch" ], "merged": true, "number": 45330, "review_comments_count": 0, "state": "closed", "title": "Fix Qwen2.5-VL temporal RoPE scaling applied to still images", "updated_at": "2026-04-14T03:21:32Z" }, { "additions": 152, "author": "abidlabs", "author_association": "MEMBER", "body_excerpt": "Updates `TrackioCallback` and `TrainingArguments` for the latest version of Trackio using HF Buckets as the backend, and control over creating a static Space for the Trackio dashboard during or at the end of training. These are now the `Tr\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 6, "conversation_url": "https://github.com/huggingface/transformers/pull/45329", "created_at": "2026-04-08T22:36:08Z", "deletions": 57, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45329/files", "html_url": "https://github.com/huggingface/transformers/pull/45329", "labels": [], "merged": true, "number": 45329, "review_comments_count": 21, "state": "closed", "title": "Update `trackio` integration to use Buckets and \"freeze\" Space after training", "updated_at": "2026-04-13T14:30:27Z" }, { "additions": 9, "author": "RyanMullins", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? Fixes #45242 * Drops `k_proj`, `k_norm`, and `v_proj` weights for `Gemma4TextAttention` modules from the checkpoint if the layer shares KV cache values. These changes can also be adapted to Gemma 3n if that's desira\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 9, "conversation_url": "https://github.com/huggingface/transformers/pull/45328", "created_at": "2026-04-08T20:43:42Z", "deletions": 6, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45328/files", "html_url": "https://github.com/huggingface/transformers/pull/45328", "labels": [], "merged": false, "number": 45328, "review_comments_count": 0, "state": "open", "title": "Drop unused Gemma4TextAttention weights when sharing KV Cache", "updated_at": "2026-04-09T18:31:13Z" }, { "additions": 337, "author": "stevhliu", "author_association": "MEMBER", "body_excerpt": "refactors the how to add a model with modular transformers doc: - structure: - flipped the order so you learn how to write the modular file first before generating it - remove the motivator examples with BERT/RoBERTa - merge the two `super\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45327", "created_at": "2026-04-08T20:23:28Z", "deletions": 408, "draft": true, "files_url": "https://github.com/huggingface/transformers/pull/45327/files", "html_url": "https://github.com/huggingface/transformers/pull/45327", "labels": [], "merged": false, "number": 45327, "review_comments_count": 1, "state": "open", "title": "[docs] modular transformers", "updated_at": "2026-04-15T18:15:34Z" }, { "additions": 19, "author": "harshaljanjani", "author_association": "CONTRIBUTOR", "body_excerpt": "### What does this PR do? \u2192 This PR introduces compat fixes across several audio models to ensure they can be loaded and used by a companion vLLM PR. These changes are deliberate and are blocking [this vLLM PR](https://github.co\u2026", "changed_files": 11, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 6, "conversation_url": "https://github.com/huggingface/transformers/pull/45326", "created_at": "2026-04-08T18:28:35Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45326/files", "html_url": "https://github.com/huggingface/transformers/pull/45326", "labels": [], "merged": false, "number": 45326, "review_comments_count": 0, "state": "open", "title": "feat[vLLM \u00d7 v5]: Add vLLM compatibility for audio models", "updated_at": "2026-04-20T09:22:42Z" }, { "additions": 236, "author": "zucchini-nlp", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Fixes https://github.com/huggingface/transformers/issues/45276 and https://github.com/huggingface/transformers/issues/45335 In gemma4 per-layer inputs have to be resized as long as they aren't part of soft multimoda\u2026", "changed_files": 16, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/45324", "created_at": "2026-04-08T17:06:26Z", "deletions": 53, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45324/files", "html_url": "https://github.com/huggingface/transformers/pull/45324", "labels": [], "merged": true, "number": 45324, "review_comments_count": 5, "state": "closed", "title": "Gemma4 resizing per layer inputs", "updated_at": "2026-04-15T11:15:23Z" }, { "additions": 225, "author": "remi-or", "author_association": "MEMBER", "body_excerpt": "# Summary This PR fixes the issue raised in https://github.com/huggingface/transformers/pull/45274 . CUDA graph reuse in continuous batching used (num_q_tokens, max_kv_read) as the graph cache key. However, FlashAttention varlen kernels al\u2026", "changed_files": 7, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 7, "conversation_url": "https://github.com/huggingface/transformers/pull/45323", "created_at": "2026-04-08T16:30:18Z", "deletions": 126, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45323/files", "html_url": "https://github.com/huggingface/transformers/pull/45323", "labels": [], "merged": true, "number": 45323, "review_comments_count": 3, "state": "closed", "title": "[CB] Fix capture of max_seqlen", "updated_at": "2026-04-17T03:35:07Z" }, { "additions": 20, "author": "andrewor14", "author_association": "CONTRIBUTOR", "body_excerpt": "**Summary:** TorchAO recently deprecated AffineQuantizedTensor and related classes (pytorch/ao#2752). These will be removed in the next release. We should remove references of these classes in transformers before then. **Test Plan:** ``` p\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/45321", "created_at": "2026-04-08T15:42:16Z", "deletions": 29, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45321/files", "html_url": "https://github.com/huggingface/transformers/pull/45321", "labels": [], "merged": false, "number": 45321, "review_comments_count": 0, "state": "open", "title": "Remove references to torchao's AffineQuantizedTensor", "updated_at": "2026-04-09T12:21:03Z" }, { "additions": 5, "author": "Regata3010", "author_association": "CONTRIBUTOR", "body_excerpt": "## What does this PR do? Fixes a crash in assisted generation when using model pairs with different vocabulary sizes but the same tokenizer family (e.g., Qwen2.5-7B + Qwen2.5-0.5B). `map_input_embeddings` is only initialized when `len(self\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45320", "created_at": "2026-04-08T15:30:16Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45320/files", "html_url": "https://github.com/huggingface/transformers/pull/45320", "labels": [], "merged": true, "number": 45320, "review_comments_count": 0, "state": "closed", "title": "Fix AttributeError in AssistantToTargetTranslator.unmap_input_ids with cross-vocab models", "updated_at": "2026-04-10T17:46:37Z" }, { "additions": 78, "author": "tarekziade", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? - Removes `HUGGINGFACE_CO_STAGING` when downloading artifacts - adds a retry mechanism for external URLs (with partial file cleanup)", "changed_files": 6, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45319", "created_at": "2026-04-08T14:51:48Z", "deletions": 32, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45319/files", "html_url": "https://github.com/huggingface/transformers/pull/45319", "labels": [], "merged": true, "number": 45319, "review_comments_count": 3, "state": "closed", "title": "fix: dont download artifacts from the test hub", "updated_at": "2026-04-15T16:52:10Z" }, { "additions": 5, "author": "tarekziade", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? AutoTokenizer.register() adds classes to the global `REGISTERED_TOKENIZER_CLASSES` dict and some tests did not clean up behind them, leading to leaky state between tests", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 5, "conversation_url": "https://github.com/huggingface/transformers/pull/45318", "created_at": "2026-04-08T13:46:47Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45318/files", "html_url": "https://github.com/huggingface/transformers/pull/45318", "labels": [], "merged": true, "number": 45318, "review_comments_count": 0, "state": "closed", "title": "fix: leak in tokenizer registry for `test_processors`", "updated_at": "2026-04-09T10:12:46Z" }, { "additions": 24, "author": "mohdfaour03", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "Fixes #45081 ## Problem Loading a Mistral tokenizer with `fix_mistral_regex=True` crashes because `_patch_mistral_regex` receives a raw `tokenizers.Tokenizer` but tries to access `.backend_tokenizer.pre_tokenizer` on it \u2014 that attribute on\u2026", "changed_files": 2, "cluster_id": "cluster-45081-3", "cluster_ids": [ "cluster-45081-3" ], "cluster_role": "canonical", "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/45317", "created_at": "2026-04-08T13:38:46Z", "deletions": 3, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45317/files", "html_url": "https://github.com/huggingface/transformers/pull/45317", "labels": [], "merged": false, "number": 45317, "review_comments_count": 1, "state": "open", "title": "Fix AttributeError in _patch_mistral_regex when fix_mistral_regex=True ", "updated_at": "2026-04-09T13:52:30Z" }, { "additions": 9, "author": "zucchini-nlp", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? As per title and seems like there are no objections. Also added some colors in verbose logging cc @tarekziade @tomaarsen @yonigozlan if you have better ideas to style this (just tagging since you reacted \u2795 ) This is\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 8, "conversation_url": "https://github.com/huggingface/transformers/pull/45316", "created_at": "2026-04-08T13:01:15Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45316/files", "html_url": "https://github.com/huggingface/transformers/pull/45316", "labels": [], "merged": true, "number": 45316, "review_comments_count": 1, "state": "closed", "title": "Logger has `[transformers]` prefix in non-verbose mode", "updated_at": "2026-04-14T14:08:04Z" }, { "additions": 46, "author": "Rocketknight1", "author_association": "MEMBER", "body_excerpt": "Reusing a variable name meant that we returned a softmaxed value instead of the original logits in some MoE routers. This generally did not affect inference, but could affect the auxiliary loss on MoE logits in training when the coefficien\u2026", "changed_files": 15, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 6, "conversation_url": "https://github.com/huggingface/transformers/pull/45315", "created_at": "2026-04-08T12:54:52Z", "deletions": 30, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45315/files", "html_url": "https://github.com/huggingface/transformers/pull/45315", "labels": [], "merged": false, "number": 45315, "review_comments_count": 0, "state": "closed", "title": "Fix softmaxing router logits", "updated_at": "2026-04-10T13:25:20Z" }, { "additions": 18, "author": "zucchini-nlp", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? fixes https://github.com/huggingface/transformers/issues/45216 and https://github.com/huggingface/transformers/issues/45310 and https://github.com/huggingface/transformers/issues/45313 TBH load-save-load works for t\u2026", "changed_files": 10, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45314", "created_at": "2026-04-08T11:54:53Z", "deletions": 27, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45314/files", "html_url": "https://github.com/huggingface/transformers/pull/45314", "labels": [], "merged": false, "number": 45314, "review_comments_count": 0, "state": "closed", "title": "Conversion for LLM class loading with VLM ckpt ", "updated_at": "2026-04-10T09:18:26Z" }, { "additions": 61, "author": "Cyrilvallez", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? As per the title. It was confirmed that the weight matrices of shared layers are NEVER used, and that kv states should ALWAYS be shared, even during training or inference without Cache. I will fully remove them on a\u2026", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/45312", "created_at": "2026-04-08T11:33:33Z", "deletions": 24, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45312/files", "html_url": "https://github.com/huggingface/transformers/pull/45312", "labels": [], "merged": true, "number": 45312, "review_comments_count": 0, "state": "closed", "title": "[gemma4] Dissociate kv states sharing from the Cache", "updated_at": "2026-04-09T08:08:07Z" }, { "additions": 2, "author": "KoichiYasuoka", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? Fixes #45292 (seems to come from #41580) ## Code Agent Policy The Transformers repo is currently being overwhelmed by a large number of PRs and issue comments written by code agents. We are currently bottlenecked by\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 7, "conversation_url": "https://github.com/huggingface/transformers/pull/45311", "created_at": "2026-04-08T10:38:34Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45311/files", "html_url": "https://github.com/huggingface/transformers/pull/45311", "labels": [], "merged": false, "number": 45311, "review_comments_count": 0, "state": "open", "title": "resize_token_embeddings does not effect to output_embeddings", "updated_at": "2026-04-18T08:30:53Z" }, { "additions": 301, "author": "agentspan", "author_association": "NONE", "body_excerpt": "## Summary Fixes #45290. `ProcessorMixin.apply_chat_template` and several related code paths assumed every message in a conversation has a `content` key. Assistant messages with `tool_calls` and no textual content (a valid shape per the Op\u2026", "changed_files": 9, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45309", "created_at": "2026-04-08T08:40:08Z", "deletions": 23, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45309/files", "html_url": "https://github.com/huggingface/transformers/pull/45309", "labels": [ "Code agent slop" ], "merged": false, "number": 45309, "review_comments_count": 0, "state": "closed", "title": "Fix KeyError in apply_chat_template when message has no content (#45290)", "updated_at": "2026-04-08T11:30:37Z" }, { "additions": 10, "author": "juliabush", "author_association": "NONE", "body_excerpt": "## What does this PR do? Fixes #29942 Flash Attention 2 inference equivalence tests for Whisper can fail due to higher numerical variance compared to the eager attention implementation. This PR increases the tolerance (`atol`, `rtol`) spec\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/45303", "created_at": "2026-04-07T21:37:00Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45303/files", "html_url": "https://github.com/huggingface/transformers/pull/45303", "labels": [ "Code agent slop" ], "merged": false, "number": 45303, "review_comments_count": 0, "state": "closed", "title": "Fix FA2 inference equivalence failures for Whisper (closes #29942)", "updated_at": "2026-04-08T14:42:36Z" }, { "additions": 7, "author": "jagwar", "author_association": "MEMBER", "body_excerpt": "## Security Fix Fixes a trust check bypass in `trl-ci-bot.yml` that allowed any GitHub user to trigger TRL CI on self-hosted GPU runners by commenting `/trl-ci` on any PR. ### The bug The \"Ignore untrusted commenter\" step used `exit 0`, wh\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/45302", "created_at": "2026-04-07T21:35:38Z", "deletions": 3, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/45302/files", "html_url": "https://github.com/huggingface/transformers/pull/45302", "labels": [], "merged": true, "number": 45302, "review_comments_count": 0, "state": "closed", "title": "fix(security): prevent untrusted users from triggering TRL CI dispatch", "updated_at": "2026-04-07T21:59:38Z" }, { "additions": 0, "author": "sahildando", "author_association": "NONE", "body_excerpt": "# What does this PR do? save locally --> local locally) ```\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44730", "created_at": "2026-03-15T20:44:32Z", "deletions": 4, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44730/files", "html_url": "https://github.com/huggingface/transformers/pull/44730", "labels": [], "merged": true, "number": 44730, "review_comments_count": 6, "state": "closed", "title": "Fix `mlcd` auto config/model/mapping issues", "updated_at": "2026-03-16T12:12:30Z" }, { "additions": 214, "author": "xenova", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? This PR introduces a helper utility function, `int_div_ceil`, which performs `math.ceil(a / b)` for non-negative integer operands. This is necessary as the current approach is both error-prone and imprecise (especia\u2026", "changed_files": 58, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44729", "created_at": "2026-03-15T20:29:38Z", "deletions": 225, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44729/files", "html_url": "https://github.com/huggingface/transformers/pull/44729", "labels": [], "merged": false, "number": 44729, "review_comments_count": 0, "state": "open", "title": "Avoid floating point math for ceil operations", "updated_at": "2026-03-15T20:49:34Z" }, { "additions": 88, "author": "ajmeese7", "author_association": "NONE", "body_excerpt": "# What does this PR do? Fixes a GPU memory leak in `Bnb4bitQuantize.convert()` where float16 source tensors are never freed during 4-bit quantized model loading via `from_pretrained`, causing OOM on models whose float16 size exceeds GPU VR\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/44728", "created_at": "2026-03-15T19:56:44Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44728/files", "html_url": "https://github.com/huggingface/transformers/pull/44728", "labels": [], "merged": false, "number": 44728, "review_comments_count": 0, "state": "closed", "title": "Fix float16 memory leak during 4-bit quantized model loading", "updated_at": "2026-03-16T20:53:54Z" }, { "additions": 202, "author": "LincolnBurrows2017", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "Fixed issue where kwargs like force_download, proxies, token were not being passed to cached_file function.", "changed_files": 11, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44727", "created_at": "2026-03-15T19:41:24Z", "deletions": 33, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44727/files", "html_url": "https://github.com/huggingface/transformers/pull/44727", "labels": [ "Code agent slop" ], "merged": false, "number": 44727, "review_comments_count": 0, "state": "closed", "title": "fix: AutoProcessor.from_pretrained not passing kwargs to cached_file", "updated_at": "2026-03-18T13:15:46Z" }, { "additions": 198, "author": "LincolnBurrows2017", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "Replaced bare except clause with except Exception in _safe_convert_tensor function to follow Python best practices (PEP 8).", "changed_files": 10, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44725", "created_at": "2026-03-15T17:41:18Z", "deletions": 29, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44725/files", "html_url": "https://github.com/huggingface/transformers/pull/44725", "labels": [ "Code agent slop" ], "merged": false, "number": 44725, "review_comments_count": 0, "state": "closed", "title": "fix: replace bare except with Exception in Fuyu image processing", "updated_at": "2026-03-18T13:16:22Z" }, { "additions": 6, "author": "ydshieh", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? TO be explained.", "changed_files": 5, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44724", "created_at": "2026-03-15T17:14:12Z", "deletions": 5, "draft": true, "files_url": "https://github.com/huggingface/transformers/pull/44724/files", "html_url": "https://github.com/huggingface/transformers/pull/44724", "labels": [], "merged": false, "number": 44724, "review_comments_count": 1, "state": "open", "title": "Fix some missing / incorrect entries in auto files", "updated_at": "2026-03-16T09:59:56Z" }, { "additions": 12, "author": "aashirpersonal", "author_association": "NONE", "body_excerpt": "## Summary This PR fixes #44716 by exposing and forwarding `interpolate_pos_encoding` through the Pixio embedding/model call chain so the option is actually usable from `PixioModel.forward()`. ### Changes - Added `interpolate_pos_encoding:\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44723", "created_at": "2026-03-15T16:52:03Z", "deletions": 6, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44723/files", "html_url": "https://github.com/huggingface/transformers/pull/44723", "labels": [ "Code agent slop" ], "merged": false, "number": 44723, "review_comments_count": 0, "state": "closed", "title": "Fix: propagate interpolate_pos_encoding through PixioEmbeddings and PixioModel", "updated_at": "2026-03-18T15:05:52Z" }, { "additions": 38, "author": "chandan11248", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "## What does this PR do? Migrates the GPT-J model to use the new `@capture_outputs` and `@can_return_tuple` decorators for standardized output collection, as described in #43979. ### Changes - Added `_can_record_outputs` to `GPTJPreTrained\u2026", "changed_files": 2, "cluster_id": "cluster-43979-11", "cluster_ids": [ "cluster-43979-11" ], "cluster_role": "member", "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44722", "created_at": "2026-03-15T15:33:25Z", "deletions": 110, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44722/files", "html_url": "https://github.com/huggingface/transformers/pull/44722", "labels": [], "merged": false, "number": 44722, "review_comments_count": 0, "state": "open", "title": "Refactor gptj output tracing to use standardized decorators", "updated_at": "2026-03-19T18:12:59Z" }, { "additions": 4, "author": "rsmed31", "author_association": "NONE", "body_excerpt": "## Summary Fixes #44716 `PixioPatchEmbeddings.forward` already accepted `interpolate_pos_encoding` but it was silently dropped \u2014 never passed from `PixioEmbeddings.forward` or `PixioModel.forward`, making the parameter effectively unusable\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44718", "created_at": "2026-03-14T23:57:14Z", "deletions": 3, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44718/files", "html_url": "https://github.com/huggingface/transformers/pull/44718", "labels": [], "merged": false, "number": 44718, "review_comments_count": 0, "state": "closed", "title": "Fix: propagate interpolate_pos_encoding through PixioEmbeddings and PixioModel", "updated_at": "2026-03-15T17:58:58Z" }, { "additions": 15, "author": "ydshieh", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? As discussed internally, some component model classes didn't specify the correct config classes. This PR fixes them (those I could found - because the tiny model creation script fails due to those mistakes).", "changed_files": 7, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/44715", "created_at": "2026-03-14T21:11:52Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44715/files", "html_url": "https://github.com/huggingface/transformers/pull/44715", "labels": [], "merged": true, "number": 44715, "review_comments_count": 0, "state": "closed", "title": "Fix missing / incorrect `config` class in some model class definitions", "updated_at": "2026-03-15T11:19:51Z" }, { "additions": 181, "author": "LincolnBurrows2017", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "## Summary Fixes issue #44625: Qwen3.5 num_labels not propagating from core config to text_config. When calling `AutoConfig.from_pretrained(\"Qwen3.5\", num_labels=1)`, the main config gets `num_labels=1` but `text_config` still has default\u2026", "changed_files": 8, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44714", "created_at": "2026-03-14T20:42:46Z", "deletions": 26, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44714/files", "html_url": "https://github.com/huggingface/transformers/pull/44714", "labels": [], "merged": false, "number": 44714, "review_comments_count": 0, "state": "closed", "title": "fix: propagate num_labels to text_config for Qwen models", "updated_at": "2026-03-18T12:56:27Z" }, { "additions": 15, "author": "kulkarni-rohan", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "Applies the output tracing refactor to ColQwen2ForRetrieval as part of the broader effort tracked in issue #43979 to modernize output handling across all models in the library. Changes in both modular_colqwen2.py and modeling_colqwen2.py:\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44713", "created_at": "2026-03-14T20:20:14Z", "deletions": 28, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44713/files", "html_url": "https://github.com/huggingface/transformers/pull/44713", "labels": [], "merged": false, "number": 44713, "review_comments_count": 0, "state": "open", "title": "[ColQwen2] Refactor output tracing (issue #43979)", "updated_at": "2026-03-14T20:21:24Z" }, { "additions": 2, "author": "ydshieh", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? torch 2.11 is going to be released soon, but we still use 2.9. Let's update it to 2.10 so at least a run with torch 2.10, before we update to torch 2.11 later.", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44712", "created_at": "2026-03-14T20:18:01Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44712/files", "html_url": "https://github.com/huggingface/transformers/pull/44712", "labels": [], "merged": true, "number": 44712, "review_comments_count": 0, "state": "closed", "title": "Update Nvidia CI docker file to use torch 2.10", "updated_at": "2026-03-14T20:29:04Z" }, { "additions": 339, "author": "anuq", "author_association": "NONE", "body_excerpt": "## What does this PR do? Fixes #35141. When `tie_word_embeddings=False`, calling `resize_token_embeddings()` creates a new `nn.Linear` for the LM head via `_get_resized_lm_head()`. The new module's weight and bias tensors do **not** carry\u2026", "changed_files": 4, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44711", "created_at": "2026-03-14T19:21:21Z", "deletions": 205, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44711/files", "html_url": "https://github.com/huggingface/transformers/pull/44711", "labels": [ "Code agent slop" ], "merged": false, "number": 44711, "review_comments_count": 0, "state": "closed", "title": "fix: mark new lm_head params as `_is_hf_initialized` after `resize_token_embeddings`", "updated_at": "2026-03-20T13:36:58Z" }, { "additions": 12, "author": "he-yufeng", "author_association": "CONTRIBUTOR", "body_excerpt": "## What does this PR do? Fixes `AutoProcessor.from_pretrained` silently dropping hub kwargs like `force_download`, `cache_dir`, `token`, `revision`, etc. ### The bug The existing code on line ~300 filters kwargs using `inspect.signature(ca\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/44710", "created_at": "2026-03-14T18:33:53Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44710/files", "html_url": "https://github.com/huggingface/transformers/pull/44710", "labels": [], "merged": true, "number": 44710, "review_comments_count": 0, "state": "closed", "title": "Fix AutoProcessor.from_pretrained silently dropping hub kwargs", "updated_at": "2026-03-25T18:13:14Z" }, { "additions": 6778, "author": "LucasMa2025", "author_association": "FIRST_TIMER", "body_excerpt": "# \ud83c\udf9b\ufe0f Add Configurable Generation Scheduler and State Machine for `generate()` ## Summary This PR introduces a **fully optional, zero-intrusion** Generation Scheduler (`GenerationScheduler`) and explicit state machine (`GenerationStateMachi\u2026", "changed_files": 15, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/44708", "created_at": "2026-03-14T17:13:34Z", "deletions": 7, "draft": true, "files_url": "https://github.com/huggingface/transformers/pull/44708/files", "html_url": "https://github.com/huggingface/transformers/pull/44708", "labels": [], "merged": false, "number": 44708, "review_comments_count": 0, "state": "closed", "title": "Add Configurable Generation Scheduler and State Machine for `generate()`", "updated_at": "2026-03-14T19:19:11Z" }, { "additions": 3, "author": "saivedant169", "author_association": "NONE", "body_excerpt": "Fixes part of #32937 ## What does this PR do? Adds `position_ids` as an explicit parameter to `MptForCausalLM.forward()` and `MptModel.forward()`, bringing MPT in line with other CausalLM models. Same rationale as the Bloom PR (#44706) \u2014 M\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44707", "created_at": "2026-03-14T17:12:16Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44707/files", "html_url": "https://github.com/huggingface/transformers/pull/44707", "labels": [ "Code agent slop" ], "merged": false, "number": 44707, "review_comments_count": 0, "state": "closed", "title": "Add position_ids to MptForCausalLM forward pass", "updated_at": "2026-03-18T13:39:36Z" }, { "additions": 3, "author": "saivedant169", "author_association": "NONE", "body_excerpt": "Fixes part of #32937 ## What does this PR do? Adds `position_ids` as an explicit parameter to `BloomForCausalLM.forward()` and `BloomModel.forward()`, bringing Bloom in line with other CausalLM models like Llama, Falcon, Gemma, and Mistral\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44706", "created_at": "2026-03-14T17:09:11Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44706/files", "html_url": "https://github.com/huggingface/transformers/pull/44706", "labels": [ "Code agent slop" ], "merged": false, "number": 44706, "review_comments_count": 0, "state": "closed", "title": "Add position_ids to BloomForCausalLM forward pass", "updated_at": "2026-03-18T13:39:51Z" }, { "additions": 14, "author": "saivedant169", "author_association": "NONE", "body_excerpt": "Fixes part of #32937 ## What does this PR do? RoFormer introduced rotary position embeddings, but its `ForCausalLM` forward method doesn't accept `position_ids` \u2014 which means callers can't specify custom positions for packed sequences or f\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44705", "created_at": "2026-03-14T16:48:06Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44705/files", "html_url": "https://github.com/huggingface/transformers/pull/44705", "labels": [ "Code agent slop" ], "merged": false, "number": 44705, "review_comments_count": 0, "state": "closed", "title": "Add position_ids to RoFormerForCausalLM forward pass", "updated_at": "2026-03-18T13:40:05Z" }, { "additions": 26, "author": "vasqu", "author_association": "MEMBER", "body_excerpt": "As per title, it seems that the `cute` subfolder can be even distributed if you only install FA2 which implies something wrong. Now we check under the (normalized) distribution names", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44703", "created_at": "2026-03-14T14:46:02Z", "deletions": 10, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44703/files", "html_url": "https://github.com/huggingface/transformers/pull/44703", "labels": [], "merged": true, "number": 44703, "review_comments_count": 1, "state": "closed", "title": "[`FA`] Fix fa detection", "updated_at": "2026-03-14T17:19:07Z" }, { "additions": 148, "author": "LincolnBurrows2017", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "## What does this PR fix? The `rms_norm_eps` parameter in `MistralConfig` was incorrectly typed as `int | None` but defaults to `1e-6` which is a float. This parameter is passed to `MistralRMSNorm` which expects `eps: float`. ### Bug Detai\u2026", "changed_files": 8, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44702", "created_at": "2026-03-14T14:41:15Z", "deletions": 25, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44702/files", "html_url": "https://github.com/huggingface/transformers/pull/44702", "labels": [ "Code agent slop" ], "merged": false, "number": 44702, "review_comments_count": 0, "state": "closed", "title": "fix: Correct rms_norm_eps type hint from int to float in MistralConfig", "updated_at": "2026-03-18T13:00:12Z" }, { "additions": 219, "author": "hmellor", "author_association": "MEMBER", "body_excerpt": "These models have `base_model_pp_plan`s but currently do not work because the base model's forward pass depends on all the `layers` being `Qwen2VLDecoderLayer`. i.e. if one of the layers is removed/replaced with `Identity`, `decoder_layer.\u2026", "changed_files": 52, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44699", "created_at": "2026-03-14T11:44:24Z", "deletions": 148, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44699/files", "html_url": "https://github.com/huggingface/transformers/pull/44699", "labels": [], "merged": true, "number": 44699, "review_comments_count": 0, "state": "closed", "title": "Fix several based models' pipeline parallel support", "updated_at": "2026-03-20T13:53:27Z" }, { "additions": 1, "author": "hmellor", "author_association": "MEMBER", "body_excerpt": "The typo in the `elif` chain meant that `image` and `video` modalidty encoders could not be set using this method. This PR fixes the typo so that they can.", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44698", "created_at": "2026-03-14T11:18:54Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44698/files", "html_url": "https://github.com/huggingface/transformers/pull/44698", "labels": [], "merged": true, "number": 44698, "review_comments_count": 0, "state": "closed", "title": "Fix `set_encoder`", "updated_at": "2026-03-14T13:42:00Z" }, { "additions": 75, "author": "LincolnBurrows2017", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "## Description The `torch_float` function in `src/transformers/utils/generic.py` was incorrectly returning `int(x)` in two places where it should return `float(x)`: 1. When torch is not available (fallback case) 2. When not in a tracing co\u2026", "changed_files": 4, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44697", "created_at": "2026-03-14T10:44:12Z", "deletions": 25, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44697/files", "html_url": "https://github.com/huggingface/transformers/pull/44697", "labels": [], "merged": false, "number": 44697, "review_comments_count": 1, "state": "open", "title": "fix: torch_float should return float, not int", "updated_at": "2026-03-17T19:29:02Z" }, { "additions": 19, "author": "hmellor", "author_association": "MEMBER", "body_excerpt": "In configs, `base_model_pp_plan` and `base_model_tp_plan` default to `None` In models, `_pp_plan` and `_tp_plan` _look like_ they default to `None` based on the class variables, but will actually always be a dict because of `post_init`. Th\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/44696", "created_at": "2026-03-14T09:41:07Z", "deletions": 13, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44696/files", "html_url": "https://github.com/huggingface/transformers/pull/44696", "labels": [], "merged": true, "number": 44696, "review_comments_count": 5, "state": "closed", "title": "Fix `supports_{tp/pp}_plan`", "updated_at": "2026-03-31T13:12:56Z" }, { "additions": 4, "author": "harshaljanjani", "author_association": "CONTRIBUTOR", "body_excerpt": "### What does this PR do? The following failing tests were identified and fixed in this PR: \u2192 **Kyutai Speech-To-Text**: [The PR [processors] Unbloating simple processors](https://github.com/huggingface/transformers/pull/40377), [refactore\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 6, "conversation_url": "https://github.com/huggingface/transformers/pull/44695", "created_at": "2026-03-14T09:05:35Z", "deletions": 4, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44695/files", "html_url": "https://github.com/huggingface/transformers/pull/44695", "labels": [], "merged": true, "number": 44695, "review_comments_count": 3, "state": "closed", "title": "fix(testing): Fix Kyutai Speech-To-Text and LongCatFlash test failures on main CI", "updated_at": "2026-04-18T08:25:15Z" }, { "additions": 143, "author": "LincolnBurrows2017", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "## Summary Fixes issue #44625: Qwen3.5 num_labels not propagated from core config to text config. When loading `AutoConfig.from_pretrained(\"Qwen3.5\", num_labels=1)`, the outer config gets `num_labels=1` but the inner `text_config` still ha\u2026", "changed_files": 7, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44693", "created_at": "2026-03-14T05:43:00Z", "deletions": 30, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44693/files", "html_url": "https://github.com/huggingface/transformers/pull/44693", "labels": [], "merged": false, "number": 44693, "review_comments_count": 0, "state": "closed", "title": "fix: Propagate num_labels to text_config in Qwen3.5", "updated_at": "2026-03-18T12:56:25Z" }, { "additions": 18, "author": "gambletan", "author_association": "NONE", "body_excerpt": "## Summary Fixes #44514. `Qwen2_5_VLProcessor.apply_chat_template` crashes with `ValueError` when called with batched input and `padding=False` (the default). The root cause is `np.array(text_inputs[\"input_ids\"])` which fails when sequence\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44692", "created_at": "2026-03-14T04:14:38Z", "deletions": 10, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44692/files", "html_url": "https://github.com/huggingface/transformers/pull/44692", "labels": [ "Code agent slop" ], "merged": false, "number": 44692, "review_comments_count": 0, "state": "closed", "title": "fix: handle ragged input_ids in Qwen2_5_VLProcessor.apply_chat_template", "updated_at": "2026-03-18T12:44:18Z" }, { "additions": 23, "author": "gambletan", "author_association": "NONE", "body_excerpt": "## Summary - Fixes `num_labels` (and `id2label`/`label2id`) not being propagated from the outer `Qwen3_5Config` to its inner `text_config` when passed via `AutoConfig.from_pretrained(..., num_labels=1)`. - When `text_config` is `None` or a\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44691", "created_at": "2026-03-14T04:10:54Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44691/files", "html_url": "https://github.com/huggingface/transformers/pull/44691", "labels": [ "Code agent slop" ], "merged": false, "number": 44691, "review_comments_count": 0, "state": "closed", "title": "Fix Qwen3.5 num_labels not propagated to text_config", "updated_at": "2026-03-18T12:57:19Z" }, { "additions": 6, "author": "gambletan", "author_association": "NONE", "body_excerpt": "## Summary Fixes #44360 The `GlmMoeDsaIndexer` is missing a ReLU activation on the per-head dot-product scores before the weighted sum across heads. The reference DeepSeek V3.2 implementation applies ReLU inside the `fp8_index` kernel: ```\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44690", "created_at": "2026-03-14T03:44:37Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44690/files", "html_url": "https://github.com/huggingface/transformers/pull/44690", "labels": [ "Code agent slop" ], "merged": false, "number": 44690, "review_comments_count": 0, "state": "closed", "title": "Fix missing ReLU in GLM-MOE-DSA indexer scoring", "updated_at": "2026-03-18T12:40:23Z" }, { "additions": 141, "author": "LincolnBurrows2017", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "## Summary Fixes issue #44625: Qwen3.5 num_labels not propagating to text_config. When calling `AutoConfig.from_pretrained(\"Qwen3.5\", num_labels=1)`, the main config gets `num_labels=1` but text_config still has default `num_labels=2`. Thi\u2026", "changed_files": 6, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44688", "created_at": "2026-03-14T00:40:50Z", "deletions": 23, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44688/files", "html_url": "https://github.com/huggingface/transformers/pull/44688", "labels": [ "Code agent slop" ], "merged": false, "number": 44688, "review_comments_count": 0, "state": "closed", "title": "fix: Propagate num_labels to text_config in Qwen models", "updated_at": "2026-03-18T12:56:41Z" }, { "additions": 8, "author": "vxa8502", "author_association": "NONE", "body_excerpt": "Fixes partial #32937 Adds explicit `position_ids` threading through GPT-Neo's attention layers to enable flash attention's packed sequence optimization. ## Context GPT-Neo uses learned absolute position embeddings (`wpe`) applied at the mo\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44687", "created_at": "2026-03-13T23:28:55Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44687/files", "html_url": "https://github.com/huggingface/transformers/pull/44687", "labels": [ "Code agent slop" ], "merged": false, "number": 44687, "review_comments_count": 0, "state": "closed", "title": "Add explicit position_ids to GPT-Neo attention layers", "updated_at": "2026-03-18T13:06:49Z" }, { "additions": 615, "author": "tejasae-afk", "author_association": "NONE", "body_excerpt": "During an automated code review of src/transformers/models/marian/convert_marian_to_pytorch.py, the following issue was identified. Use safe_load in convert marian to pytorch. yaml.load on untrusted input can construct arbitrary Python obj\u2026", "changed_files": 80, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44686", "created_at": "2026-03-13T21:22:07Z", "deletions": 259, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44686/files", "html_url": "https://github.com/huggingface/transformers/pull/44686", "labels": [], "merged": false, "number": 44686, "review_comments_count": 0, "state": "closed", "title": "Use safe_load in convert marian to pytorch", "updated_at": "2026-03-14T03:54:31Z" }, { "additions": 10, "author": "ydshieh", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? For tiny model creation script - new added model test files still miss this argument ...", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44685", "created_at": "2026-03-13T20:53:41Z", "deletions": 3, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44685/files", "html_url": "https://github.com/huggingface/transformers/pull/44685", "labels": [], "merged": true, "number": 44685, "review_comments_count": 0, "state": "closed", "title": "Fix more model tester missing `parent` issue", "updated_at": "2026-03-13T21:03:46Z" }, { "additions": 41, "author": "ntenenz", "author_association": "CONTRIBUTOR", "body_excerpt": "\u2026 # What does this PR do? In torch versions >= 2.9.0, it requests the lse from flex_attenetion using `AuxRequest` instead of the deprecated `return_lse`, which triggers a warning and can break tracing. Fixes #44683 ## Before submitting - [\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44684", "created_at": "2026-03-13T20:16:35Z", "deletions": 5, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44684/files", "html_url": "https://github.com/huggingface/transformers/pull/44684", "labels": [], "merged": true, "number": 44684, "review_comments_count": 8, "state": "closed", "title": "update flex attention to use `return_aux` instead of `return_lse` when torch verison >= 2.9", "updated_at": "2026-03-18T11:44:18Z" }, { "additions": 301, "author": "SunMarc", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Llama cpp integration in transformers serve. Minor changes to add llama.cpp integration Mostly changes on serve to fix latency for streaming and non streaming", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44682", "created_at": "2026-03-13T18:52:41Z", "deletions": 73, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44682/files", "html_url": "https://github.com/huggingface/transformers/pull/44682", "labels": [], "merged": false, "number": 44682, "review_comments_count": 0, "state": "open", "title": "transformers serve + llamacpp", "updated_at": "2026-03-14T07:05:29Z" }, { "additions": 47, "author": "dacorvo", "author_association": "MEMBER", "body_excerpt": "Fixes #44679 ## Summary - Custom attention kernels registered via `load_and_register_attn_kernel` currently get hardcoded `flash_attention_2` mask dispatch, which produces 2D or `None` masks - Kernels that need SDPA-style 4D boolean masks\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44680", "created_at": "2026-03-13T17:55:54Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44680/files", "html_url": "https://github.com/huggingface/transformers/pull/44680", "labels": [], "merged": false, "number": 44680, "review_comments_count": 12, "state": "open", "title": "Allow kernel modules to declare their preferred mask function", "updated_at": "2026-04-14T19:29:06Z" }, { "additions": 9, "author": "JokeYoonic", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "Problem: - On macOS ARM64 + Python 3.13 + transformers 5.x, GPT-2 model's lm_head forward pass produces NaN/Inf values during inference - Root cause: lm_head.weight is tied to transformer.wte.weight, and the shared memory reference causes\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44676", "created_at": "2026-03-13T16:28:01Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44676/files", "html_url": "https://github.com/huggingface/transformers/pull/44676", "labels": [], "merged": false, "number": 44676, "review_comments_count": 0, "state": "open", "title": "fix(gpt2): Resolve NaN/Inf issue in lm_head on Python 3.13 with tied weights", "updated_at": "2026-03-18T17:16:49Z" }, { "additions": 32, "author": "stevhliu", "author_association": "MEMBER", "body_excerpt": "properly formats the `ContinuousBatchingConfig` below: \"Screenshot", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44675", "created_at": "2026-03-13T16:10:28Z", "deletions": 14, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44675/files", "html_url": "https://github.com/huggingface/transformers/pull/44675", "labels": [], "merged": true, "number": 44675, "review_comments_count": 0, "state": "closed", "title": "[docs] cb config", "updated_at": "2026-03-13T23:15:04Z" }, { "additions": 408, "author": "Rocketknight1", "author_association": "MEMBER", "body_excerpt": "We've had `parse_response()` in the library for a while, but it's been a soft launch / prototype feature. This PR cleans it up and documents it, making it an official feature! The API is largely unchanged from the prototype, but we drop `x\u2026", "changed_files": 5, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/44674", "created_at": "2026-03-13T15:41:42Z", "deletions": 34, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44674/files", "html_url": "https://github.com/huggingface/transformers/pull/44674", "labels": [], "merged": true, "number": 44674, "review_comments_count": 11, "state": "closed", "title": "Officially launch parse_response", "updated_at": "2026-03-24T15:55:05Z" }, { "additions": 73, "author": "remi-or", "author_association": "MEMBER", "body_excerpt": "This PR fixes a bug in continuous batching where non-CUDA devices cannot use the feature because some CUDA-exclusive objects are always instantiated. It also adds a test to make sure this will not break again in the future.", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44673", "created_at": "2026-03-13T15:37:01Z", "deletions": 15, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44673/files", "html_url": "https://github.com/huggingface/transformers/pull/44673", "labels": [], "merged": true, "number": 44673, "review_comments_count": 0, "state": "closed", "title": "[CB] [Bug] Fix crashes when running without cuda", "updated_at": "2026-03-15T23:59:55Z" }, { "additions": 1, "author": "neo", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? modular doesn't properly convert some files (e.g. kyutai) Also fixes red CI on main", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44283", "created_at": "2026-02-25T18:33:17Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44283/files", "html_url": "https://github.com/huggingface/transformers/pull/44283", "labels": [], "merged": true, "number": 44283, "review_comments_count": 0, "state": "closed", "title": "[`Modular`] Fix file type regression", "updated_at": "2026-02-25T20:04:41Z" }, { "additions": 5, "author": "Rocketknight1", "author_association": "MEMBER", "body_excerpt": "Response schema save-loading was broken in #40936, this PR restores it! I did most of this in #42300 but missed an issue with loading/saving.", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44282", "created_at": "2026-02-25T17:57:54Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44282/files", "html_url": "https://github.com/huggingface/transformers/pull/44282", "labels": [], "merged": true, "number": 44282, "review_comments_count": 0, "state": "closed", "title": "Restore response_schema saving-loading", "updated_at": "2026-02-25T18:27:22Z" }, { "additions": 1, "author": "ArthurZucker", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Its a very small fix for #44062", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44281", "created_at": "2026-02-25T16:28:37Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44281/files", "html_url": "https://github.com/huggingface/transformers/pull/44281", "labels": [], "merged": true, "number": 44281, "review_comments_count": 0, "state": "closed", "title": "Fix special token maps BC", "updated_at": "2026-02-26T10:34:17Z" }, { "additions": 614, "author": "RishabhMehra", "author_association": "FIRST_TIMER", "body_excerpt": "# What does this PR do? - Adds an opt-in use_fast_grouping flag to TokenClassificationPipeline to enable a NumPy-vectorised BIO grouping path (~5\u00d7 faster on long sequences) while keeping the legacy path as default. - Improves correctness:\u2026", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/44278", "created_at": "2026-02-25T12:49:56Z", "deletions": 63, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44278/files", "html_url": "https://github.com/huggingface/transformers/pull/44278", "labels": [ "Code agent slop" ], "merged": false, "number": 44278, "review_comments_count": 0, "state": "closed", "title": "[FEAT] Pipelines - Faster group_entities", "updated_at": "2026-02-25T13:54:58Z" }, { "additions": 105, "author": "tarekziade", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? This patch makes the GLM-ASR doc example runnable by using `runnables` - see https://github.com/huggingface/doc-builder/blob/main/docs/runnable-code-blocks.md", "changed_files": 5, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 36, "conversation_url": "https://github.com/huggingface/transformers/pull/44277", "created_at": "2026-02-25T08:49:20Z", "deletions": 19, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44277/files", "html_url": "https://github.com/huggingface/transformers/pull/44277", "labels": [], "merged": true, "number": 44277, "review_comments_count": 6, "state": "closed", "title": "Use doc-builder runnable example for GLM-ASR", "updated_at": "2026-04-02T16:16:55Z" }, { "additions": 0, "author": "vishalpatil-45", "author_association": "NONE", "body_excerpt": "# What does this PR do? This PR addresses the performance regression where `import transformers` takes ~3.5s. The issue was caused by eager imports of heavy backend libraries (like torch/numpy) during the initial module load. By moving the\u2026", "changed_files": 0, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44275", "created_at": "2026-02-25T08:27:32Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44275/files", "html_url": "https://github.com/huggingface/transformers/pull/44275", "labels": [ "Code agent slop" ], "merged": false, "number": 44275, "review_comments_count": 0, "state": "closed", "title": "[Fix] Restore lazy loading to improve import performance (#44273)", "updated_at": "2026-02-25T20:37:18Z" }, { "additions": 559, "author": "paipeline", "author_association": "NONE", "body_excerpt": "## Description Fixes #44242 This PR resolves an issue where the auxiliary load balancing loss was not computed when `output_router_logits=False`, even when `router_aux_loss_coef != 0`. ## Problem The auxiliary loss computation was incorrec\u2026", "changed_files": 6, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44274", "created_at": "2026-02-25T06:38:02Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44274/files", "html_url": "https://github.com/huggingface/transformers/pull/44274", "labels": [ "Code agent slop" ], "merged": false, "number": 44274, "review_comments_count": 0, "state": "closed", "title": "Fix auxiliary load balancing loss computation when output_router_logits=False", "updated_at": "2026-02-25T13:36:03Z" }, { "additions": 1, "author": "hangjun-ezra", "author_association": "CONTRIBUTOR", "body_excerpt": "## What does this PR do? Fixes a `TypeError: unsupported operand type(s) for |: 'list' and 'set'` in `RotaryEmbeddingConfigMixin.convert_rope_params_to_dict` when `ignore_keys_at_rope_validation` is a `list` instead of a `set`. ### Root ca\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44272", "created_at": "2026-02-25T03:52:04Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44272/files", "html_url": "https://github.com/huggingface/transformers/pull/44272", "labels": [], "merged": true, "number": 44272, "review_comments_count": 0, "state": "closed", "title": "Fix TypeError in convert_rope_params_to_dict when ignore_keys is a list", "updated_at": "2026-02-25T14:38:36Z" }, { "additions": 1272, "author": "balak4", "author_association": "CONTRIBUTOR", "body_excerpt": "## Summary - Add GreedyLR, a metric-based adaptive learning rate scheduler that adjusts the learning rate during training based on the current loss - Based on [\"Dynamic Learning Rate Scheduling based on Loss Changes Leads to Faster Converg\u2026", "changed_files": 10, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 9, "conversation_url": "https://github.com/huggingface/transformers/pull/44271", "created_at": "2026-02-25T01:40:57Z", "deletions": 7, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44271/files", "html_url": "https://github.com/huggingface/transformers/pull/44271", "labels": [], "merged": true, "number": 44271, "review_comments_count": 3, "state": "closed", "title": "Add GreedyLR adaptive learning rate scheduler", "updated_at": "2026-03-18T18:45:46Z" }, { "additions": 88, "author": "yonigozlan", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? A lot of ProcessorsKwargs have incorrect/unspecified type hints in their ProcessorsKwargs TypedDict for their images_kwargs attribute. Functionnaly, this did not cause issues as \"_merge_kwargs\" automatically picks u\u2026", "changed_files": 44, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44270", "created_at": "2026-02-25T00:11:31Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44270/files", "html_url": "https://github.com/huggingface/transformers/pull/44270", "labels": [], "merged": false, "number": 44270, "review_comments_count": 0, "state": "open", "title": "Add correct typing to custom images_kwargs in ProcessorsKwargs", "updated_at": "2026-02-25T01:12:06Z" }, { "additions": 30, "author": "yonigozlan", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? This is a follow-up to https://github.com/huggingface/transformers/pull/43748, and will allow to have clickable links to the full modality kwargs when present in the docstring of a processor or image processor Cc @s\u2026", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44269", "created_at": "2026-02-25T00:05:47Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44269/files", "html_url": "https://github.com/huggingface/transformers/pull/44269", "labels": [], "merged": true, "number": 44269, "review_comments_count": 0, "state": "closed", "title": "Add `ProcessingKwargs` `ImagesKwargs` etc. to docs", "updated_at": "2026-02-27T19:03:15Z" }, { "additions": 5, "author": "ethanknights", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? Some improvements to the `trainer.py` docs. ## Before submitting - [x] This PR fixes a typo or improves the docs. ## Who can review? Documentation: @stevhliu", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44268", "created_at": "2026-02-24T23:20:16Z", "deletions": 4, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44268/files", "html_url": "https://github.com/huggingface/transformers/pull/44268", "labels": [], "merged": true, "number": 44268, "review_comments_count": 0, "state": "closed", "title": "chore: fixes in `Trainer` class docs (`compute_loss` & `hyperparameter_search`)", "updated_at": "2026-02-26T00:50:23Z" }, { "additions": 4, "author": "manavshrivastavagit", "author_association": "NONE", "body_excerpt": "## Summary - Update the `DocumentQuestionAnsweringPipeline` docstring to explicitly mention the task summary in the Transformers documentation. - Remove the stale TODO comment now that document question answering is covered in the task sum\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44267", "created_at": "2026-02-24T20:35:18Z", "deletions": 4, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44267/files", "html_url": "https://github.com/huggingface/transformers/pull/44267", "labels": [ "Code agent slop" ], "merged": false, "number": 44267, "review_comments_count": 0, "state": "closed", "title": "Docs: point DocumentQuestionAnswering pipeline to task summary", "updated_at": "2026-02-25T13:34:48Z" }, { "additions": 27, "author": "harshaljanjani", "author_association": "CONTRIBUTOR", "body_excerpt": "### What does this PR do? The following issue was identified and fixed in this PR: \u2192 **Reasoning:** The impact of this fix goes beyond `Mask2Former` and `DeformableDetr` and should fix any model that uses `torch_compilable_check`. Most use\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 8, "conversation_url": "https://github.com/huggingface/transformers/pull/44266", "created_at": "2026-02-24T20:02:06Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44266/files", "html_url": "https://github.com/huggingface/transformers/pull/44266", "labels": [], "merged": true, "number": 44266, "review_comments_count": 0, "state": "closed", "title": "fix(utils): Make torch_compilable_check compatible with torch.export strict mode", "updated_at": "2026-04-18T08:31:33Z" }, { "additions": 90, "author": "vasqu", "author_association": "MEMBER", "body_excerpt": "As per title, WIP --> needs a test", "changed_files": 36, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/44264", "created_at": "2026-02-24T18:06:58Z", "deletions": 210, "draft": true, "files_url": "https://github.com/huggingface/transformers/pull/44264/files", "html_url": "https://github.com/huggingface/transformers/pull/44264", "labels": [], "merged": false, "number": 44264, "review_comments_count": 3, "state": "open", "title": "[`Moe`] Enable aux loss automatically when in training + coef is not 0", "updated_at": "2026-02-25T18:53:20Z" }, { "additions": 5882, "author": "SunMarc", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? This PR refactor the common tests that we have in Trainer. I've mainly did the following: - Split the tests that we have in `test_trainer.py` into multiple files. - Fix common tests that were failing in the CI", "changed_files": 18, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44260", "created_at": "2026-02-24T15:51:11Z", "deletions": 6147, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44260/files", "html_url": "https://github.com/huggingface/transformers/pull/44260", "labels": [], "merged": true, "number": 44260, "review_comments_count": 3, "state": "closed", "title": "Update common tests Trainer", "updated_at": "2026-02-27T17:31:59Z" }, { "additions": 1830, "author": "winglian", "author_association": "COLLABORATOR", "body_excerpt": "# What does this PR do? This PR supersedes #43985 to replace the dataset/sampler/dataloader with a data producer that should allow us to more easily get to the next step of async training for RL. \"\". Then we compare `\"\" != \"LlamaTokenizer\"` (the `tokenizer_class` in `tokenizer_config.json`). Since that's true we earl\u2026", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 5, "conversation_url": "https://github.com/huggingface/transformers/pull/44127", "created_at": "2026-02-18T10:41:48Z", "deletions": 8, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44127/files", "html_url": "https://github.com/huggingface/transformers/pull/44127", "labels": [], "merged": true, "number": 44127, "review_comments_count": 0, "state": "closed", "title": "AutoTokenizer ignores config when model_type is None", "updated_at": "2026-02-18T14:47:52Z" }, { "additions": 17, "author": "Cyrilvallez", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? As per the title. Let's simplify after https://github.com/huggingface/transformers/pull/42848", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44126", "created_at": "2026-02-18T09:58:49Z", "deletions": 40, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44126/files", "html_url": "https://github.com/huggingface/transformers/pull/44126", "labels": [], "merged": true, "number": 44126, "review_comments_count": 0, "state": "closed", "title": "Simplify input preparation in generate", "updated_at": "2026-02-18T10:30:48Z" }, { "additions": 8, "author": "zucchini-nlp", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Fixes https://github.com/huggingface/transformers/issues/43986", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44125", "created_at": "2026-02-18T09:34:54Z", "deletions": 7, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44125/files", "html_url": "https://github.com/huggingface/transformers/pull/44125", "labels": [], "merged": true, "number": 44125, "review_comments_count": 2, "state": "closed", "title": "Raise informative error when loading video processors", "updated_at": "2026-02-20T08:23:35Z" }, { "additions": 10, "author": "mariam851", "author_association": "CONTRIBUTOR", "body_excerpt": "Description: Adds eval_on_end to TrainingArguments to force evaluation at the end of training, even if the last step doesn't align with eval_steps. Changes: training_args.py: Added eval_on_end field. trainer.py: Added logic to call evaluat\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/44124", "created_at": "2026-02-18T08:52:23Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44124/files", "html_url": "https://github.com/huggingface/transformers/pull/44124", "labels": [], "merged": false, "number": 44124, "review_comments_count": 0, "state": "closed", "title": "feat: add eval_on_end to Trainer for final evaluation", "updated_at": "2026-02-18T14:14:16Z" }, { "additions": 33, "author": "cyyever", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? This PR avoids device sync in training loss accumulation by ```torch.where```. The `is_torch_xla_available` condition is also removed.", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44123", "created_at": "2026-02-18T08:22:57Z", "deletions": 22, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44123/files", "html_url": "https://github.com/huggingface/transformers/pull/44123", "labels": [], "merged": false, "number": 44123, "review_comments_count": 0, "state": "open", "title": "Avoid device sync in training loss accumulation", "updated_at": "2026-03-30T07:57:16Z" }, { "additions": 158, "author": "adityuhkapoor", "author_association": "NONE", "body_excerpt": "# What does this PR do? Adds 4-bit embedding quantization for BitsAndBytes, mirroring TorchAO's existing `include_input_output_embeddings` and `untie_embedding_weights` pattern (PRs #37802, #37905, #37935). Large-vocabulary models (Llama 3\u2026", "changed_files": 4, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44122", "created_at": "2026-02-18T06:35:09Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44122/files", "html_url": "https://github.com/huggingface/transformers/pull/44122", "labels": [ "Code agent slop" ], "merged": false, "number": 44122, "review_comments_count": 0, "state": "closed", "title": "Add BnB 4-bit embedding quantization support", "updated_at": "2026-02-18T14:27:25Z" }, { "additions": 14, "author": "tirth8205", "author_association": "NONE", "body_excerpt": "Fixes #34920 After applying `normalize()`, images can have negative values. Calling `resize()` on such images fails because it internally converts to PIL, which requires values in [0, 1] or [0, 255]. ### Fix When the image has values outsi\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/44120", "created_at": "2026-02-17T23:56:48Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44120/files", "html_url": "https://github.com/huggingface/transformers/pull/44120", "labels": [ "Code agent slop" ], "merged": false, "number": 44120, "review_comments_count": 0, "state": "closed", "title": "fix: allow image_transforms.resize to handle negative values after normalization", "updated_at": "2026-02-18T14:08:54Z" }, { "additions": 1, "author": "tirth8205", "author_association": "NONE", "body_excerpt": "Fixes #44117 `TOKENIZER_MAPPING_NAMES.get(config_model_type, \"\")` returns `None` when the key exists with value `None`, causing `AttributeError: 'NoneType' object has no attribute 'replace'` when loading models like `google/siglip2-so400m-\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/44119", "created_at": "2026-02-17T23:53:20Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44119/files", "html_url": "https://github.com/huggingface/transformers/pull/44119", "labels": [], "merged": false, "number": 44119, "review_comments_count": 0, "state": "closed", "title": "fix: handle None value from TOKENIZER_MAPPING_NAMES.get() in AutoTokenizer", "updated_at": "2026-02-18T14:04:47Z" }, { "additions": 32, "author": "tirth8205", "author_association": "NONE", "body_excerpt": "## Fix Fixes #44079 When a `ModelOutput` dataclass field is initialized as `None`, it is correctly excluded from the OrderedDict keys. However, **subsequently setting that field to a non-None value** via attribute assignment (e.g. `outputs\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 0, "conversation_url": "https://github.com/huggingface/transformers/pull/44118", "created_at": "2026-02-17T23:31:31Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44118/files", "html_url": "https://github.com/huggingface/transformers/pull/44118", "labels": [ "Code agent slop" ], "merged": false, "number": 44118, "review_comments_count": 0, "state": "closed", "title": "fix: ModelOutput keys not updated when setting previously-None dataclass fields", "updated_at": "2026-02-18T14:18:12Z" }, { "additions": 27, "author": "dtiourine", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "Migrate Flaubert to the @capture_outputs and @can_return_tuple decorator pattern for output handling, as part of #43979. # What does this PR do? - Add `_can_record_outputs = {\"attentions\": MultiHeadAttention}` on `FlaubertPreTrainedModel`\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44116", "created_at": "2026-02-17T21:52:13Z", "deletions": 102, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44116/files", "html_url": "https://github.com/huggingface/transformers/pull/44116", "labels": [], "merged": false, "number": 44116, "review_comments_count": 0, "state": "open", "title": "[WIP] [Flaubert] Refactor output tracing to decorator-based interface", "updated_at": "2026-02-17T21:53:23Z" }, { "additions": 2, "author": "Deep-unlearning", "author_association": "MEMBER", "body_excerpt": "## Summary - Fix broken `[chat template](./chat_templating)` links in `docs/source/en/tasks/` - `./chat_templating` resolves within `tasks/` (doesn't exist); corrected to `../chat_templating` - Affected files: `tasks/image_text_to_text.md`\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44115", "created_at": "2026-02-17T21:32:55Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44115/files", "html_url": "https://github.com/huggingface/transformers/pull/44115", "labels": [], "merged": true, "number": 44115, "review_comments_count": 0, "state": "closed", "title": "[docs] fix broken chat_templating links in tasks docs", "updated_at": "2026-02-23T16:27:57Z" }, { "additions": 716, "author": "23atharvaS", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "## Summary This PR migrates the `wav2vec2` family to the standardized output-capturing interface (`@capture_outputs` + `@can_return_tuple`) and includes follow-up compatibility fixes required to make full CI green. ## What changed ### Core\u2026", "changed_files": 19, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44114", "created_at": "2026-02-17T21:17:35Z", "deletions": 1237, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44114/files", "html_url": "https://github.com/huggingface/transformers/pull/44114", "labels": [], "merged": false, "number": 44114, "review_comments_count": 0, "state": "open", "title": "Migrate wav2vec2, wav2vec2_conformer, and wav2vec2_bert to standardized output collection decorators", "updated_at": "2026-02-18T20:34:53Z" }, { "additions": 5, "author": "harshaljanjani", "author_association": "CONTRIBUTOR", "body_excerpt": "### What does this PR do? The following issue was identified and fixed in this PR: \u2192 Updates the stale `test_device_override` in `test_processing_granite_speech.py` to verify that the device param controls where speech inputs are placed, r\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44113", "created_at": "2026-02-17T20:01:32Z", "deletions": 7, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44113/files", "html_url": "https://github.com/huggingface/transformers/pull/44113", "labels": [], "merged": true, "number": 44113, "review_comments_count": 2, "state": "closed", "title": "fix(testing): Update stale device override test in GraniteSpeech", "updated_at": "2026-04-18T08:32:21Z" }, { "additions": 30, "author": "fumadari", "author_association": "NONE", "body_excerpt": "## Summary - Part of #43979 \u2014 refactors `poolformer` to use the `capture_outputs`, `can_return_tuple`, and `merge_with_config_defaults` decorators - Simplifies `PoolFormerLayer` to return a single tensor instead of a 1-tuple - Simplifies `\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/44111", "created_at": "2026-02-17T19:38:02Z", "deletions": 59, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44111/files", "html_url": "https://github.com/huggingface/transformers/pull/44111", "labels": [], "merged": false, "number": 44111, "review_comments_count": 0, "state": "closed", "title": "refactor(poolformer): use capture_outputs for output tracing", "updated_at": "2026-02-18T21:19:22Z" }, { "additions": 28, "author": "fumadari", "author_association": "NONE", "body_excerpt": "## Summary - Part of #43979 \u2014 refactors `tvp` to use the `capture_outputs`, `can_return_tuple`, and `merge_with_config_defaults` decorators - Simplifies `TvpAttention` to always return `(output, attention_probs)` (hooks decide what to capt\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44110", "created_at": "2026-02-17T19:32:55Z", "deletions": 101, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44110/files", "html_url": "https://github.com/huggingface/transformers/pull/44110", "labels": [], "merged": false, "number": 44110, "review_comments_count": 0, "state": "closed", "title": "refactor(tvp): use capture_outputs for output tracing", "updated_at": "2026-02-18T21:19:24Z" }, { "additions": 48, "author": "fumadari", "author_association": "NONE", "body_excerpt": "## Summary - Part of #43979 \u2014 refactors `hgnet_v2` to use the `capture_outputs` and `merge_with_config_defaults` decorators - Simplifies `HGNetV2Encoder` by removing `return_dict` parameter (always returns `BaseModelOutputWithNoAttention`)\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44109", "created_at": "2026-02-17T19:23:03Z", "deletions": 87, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44109/files", "html_url": "https://github.com/huggingface/transformers/pull/44109", "labels": [], "merged": false, "number": 44109, "review_comments_count": 0, "state": "closed", "title": "refactor(hgnet_v2): use capture_outputs for output tracing", "updated_at": "2026-02-18T21:19:25Z" }, { "additions": 33, "author": "fumadari", "author_association": "NONE", "body_excerpt": "## Summary - Adds `@merge_with_config_defaults` and `@capture_outputs` to both `VitDetModel` and `VitDetBackbone`, removing manual `output_attentions`/`return_dict` resolution - Adds `_can_record_outputs = {\"attentions\": VitDetAttention}`\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44108", "created_at": "2026-02-17T19:15:00Z", "deletions": 82, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44108/files", "html_url": "https://github.com/huggingface/transformers/pull/44108", "labels": [], "merged": false, "number": 44108, "review_comments_count": 0, "state": "closed", "title": "refactor(vitdet): use output tracing decorators", "updated_at": "2026-02-18T21:19:27Z" }, { "additions": 40, "author": "fumadari", "author_association": "NONE", "body_excerpt": "## Summary - Replaces manual `output_hidden_states`/`return_dict` resolution in `MraModel` with `@merge_with_config_defaults` and `@capture_outputs` decorators - Simplifies `MraEncoder` to a plain loop returning a single tensor, removing `\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44107", "created_at": "2026-02-17T19:04:42Z", "deletions": 112, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44107/files", "html_url": "https://github.com/huggingface/transformers/pull/44107", "labels": [], "merged": false, "number": 44107, "review_comments_count": 0, "state": "closed", "title": "refactor(mra): use output tracing decorators", "updated_at": "2026-02-18T21:19:29Z" }, { "additions": 47, "author": "fumadari", "author_association": "NONE", "body_excerpt": "## Summary - Replace manual `hidden_states`/`attentions` collection in `YosoEncoder` with the `@capture_outputs` decorator and forward hooks - Add `@can_return_tuple` to all 5 wrapper model classes, eliminating manual `return_dict` handlin\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44106", "created_at": "2026-02-17T18:59:25Z", "deletions": 132, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44106/files", "html_url": "https://github.com/huggingface/transformers/pull/44106", "labels": [], "merged": false, "number": 44106, "review_comments_count": 0, "state": "closed", "title": "Refactor yoso to use automatic output tracing", "updated_at": "2026-02-18T21:19:30Z" }, { "additions": 39, "author": "fumadari", "author_association": "NONE", "body_excerpt": "## Summary - Replace manual `hidden_states`/`attentions` collection in `LiltEncoder` with the `@capture_outputs` decorator and forward hooks - Add `@can_return_tuple` to all 3 wrapper model classes, eliminating manual `return_dict` handlin\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44105", "created_at": "2026-02-17T18:54:40Z", "deletions": 127, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44105/files", "html_url": "https://github.com/huggingface/transformers/pull/44105", "labels": [], "merged": false, "number": 44105, "review_comments_count": 0, "state": "closed", "title": "Refactor lilt to use automatic output tracing", "updated_at": "2026-02-18T21:19:32Z" }, { "additions": 66, "author": "fumadari", "author_association": "NONE", "body_excerpt": "## Summary - Replace manual `hidden_states`/`attentions`/`cross_attentions` collection in `MegatronBertEncoder` with the `@capture_outputs` decorator and forward hooks - Add `@can_return_tuple` to all 8 wrapper model classes, eliminating m\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44104", "created_at": "2026-02-17T18:43:44Z", "deletions": 207, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44104/files", "html_url": "https://github.com/huggingface/transformers/pull/44104", "labels": [], "merged": false, "number": 44104, "review_comments_count": 0, "state": "closed", "title": "Refactor megatron_bert to use automatic output tracing", "updated_at": "2026-02-18T21:19:34Z" }, { "additions": 53, "author": "engmohamedsalah", "author_association": "NONE", "body_excerpt": "Fixes #44052 Now and then, the indexer ran into trouble switching between masks and cache. Most of the test failures came from these hiccups: - Indexer cache: the old if seq_len > 1: reset cache heuristic broke assisted decoding (multi-tok\u2026", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44103", "created_at": "2026-02-17T18:04:48Z", "deletions": 76, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44103/files", "html_url": "https://github.com/huggingface/transformers/pull/44103", "labels": [], "merged": false, "number": 44103, "review_comments_count": 0, "state": "closed", "title": "Fix glm_moe_dsa", "updated_at": "2026-02-18T19:38:11Z" }, { "additions": 42, "author": "fumadari", "author_association": "NONE", "body_excerpt": "## Summary Refactors the `ibert` model to use the new `@capture_outputs` and `@can_return_tuple` decorators for output tracing, as part of the meta-issue #43979. **Key changes:** - Added `_can_record_outputs = {\"hidden_states\": IBertLayer,\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44102", "created_at": "2026-02-17T17:21:32Z", "deletions": 154, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44102/files", "html_url": "https://github.com/huggingface/transformers/pull/44102", "labels": [], "merged": false, "number": 44102, "review_comments_count": 0, "state": "closed", "title": "Refactor ibert output tracing with capture_outputs", "updated_at": "2026-02-18T21:19:35Z" }, { "additions": 210, "author": "aman-coder03", "author_association": "FIRST_TIME_CONTRIBUTOR", "body_excerpt": "## What does this PR do? This PR refactors XLM's output tracing to align with the standardized output capturing patterns used across the codebase. ### Key changes: - Refactors transformer blocks into a dedicated `XLMLayer` module to enable\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/44101", "created_at": "2026-02-17T17:15:06Z", "deletions": 194, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44101/files", "html_url": "https://github.com/huggingface/transformers/pull/44101", "labels": [], "merged": false, "number": 44101, "review_comments_count": 0, "state": "open", "title": "[XLM] Refactor output tracing to align with capture_outputs standardized architecture", "updated_at": "2026-02-19T08:08:33Z" }, { "additions": 3, "author": "qgallouedec", "author_association": "MEMBER", "body_excerpt": "In https://github.com/huggingface/trl/pull/5112 a user reported that `trl sft --help` fails It's because three inherited args from `TrainingArguments` (`torch_empty_cache_steps`, `gradient_checkpointing` and `use_liger_kernel`)help strings\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/44100", "created_at": "2026-02-17T17:10:36Z", "deletions": 3, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/44100/files", "html_url": "https://github.com/huggingface/transformers/pull/44100", "labels": [], "merged": true, "number": 44100, "review_comments_count": 0, "state": "closed", "title": "Fix percentage formatting in help messages for gradient checkpointing, Liger Kernel, and empty cache steps", "updated_at": "2026-02-20T09:57:51Z" }, { "additions": 2, "author": "qgallouedec", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? ## Related Issue Fixes #40170 **Issue:** Add MXFP4 MoE/attention backward kernels **URL:** https://github.com/huggingface/transformers/issues/40170 ## Problem ## A Call To Action! The Hugg\u2026", "changed_files": 6, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 7, "conversation_url": "https://github.com/huggingface/transformers/pull/43771", "created_at": "2026-02-05T15:12:21Z", "deletions": 4, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43771/files", "html_url": "https://github.com/huggingface/transformers/pull/43771", "labels": [ "Code agent slop" ], "merged": false, "number": 43771, "review_comments_count": 0, "state": "closed", "title": "fix: Add MXFP4 MoE/attention backward kernels", "updated_at": "2026-03-24T14:14:44Z" }, { "additions": 47, "author": "lordaarush", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? Removes the unconditional `self.state.train_batch_size = self._train_batch_size` assignment that was causing issues when resuming from checkpoint with different batch configurations. The `train_batch_size` should on\u2026", "changed_files": 2, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 7, "conversation_url": "https://github.com/huggingface/transformers/pull/43770", "created_at": "2026-02-05T14:25:36Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43770/files", "html_url": "https://github.com/huggingface/transformers/pull/43770", "labels": [], "merged": true, "number": 43770, "review_comments_count": 0, "state": "closed", "title": "Remove unconditional train_batch_size assignment", "updated_at": "2026-02-06T14:47:16Z" }, { "additions": 3950, "author": "eustlb", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Adds voxtral realtime! ## benchmarks Using [this reproducer](https://gist.github.com/eustlb/367f062f77a5971291fb5350763bea8d), I've ran WER evals on ami, librispeech and fleurs, with results Dataset | Original (vllm\u2026", "changed_files": 21, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/43769", "created_at": "2026-02-05T14:17:52Z", "deletions": 2, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43769/files", "html_url": "https://github.com/huggingface/transformers/pull/43769", "labels": [ "New model", "Audio" ], "merged": true, "number": 43769, "review_comments_count": 39, "state": "closed", "title": "Add Voxtral Realtime", "updated_at": "2026-02-26T10:18:32Z" }, { "additions": 87, "author": "zucchini-nlp", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Helps vLLM to bump to v5", "changed_files": 6, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 5, "conversation_url": "https://github.com/huggingface/transformers/pull/43768", "created_at": "2026-02-05T14:04:02Z", "deletions": 5, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43768/files", "html_url": "https://github.com/huggingface/transformers/pull/43768", "labels": [], "merged": true, "number": 43768, "review_comments_count": 10, "state": "closed", "title": "Fix init weights in remote code", "updated_at": "2026-02-17T14:45:18Z" }, { "additions": 850, "author": "XingweiDeng", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? src/transformers/utils/import_utils.py:2317:16\u2026", "changed_files": 0, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/43709", "created_at": "2026-02-03T14:26:58Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43709/files", "html_url": "https://github.com/huggingface/transformers/pull/43709", "labels": [], "merged": true, "number": 43709, "review_comments_count": 0, "state": "closed", "title": "fix: `VersionComparison.from_string` return type mismatch", "updated_at": "2026-02-23T19:05:33Z" }, { "additions": 2202, "author": "liu-jiaxuan", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingfa\u2026", "changed_files": 16, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 11, "conversation_url": "https://github.com/huggingface/transformers/pull/43707", "created_at": "2026-02-03T13:33:41Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43707/files", "html_url": "https://github.com/huggingface/transformers/pull/43707", "labels": [ "New model" ], "merged": true, "number": 43707, "review_comments_count": 145, "state": "closed", "title": "[Model] Add SLANeXt Model Support", "updated_at": "2026-03-20T17:24:22Z" }, { "additions": 42, "author": "vasqu", "author_association": "MEMBER", "body_excerpt": "As per title, the new way to call the attention interface has slipped through a refactor because it's too new and not too well known atm cc @yonigozlan", "changed_files": 9, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/43706", "created_at": "2026-02-03T11:57:22Z", "deletions": 48, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43706/files", "html_url": "https://github.com/huggingface/transformers/pull/43706", "labels": [], "merged": true, "number": 43706, "review_comments_count": 2, "state": "closed", "title": "[`Attn`] Fixup interface usage after refactor", "updated_at": "2026-02-03T14:56:35Z" }, { "additions": 120, "author": "Cyrilvallez", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Allow the `is_causal` kwarg and config attribute to make well-behaved decoder-only models act as encoders", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 1, "conversation_url": "https://github.com/huggingface/transformers/pull/43705", "created_at": "2026-02-03T11:45:43Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43705/files", "html_url": "https://github.com/huggingface/transformers/pull/43705", "labels": [], "merged": true, "number": 43705, "review_comments_count": 11, "state": "closed", "title": "Allow bi-directional attention for all models", "updated_at": "2026-02-04T17:24:32Z" }, { "additions": 1, "author": "francesco-bertolotti", "author_association": "CONTRIBUTOR", "body_excerpt": "wrong `rms_norm_type` # What does this PR do? Small type error in the configuration of qwen3. `rms_norm_eps` should be a float and not an int. ## Before submitting - [ X] This PR fixes a typo or improves the docs (you can dismiss the other\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/43703", "created_at": "2026-02-03T10:05:17Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43703/files", "html_url": "https://github.com/huggingface/transformers/pull/43703", "labels": [], "merged": true, "number": 43703, "review_comments_count": 0, "state": "closed", "title": "Update configuration_qwen3.py", "updated_at": "2026-02-04T07:03:04Z" }, { "additions": 2828, "author": "eustlb", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? Adds[ UsefulSensors'](https://huggingface.co/UsefulSensors) new ASR model.", "changed_files": 19, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/43702", "created_at": "2026-02-03T09:32:42Z", "deletions": 247, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43702/files", "html_url": "https://github.com/huggingface/transformers/pull/43702", "labels": [ "New model" ], "merged": true, "number": 43702, "review_comments_count": 30, "state": "closed", "title": "Add moonshine streaming", "updated_at": "2026-02-12T10:10:16Z" }, { "additions": 1, "author": "YangKai0616", "author_association": "CONTRIBUTOR", "body_excerpt": "Here pytorch has a mature mechanism to auto select the right backend for different devices. @ydshieh pls help review, thx!", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 6, "conversation_url": "https://github.com/huggingface/transformers/pull/43699", "created_at": "2026-02-03T07:33:04Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43699/files", "html_url": "https://github.com/huggingface/transformers/pull/43699", "labels": [], "merged": false, "number": 43699, "review_comments_count": 3, "state": "closed", "title": "avoid using specified backend for tp tests", "updated_at": "2026-03-09T08:17:48Z" }, { "additions": 1, "author": "sywangyi", "author_association": "CONTRIBUTOR", "body_excerpt": "- model loading (from pretrained, etc): @CyrilVallez - distributed: @3outeille @ArthurZucker fix tp crash. crash stack is [rank0]: Traceback (most recent call last): [rank0]: File \"/transformers/benchmark_v2/test_tp.py\", line 29, in - Qwen/Qwen3-30B-A3B-Instruct-2507-FP8 ```bash input = { \"messages\": [ { \"role\": \"user\", \"content\": [ { \"type\": \"text\", \"text\": \"The history of France is \", } ], }, ], } I have a question about th\u2026", "changed_files": 1, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/43670", "created_at": "2026-02-02T02:06:14Z", "deletions": 1, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43670/files", "html_url": "https://github.com/huggingface/transformers/pull/43670", "labels": [], "merged": true, "number": 43670, "review_comments_count": 0, "state": "closed", "title": "Fix FP8Expert for Qwen", "updated_at": "2026-02-02T15:18:49Z" }, { "additions": 2, "author": "fschlatt", "author_association": "CONTRIBUTOR", "body_excerpt": "# What does this PR do? makes the whole mixin behave like a static holder for methods... - Modify methods/inherited cl\u2026", "changed_files": 137, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/43620", "created_at": "2026-01-30T11:24:09Z", "deletions": 288, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43620/files", "html_url": "https://github.com/huggingface/transformers/pull/43620", "labels": [], "merged": true, "number": 43620, "review_comments_count": 0, "state": "closed", "title": "[`Rope`] Revert #43410 and make inheritance implicit again", "updated_at": "2026-01-30T18:44:16Z" }, { "additions": 40, "author": "zucchini-nlp", "author_association": "MEMBER", "body_excerpt": "# What does this PR do? As per title, some models add or delete entries in tied weights depending on configuration. If we load two models consecutively with different configs, it fails to tie weights correctly I am copying it in `__init__`\u2026", "changed_files": 4, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/43619", "created_at": "2026-01-30T10:43:38Z", "deletions": 6, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43619/files", "html_url": "https://github.com/huggingface/transformers/pull/43619", "labels": [ "for patch" ], "merged": true, "number": 43619, "review_comments_count": 8, "state": "closed", "title": "Don't modify `tied_weight_keys` in-place", "updated_at": "2026-01-30T15:46:02Z" }, { "additions": 17, "author": "kaixuanliu", "author_association": "CONTRIBUTOR", "body_excerpt": "@zucchini-nlp pls help review, thx! We have to add back the changes in https://github.com/huggingface/transformers/pull/42523. As for llava_onevision model, in its checkpoint config file, the model's `tie_word_embeddings` is Flase, and mod\u2026", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 3, "conversation_url": "https://github.com/huggingface/transformers/pull/43617", "created_at": "2026-01-30T10:21:45Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43617/files", "html_url": "https://github.com/huggingface/transformers/pull/43617", "labels": [], "merged": false, "number": 43617, "review_comments_count": 0, "state": "closed", "title": "Fix tie_word_embedding issue for llava_onevision model", "updated_at": "2026-04-13T02:41:01Z" }, { "additions": 3, "author": "yiliu30", "author_association": "CONTRIBUTOR", "body_excerpt": "Signed-off-by: yiliu30 # What does this PR do? ## Related Issue Fixes #43408 **Issue:** Warning: You are using a model of type sam3_video to instantiate a model of type sam3_tracker **URL:** https://github.com/huggingface/transformers/\u2026", "changed_files": 8, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 12, "conversation_url": "https://github.com/huggingface/transformers/pull/43495", "created_at": "2026-01-26T12:46:21Z", "deletions": 7, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43495/files", "html_url": "https://github.com/huggingface/transformers/pull/43495", "labels": [], "merged": true, "number": 43495, "review_comments_count": 4, "state": "closed", "title": "fix: add compatible_model_types to suppress model type mismatch warnings", "updated_at": "2026-02-05T13:31:24Z" }, { "additions": 20, "author": "githubnemo", "author_association": "MEMBER", "body_excerpt": "The Qwen3 MoE config was missing the mapping attribute for the num_expert_local config variable which made it impossible to load FP8 quantized models, due to the following exception: ``` Traceback (most recent call last): File \".../exps/tr\u2026", "changed_files": 3, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 4, "conversation_url": "https://github.com/huggingface/transformers/pull/43494", "created_at": "2026-01-26T11:34:05Z", "deletions": 0, "draft": false, "files_url": "https://github.com/huggingface/transformers/pull/43494/files", "html_url": "https://github.com/huggingface/transformers/pull/43494", "labels": [], "merged": true, "number": 43494, "review_comments_count": 1, "state": "closed", "title": "Fix loading of Qwen3 FP8", "updated_at": "2026-01-27T09:56:23Z" }, { "additions": 54, "author": "eustlb", "author_association": "MEMBER", "body_excerpt": "# What does this PR do?", "changed_files": 5, "cluster_id": null, "cluster_ids": [], "cluster_role": null, "comments_count": 2, "conversation_url": "https://github.com/huggingface/transformers/pull/43492", "created_at": "2026-01-26T10:30:53Z", "deletions": 1, "draft": true, "files_url": "https://github.com/huggingface/transformers/pull/43492/files", "html_url": "https://github.com/huggingface/transformers/pull/43492", "labels": [], "merged": false, "number": 43492, "review_comments_count": 0, "state": "open", "title": "Perception Encoder follow up PR", "updated_at": "2026-01-26T12:55:35Z" }, { "additions": 605, "author": "tarekziade", "author_association": "MEMBER", "body_excerpt": "DRAFT FOR DISCUSSION # What does this PR do?