fix: add missing use_deterministic_attn parameter to MoonViT3dEncoder

#22

MoonViT3dEncoder.init references self.use_deterministic_attn on line 575
when constructing the MoonViTEncoderLayer blocks, but the attribute is never
set on self. Loading the model via AutoModelForCausalLM with
trust_remote_code=True raises:

AttributeError: 'MoonViT3dEncoder' object has no attribute
                'use_deterministic_attn'

The sibling class MoonViTEncoderLayer already accepts use_deterministic_attn
as a keyword parameter with default False, so the attribute on the parent
3d-encoder was clearly intended to plumb through the same flag. Restore the
missing parameter with the same default.

Production serving paths (vLLM's Kimi-K25 model executor) bypass the HF
custom modeling init and construct the vision tower differently, so this
bug is invisible at serving time but blocks transformers-based workflows
like ModelOpt NVFP4 quantization and HF-native fine-tuning.

Identical fix already merged in Kimi-K2.5 PR #91 (by @katuni4ka , approved
by @fxmarty-amd ). This mirrors it to K2.6 byte-for-byte.

Minimal repro:

from transformers import AutoModelForCausalLM
AutoModelForCausalLM.from_pretrained(
    "moonshotai/Kimi-K2.6", trust_remote_code=True, torch_dtype="auto",
)

This resolved my issue, @bigeagle can we merge this in so we don't require other users to manually update?

thanks for your contribution!

Moonshot AI org

@bdellabe @ace-coreweave Hi, I've also added some code to fix the weight initialization issue. AutoModelForCausalLM.from_pretrained now works on my end. However, this doesn't mean transformers inference is fully supported — if you plan to implement Kimi k2.6 inference in other frameworks, please mainly refer to the vLLM/SGLang implementation.

bigmoyan changed pull request status to merged

Sign up or log in to comment