Text Generation
Transformers
Safetensors
PyTorch
nvidia

Update modeling_nemotron_h.py

#2
by jaeminh - opened

After passing through self.gate in NemotronHMOE, the code does not re-align the tensor dtype, which leads to the following error:

  File "/home/user/.cache/huggingface/modules/transformers_modules/nvidia_hyphen_NVIDIA_hyphen_Nemotron_hyphen_3_hyphen_Nano_hyphen_30B_hyphen_A3B_hyphen_Base_hyphen_BF16/modeling_nemotron_h.py", line 812, in forward
    return self.down_proj(self.act_fn(self.up_proj(x)))
                                      ^^^^^^^^^^^^^^^

...

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16

I modified the dtype logic by referring to NVIDIA-Nemotron-3-Nano-30B-A3B-BF16.

Hello @jaeminh , looks like your edits are not reflected in the Files and Versions, and I have indeed the same error.

...
    hidden_states = mixer_block(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 175, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/models.cache/huggingface/modules/transformers_modules/modeling_nemotron_h.py", line 786, in forward
    hidden_states = self.mixer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 175, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/models.cache/huggingface/modules/transformers_modules/modeling_nemotron_h.py", line 868, in forward
    hidden_states = self.moe(hidden_states, topk_indices, topk_weights).view(*orig_shape)
  File "/opt/models.cache/huggingface/modules/transformers_modules/modeling_nemotron_h.py", line 855, in moe
    dummy_out = expert(torch.zeros_like(hidden_states[0]).unsqueeze(0).to(final_hidden_states.dtype))
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 175, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/opt/models.cache/huggingface/modules/transformers_modules/modeling_nemotron_h.py", line 815, in forward
    return self.down_proj(self.act_fn(self.up_proj(x)))
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 175, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 134, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16
NVIDIA org

Hi @jaeminh @M-Ramo-Translated

Thank you for catching this. The file was updated for the reasoning models but was not reflected in the base model. There's another change made since the original version. The file has been updated and it should work now. Please let me know if the error persists. Thank you!
https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/commit/f73a11c1f0964a5851f984b70cd31dda9a44f01c

Hi, @suhara
thank you for the update!

jaeminh changed pull request status to closed

Sign up or log in to comment