Update modeling_nemotron_h.py
After passing through self.gate in NemotronHMOE, the code does not re-align the tensor dtype, which leads to the following error:
File "/home/user/.cache/huggingface/modules/transformers_modules/nvidia_hyphen_NVIDIA_hyphen_Nemotron_hyphen_3_hyphen_Nano_hyphen_30B_hyphen_A3B_hyphen_Base_hyphen_BF16/modeling_nemotron_h.py", line 812, in forward
return self.down_proj(self.act_fn(self.up_proj(x)))
^^^^^^^^^^^^^^^
...
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16
I modified the dtype logic by referring to NVIDIA-Nemotron-3-Nano-30B-A3B-BF16.
Hello @jaeminh , looks like your edits are not reflected in the Files and Versions, and I have indeed the same error.
...
hidden_states = mixer_block(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 175, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/models.cache/huggingface/modules/transformers_modules/modeling_nemotron_h.py", line 786, in forward
hidden_states = self.mixer(
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 175, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/models.cache/huggingface/modules/transformers_modules/modeling_nemotron_h.py", line 868, in forward
hidden_states = self.moe(hidden_states, topk_indices, topk_weights).view(*orig_shape)
File "/opt/models.cache/huggingface/modules/transformers_modules/modeling_nemotron_h.py", line 855, in moe
dummy_out = expert(torch.zeros_like(hidden_states[0]).unsqueeze(0).to(final_hidden_states.dtype))
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 175, in new_forward
output = module._old_forward(*args, **kwargs)
File "/opt/models.cache/huggingface/modules/transformers_modules/modeling_nemotron_h.py", line 815, in forward
return self.down_proj(self.act_fn(self.up_proj(x)))
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 175, in new_forward
output = module._old_forward(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py", line 134, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: float != c10::BFloat16
Hi @jaeminh @M-Ramo-Translated
Thank you for catching this. The file was updated for the reasoning models but was not reflected in the base model. There's another change made since the original version. The file has been updated and it should work now. Please let me know if the error persists. Thank you!
https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/commit/f73a11c1f0964a5851f984b70cd31dda9a44f01c