Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
1
Yiming Tang
tangyiming
Follow
AI & ML interests
None yet
Organizations
None yet
tangyiming
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
Qwen/Qwen3-Next-80B-A3B-Instruct
2 months ago
Megatron Swift dpo training on Qwen/Qwen3-Next-80B-A3B-Instruct always always return nan loss. Why?
#45 opened 2 months ago by
tangyiming