Qwen3-Next can now be Run locally! (30GB RAM)
Instruct GGUF: unsloth/Qwen3-Next-80B-A3B-Instruct-GGUF

The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B.
💜 Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next

Thinking GGUF: unsloth/Qwen3-Next-80B-A3B-Thinking-GGUF

liked a model 5 months ago

open-r1/OpenR1-Distill-7B

Text Generation • 8B • Updated May 26, 2025 • 39 • • 22

upvoted an article 6 months ago

Article

Scaling Test-Time Compute to Achieve Gold Medal at IOI 2025 with Open-Weight Models

Oct 20, 2025

•

liked a dataset 6 months ago

HuggingFaceFW/finepdfs

Viewer • Updated 12 days ago • 476M • 21.7k • 841

published a dataset 9 months ago

compressionsavant/hellaswag

Viewer • Updated Jul 28, 2025 • 60k • 21

updated a dataset 9 months ago

compressionsavant/hellaswag

Viewer • Updated Jul 28, 2025 • 60k • 21

published a dataset 9 months ago

compressionsavant/fw-edu

Updated Jul 22, 2025 • 159

updated a dataset 9 months ago

compressionsavant/fw-edu

Updated Jul 22, 2025 • 159

published a model 9 months ago

compressionsavant/gpt2

Updated Jan 26

upvoted a paper 9 months ago

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 162

reactedto merve's post with 🔥 9 months ago

Post

3530

ByteDance released Tar 1.5B and 7B: image-text in image-text out models, fully open-source 👏 ByteDance-Seed/tar-6864cf0d9fe59a3b91cc4260

They have an image tokenizer unified with text, and they de-tokenize using either of two models (LLM and diffusion)
The model is actually a full LLM (Qwen2), the tokenizer converts image tokens 🤯

reactedto mrfakename's post with 🤗 about 2 years ago

Post

Hugging Face announces Cosmo 1B, a fully open sourced Phi competitor with an open sourced dataset. The dataset references various articles and textbooks as "seed data" to generate conversations. Licensed under the Apache 2.0 license. The dataset, dubbed "Cosmopedia," is published on the Hugging Face Hub under the Apache 2.0 license. It was generated using Mixtral 8x7B with various sources (AutoMathText, OpenStax, WikiHow, etc) as "seed data."

Model: HuggingFaceTB/cosmo-1b
Dataset: HuggingFaceTB/cosmopedia

reactedto davidberenstein1957's post with ❤️ about 2 years ago

Post

A while ago, I presented this Phi2 DPO fine-tune notebook with LoRa. Got some input from @ybelkada about not needing a ref_model because we can just swap out the LoRa adapters during training. Cool feature 🤓

https://colab.research.google.com/drive/1PGMj7jlkJaCiSNNihA2NtpILsRgkRXrJ#scrollTo=wXqoH2TMnjjp

5 replies

Luis

AI & ML interests

Recent Activity

Organizations

compressionsavant's activity

Scaling Test-Time Compute to Achieve Gold Medal at IOI 2025 with Open-Weight Models