The models come in Thinking and Instruct versions and utilize a new architecture, allowing it to have ~10x faster inference than Qwen32B. ๐ Step-by-step Guide: https://docs.unsloth.ai/models/qwen3-next
They have an image tokenizer unified with text, and they de-tokenize using either of two models (LLM and diffusion) The model is actually a full LLM (Qwen2), the tokenizer converts image tokens ๐คฏ
reactedtomrfakename'spost with ๐คabout 2 years ago
Hugging Face announces Cosmo 1B, a fully open sourced Phi competitor with an open sourced dataset. The dataset references various articles and textbooks as "seed data" to generate conversations. Licensed under the Apache 2.0 license. The dataset, dubbed "Cosmopedia," is published on the Hugging Face Hub under the Apache 2.0 license. It was generated using Mixtral 8x7B with various sources (AutoMathText, OpenStax, WikiHow, etc) as "seed data."
A while ago, I presented this Phi2 DPO fine-tune notebook with LoRa. Got some input from @ybelkada about not needing a ref_model because we can just swap out the LoRa adapters during training. Cool feature ๐ค