Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
sarthak247 's Collections
Gemma-3-1B-GRPO
Qwen2.5-3B-GRPO

Qwen2.5-3B-GRPO

updated Feb 24, 2025

Trained with unsloth on just 250 steps (resource constraints) on GSM8K to add reasoning abilities to Qwen2.5-3B (smaller model because resources)

Upvote
-

  • sarthak247/qwen2.5-grpo-gsm8k-250steps-fp16

    Text Generation • Updated Feb 24, 2025 • 6

  • sarthak247/qwen2.5-grpo-gsm8k-250steps-lora-adapters

    Updated Feb 24, 2025

  • sarthak247/qwen2.5-grpo-gsm8k-250steps-gguf

    3B • Updated Feb 24, 2025 • 30
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs