How to use from
Docker Model Runner
docker model run hf.co/morty649/qwen_finetune:Q4_K_M
Quick Links

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen Reasoning Model (GRPO Fine-Tuned)

This repository contains a fine-tuned version of Qwen trained using GRPO (Group Relative Policy Optimization) with the Unsloth framework.

The model was trained to improve reasoning ability and structured responses.


Base Model

  • Base model: Qwen2.5
  • Parameter size: ~1.5B parameters
  • Quantization: GGUF Q4_K_M
  • Training framework: Unsloth
  • Optimization method: GRPO (Reinforcement Learning)

Training Details

The model was trained using reinforcement learning techniques to improve reasoning quality.

Training setup:

  • Trainer: GRPOTrainer (Unsloth)

  • Dataset: reasoning style prompts

  • Hardware: Kaggle GPU

  • Training approach:

    • LoRA fine-tuning
    • RL reward optimization
    • Quantized inference format (GGUF)

Files in this Repository

File Description
*.gguf Quantized model weights
config.json Model configuration
README.md Model card

How to Use

Run with llama.cpp

./main -m Qwen2.5-1.5B_Q4_K_M.gguf -p "Explain why the sky is blue."

Python Example

from llama_cpp import Llama

llm = Llama(
    model_path="Qwen2.5-1.5B_Q4_K_M.gguf",
    n_ctx=4096,
)

print(llm("Explain reinforcement learning simply."))

Intended Use

This model is intended for:

  • reasoning experiments
  • reinforcement learning research
  • local LLM experimentation

Limitations

  • Small parameter size (1.5B)
  • Limited training data
  • May produce incorrect reasoning

Author

Maruthi


License

Please follow the license of the original Qwen model.

Downloads last month
3
GGUF
Model size
2B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support