YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen Reasoning Model (GRPO Fine-Tuned)
This repository contains a fine-tuned version of Qwen trained using GRPO (Group Relative Policy Optimization) with the Unsloth framework.
The model was trained to improve reasoning ability and structured responses.
Base Model
- Base model: Qwen2.5
- Parameter size: ~1.5B parameters
- Quantization: GGUF Q4_K_M
- Training framework: Unsloth
- Optimization method: GRPO (Reinforcement Learning)
Training Details
The model was trained using reinforcement learning techniques to improve reasoning quality.
Training setup:
Trainer: GRPOTrainer (Unsloth)
Dataset: reasoning style prompts
Hardware: Kaggle GPU
Training approach:
- LoRA fine-tuning
- RL reward optimization
- Quantized inference format (GGUF)
Files in this Repository
| File | Description |
|---|---|
*.gguf |
Quantized model weights |
config.json |
Model configuration |
README.md |
Model card |
How to Use
Run with llama.cpp
./main -m Qwen2.5-1.5B_Q4_K_M.gguf -p "Explain why the sky is blue."
Python Example
from llama_cpp import Llama
llm = Llama(
model_path="Qwen2.5-1.5B_Q4_K_M.gguf",
n_ctx=4096,
)
print(llm("Explain reinforcement learning simply."))
Intended Use
This model is intended for:
- reasoning experiments
- reinforcement learning research
- local LLM experimentation
Limitations
- Small parameter size (1.5B)
- Limited training data
- May produce incorrect reasoning
Author
Maruthi
License
Please follow the license of the original Qwen model.
- Downloads last month
- 3
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support