# Qwen Reasoning Model (GRPO Fine-Tuned) This repository contains a fine-tuned version of **Qwen** trained using **GRPO (Group Relative Policy Optimization)** with the **Unsloth** framework. The model was trained to improve reasoning ability and structured responses. --- ## Base Model * Base model: Qwen2.5 * Parameter size: ~1.5B parameters * Quantization: GGUF Q4_K_M * Training framework: Unsloth * Optimization method: GRPO (Reinforcement Learning) --- ## Training Details The model was trained using reinforcement learning techniques to improve reasoning quality. Training setup: * Trainer: GRPOTrainer (Unsloth) * Dataset: reasoning style prompts * Hardware: Kaggle GPU * Training approach: * LoRA fine-tuning * RL reward optimization * Quantized inference format (GGUF) --- ## Files in this Repository | File | Description | | ------------- | ----------------------- | | `*.gguf` | Quantized model weights | | `config.json` | Model configuration | | `README.md` | Model card | --- ## How to Use ### Run with llama.cpp ```bash ./main -m Qwen2.5-1.5B_Q4_K_M.gguf -p "Explain why the sky is blue." ``` --- ### Python Example ```python from llama_cpp import Llama llm = Llama( model_path="Qwen2.5-1.5B_Q4_K_M.gguf", n_ctx=4096, ) print(llm("Explain reinforcement learning simply.")) ``` --- ## Intended Use This model is intended for: * reasoning experiments * reinforcement learning research * local LLM experimentation --- ## Limitations * Small parameter size (1.5B) * Limited training data * May produce incorrect reasoning --- ## Author Maruthi --- ## License Please follow the license of the original Qwen model.