| # Qwen Reasoning Model (GRPO Fine-Tuned) |
|
|
| This repository contains a fine-tuned version of **Qwen** trained using **GRPO (Group Relative Policy Optimization)** with the **Unsloth** framework. |
|
|
| The model was trained to improve reasoning ability and structured responses. |
|
|
| --- |
|
|
| ## Base Model |
|
|
| * Base model: Qwen2.5 |
| * Parameter size: ~1.5B parameters |
| * Quantization: GGUF Q4_K_M |
| * Training framework: Unsloth |
| * Optimization method: GRPO (Reinforcement Learning) |
|
|
| --- |
|
|
| ## Training Details |
|
|
| The model was trained using reinforcement learning techniques to improve reasoning quality. |
|
|
| Training setup: |
|
|
| * Trainer: GRPOTrainer (Unsloth) |
| * Dataset: reasoning style prompts |
| * Hardware: Kaggle GPU |
| * Training approach: |
|
|
| * LoRA fine-tuning |
| * RL reward optimization |
| * Quantized inference format (GGUF) |
|
|
| --- |
|
|
| ## Files in this Repository |
|
|
| | File | Description | |
| | ------------- | ----------------------- | |
| | `*.gguf` | Quantized model weights | |
| | `config.json` | Model configuration | |
| | `README.md` | Model card | |
|
|
| --- |
|
|
| ## How to Use |
|
|
| ### Run with llama.cpp |
|
|
| ```bash |
| ./main -m Qwen2.5-1.5B_Q4_K_M.gguf -p "Explain why the sky is blue." |
| ``` |
|
|
| --- |
|
|
| ### Python Example |
|
|
| ```python |
| from llama_cpp import Llama |
| |
| llm = Llama( |
| model_path="Qwen2.5-1.5B_Q4_K_M.gguf", |
| n_ctx=4096, |
| ) |
| |
| print(llm("Explain reinforcement learning simply.")) |
| ``` |
|
|
| --- |
|
|
| ## Intended Use |
|
|
| This model is intended for: |
|
|
| * reasoning experiments |
| * reinforcement learning research |
| * local LLM experimentation |
|
|
| --- |
|
|
| ## Limitations |
|
|
| * Small parameter size (1.5B) |
| * Limited training data |
| * May produce incorrect reasoning |
|
|
| --- |
|
|
| ## Author |
|
|
| Maruthi |
|
|
| --- |
|
|
| ## License |
|
|
| Please follow the license of the original Qwen model. |
|
|
|
|