# Qwen Reasoning Model (GRPO Fine-Tuned)

This repository contains a fine-tuned version of **Qwen** trained using **GRPO (Group Relative Policy Optimization)** with the **Unsloth** framework.

The model was trained to improve reasoning ability and structured responses.

---

## Base Model

* Base model: Qwen2.5
* Parameter size: ~1.5B parameters
* Quantization: GGUF Q4_K_M
* Training framework: Unsloth
* Optimization method: GRPO (Reinforcement Learning)

---

## Training Details

The model was trained using reinforcement learning techniques to improve reasoning quality.

Training setup:

* Trainer: GRPOTrainer (Unsloth)
* Dataset: reasoning style prompts
* Hardware: Kaggle GPU
* Training approach:

  * LoRA fine-tuning
  * RL reward optimization
  * Quantized inference format (GGUF)

---

## Files in this Repository

| File          | Description             |
| ------------- | ----------------------- |
| `*.gguf`      | Quantized model weights |
| `config.json` | Model configuration     |
| `README.md`   | Model card              |

---

## How to Use

### Run with llama.cpp

```bash
./main -m Qwen2.5-1.5B_Q4_K_M.gguf -p "Explain why the sky is blue."
```

---

### Python Example

```python
from llama_cpp import Llama

llm = Llama(
    model_path="Qwen2.5-1.5B_Q4_K_M.gguf",
    n_ctx=4096,
)

print(llm("Explain reinforcement learning simply."))
```

---

## Intended Use

This model is intended for:

* reasoning experiments
* reinforcement learning research
* local LLM experimentation

---

## Limitations

* Small parameter size (1.5B)
* Limited training data
* May produce incorrect reasoning

---

## Author

Maruthi

---

## License

Please follow the license of the original Qwen model.