qwen_finetune / README.md
morty649's picture
update readme
dfd9f7a verified
# Qwen Reasoning Model (GRPO Fine-Tuned)
This repository contains a fine-tuned version of **Qwen** trained using **GRPO (Group Relative Policy Optimization)** with the **Unsloth** framework.
The model was trained to improve reasoning ability and structured responses.
---
## Base Model
* Base model: Qwen2.5
* Parameter size: ~1.5B parameters
* Quantization: GGUF Q4_K_M
* Training framework: Unsloth
* Optimization method: GRPO (Reinforcement Learning)
---
## Training Details
The model was trained using reinforcement learning techniques to improve reasoning quality.
Training setup:
* Trainer: GRPOTrainer (Unsloth)
* Dataset: reasoning style prompts
* Hardware: Kaggle GPU
* Training approach:
* LoRA fine-tuning
* RL reward optimization
* Quantized inference format (GGUF)
---
## Files in this Repository
| File | Description |
| ------------- | ----------------------- |
| `*.gguf` | Quantized model weights |
| `config.json` | Model configuration |
| `README.md` | Model card |
---
## How to Use
### Run with llama.cpp
```bash
./main -m Qwen2.5-1.5B_Q4_K_M.gguf -p "Explain why the sky is blue."
```
---
### Python Example
```python
from llama_cpp import Llama
llm = Llama(
model_path="Qwen2.5-1.5B_Q4_K_M.gguf",
n_ctx=4096,
)
print(llm("Explain reinforcement learning simply."))
```
---
## Intended Use
This model is intended for:
* reasoning experiments
* reinforcement learning research
* local LLM experimentation
---
## Limitations
* Small parameter size (1.5B)
* Limited training data
* May produce incorrect reasoning
---
## Author
Maruthi
---
## License
Please follow the license of the original Qwen model.