morty649 commited on
Commit
dfd9f7a
·
verified ·
1 Parent(s): 1d33044

update readme

Browse files
Files changed (1) hide show
  1. README.md +91 -13
README.md CHANGED
@@ -1,20 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- tags:
3
- - gguf
4
- - llama.cpp
5
- - unsloth
 
 
 
 
6
 
7
  ---
8
 
9
- # qwen_finetune : GGUF
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
 
11
- This model was finetuned and converted to GGUF format using [Unsloth](https://github.com/unslothai/unsloth).
12
 
13
- **Example usage**:
14
- - For text only LLMs: `llama-cli -hf morty649/qwen_finetune --jinja`
15
- - For multimodal models: `llama-mtmd-cli -hf morty649/qwen_finetune --jinja`
16
 
17
- ## Available Model files:
18
- - `Qwen2.5-1.5B.Q4_K_M.gguf`
19
- This was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth)
20
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
+ # Qwen Reasoning Model (GRPO Fine-Tuned)
2
+
3
+ This repository contains a fine-tuned version of **Qwen** trained using **GRPO (Group Relative Policy Optimization)** with the **Unsloth** framework.
4
+
5
+ The model was trained to improve reasoning ability and structured responses.
6
+
7
+ ---
8
+
9
+ ## Base Model
10
+
11
+ * Base model: Qwen2.5
12
+ * Parameter size: ~1.5B parameters
13
+ * Quantization: GGUF Q4_K_M
14
+ * Training framework: Unsloth
15
+ * Optimization method: GRPO (Reinforcement Learning)
16
+
17
+ ---
18
+
19
+ ## Training Details
20
+
21
+ The model was trained using reinforcement learning techniques to improve reasoning quality.
22
+
23
+ Training setup:
24
+
25
+ * Trainer: GRPOTrainer (Unsloth)
26
+ * Dataset: reasoning style prompts
27
+ * Hardware: Kaggle GPU
28
+ * Training approach:
29
+
30
+ * LoRA fine-tuning
31
+ * RL reward optimization
32
+ * Quantized inference format (GGUF)
33
+
34
  ---
35
+
36
+ ## Files in this Repository
37
+
38
+ | File | Description |
39
+ | ------------- | ----------------------- |
40
+ | `*.gguf` | Quantized model weights |
41
+ | `config.json` | Model configuration |
42
+ | `README.md` | Model card |
43
 
44
  ---
45
 
46
+ ## How to Use
47
+
48
+ ### Run with llama.cpp
49
+
50
+ ```bash
51
+ ./main -m Qwen2.5-1.5B_Q4_K_M.gguf -p "Explain why the sky is blue."
52
+ ```
53
+
54
+ ---
55
+
56
+ ### Python Example
57
+
58
+ ```python
59
+ from llama_cpp import Llama
60
+
61
+ llm = Llama(
62
+ model_path="Qwen2.5-1.5B_Q4_K_M.gguf",
63
+ n_ctx=4096,
64
+ )
65
+
66
+ print(llm("Explain reinforcement learning simply."))
67
+ ```
68
+
69
+ ---
70
+
71
+ ## Intended Use
72
+
73
+ This model is intended for:
74
+
75
+ * reasoning experiments
76
+ * reinforcement learning research
77
+ * local LLM experimentation
78
+
79
+ ---
80
+
81
+ ## Limitations
82
+
83
+ * Small parameter size (1.5B)
84
+ * Limited training data
85
+ * May produce incorrect reasoning
86
+
87
+ ---
88
+
89
+ ## Author
90
+
91
+ Maruthi
92
+
93
+ ---
94
 
95
+ ## License
96
 
97
+ Please follow the license of the original Qwen model.
 
 
98