UniFlow Qwen2.5-3B v3 (4-bit NF4 QLoRA)
Fine-tuned LoRA adapter for conversation summarization in RAG meeting assistant systems.
Model Details
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-3B-Instruct |
| Quantization | 4-bit NF4 (QLoRA) |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Training Dataset | SAMSum (16,369) + DialogSum (13,460) = 28,982 examples |
| Training Epochs | 3 |
| Learning Rate | 8e-5 |
| Batch Size | 8 (effective 16 with gradient accumulation) |
| GPU Memory | 4.5 GB |
Paper
Resource-Efficient Multi-Source RAG: Hexagonal Architecture and the 4-bit Quantization Paradox
- Authors: Tunahan Buyukgebiz, Toprak Necat Gok, Suleyman Muhammed Arikan, Ibrahim Alper Dogru
- Institution: Gazi University, Department of Computer Engineering
- Target: Automated Software Engineering (ASE) Journal
Usage
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
model = PeftModel.from_pretrained(base_model, "tnecat/uniflow-qwen25-3b-v3")
tokenizer = AutoTokenizer.from_pretrained("tnecat/uniflow-qwen25-3b-v3")
Replication
Full source code and evaluation scripts: uniflow-system/ase-replication
- Downloads last month
- 13