UniFlow Qwen2.5-3B v3 (4-bit NF4 QLoRA)

Fine-tuned LoRA adapter for conversation summarization in RAG meeting assistant systems.

Model Details

Parameter Value
Base Model Qwen/Qwen2.5-3B-Instruct
Quantization 4-bit NF4 (QLoRA)
LoRA Rank 16
LoRA Alpha 32
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Dataset SAMSum (16,369) + DialogSum (13,460) = 28,982 examples
Training Epochs 3
Learning Rate 8e-5
Batch Size 8 (effective 16 with gradient accumulation)
GPU Memory 4.5 GB

Paper

Resource-Efficient Multi-Source RAG: Hexagonal Architecture and the 4-bit Quantization Paradox

  • Authors: Tunahan Buyukgebiz, Toprak Necat Gok, Suleyman Muhammed Arikan, Ibrahim Alper Dogru
  • Institution: Gazi University, Department of Computer Engineering
  • Target: Automated Software Engineering (ASE) Journal

Usage

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
model = PeftModel.from_pretrained(base_model, "tnecat/uniflow-qwen25-3b-v3")
tokenizer = AutoTokenizer.from_pretrained("tnecat/uniflow-qwen25-3b-v3")

Replication

Full source code and evaluation scripts: uniflow-system/ase-replication

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tnecat/uniflow-qwen25-3b-v3

Base model

Qwen/Qwen2.5-3B
Adapter
(1133)
this model

Dataset used to train tnecat/uniflow-qwen25-3b-v3