🩺 Vistral-7B Medical Vietnamese

Mô hình ngôn ngữ lớn tiếng Việt chuyên biệt lĩnh vực y tế,
được fine-tuned từ Viet-Mistral/Vistral-7B-Chat
bằng kỹ thuật QLoRA (Quantized Low-Rank Adaptation).

📊 Kết quả Benchmark

Đánh giá trên tập kiểm tra cố định thesis_test_v2.jsonl (181 mẫu),
pipeline chuẩn hóa HuggingFace thuần — Greedy Decoding, 50 mẫu ROUGE.

Model	Perplexity ↓	ROUGE-1 ↑	ROUGE-2 ↑	ROUGE-L ↑
Base Vistral-7B-Chat	5.173	0.662	0.292	0.316
Fine-tuned (this model)	4.092	0.693	0.343	0.343
Δ cải thiện	−20.9%	+4.7%	+17.3%	+8.5%

Phát hiện quan trọng: Base Vistral-7B chưa fine-tuned (ROUGE-1 = 0.662)
đã vượt trội so với LLaMA-3 8B sau fine-tuning đầy đủ (ROUGE-1 = 0.597),
xác nhận vai trò quyết định của Language Alignment trong NLP tiếng Việt chuyên biệt.

🗂️ Files trong repo này

File	Mô tả	Dùng cho
`adapter_model.safetensors`	LoRA adapter weights	Python / HuggingFace
`adapter_config.json`	Cấu hình LoRA	Python / HuggingFace
`vistral-7b-medical-vi.Q4_K_M.gguf`	Quantized 4-bit (~4.5GB)	LM Studio / llama.cpp

🚀 Cách sử dụng

LM Studio (Khuyến nghị cho người dùng phổ thông)

Tải file vistral-7b-medical-vi.Q4_K_M.gguf
Mở LM Studio → Load Model → chọn file .gguf
Vào tab Local Server → Start Server
Chọn Prompt Format: Alpaca

Python — Inference với adapter

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import torch

BASE_MODEL   = "Viet-Mistral/Vistral-7B-Chat"
ADAPTER_REPO = "Ethan2004/vistral-7b-medical-vi"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

base  = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL, quantization_config=bnb_config, device_map="auto"
)
model = PeftModel.from_pretrained(base, ADAPTER_REPO)
model = model.merge_and_unload()
model.eval()

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)

def ask(question: str) -> str:
    prompt = f"### Instruction:\nBạn là trợ lý y tế AI.\n\n### Input:\n{question}\n\n### Response:\n"
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        out = model.generate(
            **inputs,
            max_new_tokens=300,
            do_sample=False,
            temperature=0.3,
        )
    new_ids = out[0][inputs['input_ids'].shape[1]:]
    return tokenizer.decode(new_ids, skip_special_tokens=True).strip()

print(ask("Triệu chứng của bệnh tiểu đường type 2 là gì?"))

🏗️ Chi tiết kỹ thuật

Base model

Kiến trúc: Vistral-7B-Chat (Mistral-based, tiền huấn luyện tiếng Việt bản địa)
Lý do chọn: Language Alignment với tiếng Việt vượt trội so với LLaMA-3 8B đa ngôn ngữ

Fine-tuning config

Tham số	Giá trị
Phương pháp	QLoRA (Quantized LoRA)
LoRA rank	32
LoRA alpha	64
LoRA dropout	0.05
Tham số huấn luyện	~84M / 7B (1.03%)
Learning rate	1×10⁻⁴
Weight decay	0.1
Effective batch size	16 (8×2 grad accum)
Epochs	3 (early stopping patience=3)
Best checkpoint	Step 180
Best val loss	0.971
Precision	BF16
Hardware	NVIDIA A100 40GB
Train runtime	733.5s (12 phút 13 giây)

Dataset

Thông số	Giá trị
Phương pháp tạo	Self-Instruct + Gemini 1.5 Pro
Nguồn tài liệu	13 văn bản phác đồ điều trị Bộ Y tế Việt Nam
Train samples	1.440
Format	Alpaca (instruction / input / output)
Safety Guardrails	Tích hợp sẵn

Đánh giá định tính

Phương pháp LLM-as-a-Judge với Llama-3 70B trên 181 mẫu:

Điểm trung bình: 7.34/10
Tiêu chí: Medical Accuracy, Clinical Safety, Helpfulness

⚠️ Giới hạn và khuyến cáo

Mô hình này chỉ phục vụ mục đích nghiên cứu và demo.
Không thay thế tư vấn từ bác sĩ chuyên khoa.
Luôn tham khảo ý kiến chuyên gia y tế cho các quyết định sức khỏe quan trọng.

Các giới hạn đã biết:

Hiện tượng hallucination với câu hỏi về liều lượng thuốc đặc thị
Phạm vi bao phủ tốt nhất ở y tế phổ thông — chuyên khoa sâu còn hạn chế
Không cập nhật phác đồ điều trị sau thời điểm tạo dataset (03/2026)

📚 Citation

@misc{lehieu2026vistral-medical-vi,
  title        = {Vistral-7B Medical Vietnamese: Fine-tuned LLM for Vietnamese Healthcare},
  author       = {Le Chi Hieu},
  year         = {2026},
  institution  = {An Giang University, Faculty of Information Technology},
  note         = {Undergraduate Thesis — QLoRA fine-tuning with Language Alignment analysis},
  url          = {https://huggingface.co/Ethan2004/vistral-7b-medical-vi}
}

🔗 Liên quan

Base model: Viet-Mistral/Vistral-7B-Chat
LLaMA-3 8B version: Ethan2004/llama-3-8b-medical-vi
Web interface: GitHub

Downloads last month: 20

GGUF

Model size

7B params

Architecture

llama

Hardware compatibility

4-bit

Model tree for Ethan2004/vistral-7b-medical-vi

Base model

Viet-Mistral/Vistral-7B-Chat

Adapter

(49)

this model

Evaluation results

Perplexity (thesis_test_v2)
self-reported

4.092
ROUGE-1 (thesis_test_v2)
self-reported

0.693
ROUGE-2 (thesis_test_v2)
self-reported

0.343
ROUGE-L (thesis_test_v2)
self-reported

0.343