vidavox/Qwen3-SKK-32B-SFT-LoRA
LoRA adapter for a Qwen3-32B base model, fine-tuned on SKK’s KSMI document data for domain-specific question answering.
This repository contains only the LoRA adapter weights. To use the model, you must load a compatible Qwen3-32B base model and then attach this adapter with PeftModel.from_pretrained.
Model Details
- Base model (expected):
Qwen/Qwen3-32B(or another compatible Qwen3-32B variant) - Adapter type: LoRA via PEFT
- Task: Supervised fine-tuning for assistant-style answers grounded in SKK’s KSMI document data.
- Languages: Primarily Bahasa Indonesia and English in a technical / regulatory context.
This adapter is intended to be used in SKK’s internal systems for answering questions based on KSMI and related upstream oil & gas regulatory documents.
Usage (PEFT / LoRA)
⚠️ This repo does not include the base model. You must:
- Load the Qwen3-32B base model.
- Wrap it with
PeftModel.from_pretrainedusing this adapter.
1. Install dependencies
pip install "transformers>=4.43.0" peft accelerate bitsandbytes
2. Load the base Qwen3-32B model
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
BASE_MODEL_ID = "Qwen/Qwen3-32B" # adjust if you used a different Qwen3-32B ID
tokenizer = AutoTokenizer.from_pretrained(
BASE_MODEL_ID,
trust_remote_code=True,
)
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL_ID,
device_map="auto",
torch_dtype=torch.bfloat16, # or "auto"
trust_remote_code=True,
)
3. Attach the LoRA adapter with PeftModel
from peft import PeftModel
ADAPTER_ID = "vidavox/Qwen3-SKK-32B-SFT-LoRA"
model = PeftModel.from_pretrained(
base_model,
ADAPTER_ID,
torch_dtype=torch.bfloat16, # should match the base model dtype
)
model.eval()
PeftModel.from_pretrained loads the adapter configuration and weights and applies them to the provided base model.
4. Chat-style inference (Qwen3 chat template)
Qwen models use a chat template accessed via tokenizer.apply_chat_template.
messages = [
{
"role": "system",
"content": "You are an assistant specialized in SKK KSMI documents.",
},
{
"role": "user",
"content": "Jelaskan secara ringkas tahapan proses persetujuan POD berdasarkan KSMI.",
},
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize = False,
add_generation_prompt=True,
enable_thinking = False, # Disable thinking
)
model_inputs = tokenizer([input_text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
**model_inputs,
max_new_tokens=max_new_tokens,
do_sample=False,
temperature=0.1,
top_p=0.95,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
response = tokenizer.decode(output_ids, skip_special_tokens=True)
print(response)
5. Optional: 4-bit loading for constrained VRAM
To run base + adapter on a single 24 GB GPU (e.g. RTX 3090), you can use 4-bit quantization with bitsandbytes:
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
BASE_MODEL_ID = "Qwen/Qwen3-32B"
ADAPTER_ID = "vidavox/Qwen3-SKK-32B-SFT-LoRA"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
)
tokenizer = AutoTokenizer.from_pretrained(
BASE_MODEL_ID,
trust_remote_code=True,
)
base_model = AutoModelForCausalLM.from_pretrained(
BASE_MODEL_ID,
device_map="auto",
quantization_config=bnb_config,
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, ADAPTER_ID)
model.eval()
The generation code is then the same as in step 4.
Training Data
The adapter was trained on SKK’s internal KSMI document data, formatted as instruction-style examples.
- Train split: 2223 examples
- Validation split: 247 examples
- Test split: 50 examples
The data consists of questions and instructions grounded in KSMI and related SKK upstream oil & gas regulations, with answers written to be faithful to the underlying documents. The dataset is private and not released with this model.
Evaluation (SDA on test data)
Evaluation was performed on the 50-sample test set using an SDA-style pipeline that combines automatic metrics (text/semantic similarity) and human-oriented quality scores.
Metric summary
| Metric | Mean (test) | Scale / note |
|---|---|---|
| BERTScore F1 | 0.845 | 0–1, higher = better semantic similarity |
| Correctness | 5.96 | 1–10, higher = logically correct answers |
| Completeness | 5.10 | 1–10, higher = more required information covered |
| Factuality | 7.08 | 1–10, higher = fewer factual errors |
| Structure | 7.82 | 1–10, higher = better organization / formatting |
| Hallucination resistance | 7.08 | 1–10, higher = less hallucination |
These metrics are computed on a small, domain-specific test set and should be interpreted as indicative of quality on KSMI-style questions, not as general-purpose benchmarks.
Intended Use (High-Level)
- Primary use: Internal SKK assistant systems answering questions grounded in KSMI and related upstream O&G regulations.
- Not intended for: General-purpose open-domain chat, safety-critical decision making, or use outside the domain without additional evaluation and alignment.
Model tree for vidavox/Qwen3-SKK-32B-SFT-LoRA
Base model
Qwen/Qwen3-32B