You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

vidavox/Qwen3-SKK-32B-SFT-LoRA

LoRA adapter for a Qwen3-32B base model, fine-tuned on SKK’s KSMI document data for domain-specific question answering.

This repository contains only the LoRA adapter weights. To use the model, you must load a compatible Qwen3-32B base model and then attach this adapter with PeftModel.from_pretrained.

Model Details

Base model (expected): Qwen/Qwen3-32B (or another compatible Qwen3-32B variant)
Adapter type: LoRA via PEFT
Task: Supervised fine-tuning for assistant-style answers grounded in SKK’s KSMI document data.
Languages: Primarily Bahasa Indonesia and English in a technical / regulatory context.

This adapter is intended to be used in SKK’s internal systems for answering questions based on KSMI and related upstream oil & gas regulatory documents.

Usage (PEFT / LoRA)

⚠️ This repo does not include the base model. You must:

Load the Qwen3-32B base model.

Wrap it with PeftModel.from_pretrained using this adapter.

1. Install dependencies

pip install "transformers>=4.43.0" peft accelerate bitsandbytes

2. Load the base Qwen3-32B model

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

BASE_MODEL_ID = "Qwen/Qwen3-32B"  # adjust if you used a different Qwen3-32B ID

tokenizer = AutoTokenizer.from_pretrained(
    BASE_MODEL_ID,
    trust_remote_code=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    device_map="auto",
    torch_dtype=torch.bfloat16,   # or "auto"
    trust_remote_code=True,
)

3. Attach the LoRA adapter with `PeftModel`

from peft import PeftModel

ADAPTER_ID = "vidavox/Qwen3-SKK-32B-SFT-LoRA"

model = PeftModel.from_pretrained(
    base_model,
    ADAPTER_ID,
    torch_dtype=torch.bfloat16,   # should match the base model dtype
)
model.eval()

PeftModel.from_pretrained loads the adapter configuration and weights and applies them to the provided base model.

4. Chat-style inference (Qwen3 chat template)

Qwen models use a chat template accessed via tokenizer.apply_chat_template.

messages = [
    {
        "role": "system",
        "content": "You are an assistant specialized in SKK KSMI documents.",
    },
    {
        "role": "user",
        "content": "Jelaskan secara ringkas tahapan proses persetujuan POD berdasarkan KSMI.",
    },
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt=True,
    enable_thinking = False, # Disable thinking
)

model_inputs = tokenizer([input_text], return_tensors="pt").to(model.device)
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=max_new_tokens,
    do_sample=False,
    temperature=0.1,
    top_p=0.95,
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 
response = tokenizer.decode(output_ids, skip_special_tokens=True)
print(response)

5. Optional: 4-bit loading for constrained VRAM

To run base + adapter on a single 24 GB GPU (e.g. RTX 3090), you can use 4-bit quantization with bitsandbytes:

from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

BASE_MODEL_ID = "Qwen/Qwen3-32B"
ADAPTER_ID = "vidavox/Qwen3-SKK-32B-SFT-LoRA"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)

tokenizer = AutoTokenizer.from_pretrained(
    BASE_MODEL_ID,
    trust_remote_code=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL_ID,
    device_map="auto",
    quantization_config=bnb_config,
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(base_model, ADAPTER_ID)
model.eval()

The generation code is then the same as in step 4.

Training Data

The adapter was trained on SKK’s internal KSMI document data, formatted as instruction-style examples.

Train split: 2223 examples
Validation split: 247 examples
Test split: 50 examples

The data consists of questions and instructions grounded in KSMI and related SKK upstream oil & gas regulations, with answers written to be faithful to the underlying documents. The dataset is private and not released with this model.

Evaluation (SDA on test data)

Evaluation was performed on the 50-sample test set using an SDA-style pipeline that combines automatic metrics (text/semantic similarity) and human-oriented quality scores.

Metric summary

Metric	Mean (test)	Scale / note
BERTScore F1	0.845	0–1, higher = better semantic similarity
Correctness	5.96	1–10, higher = logically correct answers
Completeness	5.10	1–10, higher = more required information covered
Factuality	7.08	1–10, higher = fewer factual errors
Structure	7.82	1–10, higher = better organization / formatting
Hallucination resistance	7.08	1–10, higher = less hallucination

These metrics are computed on a small, domain-specific test set and should be interpreted as indicative of quality on KSMI-style questions, not as general-purpose benchmarks.

Intended Use (High-Level)

Primary use: Internal SKK assistant systems answering questions grounded in KSMI and related upstream O&G regulations.
Not intended for: General-purpose open-domain chat, safety-critical decision making, or use outside the domain without additional evaluation and alignment.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for vidavox/Qwen3-SKK-32B-SFT-LoRA

Base model

Qwen/Qwen3-32B

Adapter

(246)

this model