HIKARI-Sirius-8B-SkinDx-RAG β
Healthcare-oriented Intelligent Knowledge Augmented Retrieval and Inference
Named after Sirius β the brightest star in the night sky
π¦ Model Type: Merged Full Model
This is a fully merged model β the LoRA adapter weights have been merged directly into the base model weights.
β No adapter loading needed. Load and run directly with
transformers,vLLM, orSGLang, just like any standard Qwen3-VL model.πΎ Size: ~17 GB (4 safetensor shards)
π If you prefer a lightweight adapter instead, see the LoRA version: E27085921/HIKARI-Sirius-8B-SkinDx-RAG-LoRA (~1.1 GB)
Overview
HIKARI-Sirius is our best-performing skin disease diagnosis model, fine-tuned from Qwen/Qwen3-VL-8B-Thinking on the SkinCAP Thai dermatology dataset.
The key innovation is RAG-in-Training β retrieval-augmented generation is embedded during fine-tuning itself (not only at inference). The model learns to compare a query image against retrieved reference images and their clinical captions, making it robust to visual similarity across diseases.
| Property | Value |
|---|---|
| Task | 10-class skin disease diagnosis (Stage 2 of HIKARI pipeline) |
| Base model | Qwen/Qwen3-VL-8B-Thinking |
| Training technique | RAG-in-Training (R2: SigLIP visual + BGE-M3 text, Ξ±=0.9) |
| Val accuracy | 85.86% (99 samples, SkinCAP 3-stage split) |
| Model type | Merged full model |
| Hardware tested | RTX 5070 Ti (16 GB VRAM) |
π©Ί Disease Classes (10)
| Class | Description |
|---|---|
acne_vulgaris |
Acne β comedones, papules, pustules on face/back |
atopic_dermatitis |
Eczema β chronic pruritic inflammatory skin disease |
melanocytic_nevi |
Moles β benign melanocyte proliferations |
psoriasis |
Erythematous plaques with silvery-white scale |
sccis |
Squamous cell carcinoma in situ (Bowen's disease) |
seborrheic_dermatitis |
Dandruff-related scaly patches on oily areas |
skin_tag |
Benign soft fibroepithelial pedunculated growths |
tinea_versicolor |
Fungal discoloration (hypo/hyperpigmented macules) |
urticaria |
Hives β transient wheals with erythema |
photodermatoses |
Sun-induced skin reactions |
π Performance vs Baselines
| Model | Accuracy | Training Method |
|---|---|---|
| Qwen3-VL-8B zero-shot | ~45% | No fine-tuning |
| HIKARI-Altair (Single FT) | 74.00% | Standard fine-tuning |
| HIKARI-Deneb (Cascade FT) | 79.80% | Cascaded pretraining |
| HIKARI-Sirius (RAG-in-Training) | 85.86% β¨ | RAG embedded at training |
π§ Usage
Stage 2 in the Full HIKARI Pipeline
π· Image
β
βΌ
[Stage 1] HIKARI-Subaru-8B-SkinGroup βββΊ group label (4 classes)
β
βΌ
[Stage 2] HIKARI-Sirius-8B-SkinDx-RAG βββΊ disease label (10 classes) β YOU ARE HERE
β
βΌ
[Stage 3] HIKARI-Vega-8B-SkinCaption-Fused βββΊ clinical caption
Quick Inference β transformers
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image
model_id = "E27085921/HIKARI-Sirius-8B-SkinDx-RAG"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = Qwen3VLForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
image = Image.open("skin_lesion.jpg").convert("RGB")
group = "inflammatory" # from Stage 1 (HIKARI-Subaru)
PROMPT = (
"This skin lesion belongs to the group '{group}'. "
"Examine the lesion morphology (papules, plaques, macules), "
"color (red, violet, white, brown), scale/crust, border sharpness, "
"and distribution pattern. Based on these visual features, "
"what is the specific skin disease?"
)
messages = [{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": PROMPT.format(group=group)},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=64, temperature=0.0, do_sample=False)
result = processor.batch_decode(
out[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True
)[0].strip()
print(result) # β "atopic_dermatitis"
Production β vLLM BnB-4bit β‘ (RTX 5070 Ti / 16 GB VRAM)
Throughput: 5.57 img/s at batch=4
from vllm import LLM, SamplingParams
from transformers import AutoProcessor
from PIL import Image
model_id = "E27085921/HIKARI-Sirius-8B-SkinDx-RAG"
llm = LLM(
model=model_id,
quantization="bitsandbytes",
load_format="bitsandbytes",
trust_remote_code=True,
max_model_len=2048,
gpu_memory_utilization=0.88,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
sp = SamplingParams(max_tokens=64, temperature=0.0)
PROMPT = (
"This skin lesion belongs to the group '{group}'. "
"Examine the lesion morphology (papules, plaques, macules), "
"color (red, violet, white, brown), scale/crust, border sharpness, "
"and distribution pattern. Based on these visual features, "
"what is the specific skin disease?"
)
def classify_disease(image: Image.Image, group: str) -> str:
messages = [{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": PROMPT.format(group=group)},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
n = max(text.count("<|vision_start|>"), 1)
out = llm.generate({"prompt": text, "multi_modal_data": {"image": [image] * n}}, sp)
return out[0].outputs[0].text.strip()
img = Image.open("skin_lesion.jpg").convert("RGB")
print(classify_disease(img, group="inflammatory")) # β "atopic_dermatitis"
Production β SGLang FP8 π (maximum throughput, 9.11 img/s at batch=4)
import sglang as sgl
from transformers import AutoProcessor
from PIL import Image
model_id = "E27085921/HIKARI-Sirius-8B-SkinDx-RAG"
engine = sgl.Engine(
model_path=model_id,
dtype="bfloat16",
quantization="fp8",
context_length=2048,
mem_fraction_static=0.88,
trust_remote_code=True,
disable_cuda_graph=True,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
PROMPT = (
"This skin lesion belongs to the group '{group}'. "
"Examine the lesion morphology (papules, plaques, macules), "
"color (red, violet, white, brown), scale/crust, border sharpness, "
"and distribution pattern. Based on these visual features, "
"what is the specific skin disease?"
)
def classify_disease_sglang(image: Image.Image, group: str) -> str:
messages = [{"role": "user", "content": [
{"type": "image"},
{"type": "text", "text": PROMPT.format(group=group)},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
out = engine.generate(
prompt=text,
image_data=image,
sampling_params={"max_new_tokens": 64, "temperature": 0.0},
)
return (out["text"] if isinstance(out, dict) else out[0]["text"]).strip()
# engine.shutdown() # call when done
Parse Disease Label (fuzzy matching)
from rapidfuzz import process as fuzz_process
DISEASES = [
"acne_vulgaris", "atopic_dermatitis", "melanocytic_nevi", "psoriasis",
"sccis", "seborrheic_dermatitis", "skin_tag", "tinea_versicolor",
"urticaria", "photodermatoses",
]
def match_disease(raw: str) -> str:
result, score, _ = fuzz_process.extractOne(raw.lower(), DISEASES)
return result if score >= 50 else "unknown"
print(match_disease("The patient has atopic dermatitis")) # β atopic_dermatitis
β‘ Speed Benchmark (RTX 5070 Ti, 16 GB VRAM β Stage 2, 64-token output)
| Engine | Batch 1 | Batch 4 | vs Unsloth bs=1 |
|---|---|---|---|
| Unsloth 4-bit | 1,096 ms/img | 500 ms/img | baseline |
| vLLM BnB-4bit | 480 ms/img | 179 ms/img | 6.1Γ faster |
| SGLang FP8 | 331 ms/img | 110 ms/img β‘ | 10Γ faster |
π LoRA Adapter Version
Prefer a lightweight adapter (~1.1 GB) over the 17 GB merged model?
from peft import PeftModel
from transformers import Qwen3VLForConditionalGeneration
import torch
base = Qwen3VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen3-VL-8B-Thinking", torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "E27085921/HIKARI-Sirius-8B-SkinDx-RAG-LoRA")
β E27085921/HIKARI-Sirius-8B-SkinDx-RAG-LoRA
π HIKARI Model Family
| Model | Task | Metric | Type |
|---|---|---|---|
| HIKARI-Subaru-8B-SkinGroup | 4-class group classifier (Stage 1) | 88.68% | Merged |
| HIKARI-Altair-8B-SkinDx | 10-class disease dx β baseline | 74.00% | Merged + LoRA |
| HIKARI-Deneb-8B-SkinDx-Cascade | 10-class disease dx β cascade FT | 79.80% | Merged + LoRA |
| β HIKARI-Sirius-8B-SkinDx-RAG (this model) | 10-class disease dx β RAG-in-Training | 85.86% | Merged + LoRA |
| HIKARI-Polaris-8B-SkinDx-Oracle | Oracle upper bound (research only) | 59.38%* | Merged |
| HIKARI-Rigel-8B-SkinCaption | Clinical caption β checkpoint init | BLEU-4: 9.82 | Merged + LoRA |
| β HIKARI-Vega-8B-SkinCaption-Fused | Clinical caption β merged init (best) | BLEU-4: 29.33 | Merged + LoRA |
| HIKARI-Antares-8B-SkinCaption-STS | Caption + STS ablation (research) | BLEU-4: 0.61 | Merged + LoRA |
* Polaris requires ground-truth group at inference β for research comparison only.
π Citation
@misc{hikari2026,
title = {HIKARI: RAG-in-Training for Skin Disease Diagnosis
with Cascaded Vision-Language Models},
author = {Watin Promfiy and Pawitra Boonprasart},
year = {2026},
institution = {King Mongkut's Institute of Technology Ladkrabang,
Department of Information Technology, Bangkok, Thailand}
}
Made with β€οΈ at King Mongkut's Institute of Technology Ladkrabang (KMITL)
Department of Information Technology
- Downloads last month
- 4