HIKARI β€” Healthcare-oriented Intelligent Knowledge Augmented Retrieval and Inference

HIKARI-Sirius-8B-SkinDx-RAG ⭐

Healthcare-oriented Intelligent Knowledge Augmented Retrieval and Inference
Named after Sirius β€” the brightest star in the night sky


πŸ“¦ Model Type: Merged Full Model

This is a fully merged model β€” the LoRA adapter weights have been merged directly into the base model weights.

βœ… No adapter loading needed. Load and run directly with transformers, vLLM, or SGLang, just like any standard Qwen3-VL model.

πŸ’Ύ Size: ~17 GB (4 safetensor shards)

πŸ”Œ If you prefer a lightweight adapter instead, see the LoRA version: E27085921/HIKARI-Sirius-8B-SkinDx-RAG-LoRA (~1.1 GB)


Overview

HIKARI-Sirius is our best-performing skin disease diagnosis model, fine-tuned from Qwen/Qwen3-VL-8B-Thinking on the SkinCAP Thai dermatology dataset.

The key innovation is RAG-in-Training β€” retrieval-augmented generation is embedded during fine-tuning itself (not only at inference). The model learns to compare a query image against retrieved reference images and their clinical captions, making it robust to visual similarity across diseases.

Property Value
Task 10-class skin disease diagnosis (Stage 2 of HIKARI pipeline)
Base model Qwen/Qwen3-VL-8B-Thinking
Training technique RAG-in-Training (R2: SigLIP visual + BGE-M3 text, Ξ±=0.9)
Val accuracy 85.86% (99 samples, SkinCAP 3-stage split)
Model type Merged full model
Hardware tested RTX 5070 Ti (16 GB VRAM)

🩺 Disease Classes (10)

Class Description
acne_vulgaris Acne β€” comedones, papules, pustules on face/back
atopic_dermatitis Eczema β€” chronic pruritic inflammatory skin disease
melanocytic_nevi Moles β€” benign melanocyte proliferations
psoriasis Erythematous plaques with silvery-white scale
sccis Squamous cell carcinoma in situ (Bowen's disease)
seborrheic_dermatitis Dandruff-related scaly patches on oily areas
skin_tag Benign soft fibroepithelial pedunculated growths
tinea_versicolor Fungal discoloration (hypo/hyperpigmented macules)
urticaria Hives β€” transient wheals with erythema
photodermatoses Sun-induced skin reactions

πŸ“Š Performance vs Baselines

Model Accuracy Training Method
Qwen3-VL-8B zero-shot ~45% No fine-tuning
HIKARI-Altair (Single FT) 74.00% Standard fine-tuning
HIKARI-Deneb (Cascade FT) 79.80% Cascaded pretraining
HIKARI-Sirius (RAG-in-Training) 85.86% ✨ RAG embedded at training

πŸ”§ Usage

Stage 2 in the Full HIKARI Pipeline

πŸ“· Image
   β”‚
   β–Ό
[Stage 1] HIKARI-Subaru-8B-SkinGroup ──► group label (4 classes)
   β”‚
   β–Ό
[Stage 2] HIKARI-Sirius-8B-SkinDx-RAG ──► disease label (10 classes)  ← YOU ARE HERE
   β”‚
   β–Ό
[Stage 3] HIKARI-Vega-8B-SkinCaption-Fused ──► clinical caption

Quick Inference β€” transformers

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image

model_id = "E27085921/HIKARI-Sirius-8B-SkinDx-RAG"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

image = Image.open("skin_lesion.jpg").convert("RGB")
group = "inflammatory"  # from Stage 1 (HIKARI-Subaru)

PROMPT = (
    "This skin lesion belongs to the group '{group}'. "
    "Examine the lesion morphology (papules, plaques, macules), "
    "color (red, violet, white, brown), scale/crust, border sharpness, "
    "and distribution pattern. Based on these visual features, "
    "what is the specific skin disease?"
)

messages = [{"role": "user", "content": [
    {"type": "image", "image": image},
    {"type": "text", "text": PROMPT.format(group=group)},
]}]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=64, temperature=0.0, do_sample=False)

result = processor.batch_decode(
    out[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True
)[0].strip()
print(result)  # β†’ "atopic_dermatitis"

Production β€” vLLM BnB-4bit ⚑ (RTX 5070 Ti / 16 GB VRAM)

Throughput: 5.57 img/s at batch=4

from vllm import LLM, SamplingParams
from transformers import AutoProcessor
from PIL import Image

model_id = "E27085921/HIKARI-Sirius-8B-SkinDx-RAG"

llm = LLM(
    model=model_id,
    quantization="bitsandbytes",
    load_format="bitsandbytes",
    trust_remote_code=True,
    max_model_len=2048,
    gpu_memory_utilization=0.88,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
sp = SamplingParams(max_tokens=64, temperature=0.0)

PROMPT = (
    "This skin lesion belongs to the group '{group}'. "
    "Examine the lesion morphology (papules, plaques, macules), "
    "color (red, violet, white, brown), scale/crust, border sharpness, "
    "and distribution pattern. Based on these visual features, "
    "what is the specific skin disease?"
)

def classify_disease(image: Image.Image, group: str) -> str:
    messages = [{"role": "user", "content": [
        {"type": "image", "image": image},
        {"type": "text", "text": PROMPT.format(group=group)},
    ]}]
    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    n = max(text.count("<|vision_start|>"), 1)
    out = llm.generate({"prompt": text, "multi_modal_data": {"image": [image] * n}}, sp)
    return out[0].outputs[0].text.strip()

img = Image.open("skin_lesion.jpg").convert("RGB")
print(classify_disease(img, group="inflammatory"))  # β†’ "atopic_dermatitis"

Production β€” SGLang FP8 πŸš€ (maximum throughput, 9.11 img/s at batch=4)

import sglang as sgl
from transformers import AutoProcessor
from PIL import Image

model_id = "E27085921/HIKARI-Sirius-8B-SkinDx-RAG"

engine = sgl.Engine(
    model_path=model_id,
    dtype="bfloat16",
    quantization="fp8",
    context_length=2048,
    mem_fraction_static=0.88,
    trust_remote_code=True,
    disable_cuda_graph=True,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

PROMPT = (
    "This skin lesion belongs to the group '{group}'. "
    "Examine the lesion morphology (papules, plaques, macules), "
    "color (red, violet, white, brown), scale/crust, border sharpness, "
    "and distribution pattern. Based on these visual features, "
    "what is the specific skin disease?"
)

def classify_disease_sglang(image: Image.Image, group: str) -> str:
    messages = [{"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": PROMPT.format(group=group)},
    ]}]
    text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    out = engine.generate(
        prompt=text,
        image_data=image,
        sampling_params={"max_new_tokens": 64, "temperature": 0.0},
    )
    return (out["text"] if isinstance(out, dict) else out[0]["text"]).strip()

# engine.shutdown()  # call when done

Parse Disease Label (fuzzy matching)

from rapidfuzz import process as fuzz_process

DISEASES = [
    "acne_vulgaris", "atopic_dermatitis", "melanocytic_nevi", "psoriasis",
    "sccis", "seborrheic_dermatitis", "skin_tag", "tinea_versicolor",
    "urticaria", "photodermatoses",
]

def match_disease(raw: str) -> str:
    result, score, _ = fuzz_process.extractOne(raw.lower(), DISEASES)
    return result if score >= 50 else "unknown"

print(match_disease("The patient has atopic dermatitis"))  # β†’ atopic_dermatitis

⚑ Speed Benchmark (RTX 5070 Ti, 16 GB VRAM β€” Stage 2, 64-token output)

Engine Batch 1 Batch 4 vs Unsloth bs=1
Unsloth 4-bit 1,096 ms/img 500 ms/img baseline
vLLM BnB-4bit 480 ms/img 179 ms/img 6.1Γ— faster
SGLang FP8 331 ms/img 110 ms/img ⚑ 10Γ— faster

πŸ”Œ LoRA Adapter Version

Prefer a lightweight adapter (~1.1 GB) over the 17 GB merged model?

from peft import PeftModel
from transformers import Qwen3VLForConditionalGeneration
import torch

base = Qwen3VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen3-VL-8B-Thinking", torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "E27085921/HIKARI-Sirius-8B-SkinDx-RAG-LoRA")

β†’ E27085921/HIKARI-Sirius-8B-SkinDx-RAG-LoRA


🌟 HIKARI Model Family

ModelTaskMetricType
HIKARI-Subaru-8B-SkinGroup 4-class group classifier (Stage 1) 88.68% Merged
HIKARI-Altair-8B-SkinDx 10-class disease dx β€” baseline 74.00% Merged + LoRA
HIKARI-Deneb-8B-SkinDx-Cascade 10-class disease dx β€” cascade FT 79.80% Merged + LoRA
⭐ HIKARI-Sirius-8B-SkinDx-RAG (this model) 10-class disease dx β€” RAG-in-Training 85.86% Merged + LoRA
HIKARI-Polaris-8B-SkinDx-Oracle Oracle upper bound (research only) 59.38%* Merged
HIKARI-Rigel-8B-SkinCaption Clinical caption β€” checkpoint init BLEU-4: 9.82 Merged + LoRA
⭐ HIKARI-Vega-8B-SkinCaption-Fused Clinical caption β€” merged init (best) BLEU-4: 29.33 Merged + LoRA
HIKARI-Antares-8B-SkinCaption-STS Caption + STS ablation (research) BLEU-4: 0.61 Merged + LoRA

* Polaris requires ground-truth group at inference β€” for research comparison only.


πŸ“„ Citation

@misc{hikari2026,
  title  = {HIKARI: RAG-in-Training for Skin Disease Diagnosis
            with Cascaded Vision-Language Models},
  author = {Watin Promfiy and Pawitra Boonprasart},
  year   = {2026},
  institution = {King Mongkut's Institute of Technology Ladkrabang,
                 Department of Information Technology, Bangkok, Thailand}
}

Made with ❀️ at King Mongkut's Institute of Technology Ladkrabang (KMITL)
Department of Information Technology

Downloads last month
4
Safetensors
Model size
9B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for E27085921/HIKARI-Sirius-8B-SkinDx-RAG

Finetuned
(49)
this model
Quantizations
1 model

Collection including E27085921/HIKARI-Sirius-8B-SkinDx-RAG