HIKARI — Healthcare-oriented Intelligent Knowledge Augmented Retrieval and Inference

HIKARI-Antares-8B-SkinCaption-STS

Healthcare-oriented Intelligent Knowledge Augmented Retrieval and Inference
Named after Antares — red supergiant in Scorpius, intense and unstable, like STS training collapse

📦 Model Type: Merged Full Model

This is a fully merged model — the LoRA adapter weights have been merged directly into the base model weights.

✅ No adapter loading needed. Load directly with transformers, vLLM, or SGLang.

💾 Size: ~17 GB (4 safetensor shards)

🔌 Lightweight adapter version: E27085921/HIKARI-Antares-8B-SkinCaption-STS-LoRA (~1.2 GB)

⚠️ Research Model — STS Training Collapse

This model documents a training collapse caused by Selective Token Supervision (STS). It is published as a research artifact so others can understand the failure mode and avoid it.

For production use, see:

⭐ E27085921/HIKARI-Vega-8B-SkinCaption-Fused — BLEU-4: 29.33, same merged-init without STS

Overview

HIKARI-Antares applies Selective Token Supervision (STS) to the merged-init caption training (Way 2). STS assigns per-token loss weights (w_ans × w_reason) to emphasize diagnostic tokens, combined with IBR regularization (β × ||LoRA||²) to prevent overfitting.

On the SkinCAP dataset, the STS regularization proved too aggressive, suppressing gradient signal to near-zero and causing training collapse.

Property	Value
Task	Clinical caption generation + STS ablation (Stage 3)
Base model	`Qwen/Qwen3-VL-8B-Thinking`
Init strategy	Merged-Init (same as Vega)
STS	Selective Token Supervision + IBR regularization (β-weighted)
BLEU-4	0.61 (collapsed)
ROUGE-1	15.68
Model type	Merged full model

Full Stage 3 Ablation

Experiment	Init	STS	BLEU-4	ROUGE-1	Result
Way 1 — HIKARI-Rigel	checkpoint	✗	9.82	38.90	Catastrophic forgetting
Way 2 — HIKARI-Vega	merged	✗	29.33	53.55	Best ✅
Way 1 + STS	checkpoint	✓	0.00	5.03	Complete collapse ❌
Way 2 + STS (this model)	merged	✓	0.61	15.68	Collapse ❌

STS loss = −28.72 BLEU-4 compared to Vega (−97.9% relative).

🔧 Quick Inference — `transformers`

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image

model_id = "E27085921/HIKARI-Antares-8B-SkinCaption-STS"

processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)

image = Image.open("skin_lesion.jpg").convert("RGB")

PROMPT = (
    "Describe this skin lesion image in detail. Include information about its "
    "appearance, possible diagnosis, and recommended examinations."
)

messages = [{"role": "user", "content": [
    {"type": "image", "image": image},
    {"type": "text", "text": PROMPT},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=256, temperature=0.0, do_sample=False)

print(processor.batch_decode(out[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0].strip())
# Note: output quality is poor due to STS-induced training collapse

🔌 LoRA Adapter Version

from peft import PeftModel
from transformers import Qwen3VLForConditionalGeneration
import torch

base = Qwen3VLForConditionalGeneration.from_pretrained(
    "Qwen/Qwen3-VL-8B-Thinking", torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "E27085921/HIKARI-Antares-8B-SkinCaption-STS-LoRA")

→ E27085921/HIKARI-Antares-8B-SkinCaption-STS-LoRA

📄 Citation

@misc{hikari2026,
  title  = {HIKARI: RAG-in-Training for Skin Disease Diagnosis
            with Cascaded Vision-Language Models},
  author = {Watin Promfiy and Pawitra Boonprasart},
  year   = {2026},
  institution = {King Mongkut's Institute of Technology Ladkrabang,
                 Department of Information Technology, Bangkok, Thailand}
}

Made with ❤️ at King Mongkut's Institute of Technology Ladkrabang (KMITL)

Downloads last month: 6

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for E27085921/HIKARI-Antares-8B-SkinCaption-STS

Base model

Qwen/Qwen3-VL-8B-Thinking

Finetuned

(49)

this model

Quantizations

1 model

Collection including E27085921/HIKARI-Antares-8B-SkinCaption-STS

𑣲🌸ྀིྀི HIKARI Skin Disease AI 🌸˚˖⋆

Collection

Full HIKARI pipeline: Stage 1 group classifier, Stage 2 disease diagnosis (RAG), Stage 3 caption generation. All 14 models. • 8 items • Updated 29 days ago