HIKARI-Antares-8B-SkinCaption-STS
Healthcare-oriented Intelligent Knowledge Augmented Retrieval and Inference
Named after Antares — red supergiant in Scorpius, intense and unstable, like STS training collapse
📦 Model Type: Merged Full Model
This is a fully merged model — the LoRA adapter weights have been merged directly into the base model weights.
✅ No adapter loading needed. Load directly with
transformers,vLLM, orSGLang.💾 Size: ~17 GB (4 safetensor shards)
🔌 Lightweight adapter version: E27085921/HIKARI-Antares-8B-SkinCaption-STS-LoRA (~1.2 GB)
⚠️ Research Model — STS Training Collapse
This model documents a training collapse caused by Selective Token Supervision (STS). It is published as a research artifact so others can understand the failure mode and avoid it.
For production use, see:
⭐ E27085921/HIKARI-Vega-8B-SkinCaption-Fused — BLEU-4: 29.33, same merged-init without STS
Overview
HIKARI-Antares applies Selective Token Supervision (STS) to the merged-init caption training (Way 2). STS assigns per-token loss weights (w_ans × w_reason) to emphasize diagnostic tokens, combined with IBR regularization (β × ||LoRA||²) to prevent overfitting.
On the SkinCAP dataset, the STS regularization proved too aggressive, suppressing gradient signal to near-zero and causing training collapse.
| Property | Value |
|---|---|
| Task | Clinical caption generation + STS ablation (Stage 3) |
| Base model | Qwen/Qwen3-VL-8B-Thinking |
| Init strategy | Merged-Init (same as Vega) |
| STS | Selective Token Supervision + IBR regularization (β-weighted) |
| BLEU-4 | 0.61 (collapsed) |
| ROUGE-1 | 15.68 |
| Model type | Merged full model |
Full Stage 3 Ablation
| Experiment | Init | STS | BLEU-4 | ROUGE-1 | Result |
|---|---|---|---|---|---|
| Way 1 — HIKARI-Rigel | checkpoint | ✗ | 9.82 | 38.90 | Catastrophic forgetting |
| Way 2 — HIKARI-Vega | merged | ✗ | 29.33 | 53.55 | Best ✅ |
| Way 1 + STS | checkpoint | ✓ | 0.00 | 5.03 | Complete collapse ❌ |
| Way 2 + STS (this model) | merged | ✓ | 0.61 | 15.68 | Collapse ❌ |
STS loss = −28.72 BLEU-4 compared to Vega (−97.9% relative).
🔧 Quick Inference — transformers
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
import torch
from PIL import Image
model_id = "E27085921/HIKARI-Antares-8B-SkinCaption-STS"
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = Qwen3VLForConditionalGeneration.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True
)
image = Image.open("skin_lesion.jpg").convert("RGB")
PROMPT = (
"Describe this skin lesion image in detail. Include information about its "
"appearance, possible diagnosis, and recommended examinations."
)
messages = [{"role": "user", "content": [
{"type": "image", "image": image},
{"type": "text", "text": PROMPT},
]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(**inputs, max_new_tokens=256, temperature=0.0, do_sample=False)
print(processor.batch_decode(out[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)[0].strip())
# Note: output quality is poor due to STS-induced training collapse
🔌 LoRA Adapter Version
from peft import PeftModel
from transformers import Qwen3VLForConditionalGeneration
import torch
base = Qwen3VLForConditionalGeneration.from_pretrained(
"Qwen/Qwen3-VL-8B-Thinking", torch_dtype=torch.bfloat16, device_map="auto"
)
model = PeftModel.from_pretrained(base, "E27085921/HIKARI-Antares-8B-SkinCaption-STS-LoRA")
→ E27085921/HIKARI-Antares-8B-SkinCaption-STS-LoRA
📄 Citation
@misc{hikari2026,
title = {HIKARI: RAG-in-Training for Skin Disease Diagnosis
with Cascaded Vision-Language Models},
author = {Watin Promfiy and Pawitra Boonprasart},
year = {2026},
institution = {King Mongkut's Institute of Technology Ladkrabang,
Department of Information Technology, Bangkok, Thailand}
}
Made with ❤️ at King Mongkut's Institute of Technology Ladkrabang (KMITL)
- Downloads last month
- 6