Biomarker Extraction Model v2 (Qwen3.5-0.8B LoRA)
A fine-tuned LoRA adapter on Qwen3.5-0.8B for extracting biomarker names from clinical trial text. Trained across 2 rounds on 2,000 total samples.
Newer version available: v2.1 (Best) continues training with Nemotron-labeled data and achieves loss 1.298.
Model Lineage
Qwen/Qwen3.5-0.8B
βββ v1 (1K samples) β loss 1.645
βββ v2 (+1K GPT-120B samples, bf16) β loss 1.473 β THIS MODEL
βββ v2.1 (+1K Nemotron samples) β loss 1.298 (BEST)
All Models in This Series
| Model | Rounds | Samples | Loss | Output | Link |
|---|---|---|---|---|---|
| v1 | 1 | 1,000 | 1.645 | comma list | v1 |
| v1.1 | 2 | 2,000 | 1.051 | JSON | v1.1 |
| v2 | 2 | 2,000 | 1.473 | comma list | This model |
| v2.1 | 3 | 3,000 | 0.951 | JSON | v2.1 (Best) |
Usage
from unsloth import FastLanguageModel
from transformers import AutoTokenizer
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="Shubh-0789/biomarker-qwen3.5-0.8b-lora-v2",
max_seq_length=2048,
load_in_4bit=False,
load_in_16bit=True,
dtype=torch.bfloat16,
)
text_tokenizer = AutoTokenizer.from_pretrained("Shubh-0789/biomarker-qwen3.5-0.8b-lora-v2")
FastLanguageModel.for_inference(model)
model.generation_config.pad_token_id = text_tokenizer.pad_token_id
messages = [
{"role": "user", "content": "Extract all biomarker names from the following clinical text.\nText: The patient's HbA1c was 7.2%, CRP levels elevated at 15mg/L."}
]
inputs = text_tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True,
return_tensors="pt", return_dict=True,
).to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1, do_sample=True)
result = text_tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)
# Output: HbA1c, CRP
Training Configuration
Method: LoRA (bf16, per Unsloth Qwen3.5 guidelines)
LoRA rank: 16, alpha: 16
Learning rate: 1e-4 (cosine)
Batch size: 8, gradient accumulation: 2
Epochs: 3
Optimizer: adamw_8bit
Hardware: NVIDIA RTX A6000 (48GB), ~8.5 min