Biomarker Extraction Model v1 (Qwen3.5-0.8B LoRA)
A fine-tuned LoRA adapter on Qwen3.5-0.8B for extracting biomarker names from clinical trial text. First version, trained on 1,000 samples.
Newer versions available: v1.1 and v2.1 (Best) achieve significantly lower loss with more training data.
Model Lineage
Qwen/Qwen3.5-0.8B
βββ v1 (1K samples) β loss 1.645 β THIS MODEL
βββ v1.1 (+1K Nemotron) β loss 1.440
βββ v2 (+1K GPT-120B) β loss 1.473
βββ v2.1 (+1K Nemotron) β loss 1.298 (BEST)
All Models in This Series
| Model | Rounds | Samples | Loss | Output | Link |
|---|---|---|---|---|---|
| v1 | 1 | 1,000 | 1.645 | comma list | This model |
| v1.1 | 2 | 2,000 | 1.051 | JSON | v1.1 |
| v2 | 2 | 2,000 | 1.473 | comma list | v2 |
| v2.1 | 3 | 3,000 | 0.951 | JSON | v2.1 (Best) |
Usage
from unsloth import FastLanguageModel
from transformers import AutoTokenizer
import torch
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="Shubh-0789/biomarker-qwen3.5-0.8b-lora",
max_seq_length=2048,
load_in_4bit=False,
load_in_16bit=True,
dtype=torch.bfloat16,
)
text_tokenizer = AutoTokenizer.from_pretrained("Shubh-0789/biomarker-qwen3.5-0.8b-lora")
FastLanguageModel.for_inference(model)
model.generation_config.pad_token_id = text_tokenizer.pad_token_id
messages = [
{"role": "user", "content": "Extract all biomarker names from the following clinical text.\nText: The patient's HbA1c was 7.2%, CRP levels elevated at 15mg/L."}
]
inputs = text_tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True,
return_tensors="pt", return_dict=True,
).to(model.device)
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.1, do_sample=True)
result = text_tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)
# Output: HbA1c, CRP
Training Configuration
LoRA rank: 8, alpha: 16
Learning rate: 2e-4
Batch size: 8, gradient accumulation: 2
Epochs: 3
Quantization: 4-bit (Note: later versions use bf16 per Unsloth guidelines)