Clinical Trial Endpoint Classifier โ€” 0.8B (Qwen3.5-0.8B LoRA)

A fine-tuned LoRA adapter on Qwen3.5-0.8B for extracting and classifying clinical trial endpoints from outcome text. Returns structured JSON with standardized endpoint names, measurement types, methods, and more.

See also: 4B model achieves lower loss (0.485 vs 0.617) with better accuracy on complex endpoints.

Output Format

{
  "endpoints": [
    {
      "endpoint_name_standardized": "Objective Response Rate",
      "measurement_of": "tumor response",
      "measurement_type": "binary",
      "metric_type": "proportion",
      "timeframe": "Week 24",
      "measurement_method": "RECIST v1.1",
      "evaluation_criteria": "CR or PR",
      "unit": "%",
      "population": null,
      "is_composite": false,
      "components": []
    }
  ]
}

Field Definitions

Field Description Examples
endpoint_name_standardized Standardized endpoint name "Overall Survival", "HbA1c", "PASI 75 Response Rate"
measurement_of What is being measured "tumor response", "glycated hemoglobin", "psoriasis severity"
measurement_type Type of measurement continuous, binary, ordinal, time-to-event
metric_type Statistical metric mean, proportion, hazard ratio, change from baseline, count
timeframe When measurement occurs "Week 12", "baseline to 6 months", "Up to 36 months"
measurement_method How it is measured "blood test", "RECIST v1.1", "12-lead ECG"
evaluation_criteria Criteria for evaluation "PASI 75", "CR or PR", "clinically meaningful"
unit Unit of measurement "%", "mg/dL", "mm", "count"
population Specific population "adults aged 18-65", "in Italy only"
is_composite Whether composite endpoint true / false
components Components if composite ["blood pressure", "heart rate", "temperature"]

Supports multiple endpoints from a single text.

Training Details

Base model Qwen/Qwen3.5-0.8B
Method LoRA (bf16, rank 16, alpha 16)
Training data 1,948 samples (3,607 endpoints)
Epochs 3
Final loss 0.617
Training time 34 min on RTX 4090
Framework Unsloth + TRL SFTTrainer

Usage

import json
from unsloth import FastLanguageModel
from transformers import AutoTokenizer
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Shubh-0789/endpoint-qwen3.5-0.8b-lora",
    max_seq_length=2048,
    load_in_4bit=False,
    load_in_16bit=True,
    dtype=torch.bfloat16,
)
text_tokenizer = AutoTokenizer.from_pretrained("Shubh-0789/endpoint-qwen3.5-0.8b-lora")
FastLanguageModel.for_inference(model)
model.generation_config.pad_token_id = text_tokenizer.pad_token_id

clinical_text = "Overall Survival | Time from randomization to death from any cause | [Time Frame: Up to 5 years]"

messages = [
    {"role": "user", "content": f"Extract and classify the clinical trial endpoint from the following text. Return ONLY a JSON.\nText: {clinical_text}"}
]

inputs = text_tokenizer.apply_chat_template(
    messages, tokenize=True, add_generation_prompt=True,
    return_tensors="pt", return_dict=True,
).to(model.device)

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.1, do_sample=True)

result = text_tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
endpoints = json.loads(result)
print(json.dumps(endpoints, indent=2))

With PEFT/Transformers

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-0.8B", torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(base_model, "Shubh-0789/endpoint-qwen3.5-0.8b-lora")
tokenizer = AutoTokenizer.from_pretrained("Shubh-0789/endpoint-qwen3.5-0.8b-lora")

Examples

Single endpoint:

Input: "HbA1c change from baseline at Week 24"
Output: {"endpoints": [{"endpoint_name_standardized": "HbA1c", "measurement_of": "glycated hemoglobin", "measurement_type": "continuous", "metric_type": "change from baseline", "timeframe": "Week 24", ...}]}

Multiple endpoints:

Input: "Safety endpoints: AEs, laboratory safety (hematology, chemistry), vital signs, ECG"
Output: {"endpoints": [{"endpoint_name_standardized": "Adverse Events", ...}, {"endpoint_name_standardized": "Laboratory Safety", "is_composite": true, "components": ["hematology", "chemistry"], ...}, ...]}

Model Comparison

Model Parameters Loss Speed Link
0.8B 856M 0.617 Fast This model
4B 4.6B 0.485 Moderate 4B

Limitations

  • Trained on English clinical trial text only
  • Complex composite endpoints may need verification
  • Minimum inference: any GPU with 3GB+ VRAM
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Shubh-0789/endpoint-qwen3.5-0.8b-lora

Adapter
(73)
this model