Instructions to use UCSC-VLAA/ClinSeek-35B-A3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use UCSC-VLAA/ClinSeek-35B-A3B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="UCSC-VLAA/ClinSeek-35B-A3B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("UCSC-VLAA/ClinSeek-35B-A3B") model = AutoModelForImageTextToText.from_pretrained("UCSC-VLAA/ClinSeek-35B-A3B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use UCSC-VLAA/ClinSeek-35B-A3B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "UCSC-VLAA/ClinSeek-35B-A3B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "UCSC-VLAA/ClinSeek-35B-A3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/UCSC-VLAA/ClinSeek-35B-A3B
- SGLang
How to use UCSC-VLAA/ClinSeek-35B-A3B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "UCSC-VLAA/ClinSeek-35B-A3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "UCSC-VLAA/ClinSeek-35B-A3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "UCSC-VLAA/ClinSeek-35B-A3B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "UCSC-VLAA/ClinSeek-35B-A3B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use UCSC-VLAA/ClinSeek-35B-A3B with Docker Model Runner:
docker model run hf.co/UCSC-VLAA/ClinSeek-35B-A3B
ClinSeek-35B-A3B
ClinSeek-35B-A3B is our open-source model for
ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical
Reasoning. We trained it by supervised
fine-tuning from Qwen/Qwen3.5-35B-A3B on ClinSeekAgent trajectories generated
by Claude Opus 4.6.
ClinSeekAgent studies a clinical reasoning setting where evidence is not handed to the model in a pre-curated prompt. Instead, an agent must actively retrieve patient-specific evidence from raw EHR tables, consult external medical knowledge when needed, and synthesize the acquired evidence into a final decision. ClinSeek-35B-A3B is trained to imitate this long-horizon evidence seeking behavior in native tool-call format.
Release Information
| Item | Value |
|---|---|
| Model | ClinSeek-35B-A3B |
| Base model | Qwen/Qwen3.5-35B-A3B |
| Training method | Supervised fine-tuning |
| Teacher model | Claude Opus 4.6 |
| Training signal | ClinSeekAgent evidence-seeking trajectories |
| Primary target setting | Agentic EHR evidence seeking |
| Technical report | https://arxiv.org/abs/2605.20176 |
| Code | https://github.com/UCSC-VLAA/ClinSeekAgent |
| Benchmark metadata | https://huggingface.co/datasets/UCSC-VLAA/ClinSeek-Bench |
| Project page | https://ucsc-vlaa.github.io/ClinSeekAgent/ |
Training Data And Objective
ClinSeek-35B-A3B validates ClinSeekAgent as a training-time pipeline. Claude Opus 4.6 is used as the teacher model to generate ClinSeekAgent trajectories from the training split of the text-based benchmark. The student model is then fine-tuned with supervised learning on the resulting trajectories.
The trajectories are rendered in native tool-call format with
<tool_call> / <tool_response> turns, teaching the model how to search the
EHR rather than only imitate final answers.
Training configuration:
| Component | Configuration |
|---|---|
| Base model | Qwen3.5-35B-A3B |
| Training objective | SFT on ClinSeekAgent trajectories |
| Training / validation size | 7,204 / 147 examples |
| Maximum sequence length | 52,000 tokens |
| Training epochs | 3 |
| Global batch size | 32 |
| Micro batch size | 1 per GPU |
| Optimizer | Megatron optimizer with CPU offload |
| Learning rate | 2e-5 |
| Minimum learning rate | 2e-6 |
| Learning rate schedule | Cosine decay with 10 warmup steps |
| Weight decay | 0.1 |
| Gradient clipping | 1.0 |
| Precision | bfloat16 |
| Backend | Megatron + mbridge |
| Hardware | 8 H200 GPUs |
| Tensor / expert / pipeline parallelism | TP=2, EP=8, PP=1 |
| Random seed | 42 |
This release contains the model weights and tokenizer files. It does not redistribute protected clinical source data, patient-level databases, private trajectories, experiment logs, or raw MIMIC-derived records.
Evaluation
We evaluate ClinSeek-35B-A3B on the five-task AgentEHR-Bench setting. The model improves the Qwen3.5-35B-A3B base model from 22.1 to 34.0 average F1, a +11.9 point gain, and achieves the strongest open-source performance among the evaluated models.
| Model | Diagnoses | Labs | Microbiology | Procedures | Transfers | Avg. |
|---|---|---|---|---|---|---|
| Qwen3.5-35B-A3B (base) | 36.6 | 17.7 | 16.2 | 21.9 | 18.1 | 22.1 |
| ClinSeek-35B-A3B | 55.4 | 38.5 | 27.6 | 31.7 | 16.7 | 34.0 |
| Delta | +18.8 | +20.8 | +11.4 | +9.8 | -1.4 | +11.9 |
Our analysis shows that the distilled model learns a different tool-use policy, not just a different final-answer prior. On the same 500 AgentEHR-Bench questions, its free-form SQL use increases from 649 calls in the base model to 3,932 calls after SFT, suggesting that ClinSeekAgent trajectories teach the student to treat the EHR as a programmable database.
For full evaluation scripts and benchmark reconstruction instructions, see: https://github.com/UCSC-VLAA/ClinSeekAgent.
Usage
Use the checkpoint with a recent transformers release that supports
Qwen3.5-MoE models. For the evaluation setting used in this work, serve the
model with an OpenAI-compatible backend such as vLLM and run the ClinSeekAgent
evaluation drivers.
Basic loading example:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "UCSC-VLAA/ClinSeek-35B-A3B"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
messages = [
{
"role": "system",
"content": "You are a clinical evidence-seeking assistant.",
},
{
"role": "user",
"content": "Answer the clinical question using the available evidence.",
},
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
with torch.no_grad():
output_ids = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
For tool-using evaluation, use the ClinSeekAgent repository rather than a single-turn text generation script. The repository provides the EHR MCP server, tool schemas, prompts, and scoring code expected by this model.
Citation
Please cite our ClinSeekAgent technical report if you use this model:
@article{clinseekagent2026,
title = {ClinSeekAgent: Automating Multimodal Evidence Seeking for Agentic Clinical Reasoning},
year = {2026},
url = {https://arxiv.org/abs/2605.20176}
}
Also cite the upstream datasets, benchmarks, and base models used in your experiments, including MIMIC, AgentEHR-Bench, and Qwen3.5-35B-A3B where applicable.
- Downloads last month
- 4
Model tree for UCSC-VLAA/ClinSeek-35B-A3B
Collection including UCSC-VLAA/ClinSeek-35B-A3B
Paper for UCSC-VLAA/ClinSeek-35B-A3B
Evaluation results
- Average F1 on AgentEHR-Benchself-reported34.000