💊 Typhoon-Si-Med-Thinking-4B: Ranked-List Medical Reasoning Model

Typhoon-Si-Med-Thinking-4B is Southeast Asia’s first state-of-the-art, small, and efficient medical reasoning model, jointly developed by Typhoon (SCB 10X) and the Siriraj Informatics and Data Innovation Center (SiData+) at Siriraj Hospital, Mahidol University.

This 4-billion-parameter instructive model is trained with reinforcement learning to generate ranked lists of candidate answers, giving users breadth and multiple perspectives. Despite its lightweight footprint, it performs robustly across multiple formats—multiple choice, short answer, and ranked list reasoning.

Traditional multiple-choice (MCQ) formats constrain models to a single “best” answer, which fails to reflect the uncertainty inherent in real clinical decision-making. In contrast, Typhoon-Si-Med-Thinking-4B adopts a ranked-list approach that mirrors how clinicians think—evaluating several plausible possibilities before making a decision. This approach better captures diagnostic uncertainty, mitigates overreliance on potentially incorrect single outputs, and fosters safer, more collaborative reasoning between models and medical professionals.

The model achieves state-of-the-art performance on medical QA benchmarks—including MedQA, MedMCQA, MedXpertQA, and MMLU Pro (Health) —surpassing larger systems such as Gemini 2.5 Pro on list-based and short-answer tasks. Its reinforcement-learning design allows it to optimally balance correctness and diversity, setting a new benchmark for efficient, domain-specific medical reasoning in Southeast Asia and beyond.

For more details, see the paper.

Performance

image

Model Description

  • Model type: A 4B instruct decoder-only model based on Qwen3 architecture.
  • Requirement: transformers 4.51.1 or newer.
  • Primary Language: English 🇬🇧
  • License: Apache 2.0 License

Usage

This is a reasoning-enabled clinical assistant model, designed to output both an intermediate reasoning process and a final answer.

Modes of Reasoning

The model supports two reasoning modes, which are enabled by prefixing the user query with special instruction strings:

  • TEXT_MODE Produces a reasoning trace enclosed within <think></think> tags, followed by a single answer.

    Use the following prefix by prepending it to the beginning of the first user message:

    "You are a helpful and harmless expert clinical assistant. The assistant first thinks about the reasoning process and then provides the user with an accurate answer. The reasoning process is enclosed within <think></think> tags followed by an answer, i.e., <think>reasoning process here</think> answer here. After thinking, when you finally reach a conclusion, clearly state the answer.\n\n"
    
  • LIST_MODE Produces a reasoning trace enclosed within <think></think> tags, followed by a ranked list of possible answers in descending order of likelihood.

    Use the following prefix by prepending it to the beginning of the first user message:

    "You are a helpful and harmless expert clinical assistant. The assistant first thinks about the reasoning process and then provides the user with an accurate answer. The reasoning process is enclosed within <think></think> tags followed by an answer, i.e., <think>reasoning process here</think> answer here. After thinking, when you finally reach a conclusion, clearly list all possible answers in order from most likely to least likely. Start with "# Final Answer" followed by numbered lines using the format `n. answer` for each answer. Each item MUST contain only the answer without any explanation or reasoning.\n\nExample:\n<think>...</think>\n\n# Final Answer\n1. xxx\n2. xxx\n\nNow the user asks you to solve a problem.\n\n"
    

You must prepend the prompt with either TEXT_MODE or LIST_MODE before passing it to the model to enable reasoning.

Quirks

  • When reasoning is enabled, the model may sometimes output the special token <tool_call> at the beginning of its response. This does not affect the reasoning or answer itself, but should be removed in post-processing.

Usage Example

This code snippet shows how to use the Typhoon-Si-Med-Thinking-4B model for text generation using the transformers library. It includes setting up the model and tokenizer, formatting chat messages in a system-user style, and generating a response.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

TEXT_MODE = "You are a helpful and harmless expert clinical assistant. The assistant first thinks about the reasoning process and then provides the user with an accurate answer. The reasoning process is enclosed within <think></think> tags followed by an answer, i.e., <think>reasoning process here</think> answer here. After thinking, when you finally reach a conclusion, clearly state the answer.\n\n"
LIST_MODE = """You are a helpful and harmless expert clinical assistant. The assistant first thinks about the reasoning process and then provides the user with an accurate answer. The reasoning process is enclosed within <think></think> tags followed by an answer, i.e., <think>reasoning process here</think> answer here. After thinking, when you finally reach a conclusion, clearly list all possible answers in order from most likely to least likely. Start with "# Final Answer" followed by numbered lines using the format `n. answer` for each answer. Each item MUST contain only the answer without any explanation or reasoning.

Example:
<think>...</think>

# Final Answer
1. xxx
2. xxx

Now the user asks you to solve a problem.\n\n"""

model_id = "scb10x/typhoon-si-med-thinking-4b-research-preview"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    device_map="auto",
)

messages = [
    {"role": "user", "content": LIST_MODE + "A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air. Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. What is the best treatment for this patient?"},
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(
    input_ids,
    max_new_tokens=4096,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
    repetition_penalty=1.05
)
response = outputs[0][input_ids.shape[-1]:]
decoded = tokenizer.decode(response, skip_special_tokens=True)

# Remove <tool_call> prefix if present
if decoded.startswith("<tool_call>"):
    decoded = decoded[len("<tool_call>"):].lstrip()

print(decoded)

Intended Uses & Limitations

This model is an instructional reasoning model and part of a research preview. It is not intended for medical use. While it incorporates some level of guardrails, it may produce answers that are inaccurate, biased, or otherwise objectionable in response to user prompts. We recommend that developers assess these risks in the context of their use case.

Follow us

https://twitter.com/opentyphoon

Support

https://discord.gg/us5gAYmrxw

Citation

If you find this model useful, please cite it using:

@misc{taveekitworachai2025singleanswerenoughgenerating,
      title={Single Answer is Not Enough: On Generating Ranked Lists with Medical Reasoning Models}, 
      author={Pittawat Taveekitworachai and Natpatchara Pongjirapat and Krittaphas Chaisutyakorn and Piyalitt Ittichaiwong and Tossaporn Saengja and Kunat Pipatanakul},
      year={2025},
      eprint={2509.20866},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.20866}, 
}
Downloads last month
113
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for typhoon-ai/typhoon-si-med-thinking-4b-research-preview

Finetuned
(1535)
this model
Quantizations
3 models

Collection including typhoon-ai/typhoon-si-med-thinking-4b-research-preview

Paper for typhoon-ai/typhoon-si-med-thinking-4b-research-preview