JoyMed-32B-v1.0
JoyMed: A Leading Medical Foundation Model with Adaptive Reasoning
β
π₯ News
- [2026-04-02]: π We are pleased to announce the release of π€ JoyMed-32B-v1.0 ! Welcome to use and explore!
- [2026-03-27]: π We open-source the evaluation script for the related benchmark today, please refer to evaluation/Readme.md!
- [2026-03-26]: π We release π MedDocBench, a self-collected medical benchmark dataset to facilitate research and evaluation in medical multimodal learning!
- [2026-03-26]: βοΈ We have released π€ JoyMed-8B-v1.0 for research and application! Welcome to use and explore!
- [2026-03-26]: π We propose JoyMed, a medical foundation model with adaptive reasoning that balances reasoning accuracy and efficiency, achieves SOTA across multiple benchmarks, and advances the translation of medical multimodal large models to clinical applications!
π Overview
Reasoning capabilities are foundational to medical multimodal large models (MMLMs), as they enable trustworthy diagnosis, interpretable decision-making, and effective management of complex clinical cases. However, mainstream MMLMs either lack explicit reasoning capacities or rely on fixed, end-to-end, undifferentiated mandatory reasoning paradigms. This not only introduces chain-of-thought redundancy and wastes computational resources but also degrades performance in pure perceptual tasks due to unnecessary reasoning overhead. The core bottleneck lies in the inherent trade-off between reasoning accuracy and computational efficiency: thorough reasoning ensures rigor for complex tasks but imposes redundant costs on simple tasks, whereas efficiency-oriented simplified output paradigms lack the sufficient reasoning completeness critical for complex clinical decision-making. To address these challenges, we propose JoyMed, a leading medical foundation model with adaptive reasoning. Building upon direct output and reasoning-augmented output paradigms, we introduce an adaptive reasoning mechanism that directly outputs results for trivially simple tasks to optimize efficiency, while generating stepwise reasoning traces for complex tasks to balance accuracy and interpretability. Experimental results demonstrate that JoyMed achieves state-of-the-art performance across multiple benchmarks, which effectively balances the core clinical requirements of comprehensive accuracy and efficient result acquisition, marking an exploratory step toward translating MMLMs from laboratory research to practical clinical applications.
Key Features
- Superior Performance: Our model achieves leading performance across multiple core medical benchmarks,encompassing medical image understanding, text-based question answering, medical document comprehension, and medical report generation, demonstrating its robust capabilities. This outstanding performance stems from our carefully designed two-stage training strategy. First, fine-grained vision-language alignment training significantly enhances the modelβs perceptual ability for key regions such as lesions and anatomical structures. Subsequently, reinforcement on complex tasks like report generation and case analysis establishes precise associations between visual regions and textual descriptions, providing solid support for high-level medical visual understanding and question answering.
- Adaptive Reasoning: To strike an optimal balance between computational efficiency and deep reasoning, we innovatively propose an adaptive reasoning mechanism. Its core involves constructing a difficulty-tiered dataset and designing corresponding training strategies to mitigate potential mode collapse during the modelβs autonomous reasoning process, enabling independent assessment of the intrinsic complexity of problems. Our proposed model operates in three modes: direct output, chain-of-thought reasoning, and adaptive thinking. This design allows the model to intelligently allocate computational resources based on task demands while maintaining high accuracy and interpretability, thereby achieving an optimal trade-off between effectiveness and efficiency.
π Model Zoo
JoyMed comes in two variants with different parameter configurations:
| Model | Parameters | Hugging Face |
|---|---|---|
| JoyMed-8B-v1.0 | 8B | π€ JoyMed-8B-v1.0 |
| JoyMed-32B-v1.0 | 32B | π€ JoyMed-32B-v1.0 |
π Performance
Medical Textual Question Answering Benchmarks
The best results on each benchmark and average accuracy are highlighted in bold, and the scores with underline indicate the second best. Note that MedXQA and SGPQA denote MedXpertQA-Text and SuperGPQA-medical benchmarks.
| Model | PubMedQA | MedMCQA | MedXQA | CMMLU | SGPQA | MedQA (USMLE) | MedQA (MCMLE) | Medbullets (op-4) | Medbullets (op-5) | Avg. |
|---|---|---|---|---|---|---|---|---|---|---|
| Proprietary Models | ||||||||||
| GPT 4.1 | 76.00 | 87.98 | 30.82 | 81.02 | 50.60 | 77.07 | 81.73 | 78.90 | 73.38 | 70.83 |
| GPT 5 | 78.00 | 62.99 | 40.75 | 82.93 | 49.54 | 76.96 | 74.00 | 88.93 | 87.30 | 71.27 |
| Doubao Seed 1.6 | 76.00 | 75.06 | 30.67 | 91.67 | 55.19 | 93.48 | 94.02 | 82.79 | 76.62 | 75.06 |
| Open-Source Models (<10B) | ||||||||||
| MedGemma 4B | 73.00 | 52.26 | 13.10 | 43.96 | 21.52 | 55.54 | 41.10 | 48.05 | 42.53 | 43.45 |
| Qwen3-VL 8B | 73.20 | 60.05 | 14.98 | 79.07 | 35.68 | 65.67 | 85.61 | 55.84 | 48.70 | 57.64 |
| HuatuoGPT-V 7B | 73.60 | 51.95 | 10.33 | 71.12 | 22.11 | 52.95 | 73.09 | 43.51 | 37.66 | 48.48 |
| Lingshu 7B | 75.40 | 56.13 | 16.45 | 69.02 | 27.51 | 63.39 | 75.98 | 62.66 | 52.92 | 55.50 |
| Citrus-V 8B | 74.80 | 55.10 | 16.90 | 71.19 | 29.47 | 64.89 | 76.94 | 59.09 | 54.22 | 55.84 |
| Hulu-Med 7B | 77.20 | 67.51 | 18.53 | 71.72 | 31.10 | 73.45 | 78.93 | 64.94 | 57.47 | 60.09 |
| JoyMed 8B | 78.20 | 65.36 | 23.67 | 82.75 | 37.10 | 82.64 | 92.06 | 73.05 | 68.18 | 67.00 |
| JoyMed 8B auto | 79.40 | 66.58 | 23.55 | 83.05 | 38.04 | 84.84 | 91.42 | 74.35 | 70.46 | 67.96 |
| JoyMed 8B thinking | 79.00 | 66.89 | 24.37 | 83.35 | 39.78 | 85.23 | 91.77 | 73.38 | 70.13 | 68.21 |
| Open-Source Models (>10B) | ||||||||||
| MedGemma 27B | 79.00 | 63.23 | 22.01 | 60.24 | 33.18 | 81.15 | 64.89 | 67.86 | 65.58 | 59.68 |
| Qwen3-VL 32B | 72.00 | 69.57 | 18.65 | 86.87 | 46.24 | 77.77 | 89.73 | 63.96 | 53.90 | 64.30 |
| HealthGPT 14B | 69.40 | 63.33 | 12.45 | 55.36 | 25.59 | 66.93 | 52.83 | 53.57 | 50.00 | 49.94 |
| HealthGPT 32B | 74.20 | 64.04 | 13.84 | 69.47 | 35.43 | 68.89 | 68.86 | 50.65 | 46.43 | 54.65 |
| HuatuoGPT-V 34B | 71.00 | 55.08 | 12.20 | 77.64 | 28.06 | 58.52 | 76.09 | 44.81 | 39.29 | 51.41 |
| Lingshu 32B | 78.20 | 65.05 | 22.86 | 82.37 | 40.80 | 74.94 | 86.98 | 68.51 | 63.31 | 64.78 |
| Citrus-V 33B | 78.40 | 65.62 | 22.20 | 83.27 | 41.63 | 80.28 | 87.65 | 67.53 | 66.23 | 65.87 |
| Hulu-Med 14B | 79.80 | 70.28 | 23.67 | 75.62 | 37.75 | 78.48 | 80.88 | 70.45 | 67.86 | 64.98 |
| Hulu-Med 32B | 80.20 | 72.56 | 23.92 | 76.07 | 41.71 | 80.13 | 84.30 | 71.75 | 68.18 | 66.54 |
| JoyMed 32B | 80.20 | 72.72 | 32.65 | 88.22 | 48.93 | 87.43 | 93.90 | 80.52 | 76.30 | 73.43 |
| JoyMed 32B auto | 80.00 | 72.96 | 31.35 | 88.30 | 48.93 | 90.34 | 94.31 | 81.82 | 78.57 | 74.06 |
| JoyMed 32B thinking | 78.60 | 73.39 | 31.31 | 89.27 | 51.25 | 89.16 | 94.31 | 80.84 | 76.62 | 73.86 |
Medical Visual Question Answering Benchmarks
The best results on each benchmark and average accuracy are highlighted in bold, and the scores with underline indicate the second best. Note that MedXQA and GMAI-MMB denote MedXpertQA-mm and GMAI-MMBench-test benchmarks.
| Model | VQA-RAD | MedXQA | SLAKE | PATH-VQA | PMC-VQA | OmniMedVQA | GMAI-MMB | Avg. |
|---|---|---|---|---|---|---|---|---|
| Proprietary Models | ||||||||
| GPT 4.1 | 62.53 | 43.35 | 72.54 | 54.97 | 38.76 | 55.14 | 58.52 | 55.12 |
| GPT 5 | 68.37 | 51.48 | 65.82 | 31.74 | 36.10 | 38.44 | 56.18 | 49.73 |
| Doubao Seed 1.6 | 33.49 | 45.75 | 67.28 | 47.58 | 49.94 | 61.68 | 48.50 | 50.60 |
| Open-Source Models (<10B) | ||||||||
| MedGemma 4B | 72.06 | 22.05 | 78.32 | 48.64 | 48.02 | 70.04 | 45.59 | 54.96 |
| Qwen3-VL 7B | 63.41 | 25.00 | 72.11 | 43.65 | 54.01 | 76.90 | 54.31 | 55.63 |
| HuatuoGPT-V 7B | 67.85 | 22.30 | 69.39 | 44.29 | 53.84 | 75.14 | 51.56 | 54.91 |
| Lingshu 7B | 68.74 | 26.90 | 82.90 | 60.23 | 55.77 | 82.41 | 54.02 | 61.57 |
| Citrus-V 8B | 64.30 | 25.10 | 84.91 | 62.00 | 55.64 | 72.69 | 45.43 | 57.45 |
| Hulu-Med 7B | 74.50 | 27.70 | 82.66 | 62.57 | 66.95 | 83.70 | 54.28 | 64.62 |
| JoyMed 8B | 75.83 | 32.60 | 86.53 | 74.16 | 57.19 | 82.36 | 59.85 | 66.93 |
| JoyMed 8B auto | 76.50 | 33.25 | 87.97 | 75.06 | 58.34 | 81.47 | 60.37 | 67.56 |
| JoyMed 8B thinking | 79.16 | 33.20 | 86.82 | 75.34 | 58.52 | 81.43 | 60.35 | 67.83 |
| Open-Source Models (>10B) | ||||||||
| MedGemma 27B | 63.86 | 33.10 | 76.17 | 47.60 | 45.35 | 59.78 | 40.59 | 52.35 |
| Qwen3-VL 32B | 68.96 | 30.15 | 77.41 | 49.78 | 57.28 | 77.90 | 55.86 | 59.62 |
| Lingshu 32B | 75.39 | 31.00 | 87.68 | 64.76 | 57.23 | 82.95 | 55.32 | 63.21 |
| HealthGPT 14B | 64.08 | 24.55 | 67.43 | 58.67 | 56.90 | 76.45 | 45.67 | 56.25 |
| HealthGPT 32B | 64.75 | 26.40 | 70.58 | 62.93 | 54.93 | 73.09 | 46.61 | 57.04 |
| HuatuoGPT-V 34B | 63.64 | 22.65 | 73.02 | 44.92 | 56.79 | 73.93 | 54.29 | 55.61 |
| Citrus-V 33B | 77.83 | 29.15 | 88.40 | 63.89 | 59.74 | 77.02 | 53.50 | 64.22 |
| Hulu-Med 14B | 74.70 | 29.65 | 84.24 | 64.55 | 68.80 | 85.22 | 57.42 | 66.37 |
| Hulu-Med 32B | 79.60 | 33.40 | 85.86 | 66.44 | 69.52 | 84.41 | 59.60 | 68.40 |
| JoyMed 33B | 88.69 | 41.25 | 92.93 | 91.29 | 61.46 | 84.75 | 61.02 | 74.48 |
| JoyMed 33B auto | 89.14 | 42.45 | 94.70 | 92.19 | 61.17 | 84.22 | 61.21 | 75.01 |
| JoyMed 33B thinking | 87.81 | 41.00 | 93.27 | 89.49 | 61.32 | 81.31 | 60.62 | 73.54 |
Medical Document Understanding Benchmarks
The best results on each benchmark and average accuracy are highlighted in bold, and the scores with underline indicate the second best.
| Laboratory Test Report | GMD | |||||
|---|---|---|---|---|---|---|
| Model | abnormalityQA | fullparsing | simpleQA | Simple QA | Complex QA | Avg. |
| Proprietary Models | ||||||
| GPT 4.1 | 45.39 | 73.97 | 66.00 | 45.60 | 55.85 | 57.36 |
| GPT 5 | 71.87 | 71.41 | 78.75 | 59.35 | 57.04 | 67.68 |
| Doubao Seed 1.6 | 82.16 | 81.09 | 85.00 | 73.60 | 79.25 | 80.22 |
| Open-Source Models (<10B) | ||||||
| MedGemma 4B | 13.31 | 36.39 | 17.75 | 18.10 | 20.05 | 21.12 |
| Qwen3-VL 8B | 48.17 | 79.71 | 84.50 | 78.72 | 79.05 | 74.03 |
| HuatuoGPT-V 7B | 7.54 | 32.14 | 9.50 | 17.40 | 9.60 | 15.24 |
| Lingshu 7B | 29.50 | 62.70 | 70.25 | 60.47 | 53.70 | 55.32 |
| Citrus-V 8B | 91.16 | 92.57 | 92.57 | 83.22 | 89.45 | 90.18 |
| Hulu-Med 7B | 11.35 | 43.07 | 19.75 | 19.30 | 15.72 | 21.84 |
| JoyMed 8B | 88.99 | 93.39 | 92.00 | 83.47 | 86.72 | 88.91 |
| JoyMed 8B auto | 90.80 | 88.87 | 94.00 | 85.20 | 85.67 | 88.91 |
| JoyMed 8B thinking | 90.80 | 88.87 | 94.00 | 85.10 | 85.47 | 88.85 |
| Open-Source Models (>10B) | ||||||
| MedGemma 27B | 10.89 | 35.15 | 28.25 | 19.50 | 14.55 | 21.67 |
| Qwen3-VL 32B | 50.86 | 83.85 | 94.50 | 81.00 | 84.90 | 79.02 |
| Lingshu 32B | 35.34 | 71.19 | 74.00 | 62.00 | 58.38 | 60.18 |
| HealthGPT 14B | 5.40 | 39.37 | 13.00 | 20.00 | 9.20 | 19.26 |
| HealthGPT 32B | 6.12 | 33.91 | 21.75 | 21.10 | 9.97 | 20.59 |
| HuatuoGPT-V 34B | 7.81 | 32.78 | 10.75 | 19.00 | 8.00 | 17.11 |
| Citrus-V 33B | 92.45 | 92.62 | 97.75 | 84.30 | 83.78 | 90.18 |
| Hulu-Med 14B | 7.59 | 45.02 | 26.75 | 22.40 | 15.40 | 26.45 |
| Hulu-Med 32B | 11.45 | 41.32 | 31.50 | 30.30 | 18.90 | 28.09 |
| JoyMed 32B | 90.12 | 93.68 | 91.50 | 84.30 | 85.96 | 91.77 |
| JoyMed 32B auto | 94.65 | 94.67 | 94.00 | 87.50 | 85.85 | 94.44 |
| JoyMed 32B thinking | 94.58 | 94.46 | 94.50 | 87.40 | 81.40 | 94.51 |
Medical Image Report Generation Benchmarks
The best results on each benchmark and average accuracy are highlighted in bold, and the scores with underline indicate the second best.
| CheXpert Plus | IU XRAY | |||
|---|---|---|---|---|
| Model | ROUGE-L | RaTE | ROUGE-L | RaTE |
| Proprietary Models | ||||
| GPT 4.1 | 24.50 | 45.50 | 32.63 | 50.91 |
| GPT 5 | 24.48 | 51.26 | 31.72 | 56.64 |
| Doubao Seed 1.6 | 19.27 | 45.49 | 22.67 | 53.76 |
| Open-Source Models (<10B) | ||||
| MedGemma 4B | 26.01 | 51.23 | 39.51 | 61.99 |
| Qwen3-VL 7B | 21.64 | 46.51 | 25.22 | 52.88 |
| HuatuoGPT-V 7B | 21.40 | 46.58 | 29.96 | 54.91 |
| Lingshu 7B | 26.50 | 45.40 | 44.52 | 60.30 |
| Citrus-V 8B | 28.94 | 51.07 | 24.78 | 56.25 |
| Hulu-Med 7B | 28.94 | 51.07 | 36.15 | 63.50 |
| JoyMed 8B | 32.54 | 55.68 | 42.85 | 64.90 |
| JoyMed 8B auto | 31.68 | 55.73 | 44.94 | 65.96 |
| JoyMed 8B thinking | 31.99 | 55.89 | 44.40 | 65.70 |
| Open-Source Models (>10B) | ||||
| MedGemma 27B | 17.65 | 48.73 | 32.59 | 58.70 |
| Qwen3-VL 32B | 19.28 | 47.60 | 25.96 | 59.12 |
| Lingshu 32B | 25.29 | 46.18 | 45.06 | 65.22 |
| HealthGPT 14B | 21.29 | 47.82 | 23.89 | 52.33 |
| HealthGPT 32B | 12.50 | 45.15 | 12.74 | 45.75 |
| HuatuoGPT-V 34B | 23.97 | 45.51 | 29.12 | 57.87 |
| Citrus-V 33B | 29.58 | 52.45 | 45.66 | 64.74 |
| Hulu-Med 14B | 23.08 | 50.25 | 36.15 | 63.50 |
| Hulu-Med 32B | 29.20 | 52.87 | 47.02 | 65.43 |
| JoyMed 32B | 30.79 | 54.33 | 42.19 | 63.85 |
| JoyMed 32B auto | 28.59 | 54.28 | 40.63 | 63.14 |
| JoyMed 32B thinking | 30.04 | 53.60 | 44.05 | 66.19 |
π οΈ Installation
Installing vLLM
uv venv source .venv/bin/activate uv pip install -U vllm --torch-backend=auto # Update transformers to support the latest models. pip install -U "transformers==5.2.*"Running JoyMed-8B-v1.0
vllm serve jdh-algo/JoyMed-8B-v1.0 \ --tensor-parallel-size 8 \ --mm-encoder-tp-mode data \ --mm-processor-cache-type shm \ --enable-prefix-caching \ --trust-remote-code \ --gpu-memory-utilization 0.9
π» Quick Start
Instruct mode
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY"
)
response = client.chat.completions.create(
model="jdh-algo/JoyMed-8B-v1.0",
messages=[{"role": "user", "content": "What are the common causes of hypertension in adults? /no_think"}] # end with '/no_think' or nothing
)
print(response.choices[0].message.content)
Thinking mode
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY"
)
response = client.chat.completions.create(
model="jdh-algo/JoyMed-8B-v1.0",
messages=[{"role": "user", "content": "What are the common causes of hypertension in adults? /think"}]
)
print(response.choices[0].message.content)
Auto mode
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY"
)
response = client.chat.completions.create(
model="jdh-algo/JoyMed-8B-v1.0",
messages=[{"role": "user", "content": "What are the common causes of hypertension in adults? /auto_think"}]
)
print(response.choices[0].message.content)
π License
This project is licensed under the Apache License (Version 2.0). For models and datasets, please refer to the original resource page and follow the corresponding License.
- Downloads last month
- 55
