Instructions to use Jashan887/97_Learning_Unit_L1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Jashan887/97_Learning_Unit_L1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Jashan887/97_Learning_Unit_L1", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("Jashan887/97_Learning_Unit_L1", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Jashan887/97_Learning_Unit_L1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Jashan887/97_Learning_Unit_L1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jashan887/97_Learning_Unit_L1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Jashan887/97_Learning_Unit_L1

SGLang

How to use Jashan887/97_Learning_Unit_L1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Jashan887/97_Learning_Unit_L1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jashan887/97_Learning_Unit_L1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Jashan887/97_Learning_Unit_L1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jashan887/97_Learning_Unit_L1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Jashan887/97_Learning_Unit_L1 with Docker Model Runner:
```
docker model run hf.co/Jashan887/97_Learning_Unit_L1
```

Jashan887 commited on 5 days ago

Commit

112d755

verified ·

1 Parent(s): 4401cc9

Upload folder using huggingface_hub

Browse files

Files changed (14) hide show

.gitattributes +3 -0
LICENSE +190 -0
README.md +249 -0
banner.png +3 -0
chat_template.jinja +103 -0
config.json +47 -0
configuration_gravity_moe.py +118 -0
consortium.png +3 -0
generation_config.json +11 -0
model.safetensors +3 -0
modeling_gravity_moe.py +56 -0
special_tokens_map.json +40 -0
tokenizer.json +3 -0
tokenizer_config.json +325 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+tokenizer.json filter=lfs diff=lfs merge=lfs -text
+banner.png filter=lfs diff=lfs merge=lfs -text
+consortium.png filter=lfs diff=lfs merge=lfs -text

LICENSE ADDED Viewed

	@@ -0,0 +1,190 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to the Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by the Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding any notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   Copyright 2026 Lunit Inc.
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

README.md ADDED Viewed

	@@ -0,0 +1,249 @@

+---
+license: apache-2.0
+language:
+  - en
+base_model:
+  - trillionlabs/Gravity-16B-A3B-Base
+tags:
+  - medical
+  - clinical
+  - mixture-of-experts
+  - conversational
+  - sft
+library_name: transformers
+pipeline_tag: text-generation
+---
+<p align="center">
+  <img src="banner.png" alt="L1" style="width: 80%;">
+</p>
+# Learning Unit 1
+**L1** (Learning Unit 1) is the first language model from [Lunit](https://www.lunit.io) and Lunit Consortium, purpose-built for the medical domain. Derived from [Gravity-16B-A3B-Base](https://huggingface.co/trillionlabs/Gravity-16B-A3B-Base), L1 is designed for clinical reasoning and decision support.
+### ✨ Key Highlights
+* 🩺 **Medical-Domain Specialized**: Developed specifically for clinical reasoning and medical decision support
+* ⚡ **Efficient MoE**: Only 3B parameters active per token out of 16.24B total — fast inference with high capacity
+* 💭 **Thinking Model**: Performs step-by-step reasoning in `<think>` tags before generating the final answer
+> **Note:** L1 reasons internally using `<think>...</think>` blocks before producing a response. This chain-of-thought process improves answer quality but consumes additional tokens. Set `max_tokens` accordingly (recommended: 2048+).
+### 📋 Model Specifications
+- Type: Causal Language Model
+- Base Model: [Gravity-16B-A3B-Base](https://huggingface.co/trillionlabs/Gravity-16B-A3B-Base) from Trillion Labs and Lunit Consortium
+- Architecture: GravityMoE (Sparse Mixture-of-Experts with MLA)
+- Total Parameters: 16.24B
+- Active Parameters: 3B
+- Number of Layers: 28
+- Attention Heads: 16
+- KV Heads: 16
+- Hidden Size: 2048
+- MoE Intermediate Size: 1408
+- Routed Experts: 64 (top-8 selection)
+- Shared Experts: 1
+- Context Length: 32,768 tokens
+- Vocabulary Size: 151,552
+- Tokenizer: GLM-4.5
+- Precision: bf16
+## 🚀 Quickstart
+### SGLang (Recommended)
+**Install:**
+```bash
+pip install "sglang[all] @ git+https://github.com/trillion-labs/sglang-gravity.git#subdirectory=python"
+```
+**Launch server:**
+```bash
+python -m sglang.launch_server \
+  --model-path learning-unit/L1-16B-A3B \
+  --port 9006 --host 0.0.0.0 \
+  --tp 1 --dtype bfloat16 --trust-remote-code \
+  --attention-backend triton \
+  --moe-runner-backend triton
+```
+**Query:**
+```bash
+curl -X POST http://localhost:9006/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "learning-unit/L1-16B-A3B",
+    "messages": [
+      {"role": "user", "content": "What are the diagnostic criteria for sepsis?"}
+    ],
+    "max_tokens": 2048
+  }'
+```
+### Transformers
+**Install:**
+```bash
+pip install "transformers>=5.0" torch
+```
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "learning-unit/L1-16B-A3B"
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+    trust_remote_code=True,
+)
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+messages = [
+    {"role": "user", "content": "What are the diagnostic criteria for sepsis?"}
+]
+text = tokenizer.apply_chat_template(
+    messages,
+    tokenize=False,
+    add_generation_prompt=True,
+)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(
+    **model_inputs,
+    max_new_tokens=2048,
+    temperature=0.7,
+    do_sample=True,
+)
+generated_ids = [
+    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
+]
+response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
+print(response)
+```
+## 💬 Examples
+L1 is specialized for the medical domain and covers a wide range of clinical scenarios. Below are representative examples from real-world clinical use cases.
+### Medical Q&A
+> A 45-year-old woman with lupus nephritis on mycophenolate and prednisone develops fever, dry cough, and bilateral ground-glass opacities on chest CT. Her CD4 count is 180. What is your differential diagnosis and recommended workup?
+### Patient Education
+> I have diabetes and use insulin daily. What is the proper way to store insulin at home?
+### Clinical Documentation
+> Please draft an overnight progress note. Patient labs: RBC 4.5, WBC 8. Vitals: HR 82, BP 118/76, RR 15, Temp 37.1. Nurse reports stable overnight. Plan: continue antibiotics, recheck labs in the morning.
+### Emergency Triage
+> 다음 응급실 환자에 대해 KTAS triage를 수행하고, 초기 진단 및 감별진단을 제시해주세요. 78세 여성 환자가 119 구급차로 응급실에 내원했습니다. 22시경 갑자기 좌측 안면이 처지고 말이 어눌해지는 증상이 발생했습니다. 두통을 호소하며, 고혈압 병력이 있습니다. 활력징후는 혈압 172/88, 심박수 92, 호흡수 14, 체온 36.8, 산소포화도 98%이고 의식은 명료합니다. 사지 위약감은 없습니다.
+### Adverse Drug Reaction (ADR) Causality Assessment
+> 다음 환자의 약물이상반응(ADR)에 대해 WHO-UMC 기준으로 인과관계를 평가해주세요. 80세 여성 환자가 기관지확장증으로 입원 중 moxifloxacin 400mg IV를 투여받았습니다. 투여 중 전신 피부 가려움이 새로 발생했고, 약물 중단 후 환자 본인도 가려움이 줄어드는 양상을 표현했으며 이후 회복되었습니다. 재투여는 시행하지 않았습니다. 기존 약물 알레르기력은 없고, 가려움을 유발할 만한 다른 병용약물이나 피부질환은 확인되지 않았습니다.
+## 📊 Benchmark
+All benchmarks were evaluated using [CoEval](https://github.com/lunit-io/CoEval), Lunit's open-source medical LLM evaluation framework. Evaluations use greedy decoding (temperature=0). To reproduce these results:
+```bash
+git clone https://github.com/lunit-io/CoEval.git
+cd CoEval
+```
+Refer to the [CoEval Quickstart](https://github.com/lunit-io/CoEval#quickstart) for setup and evaluation instructions.
+### MCQA Benchmarks
+| Model | [PubMedQA](https://huggingface.co/datasets/qiaojin/PubMedQA) | [AttrBench](https://huggingface.co/datasets/osunlp/AttributionBench) | [MedQA](https://huggingface.co/datasets/GBaker/MedQA-USMLE-4-options) | [CareQA](https://huggingface.co/datasets/HPAI-BSC/CareQA) | [HeadQA](https://huggingface.co/datasets/alesi12/head_qa_v2) | [MedMCQA](https://huggingface.co/datasets/lighteval/med_mcqa) | [MMLU-Pro (Health)](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro) | [M-ARC](https://huggingface.co/datasets/mkieffer/M-ARC) | [MetaMedQA](https://huggingface.co/datasets/maximegmd/MetaMedQA) | [MedHallu](https://huggingface.co/datasets/UTAustin-AIHealth/MedHallu) | [MedCalc](https://huggingface.co/datasets/ncbi/MedCalc-Bench) | [MedBullets](https://huggingface.co/datasets/mkieffer/Medbullets) 4-opt | [MedBullets](https://huggingface.co/datasets/mkieffer/Medbullets) 5-opt | [MedXpertQA](https://huggingface.co/datasets/TsinghuaC3I/MedXpertQA)-R | [MedXpertQA](https://huggingface.co/datasets/TsinghuaC3I/MedXpertQA)-U | W.Avg |
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+| GPT-OSS-120B | 78.00 | 76.10 | 91.10 | 91.00 | 88.40 | 74.80 | 74.60 | 40.00 | 76.50 | 83.50 | 30.30 | 84.70 | 82.10 | 35.60 | 32.90 | 79.43 |
+| GPT-OSS-20B | 75.80 | 74.80 | 83.90 | 84.80 | 83.30 | 65.40 | 70.50 | 31.00 | 70.10 | 81.30 | 29.20 | 73.40 | 70.50 | 24.70 | 21.20 | 73.38 |
+| Qwen3.5-122B | 76.40 | 55.68 | 87.80 | 86.40 | 84.00 | 74.40 | 73.00 | 59.00 | 73.90 | 37.50 | 53.70 | 79.20 | 79.50 | 35.90 | 35.30 | 75.08 |
+| MedGemma-27B | 73.40 | 74.80 | 84.40 | 85.00 | 83.80 | 71.90 | 73.00 | 48.00 | 69.60 | 81.40 | 24.10 | 73.70 | 68.80 | 19.10 | 20.50 | 73.99 |
+| Gemma4-26B-A4B | 76.40 | 72.00 | 81.80 | 84.50 | 82.30 | 67.30 | 73.50 | 67.00 | 71.50 | 86.50 | 45.60 | 73.70 | 67.50 | 45.10 | 39.20 | 75.34 |
+| L1-16B-A3B | 84.20 | 78.40 | 85.50 | 88.20 | 85.80 | 76.70 | 74.90 | 82.00 | 73.10 | 76.10 | 43.90 | 78.90 | 70.80 | 27.50 | 29.20 | 77.74 |
+### Chat Task
+| Model | [HealthBench-Consensus](https://github.com/openai/simple-evals) |
+|:---|:---:|
+| GPT-OSS-120B | 90.60 |
+| GPT-OSS-20B | 78.70 |
+| Qwen3.5-122B | 92.20 |
+| MedGemma-27B | 90.70 |
+| Gemma4-26B-A4B | 92.60 |
+| L1-16B-A3B | 93.50 |
+## 📝 Citation
+```bibtex
+@misc{lunit2026l1,
+  title={L1: The First Clinical Language Model by Lunit},
+  author={Lunit},
+  year={2026},
+  url={https://huggingface.co/learning-unit/L1-16B-A3B}
+}
+```
+## ⚠️ Limitations
+- **Not a substitute for professional medical judgment.** L1 may generate factually incorrect, incomplete, or outdated clinical information. All outputs should be verified by qualified healthcare professionals.
+- **Thinking overhead.** Chain-of-thought reasoning in `<think>` tags increases token consumption and latency compared to non-thinking models of similar size.
+- **Context length.** Maximum context length is 32,768 tokens.
+- **No real-time knowledge.** The model's knowledge is limited to its training data cutoff and does not reflect the latest medical guidelines or drug approvals.
+## 🤝 Acknowledgements
+This work was supported by the Domain-Specific Foundation Model Project (인공지능 특화 파운데이션 모델 프로젝트), funded by the Ministry of Science and ICT (과학기술정보통신부) and managed by the National IT Industry Promotion Agency (NIPA).
+L1 is a collaborative effort by the following consortium members:
+**Industry**
+- Lunit
+- Trillion Labs
+- SK Biopharmaceuticals
+- Kakao Healthcare
+- AIGEN Sciences
+- D-Circle
+- Rebellions
+- Standigm
+**Academia**
+- Prof. Choi Yun-jae's Lab from KAIST
+- Prof. Hong Seung-hoon's Lab from KAIST
+- Prof. Jung Yu-seong's Lab from SNU
+- Prof. Kim Hyun-woo's Lab from KAIST
+- Prof. Kim Tae-gyun's Lab from KAIST
+- Prof. Ye Jong-cheol's Lab from KAIST
+**Hospitals**
+- NHIS Ilsan Hospital
+- Ewha Womans University Seoul Hospital
+- Keimyung University Dongsan Medical Center
+- Konyang University Hospital
+- Korea University Research & Business Foundation
+- Kyung Hee University Hospital at Gangdong
+- Kyung Hee University Medical Center
+- Pusan National University Yangsan Hospital
+- Yongin Severance Hospital
+<p align="center">
+  <img src="consortium.png" alt="Consortium Members" style="width: 80%;">
+</p>
+## 📄 License
+This model is licensed under the [Apache 2.0 License](LICENSE).
+## 📬 Contact
+- Taesoo Kim (김태수) — [taesoo.kim@lunit.io](mailto:taesoo.kim@lunit.io)
+- Donggeun Yoo (유동근) — [dgyoo@lunit.io](mailto:dgyoo@lunit.io)

banner.png ADDED Viewed

Git LFS Details

SHA256: 6b2787cddb60f574b00fc3be985850ee03acdf7fcc80b5f8ca91ae6093afc321
Pointer size: 131 Bytes
Size of remote file: 130 kB

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,103 @@

+[gMASK]<sop>
+{%- if tools -%}
+<|system|>
+# Tools
+You may call one or more functions to assist with the user query.
+You are provided with function signatures within <tools></tools> XML tags:
+<tools>
+{% for tool in tools %}
+{{ tool | tojson(ensure_ascii=False) }}
+{% endfor %}
+</tools>
+For each function call, output the function name and arguments within the following XML format:
+<tool_call>{function-name}
+<arg_key>{arg-key-1}</arg_key>
+<arg_value>{arg-value-1}</arg_value>
+<arg_key>{arg-key-2}</arg_key>
+<arg_value>{arg-value-2}</arg_value>
+...
+</tool_call>{%- endif -%}
+{%- macro visible_text(content) -%}
+    {%- if content is string -%}
+        {{- content }}
+    {%- elif content is iterable and content is not mapping -%}
+        {%- for item in content -%}
+            {%- if item is mapping and item.type == 'text' -%}
+                {{- item.text }}
+            {%- elif item is string -%}
+                {{- item }}
+            {%- endif -%}
+        {%- endfor -%}
+    {%- else -%}
+        {{- content }}
+    {%- endif -%}
+{%- endmacro -%}
+{%- set ns = namespace(last_user_index=-1) %}
+{%- for m in messages %}
+    {%- if m.role == 'user' %}
+        {% set ns.last_user_index = loop.index0 -%}
+    {%- endif %}
+{%- endfor %}
+{% for m in messages %}
+{%- if m.role == 'user' -%}<|user|>
+{{ visible_text(m.content) }}
+{{- '/nothink' if (enable_thinking is defined and not enable_thinking and not visible_text(m.content).endswith("/nothink")) else '' -}}
+{%- elif m.role == 'assistant' -%}
+<|assistant|>
+{%- set reasoning_content = '' %}
+{%- set content = visible_text(m.content) %}
+{%- if m.reasoning_content is string %}
+    {%- set reasoning_content = m.reasoning_content %}
+{%- else %}
+    {%- if '</think>' in content %}
+        {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
+        {%- set content = content.split('</think>')[-1].lstrip('\n') %}
+    {%- endif %}
+{%- endif %}
+{%- if reasoning_content -%}
+{{ '\n<think>' + reasoning_content.strip() +  '</think>'}}
+{%- else -%}
+{{ '\n<think></think>' }}
+{%- endif -%}
+{%- if content.strip() -%}
+{{ '\n' + content.strip() }}
+{%- endif -%}
+{% if m.tool_calls %}
+{% for tc in m.tool_calls %}
+{%- if tc.function %}
+    {%- set tc = tc.function %}
+{%- endif %}
+{{ '\n<tool_call>' + tc.name }}
+{% set _args = tc.arguments %}
+{% for k, v in _args.items() %}
+<arg_key>{{ k }}</arg_key>
+<arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>
+{% endfor %}
+</tool_call>{% endfor %}
+{% endif %}
+{%- elif m.role == 'tool' -%}
+{%- if m.content is string -%}
+{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+    {{- '<|observation|>' }}
+{%- endif %}
+{{- '\n<tool_response>\n' }}
+{{- m.content }}
+{{- '\n</tool_response>' }}
+{%- else -%}
+<|observation|>{% for tr in m.content %}
+<tool_response>
+{{ tr.output if tr.output is defined else tr }}
+</tool_response>{% endfor -%}
+{% endif -%}
+{%- elif m.role == 'system' -%}
+<|system|>
+{{ visible_text(m.content) }}
+{%- endif -%}
+{%- endfor -%}
+{%- if add_generation_prompt -%}
+    <|assistant|>{{- '\n<think></think>' if (enable_thinking is defined and not enable_thinking) else '' -}}
+{%- endif -%}

config.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+    "architectures": [
+        "GravityMoEForCausalLM"
+    ],
+    "model_type": "gravity_moe",
+    "auto_map": {
+        "AutoConfig": "configuration_gravity_moe.GravityMoEConfig",
+        "AutoModelForCausalLM": "modeling_gravity_moe.GravityMoEForCausalLM"
+    },
+    "vocab_size": 151552,
+    "hidden_size": 2048,
+    "intermediate_size": 8192,
+    "moe_intermediate_size": 1408,
+    "num_hidden_layers": 28,
+    "num_attention_heads": 16,
+    "num_key_value_heads": 16,
+    "q_lora_rank": null,
+    "kv_lora_rank": 512,
+    "qk_rope_head_dim": 64,
+    "qk_nope_head_dim": 128,
+    "v_head_dim": 128,
+    "n_routed_experts": 64,
+    "n_shared_experts": 1,
+    "num_experts_per_tok": 8,
+    "first_k_dense_replace": 1,
+    "moe_layer_freq": 1,
+    "routed_scaling_factor": 2.446,
+    "norm_topk_prob": true,
+    "scoring_func": "sigmoid",
+    "topk_method": "noaux_tc",
+    "n_group": 1,
+    "topk_group": 1,
+    "hidden_act": "silu",
+    "max_position_embeddings": 32768,
+    "initializer_range": 0.02,
+    "rms_norm_eps": 1e-06,
+    "use_cache": true,
+    "rope_theta": 1000000.0,
+    "rope_scaling": null,
+    "attention_bias": false,
+    "attention_dropout": 0.0,
+    "tie_word_embeddings": false,
+    "bos_token_id": null,
+    "eos_token_id": 151329,
+    "torch_dtype": "bfloat16",
+    "pad_token_id": 151329
+}

configuration_gravity_moe.py ADDED Viewed

	@@ -0,0 +1,118 @@

+# Copyright 2026 Trillion Labs and the HuggingFace Inc. team. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""GravityMoE model configuration — inherits from DeepSeek V3."""
+from transformers import DeepseekV3Config
+class GravityMoEConfig(DeepseekV3Config):
+    r"""
+    Configuration class for the GravityMoE model, inheriting from
+    [`DeepseekV3Config`]. GravityMoE shares the same architecture as
+    DeepSeek V3 (sparse MoE with MLA) but uses different hyperparameters.
+    Only default values that differ from DeepSeek V3 are overridden here.
+    See [`DeepseekV3Config`] for full documentation of all parameters.
+    Example:
+    ```python
+    >>> from configuration_gravity_moe import GravityMoEConfig
+    >>> configuration = GravityMoEConfig()
+    >>> configuration.model_type
+    'gravity_moe'
+    ```
+    """
+    model_type = "gravity_moe"
+    def __init__(
+        self,
+        vocab_size=151552,
+        hidden_size=2048,
+        intermediate_size=8192,
+        moe_intermediate_size=1408,
+        num_hidden_layers=28,
+        num_attention_heads=16,
+        num_key_value_heads=16,
+        n_shared_experts=1,
+        n_routed_experts=64,
+        routed_scaling_factor=2.446,
+        kv_lora_rank=512,
+        q_lora_rank=None,
+        qk_rope_head_dim=64,
+        v_head_dim=128,
+        qk_nope_head_dim=128,
+        n_group=1,
+        topk_group=1,
+        num_experts_per_tok=8,
+        first_k_dense_replace=1,
+        norm_topk_prob=True,
+        hidden_act="silu",
+        max_position_embeddings=65536,
+        initializer_range=0.02,
+        rms_norm_eps=1e-6,
+        use_cache=True,
+        pad_token_id=None,
+        bos_token_id=0,
+        eos_token_id=1,
+        tie_word_embeddings=False,
+        rope_theta=1000000.0,
+        rope_scaling=None,
+        rope_interleave=True,
+        attention_bias=False,
+        attention_dropout=0.0,
+        **kwargs,
+    ):
+        super().__init__(
+            vocab_size=vocab_size,
+            hidden_size=hidden_size,
+            intermediate_size=intermediate_size,
+            moe_intermediate_size=moe_intermediate_size,
+            num_hidden_layers=num_hidden_layers,
+            num_attention_heads=num_attention_heads,
+            num_key_value_heads=num_key_value_heads,
+            n_shared_experts=n_shared_experts,
+            n_routed_experts=n_routed_experts,
+            routed_scaling_factor=routed_scaling_factor,
+            kv_lora_rank=kv_lora_rank,
+            q_lora_rank=q_lora_rank,
+            qk_rope_head_dim=qk_rope_head_dim,
+            v_head_dim=v_head_dim,
+            qk_nope_head_dim=qk_nope_head_dim,
+            n_group=n_group,
+            topk_group=topk_group,
+            num_experts_per_tok=num_experts_per_tok,
+            first_k_dense_replace=first_k_dense_replace,
+            norm_topk_prob=norm_topk_prob,
+            hidden_act=hidden_act,
+            max_position_embeddings=max_position_embeddings,
+            initializer_range=initializer_range,
+            rms_norm_eps=rms_norm_eps,
+            use_cache=use_cache,
+            pad_token_id=pad_token_id,
+            bos_token_id=bos_token_id,
+            eos_token_id=eos_token_id,
+            tie_word_embeddings=tie_word_embeddings,
+            rope_theta=rope_theta,
+            rope_scaling=rope_scaling,
+            rope_interleave=rope_interleave,
+            attention_bias=attention_bias,
+            attention_dropout=attention_dropout,
+            **kwargs,
+        )
+__all__ = ["GravityMoEConfig"]

consortium.png ADDED Viewed

Git LFS Details

SHA256: 6c9b7a3ef909a9c183ac56117d38faecd7ea99151dab94353225e35f38309dd6
Pointer size: 131 Bytes
Size of remote file: 752 kB

generation_config.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "_from_model_config": true,
+  "eos_token_id": [
+    151329,
+    151329,
+    151336,
+    151338
+  ],
+  "pad_token_id": 151329,
+  "transformers_version": "5.3.0"
+}

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c0f50ca6769b69a3c0e5d35e39ffdf439116bec31646d8e7dde5da620e7e2395
+size 32485059448

modeling_gravity_moe.py ADDED Viewed

	@@ -0,0 +1,56 @@

+# Copyright 2026 Trillion Labs and the HuggingFace Inc. team. All rights reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""
+GravityMoE model — inherits from DeepSeek V3.
+GravityMoE shares the same sparse Mixture-of-Experts architecture as DeepSeek V3
+(MLA attention, sigmoid routing with bias correction, shared + routed experts)
+but with different model hyperparameters. All modeling logic is inherited from
+the DeepSeek V3 implementation in `transformers`.
+"""
+from transformers.conversion_mapping import _MODEL_TO_CONVERSION_PATTERN
+from transformers.models.deepseek_v3.modeling_deepseek_v3 import (
+    DeepseekV3ForCausalLM,
+    DeepseekV3Model,
+    DeepseekV3PreTrainedModel,
+)
+from .configuration_gravity_moe import GravityMoEConfig
+# Register weight conversion so that from_pretrained fuses per-expert
+# checkpoint weights (experts.*.gate_proj, etc.) into 3D tensors
+# (experts.gate_up_proj, experts.down_proj), same as DeepSeek V3.
+_MODEL_TO_CONVERSION_PATTERN["gravity_moe"] = "qwen2_moe"
+class GravityMoEPreTrainedModel(DeepseekV3PreTrainedModel):
+    config_class = GravityMoEConfig
+    _keep_in_fp32_modules_strict = ["e_score_correction_bias"]
+    _keys_to_ignore_on_load_unexpected = [r"model\.layers\.28.*"]
+class GravityMoEModel(DeepseekV3Model):
+    config_class = GravityMoEConfig
+class GravityMoEForCausalLM(DeepseekV3ForCausalLM):
+    config_class = GravityMoEConfig
+__all__ = [
+    "GravityMoEPreTrainedModel",
+    "GravityMoEModel",
+    "GravityMoEForCausalLM",
+]

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,40 @@

+{
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "[MASK]",
+    "[gMASK]",
+    "[sMASK]",
+    "<sop>",
+    "<eop>",
+    "<|system|>",
+    "<|user|>",
+    "<|assistant|>",
+    "<|observation|>",
+    "<|begin_of_image|>",
+    "<|end_of_image|>",
+    "<|begin_of_video|>",
+    "<|end_of_video|>",
+    "<|begin_of_audio|>",
+    "<|end_of_audio|>",
+    "<|begin_of_transcription|>",
+    "<|end_of_transcription|>",
+    "<|code_prefix|>",
+    "<|code_middle|>",
+    "<|code_suffix|>",
+    "/nothink"
+  ],
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bda8e2146c3bb7b7e0fc96dcc4f0aeff041c6c27952e3ace0665663ebff346ba
+size 19970700

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,325 @@

+{
+  "added_tokens_decoder": {
+    "151329": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151330": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151331": {
+      "content": "[gMASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151332": {
+      "content": "[sMASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151333": {
+      "content": "<sop>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151334": {
+      "content": "<eop>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151335": {
+      "content": "<|system|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151336": {
+      "content": "<|user|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151337": {
+      "content": "<|assistant|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151338": {
+      "content": "<|observation|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151339": {
+      "content": "<|begin_of_image|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151340": {
+      "content": "<|end_of_image|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151341": {
+      "content": "<|begin_of_video|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151342": {
+      "content": "<|end_of_video|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151343": {
+      "content": "<|begin_of_audio|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151344": {
+      "content": "<|end_of_audio|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151345": {
+      "content": "<|begin_of_transcription|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151346": {
+      "content": "<|end_of_transcription|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151347": {
+      "content": "<|code_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151348": {
+      "content": "<|code_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151349": {
+      "content": "<|code_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151350": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151351": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151352": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151353": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151354": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151355": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151356": {
+      "content": "<arg_key>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151357": {
+      "content": "</arg_key>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151358": {
+      "content": "<arg_value>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151359": {
+      "content": "</arg_value>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151360": {
+      "content": "/nothink",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151361": {
+      "content": "<|begin_of_box|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151362": {
+      "content": "<|end_of_box|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151363": {
+      "content": "<|image|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151364": {
+      "content": "<|video|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "[MASK]",
+    "[gMASK]",
+    "[sMASK]",
+    "<sop>",
+    "<eop>",
+    "<|system|>",
+    "<|user|>",
+    "<|assistant|>",
+    "<|observation|>",
+    "<|begin_of_image|>",
+    "<|end_of_image|>",
+    "<|begin_of_video|>",
+    "<|end_of_video|>",
+    "<|begin_of_audio|>",
+    "<|end_of_audio|>",
+    "<|begin_of_transcription|>",
+    "<|end_of_transcription|>",
+    "<|code_prefix|>",
+    "<|code_middle|>",
+    "<|code_suffix|>",
+    "/nothink"
+  ],
+  "clean_up_tokenization_spaces": false,
+  "do_lower_case": false,
+  "eos_token": "<|endoftext|>",
+  "extra_special_tokens": {},
+  "model_max_length": 128000,
+  "pad_token": "<|endoftext|>",
+  "padding_side": "left",
+  "remove_space": false,
+  "tokenizer_class": "PreTrainedTokenizerFast"
+}