Update README.md
Browse files
README.md
CHANGED
|
@@ -11,7 +11,7 @@ tags:
|
|
| 11 |
- foundation-model
|
| 12 |
- pretrained
|
| 13 |
pipeline_tag: text-generation
|
| 14 |
-
base_model: Qwen/Qwen3-4B
|
| 15 |
library_name: transformers
|
| 16 |
---
|
| 17 |
|
|
@@ -19,7 +19,7 @@ library_name: transformers
|
|
| 19 |
|
| 20 |
**BioMatrix** is a multimodal biological foundation model that natively integrates **1D sequences**, **3D structures**, and **natural language** for both **molecules** and **proteins** within a single decoder-only architecture.
|
| 21 |
|
| 22 |
-
This is the **4B-parameter Base model**, obtained via **multimodal continual pretraining** of Qwen3-4B on 304.4 billion tokens spanning text, molecular and protein 1D/3D data, and cross-modal corpora. This base checkpoint is intended for further fine-tuning on downstream tasks. For an instruction-tuned model ready for inference, see [BioMatrix-4B-SFT](https://huggingface.co/QizhiPei/BioMatrix-4B-SFT).
|
| 23 |
|
| 24 |
- 📄 **Paper**: [BioMatrix: Towards a Comprehensive Biological Foundation Model Spanning the Modality Matrix of Sequences, Structures, and Language](https://arxiv.org/abs/xxxx.xxxxx)
|
| 25 |
- 💻 **Code**: [https://github.com/QizhiPei/biomatrix](https://github.com/QizhiPei/biomatrix)
|
|
@@ -48,7 +48,7 @@ All modalities are consumed and produced uniformly under a **single next-token p
|
|
| 48 |
|
| 49 |
## Model Details
|
| 50 |
|
| 51 |
-
- **Base Architecture**: Qwen3-4B
|
| 52 |
- **Parameters**: 4B
|
| 53 |
- **Training Stage**: Multimodal Continual Pretraining only (not instruction-tuned)
|
| 54 |
- **Training Tokens**: 304.4B
|
|
@@ -158,4 +158,4 @@ If you find BioMatrix useful, please cite:
|
|
| 158 |
|
| 159 |
## License
|
| 160 |
|
| 161 |
-
This model is released under the Apache 2.0 license. The base model (Qwen3-4B) is subject to its own license terms.
|
|
|
|
| 11 |
- foundation-model
|
| 12 |
- pretrained
|
| 13 |
pipeline_tag: text-generation
|
| 14 |
+
base_model: Qwen/Qwen3-4B-Base
|
| 15 |
library_name: transformers
|
| 16 |
---
|
| 17 |
|
|
|
|
| 19 |
|
| 20 |
**BioMatrix** is a multimodal biological foundation model that natively integrates **1D sequences**, **3D structures**, and **natural language** for both **molecules** and **proteins** within a single decoder-only architecture.
|
| 21 |
|
| 22 |
+
This is the **4B-parameter Base model**, obtained via **multimodal continual pretraining** of Qwen3-4B-Base on 304.4 billion tokens spanning text, molecular and protein 1D/3D data, and cross-modal corpora. This base checkpoint is intended for further fine-tuning on downstream tasks. For an instruction-tuned model ready for inference, see [BioMatrix-4B-SFT](https://huggingface.co/QizhiPei/BioMatrix-4B-SFT).
|
| 23 |
|
| 24 |
- 📄 **Paper**: [BioMatrix: Towards a Comprehensive Biological Foundation Model Spanning the Modality Matrix of Sequences, Structures, and Language](https://arxiv.org/abs/xxxx.xxxxx)
|
| 25 |
- 💻 **Code**: [https://github.com/QizhiPei/biomatrix](https://github.com/QizhiPei/biomatrix)
|
|
|
|
| 48 |
|
| 49 |
## Model Details
|
| 50 |
|
| 51 |
+
- **Base Architecture**: Qwen3-4B-Base
|
| 52 |
- **Parameters**: 4B
|
| 53 |
- **Training Stage**: Multimodal Continual Pretraining only (not instruction-tuned)
|
| 54 |
- **Training Tokens**: 304.4B
|
|
|
|
| 158 |
|
| 159 |
## License
|
| 160 |
|
| 161 |
+
This model is released under the Apache 2.0 license. The base model (Qwen3-4B-Base) is subject to its own license terms.
|