Papers
arxiv:2402.13963

Towards Building Multilingual Language Model for Medicine

Published on Feb 21, 2024
Authors:
,
,
,
,
,

Abstract

A multilingual medical language model, MMedLM 2, achieves superior performance on a new multilingual medical question-answering benchmark, leveraging a newly constructed multilingual medical corpus.

AI-generated summary

In this paper, we aim to develop an open-source, multilingual language model for medicine, that the benefits a wider, linguistically diverse audience from different regions. In general, we present the contribution from the following aspects: first, for multilingual medical-specific adaptation, we construct a new multilingual medical corpus, that contains approximately 25.5B tokens encompassing 6 main languages, termed as MMedC, that enables auto-regressive training for existing general LLMs. second, to monitor the development of multilingual LLMs in medicine, we propose a new multilingual medical multi-choice question-answering benchmark with rationale, termed as MMedBench; third, we have assessed a number of popular, opensource large language models (LLMs) on our benchmark, along with those further auto-regressive trained on MMedC, as a result, our final model, termed as MMedLM 2, with only 7B parameters, achieves superior performance compared to all other open-source models, even rivaling GPT-4 on MMedBench. We will make the resources publicly available, including code, model weights, and datasets.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2402.13963
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 10

Browse 10 models citing this paper

Datasets citing this paper 6

Browse 6 datasets citing this paper

Spaces citing this paper 4

Collections including this paper 1