Papers
arxiv:2204.11817

Translation between Molecules and Natural Language

Published on Apr 25, 2022
Authors:
,
,
,
,
,

Abstract

MolT5, a self-supervised learning framework for pretraining models on text and molecule strings, enables new tasks like molecule captioning and de novo molecule generation, demonstrating strong performance through cross-modal evaluation metrics.

AI-generated summary

We present MolT5 - a self-supervised learning framework for pretraining models on a vast amount of unlabeled natural language text and molecule strings. MolT5 allows for new, useful, and challenging analogs of traditional vision-language tasks, such as molecule captioning and text-based de novo molecule generation (altogether: translation between molecules and language), which we explore for the first time. Since MolT5 pretrains models on single-modal data, it helps overcome the chemistry domain shortcoming of data scarcity. Furthermore, we consider several metrics, including a new cross-modal embedding-based metric, to evaluate the tasks of molecule captioning and text-based molecule generation. Our results show that MolT5-based models are able to generate outputs, both molecules and captions, which in many cases are high quality.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2204.11817
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 9

Browse 9 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2204.11817 in a dataset README.md to link it from this page.

Spaces citing this paper 3

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.