Prompts differ between README.md and config_sentence_transformers.json

by Philimipp - opened Feb 9

Feb 9

The prompt names this model supports differ in their description between the README and the config_sentence_transformers.json. The README states two prompt names question and passage_text while the config_sentence_transformers.json lists only the usual SentenceTransformer prompts query and document (among others for downstream tasks which are of no interest to me).

May I kindly ask you to clarify which prompts you fine-tuned on? Which are the ones I'd use for query and document embedding, respectively?

Thanks 👍

tomaarsen

Sentence Transformers org Feb 9

Hello!

The question and passage_text from the README refer to training columns (dataset info). Specifically, in https://huggingface.co/sentence-transformers/embeddinggemma-300m-medical#non-default-hyperparameters, it means that the texts from the question column were prepended with 'task: search result | query: ', and the texts from the passage_text column were prepended with 'title: none | text: '.

As you can see in https://huggingface.co/sentence-transformers/embeddinggemma-300m-medical/blob/main/config_sentence_transformers.json, these correspond with the "query" and "document" prompt names.

So, you can use model.encode_query or model.encode(..., prompt_name="query") (or even model.encode(..., prompt="task: search result | query: ")) for queries, these are all equivalent for this model, and match what the model was trained for.
Idem with model.encode_document or model.encode(..., prompt_name="document") (or even model.encode(..., prompt="title: none | text: ")) for documents. These are also all equivalent.

Tom Aarsen

Philimipp

Feb 9

Perfect! Thank you very much.

Philimipp changed discussion status to closed Feb 9

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment