Prompts differ between README.md and config_sentence_transformers.json
The prompt names this model supports differ in their description between the README and the config_sentence_transformers.json. The README states two prompt names question and passage_text while the config_sentence_transformers.json lists only the usual SentenceTransformer prompts query and document (among others for downstream tasks which are of no interest to me).
May I kindly ask you to clarify which prompts you fine-tuned on? Which are the ones I'd use for query and document embedding, respectively?
Thanks 👍
Hello!
The question and passage_text from the README refer to training columns (dataset info). Specifically, in https://huggingface.co/sentence-transformers/embeddinggemma-300m-medical#non-default-hyperparameters, it means that the texts from the question column were prepended with 'task: search result | query: ', and the texts from the passage_text column were prepended with 'title: none | text: '.
As you can see in https://huggingface.co/sentence-transformers/embeddinggemma-300m-medical/blob/main/config_sentence_transformers.json, these correspond with the "query" and "document" prompt names.
So, you can use model.encode_query or model.encode(..., prompt_name="query") (or even model.encode(..., prompt="task: search result | query: ")) for queries, these are all equivalent for this model, and match what the model was trained for.
Idem with model.encode_document or model.encode(..., prompt_name="document") (or even model.encode(..., prompt="title: none | text: ")) for documents. These are also all equivalent.
- Tom Aarsen
Perfect! Thank you very much.