| --- |
| license: apache-2.0 |
| base_model: ibm-granite/granite-embedding-107m-multilingual |
| tags: |
| - sentence-transformers |
| - feature-extraction |
| - sentence-similarity |
| - transformers |
| - granite |
| - embeddings |
| - multilingual |
| library_name: sentence-transformers |
| pipeline_tag: feature-extraction |
| --- |
| |
| # Granite Embedding 107M Multilingual |
|
|
| This is a copy of the [ibm-granite/granite-embedding-107m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-107m-multilingual) model for document encoding purposes. |
|
|
| ## Model Summary |
| Granite-Embedding-107M-Multilingual is a 107M parameter dense biencoder embedding model from the Granite Embeddings suite that can be used to generate high quality text embeddings. This model produces embedding vectors of size 384. |
|
|
| ## Supported Languages |
| English, German, Spanish, French, Japanese, Portuguese, Arabic, Czech, Italian, Korean, Dutch, and Chinese. |
|
|
| ## Usage |
|
|
| ### With Sentence Transformers |
| ```python |
| from sentence_transformers import SentenceTransformer |
| |
| model = SentenceTransformer('RikoteMaster/MNLP_M3_document_encoder') |
| embeddings = model.encode(['Your text here']) |
| ``` |
|
|
| ### With Transformers |
| ```python |
| from transformers import AutoModel, AutoTokenizer |
| import torch |
| |
| model = AutoModel.from_pretrained('RikoteMaster/MNLP_M3_document_encoder') |
| tokenizer = AutoTokenizer.from_pretrained('RikoteMaster/MNLP_M3_document_encoder') |
| |
| inputs = tokenizer(['Your text here'], return_tensors='pt', padding=True, truncation=True) |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| embeddings = outputs.last_hidden_state[:, 0] # CLS pooling |
| embeddings = torch.nn.functional.normalize(embeddings, dim=1) |
| ``` |
|
|
| ## Original Model |
| This model is based on [ibm-granite/granite-embedding-107m-multilingual](https://huggingface.co/ibm-granite/granite-embedding-107m-multilingual) by IBM. |
|
|