Sentence Similarity
sentence-transformers
Safetensors
multilingual
modernbert
agentic-intelligence-lab
elephant
embeddings
retrieval
rag
agents
routing
memory
matryoshka
long-context
Eval Results (legacy)
text-embeddings-inference
Instructions to use agentic-in/elephant-embeddings-v1-text-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use agentic-in/elephant-embeddings-v1-text-small with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("agentic-in/elephant-embeddings-v1-text-small") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: sentence-transformers | |
| pipeline_tag: sentence-similarity | |
| language: | |
| - multilingual | |
| tags: | |
| - agentic-intelligence-lab | |
| - elephant | |
| - embeddings | |
| - sentence-transformers | |
| - sentence-similarity | |
| - retrieval | |
| - rag | |
| - agents | |
| - routing | |
| - memory | |
| - multilingual | |
| - matryoshka | |
| - long-context | |
| - modernbert | |
| base_model: llm-semantic-router/mmbert-32k-yarn | |
| datasets: | |
| - BAAI/bge-m3-data | |
| model-index: | |
| - name: elephant-embeddings-v1-text-small | |
| results: | |
| - task: | |
| type: STS | |
| dataset: | |
| name: STS Benchmark | |
| type: mteb/stsbenchmark-sts | |
| metrics: | |
| - name: Spearman | |
| type: spearman | |
| value: 80.5 | |
| # Elephant Embeddings V1 Text Small | |
| `elephant-embeddings-v1-text-small` is the text embedding model in the **Agentic Intelligence Lab Elephant Embeddings V1** family. | |
| This ModelScope release is maintained by `agentic-intelligence-lab` to make Elephant embedding models easier to download and deploy in mainland China. It mirrors and renames the upstream HuggingFace model `llm-semantic-router/eggon-embed` under a consistent Elephant model namespace. | |
| ## Positioning | |
| This model is a multilingual long-context text embedding model for agent-native retrieval and semantic matching. It is designed for systems where embeddings are on the runtime hot path: | |
| - agent memory recall | |
| - knowledge retrieval and RAG | |
| - tool, skill, and route matching | |
| - long-horizon state search | |
| - multilingual semantic indexing | |
| - clustering and deduplication | |
| The model combines **32K context**, **ModernBERT encoder architecture**, and **2D Matryoshka training** so one embedding space can serve multiple latency, storage, and quality budgets. | |
| ## Model at a glance | |
| | Item | Value | | |
| | --- | --- | | |
| | Family | Elephant Embeddings V1 | | |
| | Maintainer | Agentic Intelligence Lab | | |
| | Model type | Text embedding model | | |
| | Modalities | Text | | |
| | Languages | Multilingual | | |
| | Architecture | ModernBERT encoder with YaRN scaling | | |
| | Parameters | ~307M | | |
| | Hidden size | 768 | | |
| | Layers | 22 | | |
| | Context length | 32,768 tokens | | |
| | Pooling | Mean pooling | | |
| | Similarity | Cosine | | |
| | Matryoshka dimensions | 768, 512, 256, 128, 64 | | |
| | Upstream source | `llm-semantic-router/eggon-embed` | | |
| | License | Apache 2.0 | | |
| ## Why it fits agentic workloads | |
| Agentic systems call embedding models repeatedly: before retrieval, during routing, while matching tools, when searching memory, and when compressing or reranking state. This model is optimized for that operating pattern rather than for a single offline benchmark. | |
| Key advantages: | |
| - **One semantic space across the stack**: routing, retrieval, memory lookup, and semantic matching can share one vector space. | |
| - **Budget-adaptive vectors**: truncate full 768-dimensional vectors to 256d, 128d, or 64d for cheaper indexes and faster candidate generation. | |
| - **Long-context representation**: encode larger notes, traces, tool descriptions, and document chunks before aggressive chunking is required. | |
| - **Practical deployment size**: a 307M-class encoder is easier to host than much larger embedding models when inference is frequent. | |
| ## Recommended use cases | |
| | Scenario | Recommended dimension | Notes | | |
| | --- | ---: | --- | | |
| | Broad route matching | 64d or 128d | Cheap candidate generation over large route/tool sets | | |
| | Large memory-bank search | 64d or 256d | Lower storage and bandwidth cost | | |
| | Main RAG retrieval | 256d or 512d | Balanced quality and cost | | |
| | High-confidence matching | 768d | Best semantic fidelity | | |
| | Long-document indexing | 768d | Preserve richer context before chunking | | |
| ## Quick start on ModelScope | |
| ```bash | |
| pip install modelscope sentence-transformers torch | |
| ``` | |
| ```python | |
| from modelscope import snapshot_download | |
| from sentence_transformers import SentenceTransformer | |
| repo_id = "agentic-intelligence-lab/elephant-embeddings-v1-text-small" | |
| local_dir = snapshot_download(repo_id) | |
| model = SentenceTransformer(local_dir) | |
| texts = [ | |
| "Find tool descriptions related to browser automation.", | |
| "检索和用户历史偏好相关的记忆。", | |
| "Retrieve notes about deployment failures in staging.", | |
| ] | |
| embeddings = model.encode(texts, normalize_embeddings=True) | |
| print(embeddings.shape) # (3, 768) | |
| ``` | |
| ## Matryoshka truncation | |
| ```python | |
| import torch.nn.functional as F | |
| from modelscope import snapshot_download | |
| from sentence_transformers import SentenceTransformer | |
| local_dir = snapshot_download("agentic-intelligence-lab/elephant-embeddings-v1-text-small") | |
| model = SentenceTransformer(local_dir) | |
| embeddings = model.encode(texts, convert_to_tensor=True, normalize_embeddings=True) | |
| # Balanced retrieval tier | |
| embeddings_256d = F.normalize(embeddings[:, :256], p=2, dim=1) | |
| # Low-cost routing or large memory-bank tier | |
| embeddings_64d = F.normalize(embeddings[:, :64], p=2, dim=1) | |
| ``` | |
| ## Evaluation snapshot | |
| | Metric | Score | | |
| | --- | ---: | | |
| | MTEB mean, 24 tasks | 61.4 | | |
| | STS Benchmark | 80.5 | | |
| | Dimension retention | 99% @ 256d, 98% @ 64d | | |
| | Layer speedup | 3.3× @ 6L, 5.8× @ 3L | | |
| | Long-context retrieval R@1, 4K tokens | 68.8% | | |
| | Long-context retrieval R@10, 4K tokens | 81.2% | | |
| These results make the model useful for systems that must balance quality, latency, vector size, and deployment simplicity. | |
| ## Files | |
| | File | Description | | |
| | --- | --- | | |
| | `model.safetensors` | Model weights | | |
| | `config.json` | ModernBERT configuration | | |
| | `tokenizer.json` / `tokenizer_config.json` | Tokenizer assets | | |
| | `modules.json` / `1_Pooling/config.json` | Sentence Transformers packaging | | |
| | `README.md` | This model card | | |
| ## Lineage | |
| This ModelScope package is published by `agentic-intelligence-lab` as part of the Elephant model release line. It mirrors the upstream HuggingFace model `llm-semantic-router/eggon-embed` and keeps the model artifacts unchanged except for the repository naming and model card presentation. | |
| ## Limitations | |
| - Full 768-dimensional embeddings are recommended for important final-stage retrieval decisions. | |
| - Aggressive dimension or layer reduction trades quality for speed and storage efficiency. | |
| - Very long inputs are supported, but they still increase compute and memory cost. | |
| - The model is optimized for retrieval and semantic similarity, not text generation. | |
| ## Citation | |
| ```bibtex | |
| @misc{elephant-embeddings-v1-text-small, | |
| title={Elephant Embeddings V1 Text Small}, | |
| author={Agentic Intelligence Lab}, | |
| year={2026}, | |
| url={https://modelscope.cn/models/agentic-intelligence-lab/elephant-embeddings-v1-text-small} | |
| } | |
| ``` | |
| ## License | |
| Apache 2.0 | |