Sentence Similarity
sentence-transformers
Safetensors
multilingual
modernbert
agentic-intelligence-lab
elephant
embeddings
retrieval
rag
agents
routing
memory
matryoshka
long-context
Eval Results (legacy)
text-embeddings-inference
Instructions to use agentic-in/elephant-embeddings-v1-text-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- sentence-transformers
How to use agentic-in/elephant-embeddings-v1-text-small with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("agentic-in/elephant-embeddings-v1-text-small") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
File size: 6,563 Bytes
180ee19 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 | ---
license: apache-2.0
library_name: sentence-transformers
pipeline_tag: sentence-similarity
language:
- multilingual
tags:
- agentic-intelligence-lab
- elephant
- embeddings
- sentence-transformers
- sentence-similarity
- retrieval
- rag
- agents
- routing
- memory
- multilingual
- matryoshka
- long-context
- modernbert
base_model: llm-semantic-router/mmbert-32k-yarn
datasets:
- BAAI/bge-m3-data
model-index:
- name: elephant-embeddings-v1-text-small
results:
- task:
type: STS
dataset:
name: STS Benchmark
type: mteb/stsbenchmark-sts
metrics:
- name: Spearman
type: spearman
value: 80.5
---
# Elephant Embeddings V1 Text Small
`elephant-embeddings-v1-text-small` is the text embedding model in the **Agentic Intelligence Lab Elephant Embeddings V1** family.
This ModelScope release is maintained by `agentic-intelligence-lab` to make Elephant embedding models easier to download and deploy in mainland China. It mirrors and renames the upstream HuggingFace model `llm-semantic-router/eggon-embed` under a consistent Elephant model namespace.
## Positioning
This model is a multilingual long-context text embedding model for agent-native retrieval and semantic matching. It is designed for systems where embeddings are on the runtime hot path:
- agent memory recall
- knowledge retrieval and RAG
- tool, skill, and route matching
- long-horizon state search
- multilingual semantic indexing
- clustering and deduplication
The model combines **32K context**, **ModernBERT encoder architecture**, and **2D Matryoshka training** so one embedding space can serve multiple latency, storage, and quality budgets.
## Model at a glance
| Item | Value |
| --- | --- |
| Family | Elephant Embeddings V1 |
| Maintainer | Agentic Intelligence Lab |
| Model type | Text embedding model |
| Modalities | Text |
| Languages | Multilingual |
| Architecture | ModernBERT encoder with YaRN scaling |
| Parameters | ~307M |
| Hidden size | 768 |
| Layers | 22 |
| Context length | 32,768 tokens |
| Pooling | Mean pooling |
| Similarity | Cosine |
| Matryoshka dimensions | 768, 512, 256, 128, 64 |
| Upstream source | `llm-semantic-router/eggon-embed` |
| License | Apache 2.0 |
## Why it fits agentic workloads
Agentic systems call embedding models repeatedly: before retrieval, during routing, while matching tools, when searching memory, and when compressing or reranking state. This model is optimized for that operating pattern rather than for a single offline benchmark.
Key advantages:
- **One semantic space across the stack**: routing, retrieval, memory lookup, and semantic matching can share one vector space.
- **Budget-adaptive vectors**: truncate full 768-dimensional vectors to 256d, 128d, or 64d for cheaper indexes and faster candidate generation.
- **Long-context representation**: encode larger notes, traces, tool descriptions, and document chunks before aggressive chunking is required.
- **Practical deployment size**: a 307M-class encoder is easier to host than much larger embedding models when inference is frequent.
## Recommended use cases
| Scenario | Recommended dimension | Notes |
| --- | ---: | --- |
| Broad route matching | 64d or 128d | Cheap candidate generation over large route/tool sets |
| Large memory-bank search | 64d or 256d | Lower storage and bandwidth cost |
| Main RAG retrieval | 256d or 512d | Balanced quality and cost |
| High-confidence matching | 768d | Best semantic fidelity |
| Long-document indexing | 768d | Preserve richer context before chunking |
## Quick start on ModelScope
```bash
pip install modelscope sentence-transformers torch
```
```python
from modelscope import snapshot_download
from sentence_transformers import SentenceTransformer
repo_id = "agentic-intelligence-lab/elephant-embeddings-v1-text-small"
local_dir = snapshot_download(repo_id)
model = SentenceTransformer(local_dir)
texts = [
"Find tool descriptions related to browser automation.",
"检索和用户历史偏好相关的记忆。",
"Retrieve notes about deployment failures in staging.",
]
embeddings = model.encode(texts, normalize_embeddings=True)
print(embeddings.shape) # (3, 768)
```
## Matryoshka truncation
```python
import torch.nn.functional as F
from modelscope import snapshot_download
from sentence_transformers import SentenceTransformer
local_dir = snapshot_download("agentic-intelligence-lab/elephant-embeddings-v1-text-small")
model = SentenceTransformer(local_dir)
embeddings = model.encode(texts, convert_to_tensor=True, normalize_embeddings=True)
# Balanced retrieval tier
embeddings_256d = F.normalize(embeddings[:, :256], p=2, dim=1)
# Low-cost routing or large memory-bank tier
embeddings_64d = F.normalize(embeddings[:, :64], p=2, dim=1)
```
## Evaluation snapshot
| Metric | Score |
| --- | ---: |
| MTEB mean, 24 tasks | 61.4 |
| STS Benchmark | 80.5 |
| Dimension retention | 99% @ 256d, 98% @ 64d |
| Layer speedup | 3.3× @ 6L, 5.8× @ 3L |
| Long-context retrieval R@1, 4K tokens | 68.8% |
| Long-context retrieval R@10, 4K tokens | 81.2% |
These results make the model useful for systems that must balance quality, latency, vector size, and deployment simplicity.
## Files
| File | Description |
| --- | --- |
| `model.safetensors` | Model weights |
| `config.json` | ModernBERT configuration |
| `tokenizer.json` / `tokenizer_config.json` | Tokenizer assets |
| `modules.json` / `1_Pooling/config.json` | Sentence Transformers packaging |
| `README.md` | This model card |
## Lineage
This ModelScope package is published by `agentic-intelligence-lab` as part of the Elephant model release line. It mirrors the upstream HuggingFace model `llm-semantic-router/eggon-embed` and keeps the model artifacts unchanged except for the repository naming and model card presentation.
## Limitations
- Full 768-dimensional embeddings are recommended for important final-stage retrieval decisions.
- Aggressive dimension or layer reduction trades quality for speed and storage efficiency.
- Very long inputs are supported, but they still increase compute and memory cost.
- The model is optimized for retrieval and semantic similarity, not text generation.
## Citation
```bibtex
@misc{elephant-embeddings-v1-text-small,
title={Elephant Embeddings V1 Text Small},
author={Agentic Intelligence Lab},
year={2026},
url={https://modelscope.cn/models/agentic-intelligence-lab/elephant-embeddings-v1-text-small}
}
```
## License
Apache 2.0
|