File size: 6,563 Bytes
180ee19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
---
license: apache-2.0
library_name: sentence-transformers
pipeline_tag: sentence-similarity
language:
  - multilingual
tags:
  - agentic-intelligence-lab
  - elephant
  - embeddings
  - sentence-transformers
  - sentence-similarity
  - retrieval
  - rag
  - agents
  - routing
  - memory
  - multilingual
  - matryoshka
  - long-context
  - modernbert
base_model: llm-semantic-router/mmbert-32k-yarn
datasets:
  - BAAI/bge-m3-data
model-index:
  - name: elephant-embeddings-v1-text-small
    results:
      - task:
          type: STS
        dataset:
          name: STS Benchmark
          type: mteb/stsbenchmark-sts
        metrics:
          - name: Spearman
            type: spearman
            value: 80.5
---

# Elephant Embeddings V1 Text Small

`elephant-embeddings-v1-text-small` is the text embedding model in the **Agentic Intelligence Lab Elephant Embeddings V1** family.

This ModelScope release is maintained by `agentic-intelligence-lab` to make Elephant embedding models easier to download and deploy in mainland China. It mirrors and renames the upstream HuggingFace model `llm-semantic-router/eggon-embed` under a consistent Elephant model namespace.

## Positioning

This model is a multilingual long-context text embedding model for agent-native retrieval and semantic matching. It is designed for systems where embeddings are on the runtime hot path:

- agent memory recall
- knowledge retrieval and RAG
- tool, skill, and route matching
- long-horizon state search
- multilingual semantic indexing
- clustering and deduplication

The model combines **32K context**, **ModernBERT encoder architecture**, and **2D Matryoshka training** so one embedding space can serve multiple latency, storage, and quality budgets.

## Model at a glance

| Item | Value |
| --- | --- |
| Family | Elephant Embeddings V1 |
| Maintainer | Agentic Intelligence Lab |
| Model type | Text embedding model |
| Modalities | Text |
| Languages | Multilingual |
| Architecture | ModernBERT encoder with YaRN scaling |
| Parameters | ~307M |
| Hidden size | 768 |
| Layers | 22 |
| Context length | 32,768 tokens |
| Pooling | Mean pooling |
| Similarity | Cosine |
| Matryoshka dimensions | 768, 512, 256, 128, 64 |
| Upstream source | `llm-semantic-router/eggon-embed` |
| License | Apache 2.0 |

## Why it fits agentic workloads

Agentic systems call embedding models repeatedly: before retrieval, during routing, while matching tools, when searching memory, and when compressing or reranking state. This model is optimized for that operating pattern rather than for a single offline benchmark.

Key advantages:

- **One semantic space across the stack**: routing, retrieval, memory lookup, and semantic matching can share one vector space.
- **Budget-adaptive vectors**: truncate full 768-dimensional vectors to 256d, 128d, or 64d for cheaper indexes and faster candidate generation.
- **Long-context representation**: encode larger notes, traces, tool descriptions, and document chunks before aggressive chunking is required.
- **Practical deployment size**: a 307M-class encoder is easier to host than much larger embedding models when inference is frequent.

## Recommended use cases

| Scenario | Recommended dimension | Notes |
| --- | ---: | --- |
| Broad route matching | 64d or 128d | Cheap candidate generation over large route/tool sets |
| Large memory-bank search | 64d or 256d | Lower storage and bandwidth cost |
| Main RAG retrieval | 256d or 512d | Balanced quality and cost |
| High-confidence matching | 768d | Best semantic fidelity |
| Long-document indexing | 768d | Preserve richer context before chunking |

## Quick start on ModelScope

```bash
pip install modelscope sentence-transformers torch
```

```python
from modelscope import snapshot_download
from sentence_transformers import SentenceTransformer

repo_id = "agentic-intelligence-lab/elephant-embeddings-v1-text-small"
local_dir = snapshot_download(repo_id)

model = SentenceTransformer(local_dir)

texts = [
    "Find tool descriptions related to browser automation.",
    "检索和用户历史偏好相关的记忆。",
    "Retrieve notes about deployment failures in staging.",
]

embeddings = model.encode(texts, normalize_embeddings=True)
print(embeddings.shape)  # (3, 768)
```

## Matryoshka truncation

```python
import torch.nn.functional as F
from modelscope import snapshot_download
from sentence_transformers import SentenceTransformer

local_dir = snapshot_download("agentic-intelligence-lab/elephant-embeddings-v1-text-small")
model = SentenceTransformer(local_dir)

embeddings = model.encode(texts, convert_to_tensor=True, normalize_embeddings=True)

# Balanced retrieval tier
embeddings_256d = F.normalize(embeddings[:, :256], p=2, dim=1)

# Low-cost routing or large memory-bank tier
embeddings_64d = F.normalize(embeddings[:, :64], p=2, dim=1)
```

## Evaluation snapshot

| Metric | Score |
| --- | ---: |
| MTEB mean, 24 tasks | 61.4 |
| STS Benchmark | 80.5 |
| Dimension retention | 99% @ 256d, 98% @ 64d |
| Layer speedup | 3.3× @ 6L, 5.8× @ 3L |
| Long-context retrieval R@1, 4K tokens | 68.8% |
| Long-context retrieval R@10, 4K tokens | 81.2% |

These results make the model useful for systems that must balance quality, latency, vector size, and deployment simplicity.

## Files

| File | Description |
| --- | --- |
| `model.safetensors` | Model weights |
| `config.json` | ModernBERT configuration |
| `tokenizer.json` / `tokenizer_config.json` | Tokenizer assets |
| `modules.json` / `1_Pooling/config.json` | Sentence Transformers packaging |
| `README.md` | This model card |

## Lineage

This ModelScope package is published by `agentic-intelligence-lab` as part of the Elephant model release line. It mirrors the upstream HuggingFace model `llm-semantic-router/eggon-embed` and keeps the model artifacts unchanged except for the repository naming and model card presentation.

## Limitations

- Full 768-dimensional embeddings are recommended for important final-stage retrieval decisions.
- Aggressive dimension or layer reduction trades quality for speed and storage efficiency.
- Very long inputs are supported, but they still increase compute and memory cost.
- The model is optimized for retrieval and semantic similarity, not text generation.

## Citation

```bibtex
@misc{elephant-embeddings-v1-text-small,
  title={Elephant Embeddings V1 Text Small},
  author={Agentic Intelligence Lab},
  year={2026},
  url={https://modelscope.cn/models/agentic-intelligence-lab/elephant-embeddings-v1-text-small}
}
```

## License

Apache 2.0