File size: 6,563 Bytes

180ee19

---
license: apache-2.0
library_name: sentence-transformers
pipeline_tag: sentence-similarity
language:
  - multilingual
tags:
  - agentic-intelligence-lab
  - elephant
  - embeddings
  - sentence-transformers
  - sentence-similarity
  - retrieval
  - rag
  - agents
  - routing
  - memory
  - multilingual
  - matryoshka
  - long-context
  - modernbert
base_model: llm-semantic-router/mmbert-32k-yarn
datasets:
  - BAAI/bge-m3-data
model-index:
  - name: elephant-embeddings-v1-text-small
    results:
      - task:
          type: STS
        dataset:
          name: STS Benchmark
          type: mteb/stsbenchmark-sts
        metrics:
          - name: Spearman
            type: spearman
            value: 80.5
---

# Elephant Embeddings V1 Text Small

`elephant-embeddings-v1-text-small` is the text embedding model in the **Agentic Intelligence Lab Elephant Embeddings V1** family.

This ModelScope release is maintained by `agentic-intelligence-lab` to make Elephant embedding models easier to download and deploy in mainland China. It mirrors and renames the upstream HuggingFace model `llm-semantic-router/eggon-embed` under a consistent Elephant model namespace.

## Positioning

This model is a multilingual long-context text embedding model for agent-native retrieval and semantic matching. It is designed for systems where embeddings are on the runtime hot path:

- agent memory recall
- knowledge retrieval and RAG
- tool, skill, and route matching
- long-horizon state search
- multilingual semantic indexing
- clustering and deduplication

The model combines **32K context**, **ModernBERT encoder architecture**, and **2D Matryoshka training** so one embedding space can serve multiple latency, storage, and quality budgets.

## Model at a glance

| Item | Value |
| --- | --- |
| Family | Elephant Embeddings V1 |
| Maintainer | Agentic Intelligence Lab |
| Model type | Text embedding model |
| Modalities | Text |
| Languages | Multilingual |
| Architecture | ModernBERT encoder with YaRN scaling |
| Parameters | ~307M |
| Hidden size | 768 |
| Layers | 22 |
| Context length | 32,768 tokens |
| Pooling | Mean pooling |
| Similarity | Cosine |
| Matryoshka dimensions | 768, 512, 256, 128, 64 |
| Upstream source | `llm-semantic-router/eggon-embed` |
| License | Apache 2.0 |

## Why it fits agentic workloads

Agentic systems call embedding models repeatedly: before retrieval, during routing, while matching tools, when searching memory, and when compressing or reranking state. This model is optimized for that operating pattern rather than for a single offline benchmark.

Key advantages:

- **One semantic space across the stack**: routing, retrieval, memory lookup, and semantic matching can share one vector space.
- **Budget-adaptive vectors**: truncate full 768-dimensional vectors to 256d, 128d, or 64d for cheaper indexes and faster candidate generation.
- **Long-context representation**: encode larger notes, traces, tool descriptions, and document chunks before aggressive chunking is required.
- **Practical deployment size**: a 307M-class encoder is easier to host than much larger embedding models when inference is frequent.

## Recommended use cases

| Scenario | Recommended dimension | Notes |
| --- | ---: | --- |
| Broad route matching | 64d or 128d | Cheap candidate generation over large route/tool sets |
| Large memory-bank search | 64d or 256d | Lower storage and bandwidth cost |
| Main RAG retrieval | 256d or 512d | Balanced quality and cost |
| High-confidence matching | 768d | Best semantic fidelity |
| Long-document indexing | 768d | Preserve richer context before chunking |

## Quick start on ModelScope

```bash
pip install modelscope sentence-transformers torch
```

```python
from modelscope import snapshot_download
from sentence_transformers import SentenceTransformer

repo_id = "agentic-intelligence-lab/elephant-embeddings-v1-text-small"
local_dir = snapshot_download(repo_id)

model = SentenceTransformer(local_dir)

texts = [
    "Find tool descriptions related to browser automation.",
    "检索和用户历史偏好相关的记忆。",
    "Retrieve notes about deployment failures in staging.",
]

embeddings = model.encode(texts, normalize_embeddings=True)
print(embeddings.shape)  # (3, 768)
```

## Matryoshka truncation

```python
import torch.nn.functional as F
from modelscope import snapshot_download
from sentence_transformers import SentenceTransformer

local_dir = snapshot_download("agentic-intelligence-lab/elephant-embeddings-v1-text-small")
model = SentenceTransformer(local_dir)

embeddings = model.encode(texts, convert_to_tensor=True, normalize_embeddings=True)

# Balanced retrieval tier
embeddings_256d = F.normalize(embeddings[:, :256], p=2, dim=1)

# Low-cost routing or large memory-bank tier
embeddings_64d = F.normalize(embeddings[:, :64], p=2, dim=1)
```

## Evaluation snapshot

| Metric | Score |
| --- | ---: |
| MTEB mean, 24 tasks | 61.4 |
| STS Benchmark | 80.5 |
| Dimension retention | 99% @ 256d, 98% @ 64d |
| Layer speedup | 3.3× @ 6L, 5.8× @ 3L |
| Long-context retrieval R@1, 4K tokens | 68.8% |
| Long-context retrieval R@10, 4K tokens | 81.2% |

These results make the model useful for systems that must balance quality, latency, vector size, and deployment simplicity.

## Files

| File | Description |
| --- | --- |
| `model.safetensors` | Model weights |
| `config.json` | ModernBERT configuration |
| `tokenizer.json` / `tokenizer_config.json` | Tokenizer assets |
| `modules.json` / `1_Pooling/config.json` | Sentence Transformers packaging |
| `README.md` | This model card |

## Lineage

This ModelScope package is published by `agentic-intelligence-lab` as part of the Elephant model release line. It mirrors the upstream HuggingFace model `llm-semantic-router/eggon-embed` and keeps the model artifacts unchanged except for the repository naming and model card presentation.

## Limitations

- Full 768-dimensional embeddings are recommended for important final-stage retrieval decisions.
- Aggressive dimension or layer reduction trades quality for speed and storage efficiency.
- Very long inputs are supported, but they still increase compute and memory cost.
- The model is optimized for retrieval and semantic similarity, not text generation.

## Citation

```bibtex
@misc{elephant-embeddings-v1-text-small,
  title={Elephant Embeddings V1 Text Small},
  author={Agentic Intelligence Lab},
  year={2026},
  url={https://modelscope.cn/models/agentic-intelligence-lab/elephant-embeddings-v1-text-small}
}
```

## License

Apache 2.0