--- license: apache-2.0 library_name: sentence-transformers pipeline_tag: sentence-similarity language: - multilingual tags: - agentic-intelligence-lab - elephant - embeddings - sentence-transformers - sentence-similarity - retrieval - rag - agents - routing - memory - multilingual - matryoshka - long-context - modernbert base_model: llm-semantic-router/mmbert-32k-yarn datasets: - BAAI/bge-m3-data model-index: - name: elephant-embeddings-v1-text-small results: - task: type: STS dataset: name: STS Benchmark type: mteb/stsbenchmark-sts metrics: - name: Spearman type: spearman value: 80.5 --- # Elephant Embeddings V1 Text Small `elephant-embeddings-v1-text-small` is the text embedding model in the **Agentic Intelligence Lab Elephant Embeddings V1** family. This ModelScope release is maintained by `agentic-intelligence-lab` to make Elephant embedding models easier to download and deploy in mainland China. It mirrors and renames the upstream HuggingFace model `llm-semantic-router/eggon-embed` under a consistent Elephant model namespace. ## Positioning This model is a multilingual long-context text embedding model for agent-native retrieval and semantic matching. It is designed for systems where embeddings are on the runtime hot path: - agent memory recall - knowledge retrieval and RAG - tool, skill, and route matching - long-horizon state search - multilingual semantic indexing - clustering and deduplication The model combines **32K context**, **ModernBERT encoder architecture**, and **2D Matryoshka training** so one embedding space can serve multiple latency, storage, and quality budgets. ## Model at a glance | Item | Value | | --- | --- | | Family | Elephant Embeddings V1 | | Maintainer | Agentic Intelligence Lab | | Model type | Text embedding model | | Modalities | Text | | Languages | Multilingual | | Architecture | ModernBERT encoder with YaRN scaling | | Parameters | ~307M | | Hidden size | 768 | | Layers | 22 | | Context length | 32,768 tokens | | Pooling | Mean pooling | | Similarity | Cosine | | Matryoshka dimensions | 768, 512, 256, 128, 64 | | Upstream source | `llm-semantic-router/eggon-embed` | | License | Apache 2.0 | ## Why it fits agentic workloads Agentic systems call embedding models repeatedly: before retrieval, during routing, while matching tools, when searching memory, and when compressing or reranking state. This model is optimized for that operating pattern rather than for a single offline benchmark. Key advantages: - **One semantic space across the stack**: routing, retrieval, memory lookup, and semantic matching can share one vector space. - **Budget-adaptive vectors**: truncate full 768-dimensional vectors to 256d, 128d, or 64d for cheaper indexes and faster candidate generation. - **Long-context representation**: encode larger notes, traces, tool descriptions, and document chunks before aggressive chunking is required. - **Practical deployment size**: a 307M-class encoder is easier to host than much larger embedding models when inference is frequent. ## Recommended use cases | Scenario | Recommended dimension | Notes | | --- | ---: | --- | | Broad route matching | 64d or 128d | Cheap candidate generation over large route/tool sets | | Large memory-bank search | 64d or 256d | Lower storage and bandwidth cost | | Main RAG retrieval | 256d or 512d | Balanced quality and cost | | High-confidence matching | 768d | Best semantic fidelity | | Long-document indexing | 768d | Preserve richer context before chunking | ## Quick start on ModelScope ```bash pip install modelscope sentence-transformers torch ``` ```python from modelscope import snapshot_download from sentence_transformers import SentenceTransformer repo_id = "agentic-intelligence-lab/elephant-embeddings-v1-text-small" local_dir = snapshot_download(repo_id) model = SentenceTransformer(local_dir) texts = [ "Find tool descriptions related to browser automation.", "检索和用户历史偏好相关的记忆。", "Retrieve notes about deployment failures in staging.", ] embeddings = model.encode(texts, normalize_embeddings=True) print(embeddings.shape) # (3, 768) ``` ## Matryoshka truncation ```python import torch.nn.functional as F from modelscope import snapshot_download from sentence_transformers import SentenceTransformer local_dir = snapshot_download("agentic-intelligence-lab/elephant-embeddings-v1-text-small") model = SentenceTransformer(local_dir) embeddings = model.encode(texts, convert_to_tensor=True, normalize_embeddings=True) # Balanced retrieval tier embeddings_256d = F.normalize(embeddings[:, :256], p=2, dim=1) # Low-cost routing or large memory-bank tier embeddings_64d = F.normalize(embeddings[:, :64], p=2, dim=1) ``` ## Evaluation snapshot | Metric | Score | | --- | ---: | | MTEB mean, 24 tasks | 61.4 | | STS Benchmark | 80.5 | | Dimension retention | 99% @ 256d, 98% @ 64d | | Layer speedup | 3.3× @ 6L, 5.8× @ 3L | | Long-context retrieval R@1, 4K tokens | 68.8% | | Long-context retrieval R@10, 4K tokens | 81.2% | These results make the model useful for systems that must balance quality, latency, vector size, and deployment simplicity. ## Files | File | Description | | --- | --- | | `model.safetensors` | Model weights | | `config.json` | ModernBERT configuration | | `tokenizer.json` / `tokenizer_config.json` | Tokenizer assets | | `modules.json` / `1_Pooling/config.json` | Sentence Transformers packaging | | `README.md` | This model card | ## Lineage This ModelScope package is published by `agentic-intelligence-lab` as part of the Elephant model release line. It mirrors the upstream HuggingFace model `llm-semantic-router/eggon-embed` and keeps the model artifacts unchanged except for the repository naming and model card presentation. ## Limitations - Full 768-dimensional embeddings are recommended for important final-stage retrieval decisions. - Aggressive dimension or layer reduction trades quality for speed and storage efficiency. - Very long inputs are supported, but they still increase compute and memory cost. - The model is optimized for retrieval and semantic similarity, not text generation. ## Citation ```bibtex @misc{elephant-embeddings-v1-text-small, title={Elephant Embeddings V1 Text Small}, author={Agentic Intelligence Lab}, year={2026}, url={https://modelscope.cn/models/agentic-intelligence-lab/elephant-embeddings-v1-text-small} } ``` ## License Apache 2.0