Mirror agentic-intelligence-lab/elephant-embeddings-v1-text-small from ModelScope

180ee19 verified 8 days ago

6.56 kB

	---
	license: apache-2.0
	library_name: sentence-transformers
	pipeline_tag: sentence-similarity
	language:
	- multilingual
	tags:
	- agentic-intelligence-lab
	- elephant
	- embeddings
	- sentence-transformers
	- sentence-similarity
	- retrieval
	- rag
	- agents
	- routing
	- memory
	- multilingual
	- matryoshka
	- long-context
	- modernbert
	base_model: llm-semantic-router/mmbert-32k-yarn
	datasets:
	- BAAI/bge-m3-data
	model-index:
	- name: elephant-embeddings-v1-text-small
	results:
	- task:
	type: STS
	dataset:
	name: STS Benchmark
	type: mteb/stsbenchmark-sts
	metrics:
	- name: Spearman
	type: spearman
	value: 80.5
	---

	# Elephant Embeddings V1 Text Small

	`elephant-embeddings-v1-text-small` is the text embedding model in the Agentic Intelligence Lab Elephant Embeddings V1 family.

	This ModelScope release is maintained by `agentic-intelligence-lab` to make Elephant embedding models easier to download and deploy in mainland China. It mirrors and renames the upstream HuggingFace model `llm-semantic-router/eggon-embed` under a consistent Elephant model namespace.

	## Positioning

	This model is a multilingual long-context text embedding model for agent-native retrieval and semantic matching. It is designed for systems where embeddings are on the runtime hot path:

	- agent memory recall
	- knowledge retrieval and RAG
	- tool, skill, and route matching
	- long-horizon state search
	- multilingual semantic indexing
	- clustering and deduplication

	The model combines 32K context, ModernBERT encoder architecture, and 2D Matryoshka training so one embedding space can serve multiple latency, storage, and quality budgets.

	## Model at a glance

	\| Item \| Value \|
	\| --- \| --- \|
	\| Family \| Elephant Embeddings V1 \|
	\| Maintainer \| Agentic Intelligence Lab \|
	\| Model type \| Text embedding model \|
	\| Modalities \| Text \|
	\| Languages \| Multilingual \|
	\| Architecture \| ModernBERT encoder with YaRN scaling \|
	\| Parameters \| ~307M \|
	\| Hidden size \| 768 \|
	\| Layers \| 22 \|
	\| Context length \| 32,768 tokens \|
	\| Pooling \| Mean pooling \|
	\| Similarity \| Cosine \|
	\| Matryoshka dimensions \| 768, 512, 256, 128, 64 \|
	\| Upstream source \| `llm-semantic-router/eggon-embed` \|
	\| License \| Apache 2.0 \|

	## Why it fits agentic workloads

	Agentic systems call embedding models repeatedly: before retrieval, during routing, while matching tools, when searching memory, and when compressing or reranking state. This model is optimized for that operating pattern rather than for a single offline benchmark.

	Key advantages:

	- One semantic space across the stack: routing, retrieval, memory lookup, and semantic matching can share one vector space.
	- Budget-adaptive vectors: truncate full 768-dimensional vectors to 256d, 128d, or 64d for cheaper indexes and faster candidate generation.
	- Long-context representation: encode larger notes, traces, tool descriptions, and document chunks before aggressive chunking is required.
	- Practical deployment size: a 307M-class encoder is easier to host than much larger embedding models when inference is frequent.

	## Recommended use cases

	\| Scenario \| Recommended dimension \| Notes \|
	\| --- \| ---: \| --- \|
	\| Broad route matching \| 64d or 128d \| Cheap candidate generation over large route/tool sets \|
	\| Large memory-bank search \| 64d or 256d \| Lower storage and bandwidth cost \|
	\| Main RAG retrieval \| 256d or 512d \| Balanced quality and cost \|
	\| High-confidence matching \| 768d \| Best semantic fidelity \|
	\| Long-document indexing \| 768d \| Preserve richer context before chunking \|

	## Quick start on ModelScope

	```bash
	pip install modelscope sentence-transformers torch
	```

	```python
	from modelscope import snapshot_download
	from sentence_transformers import SentenceTransformer

	repo_id = "agentic-intelligence-lab/elephant-embeddings-v1-text-small"
	local_dir = snapshot_download(repo_id)

	model = SentenceTransformer(local_dir)

	texts = [
	"Find tool descriptions related to browser automation.",
	"检索和用户历史偏好相关的记忆。",
	"Retrieve notes about deployment failures in staging.",
	]

	embeddings = model.encode(texts, normalize_embeddings=True)
	print(embeddings.shape) # (3, 768)
	```

	## Matryoshka truncation

	```python
	import torch.nn.functional as F
	from modelscope import snapshot_download
	from sentence_transformers import SentenceTransformer

	local_dir = snapshot_download("agentic-intelligence-lab/elephant-embeddings-v1-text-small")
	model = SentenceTransformer(local_dir)

	embeddings = model.encode(texts, convert_to_tensor=True, normalize_embeddings=True)

	# Balanced retrieval tier
	embeddings_256d = F.normalize(embeddings[:, :256], p=2, dim=1)

	# Low-cost routing or large memory-bank tier
	embeddings_64d = F.normalize(embeddings[:, :64], p=2, dim=1)
	```

	## Evaluation snapshot

	\| Metric \| Score \|
	\| --- \| ---: \|
	\| MTEB mean, 24 tasks \| 61.4 \|
	\| STS Benchmark \| 80.5 \|
	\| Dimension retention \| 99% @ 256d, 98% @ 64d \|
	\| Layer speedup \| 3.3× @ 6L, 5.8× @ 3L \|
	\| Long-context retrieval R@1, 4K tokens \| 68.8% \|
	\| Long-context retrieval R@10, 4K tokens \| 81.2% \|

	These results make the model useful for systems that must balance quality, latency, vector size, and deployment simplicity.

	## Files

	\| File \| Description \|
	\| --- \| --- \|
	\| `model.safetensors` \| Model weights \|
	\| `config.json` \| ModernBERT configuration \|
	\| `tokenizer.json` / `tokenizer_config.json` \| Tokenizer assets \|
	\| `modules.json` / `1_Pooling/config.json` \| Sentence Transformers packaging \|
	\| `README.md` \| This model card \|

	## Lineage

	This ModelScope package is published by `agentic-intelligence-lab` as part of the Elephant model release line. It mirrors the upstream HuggingFace model `llm-semantic-router/eggon-embed` and keeps the model artifacts unchanged except for the repository naming and model card presentation.

	## Limitations

	- Full 768-dimensional embeddings are recommended for important final-stage retrieval decisions.
	- Aggressive dimension or layer reduction trades quality for speed and storage efficiency.
	- Very long inputs are supported, but they still increase compute and memory cost.
	- The model is optimized for retrieval and semantic similarity, not text generation.

	## Citation

	```bibtex
	@misc{elephant-embeddings-v1-text-small,
	title={Elephant Embeddings V1 Text Small},
	author={Agentic Intelligence Lab},
	year={2026},
	url={https://modelscope.cn/models/agentic-intelligence-lab/elephant-embeddings-v1-text-small}
	}
	```

	## License

	Apache 2.0