Xunzhuo's picture
Mirror agentic-intelligence-lab/elephant-rerank-v1-text-small from ModelScope
69eb4b7 verified
---
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
language:
- multilingual
- en
- zh
- de
- fr
- es
- ru
- ja
- ko
- ar
- hi
tags:
- agentic-intelligence-lab
- elephant
- rerank
- reranker
- cross-encoder
- text-ranking
- retrieval
- rag
- agents
- routing
- multilingual
- matryoshka
- 2d-matryoshka
- long-context
- modernbert
base_model: llm-semantic-router/mmbert-32k-yarn
datasets:
- cfli/bge-m3-data
model-index:
- name: elephant-rerank-v1-text-small
results:
- task:
type: text-ranking
dataset:
name: Long document reranking validation
type: synthetic_long_document_reranking
metrics:
- name: Answer at start accuracy
type: accuracy
value: 100
- name: Answer at end accuracy
type: accuracy
value: 100
- task:
type: text-ranking
dataset:
name: BEIR short-document validation
type: beir
metrics:
- name: SciFact MRR
type: mrr
value: 94.9
- name: NFCorpus MRR
type: mrr
value: 87.2
- name: HotpotQA MRR
type: mrr
value: 100.0
- name: FiQA MRR
type: mrr
value: 93.9
---
# Elephant Rerank V1 Text Small
`elephant-rerank-v1-text-small` is the text reranker model in the **Agentic Intelligence Lab Elephant Rerank V1** family.
This ModelScope release is maintained by `agentic-intelligence-lab` to make Elephant rerank models easier to download and deploy in mainland China. It mirrors and renames the upstream HuggingFace model `llm-semantic-router/mmbert-rerank-32k-2d-matryoshka` under a consistent Elephant model namespace.
## Positioning
This model is a multilingual long-context cross-encoder reranker for retrieval pipelines, agent memory systems, and RAG applications.
Embedding models are usually used for fast candidate generation. A reranker is used after that stage to score query-document pairs with higher precision. `elephant-rerank-v1-text-small` is designed for the second stage: take a query and a set of candidate passages, then assign relevance scores for final ordering.
The model is especially useful when passages are longer than the 512-token window used by many rerankers, or when relevant information may appear late in a document.
## Model at a glance
| Item | Value |
| --- | --- |
| Family | Elephant Rerank V1 |
| Maintainer | Agentic Intelligence Lab |
| Model type | Text reranker / cross-encoder |
| Modalities | Text query + text passage |
| Languages | Multilingual |
| Architecture | ModernBERT cross-encoder with 2D Matryoshka heads |
| Base model | `llm-semantic-router/mmbert-32k-yarn` |
| Parameters | ~308M |
| Hidden size | 768 |
| Layers | 22 |
| Context length | 32,768 tokens |
| Pooling | CLS |
| Layer indices | 3, 6, 11, 22 |
| Dimension indices | 768, 512, 256, 128, 64 |
| Upstream source | `llm-semantic-router/mmbert-rerank-32k-2d-matryoshka` |
| License | Apache 2.0 |
## Why it fits agentic workloads
Agentic systems often retrieve many candidate memories, documents, tools, or execution traces before deciding what to use. The first retrieval stage needs to be fast; the final ordering stage needs to be precise. This reranker is designed for that final ordering stage.
Key advantages:
- **Long-context pair scoring**: score query-passage pairs with up to 32K tokens of context.
- **Useful after vector retrieval**: rerank candidates from Elephant embeddings or any other first-stage retriever.
- **2D Matryoshka flexibility**: use different layer and dimension heads to trade quality for cost.
- **Multilingual coverage**: suitable for mixed-language retrieval and international corpora.
- **Agent-friendly use cases**: memory selection, tool ranking, evidence ordering, and long-document RAG.
## Recommended use cases
| Scenario | Recommendation |
| --- | --- |
| Long-document RAG | Rerank retrieved chunks or longer passages before generation |
| Agent memory recall | Reorder memory candidates by query relevance |
| Tool and skill ranking | Rank candidate tools after broad semantic retrieval |
| Evidence selection | Pick the strongest supporting records for answer synthesis |
| Multilingual search | Rerank candidates from mixed-language corpora |
| Quality-speed tuning | Use 2D Matryoshka layer/dimension heads for runtime budgets |
## Quick start on ModelScope
```bash
pip install modelscope transformers torch
```
This package contains the ModernBERT encoder weights plus the 2D Matryoshka classification heads. Loading the full reranker requires the custom reranker wrapper used by the upstream training/export code.
```python
import torch
from modelscope import snapshot_download
from transformers import AutoTokenizer
# Use the reranker wrapper from the upstream training package.
# The wrapper is expected to load `model.safetensors`, `classification_heads.pt`,
# and `matryoshka_config.json` from the local model directory.
from train_rerank import Matryoshka2DReranker
repo_id = "agentic-intelligence-lab/elephant-rerank-v1-text-small"
local_dir = snapshot_download(repo_id)
model = Matryoshka2DReranker.from_pretrained(local_dir)
tokenizer = AutoTokenizer.from_pretrained(local_dir)
model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
pairs = [
(
"What is machine learning?",
"Machine learning is a subset of AI that enables systems to learn from data.",
),
(
"What is machine learning?",
"The weather is sunny today.",
),
]
scores = model.compute_score(pairs, tokenizer, normalize=True)
print(scores)
```
## 2D Matryoshka scoring
The model provides multiple layer and dimension heads. This allows one checkpoint to serve several quality/cost profiles.
```python
# Full model: 22 layers, 768 dimensions
scores_full = model.compute_score(pairs, tokenizer, normalize=True)
# Balanced profile
scores_balanced = model.compute_score(
pairs,
tokenizer,
layer_idx=11,
dim_idx=256,
normalize=True,
)
# Lower-cost profile
scores_fast = model.compute_score(
pairs,
tokenizer,
layer_idx=6,
dim_idx=128,
normalize=True,
)
```
## Reranking pipeline example
```python
def rerank(query: str, passages: list[str], top_k: int = 10) -> list[tuple[str, float]]:
pairs = [(query, passage) for passage in passages]
scores = model.compute_score(pairs, tokenizer, normalize=True)
ranked = sorted(zip(passages, scores), key=lambda item: item[1], reverse=True)
return ranked[:top_k]
query = "How does photosynthesis work?"
passages = [
"Photosynthesis is the process by which plants convert sunlight into energy.",
"The stock market closed higher today.",
"Plants use chlorophyll to absorb light during photosynthesis.",
"Python is a popular programming language.",
]
results = rerank(query, passages, top_k=2)
print(results)
```
## Evaluation snapshot
| Evaluation | Metric | Score |
| --- | --- | ---: |
| Long document, answer at start | Accuracy | 100% |
| Long document, answer at end | Accuracy | 100% |
| High-resource multilingual validation | Accuracy | 100% |
| Low-resource multilingual validation | Accuracy | 100% |
| BEIR SciFact | MRR | 94.9 |
| BEIR NFCorpus | MRR | 87.2 |
| BEIR HotpotQA | MRR | 100.0 |
| BEIR FiQA | MRR | 93.9 |
The long-document validation checks whether the reranker can still find relevant information when it appears late in a long passage. This is the main reason to use this model over short-window rerankers in long-context RAG and memory workflows.
## Files
| File | Description |
| --- | --- |
| `model.safetensors` | ModernBERT encoder weights |
| `classification_heads.pt` | 2D Matryoshka reranking heads |
| `matryoshka_config.json` | Layer/dimension head configuration |
| `config.json` | ModernBERT configuration |
| `tokenizer.json` / `tokenizer_config.json` | Tokenizer assets |
| `training_args.json` | Training/export configuration snapshot |
| `README.md` | This model card |
## Lineage
This ModelScope package is published by `agentic-intelligence-lab` as part of the Elephant model release line. It mirrors the upstream HuggingFace model `llm-semantic-router/mmbert-rerank-32k-2d-matryoshka` and keeps the model artifacts unchanged except for the repository naming and model card presentation.
The model is built from `llm-semantic-router/mmbert-32k-yarn`, a ModernBERT-based multilingual encoder extended to 32K context with YaRN position interpolation.
## Limitations
- This is a custom reranker export; the complete scoring path requires the upstream `Matryoshka2DReranker` wrapper or an equivalent implementation.
- Training data is primarily based on BGE-M3 style query-passage pairs, so specialized domains may benefit from fine-tuning.
- Although the model supports 32K tokens, very long query-passage pairs still increase compute and memory cost.
- Layer and dimension reduction trade quality for efficiency and should be validated for each production workload.
- For very short passages where latency is the only priority, a smaller short-window reranker may be faster.
## Citation
```bibtex
@misc{elephant-rerank-v1-text-small,
title={Elephant Rerank V1 Text Small},
author={Agentic Intelligence Lab},
year={2026},
url={https://modelscope.cn/models/agentic-intelligence-lab/elephant-rerank-v1-text-small}
}
```
## License
Apache 2.0