Text Classification
Transformers
Safetensors
modernbert
feature-extraction
agentic-intelligence-lab
elephant
rerank
reranker
cross-encoder
text-ranking
retrieval
rag
agents
routing
matryoshka
2d-matryoshka
long-context
Eval Results (legacy)
text-embeddings-inference
Instructions to use agentic-in/elephant-rerank-v1-text-small with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use agentic-in/elephant-rerank-v1-text-small with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="agentic-in/elephant-rerank-v1-text-small")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("agentic-in/elephant-rerank-v1-text-small") model = AutoModel.from_pretrained("agentic-in/elephant-rerank-v1-text-small") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| library_name: transformers | |
| pipeline_tag: text-classification | |
| language: | |
| - multilingual | |
| - en | |
| - zh | |
| - de | |
| - fr | |
| - es | |
| - ru | |
| - ja | |
| - ko | |
| - ar | |
| - hi | |
| tags: | |
| - agentic-intelligence-lab | |
| - elephant | |
| - rerank | |
| - reranker | |
| - cross-encoder | |
| - text-ranking | |
| - retrieval | |
| - rag | |
| - agents | |
| - routing | |
| - multilingual | |
| - matryoshka | |
| - 2d-matryoshka | |
| - long-context | |
| - modernbert | |
| base_model: llm-semantic-router/mmbert-32k-yarn | |
| datasets: | |
| - cfli/bge-m3-data | |
| model-index: | |
| - name: elephant-rerank-v1-text-small | |
| results: | |
| - task: | |
| type: text-ranking | |
| dataset: | |
| name: Long document reranking validation | |
| type: synthetic_long_document_reranking | |
| metrics: | |
| - name: Answer at start accuracy | |
| type: accuracy | |
| value: 100 | |
| - name: Answer at end accuracy | |
| type: accuracy | |
| value: 100 | |
| - task: | |
| type: text-ranking | |
| dataset: | |
| name: BEIR short-document validation | |
| type: beir | |
| metrics: | |
| - name: SciFact MRR | |
| type: mrr | |
| value: 94.9 | |
| - name: NFCorpus MRR | |
| type: mrr | |
| value: 87.2 | |
| - name: HotpotQA MRR | |
| type: mrr | |
| value: 100.0 | |
| - name: FiQA MRR | |
| type: mrr | |
| value: 93.9 | |
| # Elephant Rerank V1 Text Small | |
| `elephant-rerank-v1-text-small` is the text reranker model in the **Agentic Intelligence Lab Elephant Rerank V1** family. | |
| This ModelScope release is maintained by `agentic-intelligence-lab` to make Elephant rerank models easier to download and deploy in mainland China. It mirrors and renames the upstream HuggingFace model `llm-semantic-router/mmbert-rerank-32k-2d-matryoshka` under a consistent Elephant model namespace. | |
| ## Positioning | |
| This model is a multilingual long-context cross-encoder reranker for retrieval pipelines, agent memory systems, and RAG applications. | |
| Embedding models are usually used for fast candidate generation. A reranker is used after that stage to score query-document pairs with higher precision. `elephant-rerank-v1-text-small` is designed for the second stage: take a query and a set of candidate passages, then assign relevance scores for final ordering. | |
| The model is especially useful when passages are longer than the 512-token window used by many rerankers, or when relevant information may appear late in a document. | |
| ## Model at a glance | |
| | Item | Value | | |
| | --- | --- | | |
| | Family | Elephant Rerank V1 | | |
| | Maintainer | Agentic Intelligence Lab | | |
| | Model type | Text reranker / cross-encoder | | |
| | Modalities | Text query + text passage | | |
| | Languages | Multilingual | | |
| | Architecture | ModernBERT cross-encoder with 2D Matryoshka heads | | |
| | Base model | `llm-semantic-router/mmbert-32k-yarn` | | |
| | Parameters | ~308M | | |
| | Hidden size | 768 | | |
| | Layers | 22 | | |
| | Context length | 32,768 tokens | | |
| | Pooling | CLS | | |
| | Layer indices | 3, 6, 11, 22 | | |
| | Dimension indices | 768, 512, 256, 128, 64 | | |
| | Upstream source | `llm-semantic-router/mmbert-rerank-32k-2d-matryoshka` | | |
| | License | Apache 2.0 | | |
| ## Why it fits agentic workloads | |
| Agentic systems often retrieve many candidate memories, documents, tools, or execution traces before deciding what to use. The first retrieval stage needs to be fast; the final ordering stage needs to be precise. This reranker is designed for that final ordering stage. | |
| Key advantages: | |
| - **Long-context pair scoring**: score query-passage pairs with up to 32K tokens of context. | |
| - **Useful after vector retrieval**: rerank candidates from Elephant embeddings or any other first-stage retriever. | |
| - **2D Matryoshka flexibility**: use different layer and dimension heads to trade quality for cost. | |
| - **Multilingual coverage**: suitable for mixed-language retrieval and international corpora. | |
| - **Agent-friendly use cases**: memory selection, tool ranking, evidence ordering, and long-document RAG. | |
| ## Recommended use cases | |
| | Scenario | Recommendation | | |
| | --- | --- | | |
| | Long-document RAG | Rerank retrieved chunks or longer passages before generation | | |
| | Agent memory recall | Reorder memory candidates by query relevance | | |
| | Tool and skill ranking | Rank candidate tools after broad semantic retrieval | | |
| | Evidence selection | Pick the strongest supporting records for answer synthesis | | |
| | Multilingual search | Rerank candidates from mixed-language corpora | | |
| | Quality-speed tuning | Use 2D Matryoshka layer/dimension heads for runtime budgets | | |
| ## Quick start on ModelScope | |
| ```bash | |
| pip install modelscope transformers torch | |
| ``` | |
| This package contains the ModernBERT encoder weights plus the 2D Matryoshka classification heads. Loading the full reranker requires the custom reranker wrapper used by the upstream training/export code. | |
| ```python | |
| import torch | |
| from modelscope import snapshot_download | |
| from transformers import AutoTokenizer | |
| # Use the reranker wrapper from the upstream training package. | |
| # The wrapper is expected to load `model.safetensors`, `classification_heads.pt`, | |
| # and `matryoshka_config.json` from the local model directory. | |
| from train_rerank import Matryoshka2DReranker | |
| repo_id = "agentic-intelligence-lab/elephant-rerank-v1-text-small" | |
| local_dir = snapshot_download(repo_id) | |
| model = Matryoshka2DReranker.from_pretrained(local_dir) | |
| tokenizer = AutoTokenizer.from_pretrained(local_dir) | |
| model.eval() | |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | |
| model = model.to(device) | |
| pairs = [ | |
| ( | |
| "What is machine learning?", | |
| "Machine learning is a subset of AI that enables systems to learn from data.", | |
| ), | |
| ( | |
| "What is machine learning?", | |
| "The weather is sunny today.", | |
| ), | |
| ] | |
| scores = model.compute_score(pairs, tokenizer, normalize=True) | |
| print(scores) | |
| ``` | |
| ## 2D Matryoshka scoring | |
| The model provides multiple layer and dimension heads. This allows one checkpoint to serve several quality/cost profiles. | |
| ```python | |
| # Full model: 22 layers, 768 dimensions | |
| scores_full = model.compute_score(pairs, tokenizer, normalize=True) | |
| # Balanced profile | |
| scores_balanced = model.compute_score( | |
| pairs, | |
| tokenizer, | |
| layer_idx=11, | |
| dim_idx=256, | |
| normalize=True, | |
| ) | |
| # Lower-cost profile | |
| scores_fast = model.compute_score( | |
| pairs, | |
| tokenizer, | |
| layer_idx=6, | |
| dim_idx=128, | |
| normalize=True, | |
| ) | |
| ``` | |
| ## Reranking pipeline example | |
| ```python | |
| def rerank(query: str, passages: list[str], top_k: int = 10) -> list[tuple[str, float]]: | |
| pairs = [(query, passage) for passage in passages] | |
| scores = model.compute_score(pairs, tokenizer, normalize=True) | |
| ranked = sorted(zip(passages, scores), key=lambda item: item[1], reverse=True) | |
| return ranked[:top_k] | |
| query = "How does photosynthesis work?" | |
| passages = [ | |
| "Photosynthesis is the process by which plants convert sunlight into energy.", | |
| "The stock market closed higher today.", | |
| "Plants use chlorophyll to absorb light during photosynthesis.", | |
| "Python is a popular programming language.", | |
| ] | |
| results = rerank(query, passages, top_k=2) | |
| print(results) | |
| ``` | |
| ## Evaluation snapshot | |
| | Evaluation | Metric | Score | | |
| | --- | --- | ---: | | |
| | Long document, answer at start | Accuracy | 100% | | |
| | Long document, answer at end | Accuracy | 100% | | |
| | High-resource multilingual validation | Accuracy | 100% | | |
| | Low-resource multilingual validation | Accuracy | 100% | | |
| | BEIR SciFact | MRR | 94.9 | | |
| | BEIR NFCorpus | MRR | 87.2 | | |
| | BEIR HotpotQA | MRR | 100.0 | | |
| | BEIR FiQA | MRR | 93.9 | | |
| The long-document validation checks whether the reranker can still find relevant information when it appears late in a long passage. This is the main reason to use this model over short-window rerankers in long-context RAG and memory workflows. | |
| ## Files | |
| | File | Description | | |
| | --- | --- | | |
| | `model.safetensors` | ModernBERT encoder weights | | |
| | `classification_heads.pt` | 2D Matryoshka reranking heads | | |
| | `matryoshka_config.json` | Layer/dimension head configuration | | |
| | `config.json` | ModernBERT configuration | | |
| | `tokenizer.json` / `tokenizer_config.json` | Tokenizer assets | | |
| | `training_args.json` | Training/export configuration snapshot | | |
| | `README.md` | This model card | | |
| ## Lineage | |
| This ModelScope package is published by `agentic-intelligence-lab` as part of the Elephant model release line. It mirrors the upstream HuggingFace model `llm-semantic-router/mmbert-rerank-32k-2d-matryoshka` and keeps the model artifacts unchanged except for the repository naming and model card presentation. | |
| The model is built from `llm-semantic-router/mmbert-32k-yarn`, a ModernBERT-based multilingual encoder extended to 32K context with YaRN position interpolation. | |
| ## Limitations | |
| - This is a custom reranker export; the complete scoring path requires the upstream `Matryoshka2DReranker` wrapper or an equivalent implementation. | |
| - Training data is primarily based on BGE-M3 style query-passage pairs, so specialized domains may benefit from fine-tuning. | |
| - Although the model supports 32K tokens, very long query-passage pairs still increase compute and memory cost. | |
| - Layer and dimension reduction trade quality for efficiency and should be validated for each production workload. | |
| - For very short passages where latency is the only priority, a smaller short-window reranker may be faster. | |
| ## Citation | |
| ```bibtex | |
| @misc{elephant-rerank-v1-text-small, | |
| title={Elephant Rerank V1 Text Small}, | |
| author={Agentic Intelligence Lab}, | |
| year={2026}, | |
| url={https://modelscope.cn/models/agentic-intelligence-lab/elephant-rerank-v1-text-small} | |
| } | |
| ``` | |
| ## License | |
| Apache 2.0 | |