File size: 9,586 Bytes

69eb4b7

---
license: apache-2.0
library_name: transformers
pipeline_tag: text-classification
language:
  - multilingual
  - en
  - zh
  - de
  - fr
  - es
  - ru
  - ja
  - ko
  - ar
  - hi
tags:
  - agentic-intelligence-lab
  - elephant
  - rerank
  - reranker
  - cross-encoder
  - text-ranking
  - retrieval
  - rag
  - agents
  - routing
  - multilingual
  - matryoshka
  - 2d-matryoshka
  - long-context
  - modernbert
base_model: llm-semantic-router/mmbert-32k-yarn
datasets:
  - cfli/bge-m3-data
model-index:
  - name: elephant-rerank-v1-text-small
    results:
      - task:
          type: text-ranking
        dataset:
          name: Long document reranking validation
          type: synthetic_long_document_reranking
        metrics:
          - name: Answer at start accuracy
            type: accuracy
            value: 100
          - name: Answer at end accuracy
            type: accuracy
            value: 100
      - task:
          type: text-ranking
        dataset:
          name: BEIR short-document validation
          type: beir
        metrics:
          - name: SciFact MRR
            type: mrr
            value: 94.9
          - name: NFCorpus MRR
            type: mrr
            value: 87.2
          - name: HotpotQA MRR
            type: mrr
            value: 100.0
          - name: FiQA MRR
            type: mrr
            value: 93.9
---

# Elephant Rerank V1 Text Small

`elephant-rerank-v1-text-small` is the text reranker model in the **Agentic Intelligence Lab Elephant Rerank V1** family.

This ModelScope release is maintained by `agentic-intelligence-lab` to make Elephant rerank models easier to download and deploy in mainland China. It mirrors and renames the upstream HuggingFace model `llm-semantic-router/mmbert-rerank-32k-2d-matryoshka` under a consistent Elephant model namespace.

## Positioning

This model is a multilingual long-context cross-encoder reranker for retrieval pipelines, agent memory systems, and RAG applications.

Embedding models are usually used for fast candidate generation. A reranker is used after that stage to score query-document pairs with higher precision. `elephant-rerank-v1-text-small` is designed for the second stage: take a query and a set of candidate passages, then assign relevance scores for final ordering.

The model is especially useful when passages are longer than the 512-token window used by many rerankers, or when relevant information may appear late in a document.

## Model at a glance

| Item | Value |
| --- | --- |
| Family | Elephant Rerank V1 |
| Maintainer | Agentic Intelligence Lab |
| Model type | Text reranker / cross-encoder |
| Modalities | Text query + text passage |
| Languages | Multilingual |
| Architecture | ModernBERT cross-encoder with 2D Matryoshka heads |
| Base model | `llm-semantic-router/mmbert-32k-yarn` |
| Parameters | ~308M |
| Hidden size | 768 |
| Layers | 22 |
| Context length | 32,768 tokens |
| Pooling | CLS |
| Layer indices | 3, 6, 11, 22 |
| Dimension indices | 768, 512, 256, 128, 64 |
| Upstream source | `llm-semantic-router/mmbert-rerank-32k-2d-matryoshka` |
| License | Apache 2.0 |

## Why it fits agentic workloads

Agentic systems often retrieve many candidate memories, documents, tools, or execution traces before deciding what to use. The first retrieval stage needs to be fast; the final ordering stage needs to be precise. This reranker is designed for that final ordering stage.

Key advantages:

- **Long-context pair scoring**: score query-passage pairs with up to 32K tokens of context.
- **Useful after vector retrieval**: rerank candidates from Elephant embeddings or any other first-stage retriever.
- **2D Matryoshka flexibility**: use different layer and dimension heads to trade quality for cost.
- **Multilingual coverage**: suitable for mixed-language retrieval and international corpora.
- **Agent-friendly use cases**: memory selection, tool ranking, evidence ordering, and long-document RAG.

## Recommended use cases

| Scenario | Recommendation |
| --- | --- |
| Long-document RAG | Rerank retrieved chunks or longer passages before generation |
| Agent memory recall | Reorder memory candidates by query relevance |
| Tool and skill ranking | Rank candidate tools after broad semantic retrieval |
| Evidence selection | Pick the strongest supporting records for answer synthesis |
| Multilingual search | Rerank candidates from mixed-language corpora |
| Quality-speed tuning | Use 2D Matryoshka layer/dimension heads for runtime budgets |

## Quick start on ModelScope

```bash
pip install modelscope transformers torch
```

This package contains the ModernBERT encoder weights plus the 2D Matryoshka classification heads. Loading the full reranker requires the custom reranker wrapper used by the upstream training/export code.

```python
import torch
from modelscope import snapshot_download
from transformers import AutoTokenizer

# Use the reranker wrapper from the upstream training package.
# The wrapper is expected to load `model.safetensors`, `classification_heads.pt`,
# and `matryoshka_config.json` from the local model directory.
from train_rerank import Matryoshka2DReranker

repo_id = "agentic-intelligence-lab/elephant-rerank-v1-text-small"
local_dir = snapshot_download(repo_id)

model = Matryoshka2DReranker.from_pretrained(local_dir)
tokenizer = AutoTokenizer.from_pretrained(local_dir)

model.eval()
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

pairs = [
    (
        "What is machine learning?",
        "Machine learning is a subset of AI that enables systems to learn from data.",
    ),
    (
        "What is machine learning?",
        "The weather is sunny today.",
    ),
]

scores = model.compute_score(pairs, tokenizer, normalize=True)
print(scores)
```

## 2D Matryoshka scoring

The model provides multiple layer and dimension heads. This allows one checkpoint to serve several quality/cost profiles.

```python
# Full model: 22 layers, 768 dimensions
scores_full = model.compute_score(pairs, tokenizer, normalize=True)

# Balanced profile
scores_balanced = model.compute_score(
    pairs,
    tokenizer,
    layer_idx=11,
    dim_idx=256,
    normalize=True,
)

# Lower-cost profile
scores_fast = model.compute_score(
    pairs,
    tokenizer,
    layer_idx=6,
    dim_idx=128,
    normalize=True,
)
```

## Reranking pipeline example

```python
def rerank(query: str, passages: list[str], top_k: int = 10) -> list[tuple[str, float]]:
    pairs = [(query, passage) for passage in passages]
    scores = model.compute_score(pairs, tokenizer, normalize=True)
    ranked = sorted(zip(passages, scores), key=lambda item: item[1], reverse=True)
    return ranked[:top_k]

query = "How does photosynthesis work?"
passages = [
    "Photosynthesis is the process by which plants convert sunlight into energy.",
    "The stock market closed higher today.",
    "Plants use chlorophyll to absorb light during photosynthesis.",
    "Python is a popular programming language.",
]

results = rerank(query, passages, top_k=2)
print(results)
```

## Evaluation snapshot

| Evaluation | Metric | Score |
| --- | --- | ---: |
| Long document, answer at start | Accuracy | 100% |
| Long document, answer at end | Accuracy | 100% |
| High-resource multilingual validation | Accuracy | 100% |
| Low-resource multilingual validation | Accuracy | 100% |
| BEIR SciFact | MRR | 94.9 |
| BEIR NFCorpus | MRR | 87.2 |
| BEIR HotpotQA | MRR | 100.0 |
| BEIR FiQA | MRR | 93.9 |

The long-document validation checks whether the reranker can still find relevant information when it appears late in a long passage. This is the main reason to use this model over short-window rerankers in long-context RAG and memory workflows.

## Files

| File | Description |
| --- | --- |
| `model.safetensors` | ModernBERT encoder weights |
| `classification_heads.pt` | 2D Matryoshka reranking heads |
| `matryoshka_config.json` | Layer/dimension head configuration |
| `config.json` | ModernBERT configuration |
| `tokenizer.json` / `tokenizer_config.json` | Tokenizer assets |
| `training_args.json` | Training/export configuration snapshot |
| `README.md` | This model card |

## Lineage

This ModelScope package is published by `agentic-intelligence-lab` as part of the Elephant model release line. It mirrors the upstream HuggingFace model `llm-semantic-router/mmbert-rerank-32k-2d-matryoshka` and keeps the model artifacts unchanged except for the repository naming and model card presentation.

The model is built from `llm-semantic-router/mmbert-32k-yarn`, a ModernBERT-based multilingual encoder extended to 32K context with YaRN position interpolation.

## Limitations

- This is a custom reranker export; the complete scoring path requires the upstream `Matryoshka2DReranker` wrapper or an equivalent implementation.
- Training data is primarily based on BGE-M3 style query-passage pairs, so specialized domains may benefit from fine-tuning.
- Although the model supports 32K tokens, very long query-passage pairs still increase compute and memory cost.
- Layer and dimension reduction trade quality for efficiency and should be validated for each production workload.
- For very short passages where latency is the only priority, a smaller short-window reranker may be faster.

## Citation

```bibtex
@misc{elephant-rerank-v1-text-small,
  title={Elephant Rerank V1 Text Small},
  author={Agentic Intelligence Lab},
  year={2026},
  url={https://modelscope.cn/models/agentic-intelligence-lab/elephant-rerank-v1-text-small}
}
```

## License

Apache 2.0