Mirror agentic-intelligence-lab/elephant-rerank-v1-text-small from ModelScope

69eb4b7 verified 7 days ago

9.59 kB

	---
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-classification
	language:
	- multilingual
	- en
	- zh
	- de
	- fr
	- es
	- ru
	- ja
	- ko
	- ar
	- hi
	tags:
	- agentic-intelligence-lab
	- elephant
	- rerank
	- reranker
	- cross-encoder
	- text-ranking
	- retrieval
	- rag
	- agents
	- routing
	- multilingual
	- matryoshka
	- 2d-matryoshka
	- long-context
	- modernbert
	base_model: llm-semantic-router/mmbert-32k-yarn
	datasets:
	- cfli/bge-m3-data
	model-index:
	- name: elephant-rerank-v1-text-small
	results:
	- task:
	type: text-ranking
	dataset:
	name: Long document reranking validation
	type: synthetic_long_document_reranking
	metrics:
	- name: Answer at start accuracy
	type: accuracy
	value: 100
	- name: Answer at end accuracy
	type: accuracy
	value: 100
	- task:
	type: text-ranking
	dataset:
	name: BEIR short-document validation
	type: beir
	metrics:
	- name: SciFact MRR
	type: mrr
	value: 94.9
	- name: NFCorpus MRR
	type: mrr
	value: 87.2
	- name: HotpotQA MRR
	type: mrr
	value: 100.0
	- name: FiQA MRR
	type: mrr
	value: 93.9
	---

	# Elephant Rerank V1 Text Small

	`elephant-rerank-v1-text-small` is the text reranker model in the Agentic Intelligence Lab Elephant Rerank V1 family.

	This ModelScope release is maintained by `agentic-intelligence-lab` to make Elephant rerank models easier to download and deploy in mainland China. It mirrors and renames the upstream HuggingFace model `llm-semantic-router/mmbert-rerank-32k-2d-matryoshka` under a consistent Elephant model namespace.

	## Positioning

	This model is a multilingual long-context cross-encoder reranker for retrieval pipelines, agent memory systems, and RAG applications.

	Embedding models are usually used for fast candidate generation. A reranker is used after that stage to score query-document pairs with higher precision. `elephant-rerank-v1-text-small` is designed for the second stage: take a query and a set of candidate passages, then assign relevance scores for final ordering.

	The model is especially useful when passages are longer than the 512-token window used by many rerankers, or when relevant information may appear late in a document.

	## Model at a glance

	\| Item \| Value \|
	\| --- \| --- \|
	\| Family \| Elephant Rerank V1 \|
	\| Maintainer \| Agentic Intelligence Lab \|
	\| Model type \| Text reranker / cross-encoder \|
	\| Modalities \| Text query + text passage \|
	\| Languages \| Multilingual \|
	\| Architecture \| ModernBERT cross-encoder with 2D Matryoshka heads \|
	\| Base model \| `llm-semantic-router/mmbert-32k-yarn` \|
	\| Parameters \| ~308M \|
	\| Hidden size \| 768 \|
	\| Layers \| 22 \|
	\| Context length \| 32,768 tokens \|
	\| Pooling \| CLS \|
	\| Layer indices \| 3, 6, 11, 22 \|
	\| Dimension indices \| 768, 512, 256, 128, 64 \|
	\| Upstream source \| `llm-semantic-router/mmbert-rerank-32k-2d-matryoshka` \|
	\| License \| Apache 2.0 \|

	## Why it fits agentic workloads

	Agentic systems often retrieve many candidate memories, documents, tools, or execution traces before deciding what to use. The first retrieval stage needs to be fast; the final ordering stage needs to be precise. This reranker is designed for that final ordering stage.

	Key advantages:

	- Long-context pair scoring: score query-passage pairs with up to 32K tokens of context.
	- Useful after vector retrieval: rerank candidates from Elephant embeddings or any other first-stage retriever.
	- 2D Matryoshka flexibility: use different layer and dimension heads to trade quality for cost.
	- Multilingual coverage: suitable for mixed-language retrieval and international corpora.
	- Agent-friendly use cases: memory selection, tool ranking, evidence ordering, and long-document RAG.

	## Recommended use cases

	\| Scenario \| Recommendation \|
	\| --- \| --- \|
	\| Long-document RAG \| Rerank retrieved chunks or longer passages before generation \|
	\| Agent memory recall \| Reorder memory candidates by query relevance \|
	\| Tool and skill ranking \| Rank candidate tools after broad semantic retrieval \|
	\| Evidence selection \| Pick the strongest supporting records for answer synthesis \|
	\| Multilingual search \| Rerank candidates from mixed-language corpora \|
	\| Quality-speed tuning \| Use 2D Matryoshka layer/dimension heads for runtime budgets \|

	## Quick start on ModelScope

	```bash
	pip install modelscope transformers torch
	```

	This package contains the ModernBERT encoder weights plus the 2D Matryoshka classification heads. Loading the full reranker requires the custom reranker wrapper used by the upstream training/export code.

	```python
	import torch
	from modelscope import snapshot_download
	from transformers import AutoTokenizer

	# Use the reranker wrapper from the upstream training package.
	# The wrapper is expected to load `model.safetensors`, `classification_heads.pt`,
	# and `matryoshka_config.json` from the local model directory.
	from train_rerank import Matryoshka2DReranker

	repo_id = "agentic-intelligence-lab/elephant-rerank-v1-text-small"
	local_dir = snapshot_download(repo_id)

	model = Matryoshka2DReranker.from_pretrained(local_dir)
	tokenizer = AutoTokenizer.from_pretrained(local_dir)

	model.eval()
	device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
	model = model.to(device)

	pairs = [
	(
	"What is machine learning?",
	"Machine learning is a subset of AI that enables systems to learn from data.",
	),
	(
	"What is machine learning?",
	"The weather is sunny today.",
	),
	]

	scores = model.compute_score(pairs, tokenizer, normalize=True)
	print(scores)
	```

	## 2D Matryoshka scoring

	The model provides multiple layer and dimension heads. This allows one checkpoint to serve several quality/cost profiles.

	```python
	# Full model: 22 layers, 768 dimensions
	scores_full = model.compute_score(pairs, tokenizer, normalize=True)

	# Balanced profile
	scores_balanced = model.compute_score(
	pairs,
	tokenizer,
	layer_idx=11,
	dim_idx=256,
	normalize=True,
	)

	# Lower-cost profile
	scores_fast = model.compute_score(
	pairs,
	tokenizer,
	layer_idx=6,
	dim_idx=128,
	normalize=True,
	)
	```

	## Reranking pipeline example

	```python
	def rerank(query: str, passages: list[str], top_k: int = 10) -> list[tuple[str, float]]:
	pairs = [(query, passage) for passage in passages]
	scores = model.compute_score(pairs, tokenizer, normalize=True)
	ranked = sorted(zip(passages, scores), key=lambda item: item[1], reverse=True)
	return ranked[:top_k]

	query = "How does photosynthesis work?"
	passages = [
	"Photosynthesis is the process by which plants convert sunlight into energy.",
	"The stock market closed higher today.",
	"Plants use chlorophyll to absorb light during photosynthesis.",
	"Python is a popular programming language.",
	]

	results = rerank(query, passages, top_k=2)
	print(results)
	```

	## Evaluation snapshot

	\| Evaluation \| Metric \| Score \|
	\| --- \| --- \| ---: \|
	\| Long document, answer at start \| Accuracy \| 100% \|
	\| Long document, answer at end \| Accuracy \| 100% \|
	\| High-resource multilingual validation \| Accuracy \| 100% \|
	\| Low-resource multilingual validation \| Accuracy \| 100% \|
	\| BEIR SciFact \| MRR \| 94.9 \|
	\| BEIR NFCorpus \| MRR \| 87.2 \|
	\| BEIR HotpotQA \| MRR \| 100.0 \|
	\| BEIR FiQA \| MRR \| 93.9 \|

	The long-document validation checks whether the reranker can still find relevant information when it appears late in a long passage. This is the main reason to use this model over short-window rerankers in long-context RAG and memory workflows.

	## Files

	\| File \| Description \|
	\| --- \| --- \|
	\| `model.safetensors` \| ModernBERT encoder weights \|
	\| `classification_heads.pt` \| 2D Matryoshka reranking heads \|
	\| `matryoshka_config.json` \| Layer/dimension head configuration \|
	\| `config.json` \| ModernBERT configuration \|
	\| `tokenizer.json` / `tokenizer_config.json` \| Tokenizer assets \|
	\| `training_args.json` \| Training/export configuration snapshot \|
	\| `README.md` \| This model card \|

	## Lineage

	This ModelScope package is published by `agentic-intelligence-lab` as part of the Elephant model release line. It mirrors the upstream HuggingFace model `llm-semantic-router/mmbert-rerank-32k-2d-matryoshka` and keeps the model artifacts unchanged except for the repository naming and model card presentation.

	The model is built from `llm-semantic-router/mmbert-32k-yarn`, a ModernBERT-based multilingual encoder extended to 32K context with YaRN position interpolation.

	## Limitations

	- This is a custom reranker export; the complete scoring path requires the upstream `Matryoshka2DReranker` wrapper or an equivalent implementation.
	- Training data is primarily based on BGE-M3 style query-passage pairs, so specialized domains may benefit from fine-tuning.
	- Although the model supports 32K tokens, very long query-passage pairs still increase compute and memory cost.
	- Layer and dimension reduction trade quality for efficiency and should be validated for each production workload.
	- For very short passages where latency is the only priority, a smaller short-window reranker may be faster.

	## Citation

	```bibtex
	@misc{elephant-rerank-v1-text-small,
	title={Elephant Rerank V1 Text Small},
	author={Agentic Intelligence Lab},
	year={2026},
	url={https://modelscope.cn/models/agentic-intelligence-lab/elephant-rerank-v1-text-small}
	}
	```

	## License

	Apache 2.0