Instructions to use ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b-fp8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b-fp8 with Transformers:
# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b-fp8") model = AutoModelForCausalLM.from_pretrained("ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b-fp8") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| license: cc-by-nc-sa-4.0 | |
| pipeline_tag: text-ranking | |
| <div align="center"> | |
| # Contextual AI Reranker v2 1B | |
| <img src="Contextual_AI_Brand_Mark_Dark.png" width="10%" alt="Contextual_AI"/> | |
| [](https://contextual.ai/blog/rerank-v2) | |
| [](https://huggingface.co/collections/ContextualAI/contextual-ai-reranker-v2) | |
| </div> | |
| <hr> | |
| ## Highlights | |
| Contextual AI's reranker is the **first instruction-following reranker** capable of handling retrieval conflicts and ranking with custom instructions (e.g., prioritizing recent information). It achieves state-of-the-art performance on BEIR and sits on the cost/performance Pareto frontier across: | |
| - Instruction following | |
| - Question answering | |
| - Multilinguality (100+ languages) | |
| - Product search & recommendation | |
| - Real-world use cases | |
| <p align="center"> | |
| <img src="main_benchmark.png" width="1200"/> | |
| <p> | |
| For detailed benchmarks, see our [blog post](https://contextual.ai/blog/rerank-v2). | |
| ## Overview | |
| - **Model Type**: Text Reranking | |
| - **Supported Languages**: 100+ | |
| - **Parameters**: 1B | |
| - **Context Length**: up to 32K | |
| ## When to Use This Model | |
| Use this reranker when you need to: | |
| - Re-rank retrieved documents with custom instructions | |
| - Handle conflicting information in retrieval results | |
| - Prioritize documents by recency or other criteria | |
| - Support multilingual search (100+ languages) | |
| - Process long contexts (up to 32K tokens) | |
| ## Quickstart | |
| ### Basic Usage | |
| ```python | |
| # Choose vLLM (recommended for production) or Transformers (simpler setup) | |
| # See full implementation in sections below | |
| model_path = "ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b" | |
| query = "What are the health benefits of exercise?" | |
| instruction = "Prioritize recent medical research" | |
| documents = [ | |
| "Regular exercise reduces risk of heart disease and improves mental health.", | |
| "A 2024 study shows exercise enhances cognitive function in older adults.", | |
| "Ancient Greeks valued physical fitness for military training." | |
| ] | |
| # Using vLLM (see full code below): | |
| infer_w_vllm(model_path, query, instruction, documents) | |
| # OR using Transformers (see full code below): | |
| infer_w_hf(model_path, query, instruction, documents) | |
| ``` | |
| **Expected Output:** | |
| ``` | |
| Query: What are the health benefits of exercise? | |
| Instruction: Prioritize recent medical research | |
| Score: 0.5039 | Doc: A 2024 study shows exercise enhances cognitive function in older adults. | |
| Score: -0.8398 | Doc: Regular exercise reduces risk of heart disease and improves mental health. | |
| Score: -9.3125 | Doc: Ancient Greeks valued physical fitness for military training. | |
| ``` | |
| ### vLLM Usage (Recommended for Production) | |
| Requires `vllm==0.10.0` for NVFP4 or `vllm>=0.8.5` for BF16. | |
| ```python | |
| import os | |
| os.environ['VLLM_USE_V1'] = '0' # v1 engine doesn't support logits processor yet | |
| import torch | |
| from vllm import LLM, SamplingParams | |
| def logits_processor(_, scores): | |
| """Custom logits processor for vLLM reranking.""" | |
| index = scores[0].view(torch.uint16) | |
| scores = torch.full_like(scores, float("-inf")) | |
| scores[index] = 1 | |
| return scores | |
| def format_prompts(query: str, instruction: str, documents: list[str]) -> list[str]: | |
| """Format query and documents into prompts for reranking.""" | |
| if instruction: | |
| instruction = f" {instruction}" | |
| prompts = [] | |
| for doc in documents: | |
| prompt = f"Check whether a given document contains information helpful to answer the query.\n<Document> {doc}\n<Query> {query}{instruction} ??" | |
| prompts.append(prompt) | |
| return prompts | |
| def infer_w_vllm(model_path: str, query: str, instruction: str, documents: list[str]): | |
| model = LLM( | |
| model=model_path, | |
| gpu_memory_utilization=0.85, | |
| max_model_len=8192, | |
| dtype="bfloat16", | |
| max_logprobs=2, | |
| max_num_batched_tokens=262144, | |
| ) | |
| sampling_params = SamplingParams( | |
| temperature=0, | |
| max_tokens=1, | |
| logits_processors=[logits_processor] | |
| ) | |
| prompts = format_prompts(query, instruction, documents) | |
| outputs = model.generate(prompts, sampling_params, use_tqdm=False) | |
| # Extract scores and create results | |
| results = [] | |
| for i, output in enumerate(outputs): | |
| score = ( | |
| torch.tensor([output.outputs[0].token_ids[0]], dtype=torch.uint16) | |
| .view(torch.bfloat16) | |
| .item() | |
| ) | |
| results.append((score, i, documents[i])) | |
| # Sort by score (descending) | |
| results = sorted(results, key=lambda x: x[0], reverse=True) | |
| print(f"Query: {query}") | |
| print(f"Instruction: {instruction}") | |
| for score, doc_id, doc in results: | |
| print(f"Score: {score:.4f} | Doc: {doc}") | |
| # Example usage | |
| if __name__ == "__main__": | |
| model_path = "ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b" | |
| query = "What are the health benefits of exercise?" | |
| instruction = "Prioritize recent medical research" | |
| documents = [ | |
| "Regular exercise reduces risk of heart disease and improves mental health.", | |
| "A 2024 study shows exercise enhances cognitive function in older adults.", | |
| "Ancient Greeks valued physical fitness for military training." | |
| ] | |
| infer_w_vllm(model_path, query, instruction, documents) | |
| ``` | |
| ### Transformers Usage (Simpler Setup) | |
| Requires `transformers>=4.51.0` for BF16. Not supported for NVFP4. | |
| ```python | |
| import torch | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| def format_prompts(query: str, instruction: str, documents: list[str]) -> list[str]: | |
| """Format query and documents into prompts for reranking.""" | |
| if instruction: | |
| instruction = f" {instruction}" | |
| prompts = [] | |
| for doc in documents: | |
| prompt = f"Check whether a given document contains information helpful to answer the query.\n<Document> {doc}\n<Query> {query}{instruction} ??" | |
| prompts.append(prompt) | |
| return prompts | |
| def infer_w_hf(model_path: str, query: str, instruction: str, documents: list[str]): | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| dtype = torch.bfloat16 if torch.cuda.is_available() else torch.float32 | |
| tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=True) | |
| if tokenizer.pad_token is None: | |
| tokenizer.pad_token = tokenizer.eos_token | |
| tokenizer.padding_side = "left" # so -1 is the real last token for all prompts | |
| model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=dtype).to(device) | |
| model.eval() | |
| prompts = format_prompts(query, instruction, documents) | |
| enc = tokenizer( | |
| prompts, | |
| return_tensors="pt", | |
| padding=True, | |
| truncation=True, | |
| ) | |
| input_ids = enc["input_ids"].to(device) | |
| attention_mask = enc["attention_mask"].to(device) | |
| with torch.no_grad(): | |
| out = model(input_ids=input_ids, attention_mask=attention_mask) | |
| next_logits = out.logits[:, -1, :] # [batch, vocab] | |
| scores_bf16 = next_logits[:, 0].to(torch.bfloat16) | |
| scores = scores_bf16.float().tolist() | |
| # Sort by score (descending) | |
| results = sorted([(s, i, documents[i]) for i, s in enumerate(scores)], key=lambda x: x[0], reverse=True) | |
| print(f"Query: {query}") | |
| print(f"Instruction: {instruction}") | |
| for score, doc_id, doc in results: | |
| print(f"Score: {score:.4f} | Doc: {doc}") | |
| # Example usage | |
| if __name__ == "__main__": | |
| model_path = "ContextualAI/ctxl-rerank-v2-instruct-multilingual-1b" | |
| query = "What are the health benefits of exercise?" | |
| instruction = "Prioritize recent medical research" | |
| documents = [ | |
| "Regular exercise reduces risk of heart disease and improves mental health.", | |
| "A 2024 study shows exercise enhances cognitive function in older adults.", | |
| "Ancient Greeks valued physical fitness for military training." | |
| ] | |
| infer_w_hf(model_path, query, instruction, documents) | |
| ``` | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @misc{ctxl_rerank_v2_instruct_multilingual, | |
| title={Contextual AI Reranker v2}, | |
| author={Halal, George and Agrawal, Sheshansh}, | |
| year={2025}, | |
| url={https://contextual.ai/blog/rerank-v2}, | |
| } | |
| ``` | |
| ## License | |
| Creative Commons Attribution Non Commercial Share Alike 4.0 (cc-by-nc-sa-4.0) | |
| ## Contact | |
| For questions or issues, please open an issue on the model repository. |