Text Classification
Transformers
Safetensors
PEFT
English
code
qwen3
text-generation
code-search
reranker
code-retrieval
lora
text-embeddings-inference
Instructions to use hq-bench/coreb-code-reranker with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use hq-bench/coreb-code-reranker with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="hq-bench/coreb-code-reranker")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("hq-bench/coreb-code-reranker") model = AutoModelForCausalLM.from_pretrained("hq-bench/coreb-code-reranker") - PEFT
How to use hq-bench/coreb-code-reranker with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
File size: 4,809 Bytes
caad463 4fc0271 caad463 4fc0271 caad463 cafe208 caad463 89ccb61 caad463 4fc0271 caad463 010830a caad463 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | ---
license: apache-2.0
base_model: Qwen/Qwen3-Reranker-4B
tags:
- code-search
- reranker
- code-retrieval
- peft
- lora
language:
- en
- code
datasets:
- hq-bench/coreb
pipeline_tag: text-classification
library_name: transformers
---
[](https://hq-bench.github.io/coreb-page/)
[](https://arxiv.org/abs/2605.04615)
[](https://huggingface.co/datasets/hq-bench/coreb)
[](https://github.com/hq-bench/coreb)
# CoREB-Reranker
**CoREB-Reranker** is a code reranker fine-tuned from [Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B) via LoRA on a mixed reranker corpus. It is the **only reranker we evaluate that achieves consistent gains across all three code search tasks** (text-to-code, code-to-text, and code-to-code).
## Highlights
- Fine-tuned from Qwen3-Reranker-4B using LoRA (rank=16, alpha=16) on **3.1M training samples** from a mixed corpus
- Evaluated on CoREB v202603 (problem-disjoint from training set, no data leakage)
- Achieves **positive reranking delta on all three tasks**, unlike all off-the-shelf rerankers tested
## Reranking Results (nDCG@10 Delta %)
Reranking delta on CoREB v202603, using C2LLM-7B as the first-stage retriever:
| Reranker | Text-to-Code | Code-to-Text | Code-to-Code |
|----------|:---:|:---:|:---:|
| Jina Reranker v2 | -8.3 | -22.4 | -8.8 |
| Jina Reranker v3 | -2.2 | -5.0 | -0.1 |
| Qwen3-Reranker-0.6B | -0.6 | -8.2 | -2.3 |
| Qwen3-Reranker-4B | -0.1 | -3.2 | +3.3 |
| **CoREB-Reranker (ours)** | **+1.1** | **+0.8** | **+5.1** |
## Training Details
- **Base model**: [Qwen/Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B)
- **Method**: LoRA (rank=16, alpha=16, dropout=0.05)
- **Target modules**: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- **Training data**: A mixed reranker corpus consisting of [CoREB v202602](https://huggingface.co/datasets/hq-bench/coreb), [CodeSearchNet](https://github.com/github/CodeSearchNet) (code-to-code, code-to-text, text-to-code), [APPS](https://github.com/hendrycks/apps), [CosQA](https://github.com/Jun-jie-Huang/CosQA), and [CodeFeedback](https://github.com/OpenCodeInterpreter/OpenCodeInterpreter) (single-turn and multi-turn). Each record is normalized into binary reranking examples (instruction, query, document, yes/no). Positives are duplicated twice; one easy negative and one hard negative are sampled per record.
- **Evaluation data**: CoREB v202603 (problem-disjoint from CoREB v202602 training split; covers a different contest time window)
- **Training samples**: ~3.1M binary reranking examples across text-to-code, code-to-text, and code-to-code tasks
- **Top-k retrieval for reranking**: 128
## Usage
CoREB-Reranker follows the same usage pattern as Qwen3-Reranker. Given a query and a list of candidate documents from first-stage retrieval, the reranker scores each query-document pair:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "hq-bench/coreb-code-reranker"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
# Format as Qwen3-Reranker input
query = "binary search implementation"
document = "def binary_search(arr, target):\n lo, hi = 0, len(arr) - 1\n ..."
prefix = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'
suffix = "<|im_end|>\n<|im_start|>assistant\n"
instruct = "Given a code search query, does the following code snippet match the query intent?"
prompt = f"{prefix}<Instruct>: {instruct}\n<Query>: {query}\n<Document>: {document}{suffix}"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model(**inputs)
# Score is the logit difference between "yes" and "no" tokens
yes_id = tokenizer.convert_tokens_to_ids("yes")
no_id = tokenizer.convert_tokens_to_ids("no")
logits = outputs.logits[0, -1, :]
score = logits[yes_id] - logits[no_id]
print(f"Relevance score: {score.item():.4f}")
```
For batch reranking with the CoREB evaluation pipeline, see the [CoREB repository](https://github.com/hq-bench/coreb).
## Citation
```bibtex
@article{xue2026coreb,
title={Beyond Retrieval: A Multitask Benchmark and Reranker for Code Search},
author={Xue, Siqiao and Liao, Zihan and Qin, Jin and Zhang, Ziyin and Mu, Yixiang and Zhou, Fan and Yu, Hang},
journal={arXiv preprint arXiv:2605.04615},
year={2026},
url={https://arxiv.org/abs/2605.04615}
}
```
|