Fix retriever: C2LLM-7B not GemEmb-2

cafe208 verified 2 days ago

4.81 kB

license: apache-2.0
base_model: Qwen/Qwen3-Reranker-4B
tags:
  - code-search
  - reranker
  - code-retrieval
  - peft
  - lora
language:
  - en
  - code
datasets:
  - hq-bench/coreb
pipeline_tag: text-classification
library_name: transformers

CoREB-Reranker

CoREB-Reranker is a code reranker fine-tuned from Qwen3-Reranker-4B via LoRA on a mixed reranker corpus. It is the only reranker we evaluate that achieves consistent gains across all three code search tasks (text-to-code, code-to-text, and code-to-code).

Highlights

Fine-tuned from Qwen3-Reranker-4B using LoRA (rank=16, alpha=16) on 3.1M training samples from a mixed corpus
Evaluated on CoREB v202603 (problem-disjoint from training set, no data leakage)
Achieves positive reranking delta on all three tasks, unlike all off-the-shelf rerankers tested

Reranking Results (nDCG@10 Delta %)

Reranking delta on CoREB v202603, using C2LLM-7B as the first-stage retriever:

Reranker	Text-to-Code	Code-to-Text	Code-to-Code
Jina Reranker v2	-8.3	-22.4	-8.8
Jina Reranker v3	-2.2	-5.0	-0.1
Qwen3-Reranker-0.6B	-0.6	-8.2	-2.3
Qwen3-Reranker-4B	-0.1	-3.2	+3.3
CoREB-Reranker (ours)	+1.1	+0.8	+5.1

Training Details

Base model: Qwen/Qwen3-Reranker-4B
Method: LoRA (rank=16, alpha=16, dropout=0.05)
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training data: A mixed reranker corpus consisting of CoREB v202602, CodeSearchNet (code-to-code, code-to-text, text-to-code), APPS, CosQA, and CodeFeedback (single-turn and multi-turn). Each record is normalized into binary reranking examples (instruction, query, document, yes/no). Positives are duplicated twice; one easy negative and one hard negative are sampled per record.
Evaluation data: CoREB v202603 (problem-disjoint from CoREB v202602 training split; covers a different contest time window)
Training samples: ~3.1M binary reranking examples across text-to-code, code-to-text, and code-to-code tasks
Top-k retrieval for reranking: 128

Usage

CoREB-Reranker follows the same usage pattern as Qwen3-Reranker. Given a query and a list of candidate documents from first-stage retrieval, the reranker scores each query-document pair:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "hq-bench/coreb-code-reranker"
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)

# Format as Qwen3-Reranker input
query = "binary search implementation"
document = "def binary_search(arr, target):\n    lo, hi = 0, len(arr) - 1\n    ..."

prefix = '<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'
suffix = "<|im_end|>\n<|im_start|>assistant\n"
instruct = "Given a code search query, does the following code snippet match the query intent?"

prompt = f"{prefix}<Instruct>: {instruct}\n<Query>: {query}\n<Document>: {document}{suffix}"

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model(**inputs)

# Score is the logit difference between "yes" and "no" tokens
yes_id = tokenizer.convert_tokens_to_ids("yes")
no_id = tokenizer.convert_tokens_to_ids("no")
logits = outputs.logits[0, -1, :]
score = logits[yes_id] - logits[no_id]
print(f"Relevance score: {score.item():.4f}")

For batch reranking with the CoREB evaluation pipeline, see the CoREB repository.

Citation

@article{xue2026coreb,
  title={Beyond Retrieval: A Multitask Benchmark and Reranker for Code Search},
  author={Xue, Siqiao and Liao, Zihan and Qin, Jin and Zhang, Ziyin and Mu, Yixiang and Zhou, Fan and Yu, Hang},
  journal={arXiv preprint arXiv:2605.04615},
  year={2026},
  url={https://arxiv.org/abs/2605.04615}
}