Qwen 2.5 7B Query Rewriter (Final) -- SemEval-2026 Task 8

Fine-tuned Qwen 2.5 7B Instruct model for query rewriting in multi-turn conversational retrieval. This is the final model used in our SemEval-2026 Task 8 (MTRAGEval) submission, achieving nDCG@5 of 0.531 (8th/38 systems).

This model has the LoRA adapter weights fused into the base model for direct inference without adapter loading.

Model Details

Base Model: Qwen/Qwen2.5-7B-Instruct
Training Method: LoRA (Low-Rank Adaptation), weights fused into base model
Training Data: 777 query rewriting examples from MTRAGEval gold rewrites (all training + holdout combined)
Training Iterations: 500
Framework: MLX (Apple Silicon optimized)
Task: Transform multi-turn conversational queries into standalone, search-friendly queries

Training Configuration

Parameter	Value
LoRA Rank	16
LoRA Alpha	32
LoRA Dropout	0.15
Target Modules	q/k/v/o_proj, gate/up/down_proj
Layers Adapted	28 (all)
Trainable Params	40.4M (0.53%)
Optimizer	AdamW
Learning Rate	1e-5
Weight Decay	0.01
Batch Size	2 (effective 16 with grad accumulation 8)
Max Sequence Length	2048
Gradient Checkpointing	Yes
Precision	bf16
Seed	42

Usage

With Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("caraman/Qwen2.5-7B-mtrag-query-rewriter-final")
tokenizer = AutoTokenizer.from_pretrained("caraman/Qwen2.5-7B-mtrag-query-rewriter-final")

system_prompt = """You are a query rewriting assistant for information retrieval. Given a conversation history and a current question, rewrite the question to be completely standalone and self-contained.

Rules:
1. Resolve all pronouns (it, they, this, that) to their explicit referents
2. Include relevant context from the conversation that's needed to understand the query
3. Keep the rewritten query concise and search-friendly
4. Do not add information not present in the conversation
5. If the question is already standalone, return it unchanged"""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": """CONVERSATION HISTORY:
USER: Tell me about the Eiffel Tower
ASSISTANT: The Eiffel Tower is a wrought-iron lattice tower in Paris, France.

CURRENT QUESTION: When was it built?

Rewrite this question to be standalone:"""}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.2)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
# Expected: "When was the Eiffel Tower built?"

With MLX (Apple Silicon)

from mlx_lm import load, generate

model, tokenizer = load("caraman/Qwen2.5-7B-mtrag-query-rewriter-final")
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=256, temp=0.2)

Domain-Specific Temperature

For optimal performance, use domain-specific temperatures:

Domain	Temperature	Description
Cloud (technical docs)	0.0	Deterministic -- preserves exact technical terms
ClapNQ (Wikipedia)	0.2	Minimal diversity for well-structured queries
FiQA (financial forums)	0.3	Slight exploration for ambiguous queries
Govt (government docs)	0.3	Slight exploration for policy terminology

Performance

Part of a three-stage pipeline (query rewriting + hybrid BM25/dense retrieval + cross-encoder reranking):

Metric	Development Holdout	Official Test
nDCG@5	0.422 (uniform t=0.2)	0.531
Rank	--	8th / 38 systems

Query rewriting alone provides a 13.7% relative gain over no-rewriting baseline (nDCG@5: 0.371 -> 0.422).

License

Apache 2.0

Downloads last month: 18

Safetensors

Model size

8B params

Tensor type

BF16

MLX

Hardware compatibility

Quantized

Model tree for caraman/Qwen2.5-7B-mtrag-query-rewriter-final

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(1804)

this model