Qwen 2.5 7B Query Rewriter (Final) -- SemEval-2026 Task 8
Fine-tuned Qwen 2.5 7B Instruct model for query rewriting in multi-turn conversational retrieval. This is the final model used in our SemEval-2026 Task 8 (MTRAGEval) submission, achieving nDCG@5 of 0.531 (8th/38 systems).
This model has the LoRA adapter weights fused into the base model for direct inference without adapter loading.
Model Details
- Base Model: Qwen/Qwen2.5-7B-Instruct
- Training Method: LoRA (Low-Rank Adaptation), weights fused into base model
- Training Data: 777 query rewriting examples from MTRAGEval gold rewrites (all training + holdout combined)
- Training Iterations: 500
- Framework: MLX (Apple Silicon optimized)
- Task: Transform multi-turn conversational queries into standalone, search-friendly queries
Training Configuration
| Parameter | Value |
|---|---|
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| LoRA Dropout | 0.15 |
| Target Modules | q/k/v/o_proj, gate/up/down_proj |
| Layers Adapted | 28 (all) |
| Trainable Params | 40.4M (0.53%) |
| Optimizer | AdamW |
| Learning Rate | 1e-5 |
| Weight Decay | 0.01 |
| Batch Size | 2 (effective 16 with grad accumulation 8) |
| Max Sequence Length | 2048 |
| Gradient Checkpointing | Yes |
| Precision | bf16 |
| Seed | 42 |
Usage
With Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("caraman/Qwen2.5-7B-mtrag-query-rewriter-final")
tokenizer = AutoTokenizer.from_pretrained("caraman/Qwen2.5-7B-mtrag-query-rewriter-final")
system_prompt = """You are a query rewriting assistant for information retrieval. Given a conversation history and a current question, rewrite the question to be completely standalone and self-contained.
Rules:
1. Resolve all pronouns (it, they, this, that) to their explicit referents
2. Include relevant context from the conversation that's needed to understand the query
3. Keep the rewritten query concise and search-friendly
4. Do not add information not present in the conversation
5. If the question is already standalone, return it unchanged"""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": """CONVERSATION HISTORY:
USER: Tell me about the Eiffel Tower
ASSISTANT: The Eiffel Tower is a wrought-iron lattice tower in Paris, France.
CURRENT QUESTION: When was it built?
Rewrite this question to be standalone:"""}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.2)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
# Expected: "When was the Eiffel Tower built?"
With MLX (Apple Silicon)
from mlx_lm import load, generate
model, tokenizer = load("caraman/Qwen2.5-7B-mtrag-query-rewriter-final")
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=256, temp=0.2)
Domain-Specific Temperature
For optimal performance, use domain-specific temperatures:
| Domain | Temperature | Description |
|---|---|---|
| Cloud (technical docs) | 0.0 | Deterministic -- preserves exact technical terms |
| ClapNQ (Wikipedia) | 0.2 | Minimal diversity for well-structured queries |
| FiQA (financial forums) | 0.3 | Slight exploration for ambiguous queries |
| Govt (government docs) | 0.3 | Slight exploration for policy terminology |
Performance
Part of a three-stage pipeline (query rewriting + hybrid BM25/dense retrieval + cross-encoder reranking):
| Metric | Development Holdout | Official Test |
|---|---|---|
| nDCG@5 | 0.422 (uniform t=0.2) | 0.531 |
| Rank | -- | 8th / 38 systems |
Query rewriting alone provides a 13.7% relative gain over no-rewriting baseline (nDCG@5: 0.371 -> 0.422).
License
Apache 2.0
- Downloads last month
- 18
Model size
8B params
Tensor type
BF16
·
Hardware compatibility
Log In to add your hardware
Quantized