YAML Metadata Warning:The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other
Qwen 2.5 7B Query Rewriter - LoRA Adapter
Fine-tuned LoRA adapter for query rewriting in multi-turn conversational retrieval (MTRAGEval Task A).
Model Details
- Base Model: Qwen/Qwen2.5-7B-Instruct
- Training Method: LoRA (Low-Rank Adaptation)
- Training Data: MTRAG Benchmark human evaluations (551 train, 62 validation)
- Best Checkpoint: Iteration 150
- Framework: MLX (Apple Silicon optimized)
- Task: Transform multi-turn conversational queries into standalone, search-friendly queries
Performance
The model resolves pronouns and includes necessary context from conversation history:
| Original Query | Rewritten Query |
|---|---|
| "What about Germany?" (after asking about France's capital) | "What is the capital of Germany?" |
| "How much does it cost?" (after discussing iPhone 15 Pro) | "What is the price of the iPhone 15 Pro?" |
Usage
With MLX (Apple Silicon)
from mlx_lm import load, generate
# Load base model with adapter
model, tokenizer = load(
"Qwen/Qwen2.5-7B-Instruct",
adapter_path="caraman/Qwen2.5-7B-query-rewriter-lora"
)
# Prepare prompt
system_prompt = """You are a query rewriting assistant for information retrieval. Given a conversation history and a current question, rewrite the question to be completely standalone and self-contained."""
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": """CONVERSATION HISTORY:
USER: Tell me about the Eiffel Tower
ASSISTANT: The Eiffel Tower is in Paris, France.
CURRENT QUESTION: When was it built?
Rewrite this question to be standalone:"""}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response) # "When was the Eiffel Tower built?"
Training Configuration
- LoRA Rank: 16
- LoRA Alpha (scale): 32.0
- Dropout: 0.15
- Learning Rate: 1e-05
- Batch Size: 4 (effective 16 with gradient accumulation)
- Layers: 28
- Training Iterations: 200 (best at 150)
Training Data
Extracted from MTRAG Benchmark human query rewrites across 4 domains:
- ClapNQ (question answering)
- FiQA (finance)
- Government documents
- IBM Cloud documentation
Evaluation
Evaluated on 164 holdout conversation queries with nDCG@10 for retrieval performance.
Limitations
- Optimized for English only
- Best for technical/informational queries
- May not handle highly creative or open-ended questions well
Citation
@inproceedings{mtrageval2026,
title={MTRAGEval: Multi-Turn Retrieval-Augmented Generation Evaluation},
booktitle={SemEval 2026 Task 8},
year={2026}
}
License
Apache 2.0
Quantized