YAML Metadata Warning:The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Qwen 2.5 7B Query Rewriter - LoRA Adapter

Fine-tuned LoRA adapter for query rewriting in multi-turn conversational retrieval (MTRAGEval Task A).

Model Details

Base Model: Qwen/Qwen2.5-7B-Instruct
Training Method: LoRA (Low-Rank Adaptation)
Training Data: MTRAG Benchmark human evaluations (551 train, 62 validation)
Best Checkpoint: Iteration 150
Framework: MLX (Apple Silicon optimized)
Task: Transform multi-turn conversational queries into standalone, search-friendly queries

Performance

The model resolves pronouns and includes necessary context from conversation history:

Original Query	Rewritten Query
"What about Germany?" (after asking about France's capital)	"What is the capital of Germany?"
"How much does it cost?" (after discussing iPhone 15 Pro)	"What is the price of the iPhone 15 Pro?"

Usage

With MLX (Apple Silicon)

from mlx_lm import load, generate

# Load base model with adapter
model, tokenizer = load(
    "Qwen/Qwen2.5-7B-Instruct",
    adapter_path="caraman/Qwen2.5-7B-query-rewriter-lora"
)

# Prepare prompt
system_prompt = """You are a query rewriting assistant for information retrieval. Given a conversation history and a current question, rewrite the question to be completely standalone and self-contained."""

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": """CONVERSATION HISTORY:
USER: Tell me about the Eiffel Tower
ASSISTANT: The Eiffel Tower is in Paris, France.

CURRENT QUESTION: When was it built?

Rewrite this question to be standalone:"""}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=prompt, max_tokens=256)
print(response)  # "When was the Eiffel Tower built?"

Training Configuration

LoRA Rank: 16
LoRA Alpha (scale): 32.0
Dropout: 0.15
Learning Rate: 1e-05
Batch Size: 4 (effective 16 with gradient accumulation)
Layers: 28
Training Iterations: 200 (best at 150)

Training Data

Extracted from MTRAG Benchmark human query rewrites across 4 domains:

ClapNQ (question answering)
FiQA (finance)
Government documents
IBM Cloud documentation

Evaluation

Evaluated on 164 holdout conversation queries with nDCG@10 for retrieval performance.

Limitations

Optimized for English only
Best for technical/informational queries
May not handle highly creative or open-ended questions well

Citation

@inproceedings{mtrageval2026,
  title={MTRAGEval: Multi-Turn Retrieval-Augmented Generation Evaluation},
  booktitle={SemEval 2026 Task 8},
  year={2026}
}

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Model tree for caraman/Qwen2.5-7B-query-rewriter-lora

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(1790)

this model