SearchLM: RLHF-Trained Search Query Generator
This model is a fine-tuned version of Qwen/Qwen2.5-3B-Instruct trained using Group Relative Policy Optimization (GRPO) with verifiable rewards for generating better boolean search queries.
Model Description
SearchLM uses Reinforcement Learning with Verifiable Rewards (RLVR) to train language models to generate effective boolean search queries for information retrieval tasks. The model is optimized using real search evaluation metrics (NDCG and MRR) as rewards.
- Base Model: Qwen/Qwen2.5-3B-Instruct
- Training Method: Group Relative Policy Optimization (GRPO)
- Checkpoint: final
- Reward Function: Weighted combination of NDCG and MRR from actual search results
- Datasets: NFCorpus and SciFact (MTEB)
- Task: Boolean search query generation
Training Details
Training Configuration
- Learning Rate: 1e-06
- Epochs: 3
- Batch Size: 1 (colocate) / 8 (server)
- Gradient Accumulation: 16 (colocate) / 4 (server)
- Precision: bf16
- Gradient Checkpointing: True
- Max New Tokens: 2048
- Num Generations: 2
Reward Function
- NDCG Weight: 0.6
- MRR Weight: 0.4
- Evaluation K: 100
Evaluation Results
Performance comparison between the base model and RLHF-trained model across SciFact and NFCorpus datasets. Results show mean ± standard deviation across 3 evaluation runs.
SciFact Dataset
| Model | NDCG@10 | NDCG@100 | MRR | MAP | Precision@10 | Recall@10 |
|---|---|---|---|---|---|---|
| Base (Qwen2.5-3B-Instruct) | 0.1644 ± 0.0204 | 0.1718 ± 0.0156 | 0.1523 ± 0.0183 | 0.1502 ± 0.0162 | 0.0413 ± 0.0069 | 0.2016 ± 0.0279 |
| RLHF (searchlm-qwen2.5-3b-rlhf) | 0.6512 ± 0.0040 | 0.6696 ± 0.0032 | 0.6092 ± 0.0045 | 0.6009 ± 0.0041 | 0.0870 ± 0.0005 | 0.7839 ± 0.0025 |
| Improvement | +296.0% | +289.7% | +300.1% | +300.1% | +110.6% | +288.8% |
NFCorpus Dataset
| Model | NDCG@10 | NDCG@100 | MRR | MAP | Precision@10 | Recall@10 |
|---|---|---|---|---|---|---|
| Base (Qwen2.5-3B-Instruct) | 0.3345 ± 0.0076 | 0.3355 ± 0.0080 | 0.3197 ± 0.0081 | 0.2834 ± 0.0035 | 0.2093 ± 0.0102 | 0.0853 ± 0.0008 |
| RLHF (searchlm-qwen2.5-3b-rlhf) | 0.5502 ± 0.0031 | 0.5448 ± 0.0017 | 0.5338 ± 0.0050 | 0.4114 ± 0.0021 | 0.2982 ± 0.0024 | 0.1577 ± 0.0003 |
| Improvement | +64.5% | +62.4% | +66.9% | +45.2% | +42.5% | +84.9% |
Usage
Prerequisites
uv add transformers torch vllm trl[vllm] datasets omegaconf
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Supreeth/searchlm-qwen2.5-3b-rlhf"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# System prompt for query generation
system_prompt = """You are a search expert. Given a user's information need, generate an effective boolean search query using AND, OR, NOT operators and parentheses for grouping. The query should be precise and retrieve relevant documents.
Guidelines:
- Use AND to require multiple terms
- Use OR for synonyms or alternatives
- Use NOT to exclude irrelevant terms
- Use parentheses for grouping complex logic
- Keep queries focused and not overly complex
Format your response with the query inside <query></query> tags."""
# User query
user_query = "What are the latest treatments for Type 2 diabetes?"
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query}
]
# Generate search query
inputs = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True
)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Using with vLLM (Recommended)
from vllm import LLM, SamplingParams
model_name = "Supreeth/searchlm-qwen2.5-3b-rlhf"
llm = LLM(model=model_name)
system_prompt = """You are a search expert. Given a user's information need, generate an effective boolean search query..."""
prompts = [
f"<|im_start|>system\n{system_prompt}<|im_end|>\n<|im_start|>user\nWhat are the latest treatments for Type 2 diabetes?<|im_end|>\n<|im_start|>assistant\n"
]
sampling_params = SamplingParams(temperature=0.7, max_tokens=1024)
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
print(output.outputs[0].text)
Evaluation
The model is evaluated on standard information retrieval datasets (NFCorpus and SciFact) using the following metrics:
- NDCG@10, NDCG@100: Normalized Discounted Cumulative Gain
- MRR: Mean Reciprocal Rank
- Precision@10: Precision at top 10 results
- Recall@10: Recall at top 10 results
- MAP: Mean Average Precision
Training Data
The model was trained on:
- NFCorpus: Medical information retrieval dataset
- SciFact: Scientific fact-checking dataset
Both datasets are from the MTEB (Massive Text Embedding Benchmark) collection.
Limitations and Bias
- The model is specifically trained for scientific and medical domains (NFCorpus and SciFact)
- Performance may vary on other domains
- Boolean query syntax is optimized for full-text search engines (e.g., Tantivy)
- Generated queries may need domain-specific tuning for production use
Citation
If you use this model, please cite:
@misc{searchlm2025,
author = {Supreeth Rao},
title = {SearchLM: Reinforcement Learning with Verifiable Rewards for Search Query Generation},
year = {2025},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/Supreeth/searchlm-qwen2.5-3b-rlhf}}
}
License
MIT License
Contact
For questions or issues:
- GitHub: SearchLM Repository
Acknowledgments
- Base model: Qwen/Qwen2.5-3B-Instruct
- Training framework: TRL (Transformer Reinforcement Learning)
- Inference engine: vLLM
- Search engine: Tantivy
- Downloads last month
- 1