LIMO: Less is More for Reasoning
Paper • 2502.03387 • Published • 62
This is the merged version of the LIMO fine-tuned Qwen3-8B model for mathematical reasoning.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
"Cbgcbg/limo-qwen3-8b-math-merged",
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"Cbgcbg/limo-qwen3-8b-math-merged",
trust_remote_code=True
)
# Format input
messages = [
{"role": "system", "content": "Please reason step by step, and put your final answer within \\boxed{}."},
{"role": "user", "content": "What is the sum of the first 10 positive integers?"}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
# Generate
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)
This merged model maintains the same mathematical reasoning capabilities as the LoRA adapter but is easier to use and compatible with more tools.
| Feature | LoRA Adapter | Merged Model |
|---|---|---|
| Model Size | ~100MB | ~16GB |
| Loading Speed | Requires base model | Direct loading |
| Compatibility | Limited | Full transformers support |
| Benchmarking | Needs special handling | Works with lighteval |
@misc{ye2025limoreasoning,
title={LIMO: Less is More for Reasoning},
author={Yixin Ye and Zhen Huang and Yang Xiao and Ethan Chern and Shijie Xia and Pengfei Liu},
year={2025},
eprint={2502.03387},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.03387},
}