Instructions to use Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1") model = AutoModelForCausalLM.from_pretrained("Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1
- SGLang
How to use Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 with Docker Model Runner:
docker model run hf.co/Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1
Qwen2.5-7B-Instruct-borg-merge-v1
A training-free cross-family weight merge of Qwen2.5-7B-Instruct with 8 donors from 4 architecture families. Lifts GSM8K +3.3 pp, ARC-Challenge +3.2 pp, and IFEval +2.6 pp absolute over the unmerged anchor. No fine-tuning. No distillation. No router. Drop-in safetensors.
| Task | Anchor SOLO | This model | Δ |
|---|---|---|---|
GSM8K (exact_match,strict-match) |
0.8120 | 0.8446 | +0.0326 |
ARC-Challenge (acc_norm,none) |
0.5256 | 0.5572 | +0.0316 |
IFEval (inst_level_strict_acc,none) |
0.6547 | 0.6811 | +0.0264 |
MMLU (acc,none) |
0.7180 | 0.7094 | -0.0086 |
TruthfulQA mc2 (acc,none) |
0.6475 | 0.6285 | -0.0190 |
HellaSwag (acc,none) |
0.6895 | 0.6830 | -0.0065 |
PIQA (acc,none) |
0.8030 | 0.8014 | -0.0016 |
HumanEval (pass@1,greedy) |
0.6463 | 0.5854 | -0.0610 |
Lifts on 3 of 8 standard benchmarks vs. the unmerged anchor -- on the tasks where the donor pool is competence-concentrated (instruction following + broad reasoning). Regresses on HumanEval, where the donor pool was code-light by design. The regression structure is itself a falsifiable prediction about the recipe.
Quick start
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1",
torch_dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1")
prompt = "Q: What is 17 multiplied by 23? Show your work.\nA:"
ids = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=128, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Compatible with vLLM, llama.cpp (after GGUF conversion), text-generation-inference, text-generation-webui, and any standard HuggingFace inference stack.
What's special about this merge
Cross-family weight merging across architecture families (Llama, Phi, NeoX, OPT) is conventionally considered impossible -- different attention head dimensions, different FFN expansion factors, different vocabularies. A naive linear interpolation between, say, a Qwen attention block and a Mistral attention block does not even type-check.
This model is the result of a training-free pipeline that solves this:
- Canonicalize each donor's tensors into a shared key namespace via per-architecture detectors (10 architecture families covered: BERT, RoBERTa, Llama/Qwen, Mistral, Pythia, OPT, Phi, T5, w2v-bert, and more).
- Procrustes-align each donor's basis to the anchor via per-tensor orthogonal rotation (smaller-side SVD).
- Compute donor deltas in canonical space; filter via per-role tolerance (asymmetric:
τ_attn=0.05,τ_ffn=0.20); keep top-3 SVD components. - Absorb the rotated, filtered, low-rank delta into the anchor with anchor blend
β=0.60. - Decanonicalize to the anchor's native key namespace; save as standard
safetensors.
This is the asymmetric tolerance recipe: tight on attention to preserve circuits, loose on FFN to absorb knowledge.
Donor pool (8 donors, 4 architecture families)
| Source | Family | License |
|---|---|---|
| Qwen/Qwen2.5-7B-Instruct (anchor) | Qwen / Llama-arch | Apache 2.0 |
| mistralai/Mistral-7B-Instruct-v0.3 | Mistral / Llama-arch | Apache 2.0 |
| microsoft/Phi-3-mini-4k-instruct | Phi (new) | MIT |
| microsoft/phi-2 | Phi (old) | MIT |
| HuggingFaceTB/SmolLM2-1.7B-Instruct | Llama-arch (small) | Apache 2.0 |
| ibm-granite/granite-3.0-2b-instruct | Llama-arch (Granite tweaks) | Apache 2.0 |
| EleutherAI/pythia-2.8b | NeoX | Apache 2.0 |
| EleutherAI/pythia-1.4b | NeoX | Apache 2.0 |
| facebook/opt-2.7b | OPT | OPT license |
Verification
- Cross-run reproducibility: an independent preflight evaluation two days prior to the headline run produces byte-identical scores to all 16 decimal places across every overlapping (variant, task) cell. The merge is fully deterministic.
- Pre-flight gates: G1 round-trip across all 6 cross-family canonicalization tests reports
r_max=0.0,n_bad=0(lossless canonical key namespace). G3 multi-seed slice-bias on the anchor MMLU 200-sample slice returns0.7480126320374605to 16 decimal places across seeds 7, 42, 1337. G4 anchor MMLU full matches the published Qwen2.5-7B-Instruct leaderboard reference. - Behavioural inspection: 5 reasoning-heavy prompts (math word problem, French translation, long-multiplication, recursive Fibonacci, factual enumeration) produce coherent, instruction-following, mathematically-correct output with no gibberish, no tokenizer drift, no instruction-format collapse.
- Eval framework:
lm-eval-harness0.4.4 withtransformers4.55.0,tokenizers0.21.4,datasets>=2.20 <4.0, fp16, batch 2, single A100 80GB.
Comparison to recent work in the model-merging landscape
For a comprehensive map of model-merging methods, theory, and applications, see Yang et al.'s curated survey Awesome-Model-Merging-Methods-Theories-Applications (forthcoming ACM Computing Surveys 2026).
Closest direct relatives:
- Transport and Merge (Cui et al., Feb 2026) -- cross-architecture merging via activation-space optimal transport. Different problem class: theirs produces a runtime-aligned composition; this model is a permanent merged checkpoint.
- Unconstrained Model Merging for Enhanced LLM Reasoning (Zhang et al., Oct 2024) -- closest direct relative on substrate scale (7B-class) and donor count (9 reasoning-optimized LLMs). The result above extends this lineage with absolute benchmark deltas against a state-competitive instruction-tuned anchor.
- Git Re-Basin (Ainsworth, Hayase & Srinivasa, ICLR 2023) -- same-architecture merging modulo permutation symmetries. The pipeline above is essentially the cross-architecture generalization (continuous Procrustes rotation rather than discrete permutation matching).
- OT-Fusion (Singh & Jaggi, NeurIPS 2020) -- same-architecture optimal transport on weight rows. Spiritual ancestor of Cui et al.'s 2026 cross-architecture extension.
- REPAIR (Jordan et al., 2022) -- re-normalization to address variance collapse after permutation interpolation. The pipeline above sidesteps this by using anchor-plus-delta absorption rather than midpoint interpolation.
Limitations
- Code generation regresses by 6.10 pp on HumanEval. The donor pool was reasoning-heavy and instruction-tuned; it contained no code-specialist models (CodeLlama, StarCoder, Qwen2.5-Coder). Documented as falsifiable prediction: a code-heavy donor pool should restore HumanEval while preserving the GSM8K, ARC-Challenge, and IFEval gains. This is the explicit subject of the next research cycle.
- Mild MMLU regression (-0.86 pp). The merge trades some broad knowledge for instruction-following + reasoning concentration. Within typical eval noise on TruthfulQA mc2 (-0.19), HellaSwag (-0.07), PIQA (-0.02).
- Single substrate tested: results are on Qwen2.5-7B-Instruct. Generalization to other instruction-tuned 7B-class anchors (Llama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.3 as anchor, etc.) is the next experiment.
- HumanEval pass@1 measured via custom isolated-subprocess scorer, not via lm-eval (the pinned
lm-eval-harness 0.4.4does not ship the humaneval task). Greedy decoding, 164 problems, no temperature sweep. Identical methodology to bigcode-evaluation-harness with subprocess-isolated test execution.
Intended use
- Research and evaluation of cross-family weight-merging techniques.
- Drop-in replacement for
Qwen/Qwen2.5-7B-Instructin workflows where the trade-off (GSM8K / ARC-Challenge / IFEval lifts vs. mild HumanEval regression) is favorable. - Compatible with vLLM, llama.cpp (after GGUF conversion), TGI, text-generation-webui, and any standard HuggingFace inference stack.
Out of scope
- Code generation as primary use case -- use
Qwen/Qwen2.5-Coder-7B-Instructinstead, or wait for the next merge variant which targets a code-heavy donor pool. - Production deployment without your own evaluation on your specific task distribution.
Citation
If you use this model, please cite:
@misc{borg-merge-v1-2026,
title = {Conflict-Free Replicated Datatypes for Neural Network Model Merging},
author = {Optitransfer},
year = {2026},
url = {https://huggingface.co/Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1}
}
Contact
rgillespie83@icloud.comdata@optitransfer.ch
For arXiv endorsement requests on the full technical paper covering cross-family weight merging (cs.LG / secondary cs.CL): same contacts, subject line "arXiv endorsement: cross-family weight merging".
- Downloads last month
- 207
Model tree for Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1
Space using Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 1
Evaluation results
- exact_match (strict-match) on GSM8Ktest set self-reported0.845
- acc_norm on ARC-Challengetest set self-reported0.557
- instruction-level strict accuracy on IFEvaltest set self-reported0.681
- acc on MMLUtest set self-reported0.709
- mc2 on TruthfulQA mc2validation set self-reported0.628
- acc on HellaSwagvalidation set self-reported0.683
- acc on PIQAvalidation set self-reported0.801
- pass@1 (greedy) on HumanEvaltest set self-reported0.585