Instructions to use Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1")
model = AutoModelForCausalLM.from_pretrained("Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1

SGLang

How to use Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1 with Docker Model Runner:
```
docker model run hf.co/Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1
```

Qwen2.5-7B-Instruct-borg-merge-v1

A training-free cross-family weight merge of Qwen2.5-7B-Instruct with 8 donors from 4 architecture families. Lifts GSM8K +3.3 pp, ARC-Challenge +3.2 pp, and IFEval +2.6 pp absolute over the unmerged anchor. No fine-tuning. No distillation. No router. Drop-in safetensors.

Task	Anchor SOLO	This model	Δ
GSM8K (`exact_match,strict-match`)	0.8120	0.8446	+0.0326
ARC-Challenge (`acc_norm,none`)	0.5256	0.5572	+0.0316
IFEval (`inst_level_strict_acc,none`)	0.6547	0.6811	+0.0264
MMLU (`acc,none`)	0.7180	0.7094	-0.0086
TruthfulQA mc2 (`acc,none`)	0.6475	0.6285	-0.0190
HellaSwag (`acc,none`)	0.6895	0.6830	-0.0065
PIQA (`acc,none`)	0.8030	0.8014	-0.0016
HumanEval (`pass@1,greedy`)	0.6463	0.5854	-0.0610

Lifts on 3 of 8 standard benchmarks vs. the unmerged anchor -- on the tasks where the donor pool is competence-concentrated (instruction following + broad reasoning). Regresses on HumanEval, where the donor pool was code-light by design. The regression structure is itself a falsifiable prediction about the recipe.

Quick start

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1",
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1")

prompt = "Q: What is 17 multiplied by 23? Show your work.\nA:"
ids = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**ids, max_new_tokens=128, do_sample=False)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Compatible with vLLM, llama.cpp (after GGUF conversion), text-generation-inference, text-generation-webui, and any standard HuggingFace inference stack.

What's special about this merge

Cross-family weight merging across architecture families (Llama, Phi, NeoX, OPT) is conventionally considered impossible -- different attention head dimensions, different FFN expansion factors, different vocabularies. A naive linear interpolation between, say, a Qwen attention block and a Mistral attention block does not even type-check.

This model is the result of a training-free pipeline that solves this:

Canonicalize each donor's tensors into a shared key namespace via per-architecture detectors (10 architecture families covered: BERT, RoBERTa, Llama/Qwen, Mistral, Pythia, OPT, Phi, T5, w2v-bert, and more).
Procrustes-align each donor's basis to the anchor via per-tensor orthogonal rotation (smaller-side SVD).
Compute donor deltas in canonical space; filter via per-role tolerance (asymmetric: τ_attn=0.05, τ_ffn=0.20); keep top-3 SVD components.
Absorb the rotated, filtered, low-rank delta into the anchor with anchor blend β=0.60.
Decanonicalize to the anchor's native key namespace; save as standard safetensors.

This is the asymmetric tolerance recipe: tight on attention to preserve circuits, loose on FFN to absorb knowledge.

Donor pool (8 donors, 4 architecture families)

Source	Family	License
Qwen/Qwen2.5-7B-Instruct (anchor)	Qwen / Llama-arch	Apache 2.0
mistralai/Mistral-7B-Instruct-v0.3	Mistral / Llama-arch	Apache 2.0
microsoft/Phi-3-mini-4k-instruct	Phi (new)	MIT
microsoft/phi-2	Phi (old)	MIT
HuggingFaceTB/SmolLM2-1.7B-Instruct	Llama-arch (small)	Apache 2.0
ibm-granite/granite-3.0-2b-instruct	Llama-arch (Granite tweaks)	Apache 2.0
EleutherAI/pythia-2.8b	NeoX	Apache 2.0
EleutherAI/pythia-1.4b	NeoX	Apache 2.0
facebook/opt-2.7b	OPT	OPT license

Verification

Cross-run reproducibility: an independent preflight evaluation two days prior to the headline run produces byte-identical scores to all 16 decimal places across every overlapping (variant, task) cell. The merge is fully deterministic.
Pre-flight gates: G1 round-trip across all 6 cross-family canonicalization tests reports r_max=0.0, n_bad=0 (lossless canonical key namespace). G3 multi-seed slice-bias on the anchor MMLU 200-sample slice returns 0.7480126320374605 to 16 decimal places across seeds 7, 42, 1337. G4 anchor MMLU full matches the published Qwen2.5-7B-Instruct leaderboard reference.
Behavioural inspection: 5 reasoning-heavy prompts (math word problem, French translation, long-multiplication, recursive Fibonacci, factual enumeration) produce coherent, instruction-following, mathematically-correct output with no gibberish, no tokenizer drift, no instruction-format collapse.
Eval framework: lm-eval-harness 0.4.4 with transformers 4.55.0, tokenizers 0.21.4, datasets >=2.20 <4.0, fp16, batch 2, single A100 80GB.

Comparison to recent work in the model-merging landscape

For a comprehensive map of model-merging methods, theory, and applications, see Yang et al.'s curated survey Awesome-Model-Merging-Methods-Theories-Applications (forthcoming ACM Computing Surveys 2026).

Closest direct relatives:

Transport and Merge (Cui et al., Feb 2026) -- cross-architecture merging via activation-space optimal transport. Different problem class: theirs produces a runtime-aligned composition; this model is a permanent merged checkpoint.
Unconstrained Model Merging for Enhanced LLM Reasoning (Zhang et al., Oct 2024) -- closest direct relative on substrate scale (7B-class) and donor count (9 reasoning-optimized LLMs). The result above extends this lineage with absolute benchmark deltas against a state-competitive instruction-tuned anchor.
Git Re-Basin (Ainsworth, Hayase & Srinivasa, ICLR 2023) -- same-architecture merging modulo permutation symmetries. The pipeline above is essentially the cross-architecture generalization (continuous Procrustes rotation rather than discrete permutation matching).
OT-Fusion (Singh & Jaggi, NeurIPS 2020) -- same-architecture optimal transport on weight rows. Spiritual ancestor of Cui et al.'s 2026 cross-architecture extension.
REPAIR (Jordan et al., 2022) -- re-normalization to address variance collapse after permutation interpolation. The pipeline above sidesteps this by using anchor-plus-delta absorption rather than midpoint interpolation.

Limitations

Code generation regresses by 6.10 pp on HumanEval. The donor pool was reasoning-heavy and instruction-tuned; it contained no code-specialist models (CodeLlama, StarCoder, Qwen2.5-Coder). Documented as falsifiable prediction: a code-heavy donor pool should restore HumanEval while preserving the GSM8K, ARC-Challenge, and IFEval gains. This is the explicit subject of the next research cycle.
Mild MMLU regression (-0.86 pp). The merge trades some broad knowledge for instruction-following + reasoning concentration. Within typical eval noise on TruthfulQA mc2 (-0.19), HellaSwag (-0.07), PIQA (-0.02).
Single substrate tested: results are on Qwen2.5-7B-Instruct. Generalization to other instruction-tuned 7B-class anchors (Llama-3.1-8B-Instruct, Mistral-7B-Instruct-v0.3 as anchor, etc.) is the next experiment.
HumanEval pass@1 measured via custom isolated-subprocess scorer, not via lm-eval (the pinned lm-eval-harness 0.4.4 does not ship the humaneval task). Greedy decoding, 164 problems, no temperature sweep. Identical methodology to bigcode-evaluation-harness with subprocess-isolated test execution.

Intended use

Research and evaluation of cross-family weight-merging techniques.
Drop-in replacement for Qwen/Qwen2.5-7B-Instruct in workflows where the trade-off (GSM8K / ARC-Challenge / IFEval lifts vs. mild HumanEval regression) is favorable.
Compatible with vLLM, llama.cpp (after GGUF conversion), TGI, text-generation-webui, and any standard HuggingFace inference stack.

Out of scope

Code generation as primary use case -- use Qwen/Qwen2.5-Coder-7B-Instruct instead, or wait for the next merge variant which targets a code-heavy donor pool.
Production deployment without your own evaluation on your specific task distribution.

Citation

If you use this model, please cite:

@misc{borg-merge-v1-2026,
  title  = {Conflict-Free Replicated Datatypes for Neural Network Model Merging},
  author = {Optitransfer},
  year   = {2026},
  url    = {https://huggingface.co/Optitransfer/Qwen2.5-7B-Instruct-borg-merge-v1}
}

Contact

rgillespie83@icloud.com
data@optitransfer.ch

For arXiv endorsement requests on the full technical paper covering cross-family weight merging (cs.LG / secondary cs.CL): same contacts, subject line "arXiv endorsement: cross-family weight merging".