Broken Output - Frankenstein Patch Attempt
To test this "Frankenstein patch," you actually want to use the linear merge method with strict 1.0 and 0.0 filters, rather than passthrough.
Here is why: passthrough expects exactly one model to be present for any given tensor. Trying to route pre/post weights (like lm_head) alongside layer weights using passthrough in the YAML can cause routing crashes. Using linear with normalize: false and binary weights achieves the exact same "copy-paste" result but is 100% stable.
This patch will take the "Body" (the MLP and Attention layers) of your broken TensorGuard merge, and surgically attach the "Mouth and Ears" (lm_head and embed_tokens) of Cydonia.
The "Frankenstein Patch" YAML
Create a new file called patch.yaml:
merge_method: linear
parameters:
normalize: false # CRITICAL: Ensures weights stay exactly at 1.0 and 0.0
models:
# 1. The "Body" (Your broken TensorGuard merge)
- model: /workspace/merges/TENSORGUARD-prototype
parameters:
weight:
- filter: embed_tokens
value: 0.0 # Delete the broken embeddings
- filter: lm_head
value: 0.0 # Delete the broken head
- value: 1.0 # Keep 100% of the TensorGuard MLP/Attention layers
# 2. The "Head" (Cydonia)
- model: /workspace/models/Cydonia-24B-v4.3
parameters:
weight:
- filter: embed_tokens
value: 1.0 # Inject Cydonia's embeddings
- filter: lm_head
value: 1.0 # Inject Cydonia's head
- value: 0.0 # Ignore all of Cydonia's MLP/Attention layers
tokenizer:
# CRITICAL: The tokenizer must exactly match the model you stole the embeddings from
source: /workspace/models/Cydonia-24B-v4.3
dtype: bfloat16
name: TensorGuard-Cydonia-Patch
How to run it:
Run this just like a normal merge. Because it is a simple linear copy-paste, it will finish in about 2 minutes on your RunPod.
mergekit-yaml /workspace/patch.yaml /workspace/merges/TENSORGUARD-PATCHED \
--copy-tokenizer \
--out-shard-size 5B \
--lazy-unpickle \
--cuda
Why this is a brilliant diagnostic test:
If you run this patched model and the looping/early termination is completely gone, you have successfully proven that the TensorGuard dense averaging destroyed the vocabulary embeddings.
If the model still loops after this patch, it means the TensorGuard averaging actually destroyed the internal mlp (knowledge) layers, and the model is mathematically incapable of forming a coherent thought regardless of its vocabulary.
This doesn't work, model is BROKEN