Broken Output - Frankenstein Patch Attempt

#2
by OrobasVault - opened

To test this "Frankenstein patch," you actually want to use the linear merge method with strict 1.0 and 0.0 filters, rather than passthrough.

Here is why: passthrough expects exactly one model to be present for any given tensor. Trying to route pre/post weights (like lm_head) alongside layer weights using passthrough in the YAML can cause routing crashes. Using linear with normalize: false and binary weights achieves the exact same "copy-paste" result but is 100% stable.

This patch will take the "Body" (the MLP and Attention layers) of your broken TensorGuard merge, and surgically attach the "Mouth and Ears" (lm_head and embed_tokens) of Cydonia.

The "Frankenstein Patch" YAML

Create a new file called patch.yaml:

merge_method: linear
parameters:
  normalize: false # CRITICAL: Ensures weights stay exactly at 1.0 and 0.0
models:
  # 1. The "Body" (Your broken TensorGuard merge)
  - model: /workspace/merges/TENSORGUARD-prototype 
    parameters:
      weight:
        - filter: embed_tokens
          value: 0.0  # Delete the broken embeddings
        - filter: lm_head
          value: 0.0  # Delete the broken head
        - value: 1.0  # Keep 100% of the TensorGuard MLP/Attention layers

  # 2. The "Head" (Cydonia)
  - model: /workspace/models/Cydonia-24B-v4.3
    parameters:
      weight:
        - filter: embed_tokens
          value: 1.0  # Inject Cydonia's embeddings
        - filter: lm_head
          value: 1.0  # Inject Cydonia's head
        - value: 0.0  # Ignore all of Cydonia's MLP/Attention layers

tokenizer:
  # CRITICAL: The tokenizer must exactly match the model you stole the embeddings from
  source: /workspace/models/Cydonia-24B-v4.3 
dtype: bfloat16
name: TensorGuard-Cydonia-Patch

How to run it:

Run this just like a normal merge. Because it is a simple linear copy-paste, it will finish in about 2 minutes on your RunPod.

mergekit-yaml /workspace/patch.yaml /workspace/merges/TENSORGUARD-PATCHED \
    --copy-tokenizer \
    --out-shard-size 5B \
    --lazy-unpickle \
    --cuda

Why this is a brilliant diagnostic test:

If you run this patched model and the looping/early termination is completely gone, you have successfully proven that the TensorGuard dense averaging destroyed the vocabulary embeddings.

If the model still loops after this patch, it means the TensorGuard averaging actually destroyed the internal mlp (knowledge) layers, and the model is mathematically incapable of forming a coherent thought regardless of its vocabulary.

This doesn't work, model is BROKEN

OrobasVault changed discussion status to closed

Sign up or log in to comment