DeepWater Pleroma broken prototype - 90 model merge

This merges 90 Mistral Nemo models into one, but its bugged.

These are broken (uploaded for archive). I recommend using the polished version at EldritchLabs/DeepWater-Pleroma-12B-v1

Polished version not available yet

I tested a lot of ideas including a custom patcher script that replaced inf/nan sections with base_model vectors. After HEALING I tested SLERP, Karcher, and Arcee Fusion (with different Tukey Fence values) and it was still ruined.

Anyway here is the healer script in case anyone wants to try patching this model. Something must be done with the tokenizers too but I'm not sure what.

If you run the healer script first you can get the model to quantize, but it still generates endless repition midway through regardless of chat template.

For now I just say this merge is failed and test some other combinations next, maybe in smaller chunks (based on tokenizers) to see what works.

more info https://huggingface.co/EldritchLabs/Kraken-Stock-12B-v1/discussions/4

import torch
from safetensors.torch import load_file, save_file
import os
import gc

# Configuration
base_path = r'B:\12B\models--mistralai--Mistral-Nemo-Instruct-2407'
broken_path = r'C:\Quanter\model_cache\EldritchLabs__DeepWater-Pleroma-12B-v1'
output_path = r'C:\Quanter\model_cache\EldritchLabs__DeepWater-Pleroma-12B-v1\DeepWater-Healed'
os.makedirs(output_path, exist_ok=True)

print("Step 1: Indexing base model shards...")
base_map = {}
base_files = [f for f in os.listdir(base_path) if f.endswith('.safetensors')]
for bf in base_files:
    # We only load the header to index keys (fast)
    from safetensors import safe_open
    with safe_open(os.path.join(base_path, bf), framework="pt") as f:
        for k in f.keys():
            base_map[k] = bf

def get_base_tensor(name):
    """Helper to find and load a specific tensor from the base model."""
    if name not in base_map:
        return None
    target_file = os.path.join(base_path, base_map[name])
    sd = load_file(target_file)
    return sd[name].clone().detach()

print("Step 2: Healing broken shards...")
broken_files = [f for f in os.listdir(broken_path) if f.endswith('.safetensors')]

for bf in broken_files:
    print(f"Processing {bf}...")
    broken_file_path = os.path.join(broken_path, bf)
    
    # Load broken shard into RAM and break disk link
    mmap_broken = load_file(broken_file_path)
    broken_sd = {k: v.clone().detach() for k, v in mmap_broken.items()}
    del mmap_broken
    gc.collect()
    
    shard_healed = False
    for k in list(broken_sd.keys()):
        broken_t = broken_sd[k]
        
        # Check for inf/nan
        invalid_mask = ~torch.isfinite(broken_t)
        if invalid_mask.any():
            num_broken = torch.sum(invalid_mask).item()
            print(f"  !! Found {num_broken} inf/nan in {k}. Fetching base weights...")
            
            base_t = get_base_tensor(k)
            if base_t is not None:
                # Ensure shapes match (handle potential mergekit resizing)
                if base_t.shape != broken_t.shape:
                    print(f"     Shape mismatch for {k}: Base {base_t.shape} vs Broken {broken_t.shape}. Skipping.")
                    continue
                
                # Heal: Keep broken where finite, take base where infinite
                broken_sd[k] = torch.where(invalid_mask, base_t, broken_t)
                shard_healed = True
                del base_t
            else:
                print(f"     Warning: {k} not found in base model. Cannot heal.")

    # Save the healed shard to the NEW directory
    save_file(broken_sd, os.path.join(output_path, bf))
    print(f"  Saved {bf}")
    
    del broken_sd
    gc.collect()

print("\nHealing complete. Output saved to:", output_path)
models:
  - model: B:\12B\models--aixonlab--Aether-12b
  - model: B:\12B\models--aixonlab--Zinakha-12b
  - model: B:\12B\models--allura-org--Bigger-Body-12b
  - model: B:\12B\models--allura-org--MN-12b-RP-Ink
  - model: B:\12B\models--allura-org--remnant-mn-12b
  - model: B:\12B\models--allura-org--Tlacuilo-12B
  - model: B:\12B\models--anthracite-org--magnum-v4-12b
  - model: B:\12B\models--ArliAI--Mistral-Nemo-12B-ArliAI-RPMax-v1.2
  - model: B:\12B\models--axolotl-ai-co--romulus-mistral-nemo-12b-simpo
  - model: B:\12B\models--Babsie--Opulus-12B-v3
  - model: B:\12B\models--BeaverAI--mistral-doryV2-12b
  - model: B:\12B\models--BeaverAI--MN-2407-DSK-QwQify-v0.1-12B
  - model: B:\12B\models--cgato--Nemo-12b-Humanize-KTO-Experimental-Latest
  - model: B:\12B\models--cgato--Nemo-12b-Humanize-SFT-v0.2.5-KTO
  - model: B:\12B\models--crestf411--MN-Slush
  - model: B:\12B\models--crestf411--nemo-sunfall-v0.6.1
  - model: B:\12B\models--D1rtyB1rd--Egregore-Alice-RP-NSFW-12B
  - model: B:\12B\models--Delta-Vector--Francois-PE-V2-Huali-12B
  - model: B:\12B\models--Delta-Vector--Ohashi-NeMo-12B
  - model: B:\12B\models--Delta-Vector--Rei-V3-KTO-12B
  - model: B:\12B\models--dphn--dolphin-2.9.3-mistral-nemo-12b
  - model: B:\12B\models--EldritchLabs--Altair-Stock-12B-v1
  - model: B:\12B\models--elinas--Chronos-Gold-12B-1.0
  - model: B:\12B\models--Elizezen--Himeyuri-v0.1-12B
  - model: B:\12B\models--Epiculous--Azure_Dusk-v0.2
  - model: B:\12B\models--Epiculous--Crimson_Dawn-v0.2
  - model: B:\12B\models--EpistemeAI2--Fireball-Mistral-Nemo-12B-Philos
  - model: B:\12B\models--EpistemeAI--Mistral-Nemo-Instruct-12B-Philosophy-Math
  - model: B:\12B\models--Fizzarolli--MN-12b-Rosier-v1
  - model: B:\12B\models--Fizzarolli--MN-12b-Sunrose
  - model: B:\12B\models--flammenai--Flammades-Mistral-Nemo-12B
  - model: B:\12B\models--flammenai--Mahou-1.5-mistral-nemo-12B
  - model: B:\12B\models--GreenerPastures--Golden-Curry-12B
  - model: B:\12B\models--Gryphe--Pantheon-RP-1.5-12b-Nemo
  - model: B:\12B\models--Gryphe--Pantheon-RP-1.6.1-12b-Nemo
  - model: B:\12B\models--HumanLLMs--Human-Like-Mistral-Nemo-Instruct-2407
  - model: B:\12B\models--IIEleven11--Kalypso
  - model: B:\12B\models--inflatebot--MN-12B-Mag-Mell-R1
  - model: B:\12B\models--intervitens--mini-magnum-12b-v1.1
  - model: B:\12B\models--jtatman--mistral_nemo_12b_reasoning_psychology_lora
  - model: B:\12B\models--KOOWEEYUS--BlackSheep-RP-12B
  - model: B:\12B\models--Lambent--Arsenic-Shahrazad-12B-v2
  - model: B:\12B\models--Lambent--Arsenic-Shahrazad-12B-v3
  - model: B:\12B\models--Lambent--arsenic-nemo-unleashed-12B
  - model: B:\12B\models--Lambent--Gilded-Arsenic-12B
  - model: B:\12B\models--LatitudeGames--Muse-12B
  - model: B:\12B\models--LatitudeGames--Wayfarer-12B
  - model: B:\12B\models--LatitudeGames--Wayfarer-2-12B
  - model: B:\12B\models--MarinaraSpaghetti--NemoMix-Unleashed-12B
  - model: B:\12B\models--migtissera--Tess-3-Mistral-Nemo-12B
  - model: B:\12B\models--mistralai--Mistral-Nemo-Instruct-2407
  - model: B:\12B\models--mpasila--Mistral-freeLiPPA-LoRA-12B
#  - model: B:\12B\models--nbeerbower--Denker-mistral-nemo-12B
#  - model: B:\12B\models--nbeerbower--Lyra-Gutenberg-mistral-nemo-12B
#  - model: B:\12B\models--nbeerbower--Lyra4-Gutenberg-12B
#  - model: B:\12B\models--nbeerbower--Merlina-ORPO-12B
#  - model: B:\12B\models--nbeerbower--mistral-nemo-bophades-12B
#  - model: B:\12B\models--nbeerbower--mistral-nemo-cc-12B
#  - model: B:\12B\models--nbeerbower--mistral-nemo-gutenberg-12B-v4
#  - model: B:\12B\models--nbeerbower--Mistral-Nemo-Gutenberg-Doppel-12B
#  - model: B:\12B\models--nbeerbower--Mistral-Nemo-Gutenberg-Vitus-12B
#  - model: B:\12B\models--nbeerbower--mistral-nemo-kartoffel-12B
#  - model: B:\12B\models--nbeerbower--Mistral-Nemo-Prism-12B
#  - model: B:\12B\models--nbeerbower--mistral-nemo-wissenschaft-12B
  - model: B:\12B\models--MuXodious--Irix-12B-Model_Stock-absolute-heresy
  - model: B:\12B\models--NeverSleepHistorical--lumi-nemo-e2.0
  - model: B:\12B\models--NeverSleep--Lumimaid-v0.2-12B
  - model: B:\12B\models--nothingiisreal--MN-12B-Celeste-V1.9
  - model: B:\12B\models--p-e-w--Mistral-Nemo-Instruct-2407-heretic-noslop
  - model: B:\12B\models--PocketDoc--Dans-DangerousWinds-V1.1.0-12b
  - model: B:\12B\models--PocketDoc--Dans-PersonalityEngine-V1.1.0-12b
  - model: B:\12B\models--PocketDoc--Dans-PersonalityEngine-V1.3.0-12b
  - model: B:\12B\models--PocketDoc--Dans-SakuraKaze-V1.0.0-12b
  - model: B:\12B\models--PygmalionAI--Eleusis-12B
  - model: B:\12B\models--PygmalionAI--Pygmalion-3-12B
  - model: B:\12B\models--rAIfle--Questionable-MN-bf16
  - model: B:\12B\models--ReadyArt--Dark-Nexus-12B-v2.0
  - model: B:\12B\models--ReadyArt--Forgotten-Safeword-12B-v4.0
  - model: B:\12B\models--ReadyArt--Omega-Darker_The-Final-Directive-12B
#  - model: B:\12B\models--ReadyArt--Safeword-Casual-V1-12B
#  - model: B:\12B\models--ReadyArt--The-Omega-Directive-M-12B-Unslop-v2.0
#  - model: B:\12B\models--RicardoEstep--RPBizkit-v5-12B-Lorablated
  - model: B:\12B\models--romaingrx--red-teamer-mistral-nemo
  - model: B:\12B\models--Sao10K--MN-12B-Lyra-v1
  - model: B:\12B\models--Sao10K--MN-12B-Lyra-v4
  - model: B:\12B\models--Sao10K--MN-12B-Vespa-x1
  - model: B:\12B\models--Sao10K--MN-BackyardAI-Party-12B-v1
  - model: B:\12B\models--shisa-ai--shisa-v2-mistral-nemo-12b
  - model: B:\12B\models--SicariusSicariiStuff--Angelic_Eclipse_12B
  - model: B:\12B\models--SicariusSicariiStuff--Impish_Bloodmoon_12B
  - model: B:\12B\models--SicariusSicariiStuff--Impish_Longtail_12B
  - model: B:\12B\models--SicariusSicariiStuff--Impish_Nemo_12B
  - model: B:\12B\models--SicariusSicariiStuff--Sweet_Dreams_12B
  - model: B:\12B\models--sleepdeprived3--Christian-Bible-Expert-v2.0-12B
  - model: B:\12B\models--SuperbEmphasis--MN-12b-RP-Ink-RP-Longform
  - model: B:\12B\models--SuperbEmphasis--Omega-Darker_The-Final-Directive-Longform-Stage2-ERP-12B-v0.2
  - model: B:\12B\models--TheDrummer--Rivermind-Lux-12B-v1
  - model: B:\12B\models--TheDrummer--Rocinante-12B-v1
  - model: B:\12B\models--TheDrummer--Rocinante-12B-v1.1
  - model: B:\12B\models--TheDrummer--Rocinante-X-12B-v1
  - model: B:\12B\models--TheDrummer--UnslopNemo-12B-v4.1
  - model: B:\12B\models--Trappu--Nemo-Picaro-12B
  - model: B:\12B\models--Undi95--LocalC-12B-e2.0
  - model: B:\12B\models--UsernameJustAnother--Nemo-12B-Marlin-v8
  - model: B:\12B\models--VAGOsolutions--SauerkrautLM-Nemo-12b-Instruct
merge_method: karcher
base_model: B:\12B\models--SicariusSicariiStuff--Sweet_Dreams_12B
parameters:
  tol: 1e-9
  max_iter: 300
dtype: float32
out_dtype: bfloat16
tokenizer:
  source: base
chat_template: auto
Downloads last month
32
Safetensors
Model size
12B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Naphula-Archives/DeepWater-Pleroma-12B-v0-raw-weights

Merge model
this model