Corrupted upload

#3
by SicariusSicariiStuff - opened

Thank you for everyone for reporting the broken files issues, and sorry for the inconvenience.

What happening is this- even after reuploading, the FP16 got broken, I don't know why they get corrupted.

The GGUF were verified to be fine βœ…

I'll reupload everything else, and verify by hand, and report back.

SOME FILES MISSING, WILL UPDATE

I know this is annoying, it annoys me too :\

UPDATE:

Tested the FP16, quanted it to make sure it works, it is βœ…
Uploading it now.

Thanks for re-uploading. Are you aware that some model-files are missing?

Oh god, yes, I see it now. For fuck sake.
I used HF and it still skipped a few. This is getting ridiculous.
I'll add them later on 2day, on the roads right now.

looks good now.

i've no idea what was this insane streak of what seems like bad luck with the upload, but thankfully everything was reuploaded and seems fine now.

i need a vacation.

SicariusSicariiStuff changed discussion status to closed

Can confirm this latest batch of 14 shards quantizes properly while the 50 shard version has NAN corruption.

--- Scanning MERGED_MODEL ---
Checking model-00028-of-00030.safetensors:  83%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ      | 20/24 [00:06<00:01,  3.02it/s]
[!] NaN DETECTED: model.layers.77.mlp.up_proj.weight in model-00028-of-00030.safetensors
    Max Value: nan
Checking model-00029-of-00030.safetensors:  50%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                   | 13/26 [00:03<00:03,  4.12it/s]
[!] NaN DETECTED: model.layers.79.mlp.gate_proj.weight in model-00029-of-00030.safetensors
    Max Value: nan
Result: MERGED_MODEL has 2 corrupted tensors.

--- Scanning ASSISTANT_PEPE ---
Checking model-00048-of-00050.safetensors:   6%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                                                                                                   | 1/16 [00:02<00:41,  2.79s/it]
[!] NaN DETECTED: model.layers.77.mlp.up_proj.weight in model-00048-of-00050.safetensors
    Max Value: nan
Checking model-00049-of-00050.safetensors:   8%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                                                                                                | 1/12 [00:03<00:33,  3.06s/it]
[!] NaN DETECTED: model.layers.79.mlp.gate_proj.weight in model-00049-of-00050.safetensors
    Max Value: nan
Result: ASSISTANT_PEPE has 2 corrupted tensors.
--- Scanning ASSISTANT_PEPE ---
Result: ASSISTANT_PEPE is CLEAN.

I ran into this problem while merging and created a "sanity scanner". Remerging with clean pepe fixes the issues.

import torch
from safetensors import safe_open
import os
import glob
import re
from tqdm import tqdm

# --- CONFIGURATION ---
models_to_scan = {
    "MERGED_MODEL": r"B:\70B\v1_della",
    "ASSISTANT_PEPE": r"B:\70B\SicariusSicariiStuff--Assistant_Pepe_70B",
}
# ---------------------

def scan_model(name, path):
    print(f"\n--- Scanning {name} ---")
    if not os.path.exists(path):
        print(f"Skipping: Path not found: {path}")
        return

    files = glob.glob(os.path.join(path, "*.safetensors"))
    if not files:
        print(f"No safetensors found in {path}")
        return

    issues_found = 0
    for f in files:
        with safe_open(f, framework="pt", device="cpu") as st:
            for key in tqdm(st.keys(), desc=f"Checking {os.path.basename(f)}", leave=False):
                # --- LAYER 70+ FILTER ---
                # Matches 'layers.N' or 'blk.N'
                layer_match = re.search(r'\.(?:layers|blk)\.(\d+)\.', key)
                if layer_match:
                    layer_num = int(layer_match.group(1))
                    if layer_num < 60:
                        continue # Skip early layers to save time
                # ------------------------
                
                tensor = st.get_tensor(key)
                
                has_nan = torch.isnan(tensor).any()
                has_inf = torch.isinf(tensor).any()
                
                if has_nan or has_inf:
                    problem = "NaN" if has_nan else "INF"
                    print(f"\n[!] {problem} DETECTED: {key} in {os.path.basename(f)}")
                    # Check max value to see if it's a blowout
                    print(f"    Max Value: {tensor.abs().max().item()}")
                    issues_found += 1
    
    if issues_found == 0:
        print(f"Result: {name} is CLEAN.")
    else:
        print(f"Result: {name} has {issues_found} corrupted tensors.")

if __name__ == "__main__":
    for name, path in models_to_scan.items():
        scan_model(name, path)

Sign up or log in to comment