Spaces:

Naphula
/

model_tools

Running

App Files Files Community

model_tools / README.md

Naphula

Update README.md

0caf794 verified 21 days ago

preview code

raw

history blame contribute delete

11.6 kB

metadata

title: Model Tools
emoji: 📚
colorFrom: pink
colorTo: yellow
sdk: static
pinned: false

Model Tools by Naphula

Tools to enhance LLM quantizations and merging. Merge and audit large language models on low VRAM GPUs.

graph_v18.py

Merge models in minutes instead of hours on low VRAM. For a 3060/3060 Ti user: This script enables functionality that is otherwise impossible (merging 70B models or large 7B merges with --cuda) without OOM. More details here
Update: v18 is much faster than v4 and replaces the trial-and-error loop with an adaptive math-based calculator (using GrimJim's measure.py logic)

config.py

Simply replace line 13 | BEFORE ScalarOrGradient: TypeAlias = Union[float, List[float]] → AFTER ScalarOrGradient: TypeAlias = Union[float, List[float], str, bool] | to allow for custom filepath strings within parameter settings.

embed_12B.py and embed_24B.py

This is an alternate solution in cases where --fix-mistral-regex and tokensurgeon fail, such as della or passthrough merges between models with mismatched vocab_size. Read the guide here, download either file and save it as mergekit-main\mergekit\tokenizer\embed.py. Attached is one for Mistral Nemo 12B (v2d), and another for Mistral Small 24B (v2a).
I noticed that sometimes the default embed.py works best so keep a copy of that too, and if it fails for some reason try the 12B or 24B version.

enable_fix_mistral_regex_true.md

Merge models with extreme tokenizer incompatibility. Requires modifying the mergekit.yaml tokenizer section and adding --fix-mistral-regex to your merge commands. (Note: Do not use token_surgeon.py, gen_id_patcher.py, or vocab_id_patcher.py with this, they are obsolete now.) Configured for MN 12B by default. Follow the steps in this guide to modify these scripts:
mergekit/merge.py
mergekit/options.py
mergekit/scripts/moe.py
mergekit/scripts/tokensurgeon.py
mergekit/tokenizer/build.py

audit_della.py

Audit the compatibility of donor models for Della merges before merging. See: example chart Asmodeus, example log Asmodeus, example chart Slimaki, example log Slimaki

audit_karcher.py

Audit the compatibility of donor models for Karcher merges before merging. See: example chart Goetia

generalized_task_arithmetic.py

Live audit reports of actual contribution magnitude on a per-layer basis for Della merges. See: example audit Asmodeus, example audit Slimaki

model_stock.py

Live audit reports of actual contribution magnitude on a per-layer basis for Model_Stock merges.

metadata_audit.py

Checks multiple models within subdirectories for vocab or rope mismatch (useful for large merges). Calibrated for Mistral Nemo 12B by default.

llama moe

Add support for Llama Mixture of Experts. If you want to merge custom Llama MoE you can add these scripts to your mergekit environment:
mergekit-main\mergekit\architecture\moe_defs.py
mergekit-main\mergekit_init_.py
mergekit-main\mergekit\moe\llama.py
Then assign the num_experts_per_tok in config.json (or the config.yaml)

tokensurgeon.py

Uses adaptive VRAM from Grim Jim's measure.py like graph_v18 to prevent OOM. Use recommended batch file here or modify sh. This avoids 'Potemkin village' fake patches like gen_id_patcher and vocab_id_patcher. For this to work properly, you must also run shield_embeddings.py and shield_norms.py on any merges made from models patched with with tokensurgeon.

tokeninspector.py

Audit your tokensurgeon results.

arcee_fusion_salience_scanner.py

Scan the salience % of your arcee_fusion merges. The default tukey_fence value is 1.5 which results in 12.5% salience, but this can be adjusted (see guide here).
Updated version here arcee_fusion_salience_scanner_v3.py

eos_scanner.py

Updated! This tool scans the tokenizer jsons to detect any mismatches with EOS tokens, which cause early termination bugs. You can then use the gen_id_patcher.py and vocab_id_patcher.py, or the chatml_to_mistral.py to patch missing generation_config.json files for EOS token. See this post as well as the EOS Scanner ReadMe for more info.

weight_counter.py

This counts the number of models in a yaml and adds up the total weight values. Useful for large della/ties merges.

fp32_to_bf16.py

Converts FP32 to BF16 safetensors

fp32_to_fp16.py

Converts FP32 to FP16 safetensors

pytorch_to_safetensors.py

Converts pytorch bin to safetensors format

textonly_ripper_v2.py

Converts a sharded, multimodal (text and vision) model into a text-only version. Readme at textonly_ripper.md

json_reverter.py

Revert changes to all JSON files done by gen_id_patcher.py, vocab_id_patcher.py or other scripts, within a specified root folder. It re-downloads the source files from the HF repo.

vocab_resizer.py

Converts models with larger vocab_sizes to a standard size (default 131072 Mistral 24B) for use with mergekit. Note that tokenizer.model must be manually copied into the /fixed/ folder.

lm_head_remover.py

This script will load a "fat" 18.9GB model (default Gemma 9B), force it to tie the weights (deduplicating the lm_head), and re-save it. This will drop the file size to ~17.2GB and make it compatible with the others.

model_index_json_generator.py

Generates a missing model.safetensors.index.json file. Useful for cases where safetensors may have been sharded at the wrong size. Single tensor variant here.

folder_content_combiner_anyfiles.py

Combines all files in the script's current directory into a single output file, sorted alphabetically.

folder+subfolder_content_combiner_anyfiles.py

Combines all files in the script's directory, including all files within subdirectories (excluding blacklisted formats) into a single output file, sorted alphabetically.

GGUF Repo Suite

Create and quantize Hugging Face models

Markdown Viewer

Portable Offline Markdown Viewer

Markdown to SMF

Converts a Markdown string to an SMF-compatible BBCode string. Not perfect—sometimes misses double bold tags.

Quant Clone

A tool which allows you to recreate UD quants such as Q8_K_XL. Examples: Mistral 24B, Mistral 7B

Text Analysis Suite v1.5

Analyze text files with advanced metrics

Not Functional

Failed Experiment gguf_to_safetensors_v2.py

Unsuccessful attempt by Gemini to patch the gguf_to_safetensors script. Missing json files are hard to reconstruct. Also see safetensors_meta_ripper_v1.py and tokenizer_ripper_v1.py

IQ5_NL.md

Note: Not functional yet. Includes the code needed to quantize IQ5_NL GGUFs using block size 32.