Quantized version of black-forest-labs/FLUX.2-klein-9B made using sd.cpp.

Brain-dead mixed tensor selection of quantization based on the amount of element of the associated tensor:

Finding the distribution of layer profile:

from safetensors.torch import load_file
k = load_file("flux-2-klein-9b.safetensors")
keys = list(k.keys())

distrib = []
for key in keys:
  n = k[key].nelement()
  if n not in distrib:
    distrib.append(n)

distrib.sort()

print(distrib)

Making the CLI parameter for sd-cli.exe:

repart = {}
repart["524288"] = "q4_K"
repart["1048576"] = "q4_K"
repart["16777216"] = "q3_K"
repart["33554432"] = "q3_K"
repart["50331648"] = "q3_K"
repart["100663296"] = "q2_K"
repart["150994944"] = "q2_K"

tensor_type_rules = []
for key in keys:
  n = str(k[key].nelement())
  if n in repart.keys():
    tensor_type_rules.append(f"{key}={repart[n]}")

print(",".join(tensor_type_rules))

Resulting command:

sd-cli.exe --mode convert --model "flux-2-klein-9b.safetensors" --output "flux-2-klein-9b-Q3_K_M.gguf" --tensor-type-rules "double_blocks.0.img_attn.proj.weight=q3_K,double_blocks.0.img_attn.qkv.weight=q3_K,double_blocks.0.img_mlp.0.weight=q2_K,double_blocks.0.img_mlp.2.weight=q3_K,double_blocks.0.txt_attn.proj.weight=q3_K,double_blocks.0.txt_attn.qkv.weight=q3_K,double_blocks.0.txt_mlp.0.weight=q2_K,double_blocks.0.txt_mlp.2.weight=q3_K,double_blocks.1.img_attn.proj.weight=q3_K,double_blocks.1.img_attn.qkv.weight=q3_K,double_blocks.1.img_mlp.0.weight=q2_K,double_blocks.1.img_mlp.2.weight=q3_K,double_blocks.1.txt_attn.proj.weight=q3_K,double_blocks.1.txt_attn.qkv.weight=q3_K,double_blocks.1.txt_mlp.0.weight=q2_K,double_blocks.1.txt_mlp.2.weight=q3_K,double_blocks.2.img_attn.proj.weight=q3_K,double_blocks.2.img_attn.qkv.weight=q3_K,double_blocks.2.img_mlp.0.weight=q2_K,double_blocks.2.img_mlp.2.weight=q3_K,double_blocks.2.txt_attn.proj.weight=q3_K,double_blocks.2.txt_attn.qkv.weight=q3_K,double_blocks.2.txt_mlp.0.weight=q2_K,double_blocks.2.txt_mlp.2.weight=q3_K,double_blocks.3.img_attn.proj.weight=q3_K,double_blocks.3.img_attn.qkv.weight=q3_K,double_blocks.3.img_mlp.0.weight=q2_K,double_blocks.3.img_mlp.2.weight=q3_K,double_blocks.3.txt_attn.proj.weight=q3_K,double_blocks.3.txt_attn.qkv.weight=q3_K,double_blocks.3.txt_mlp.0.weight=q2_K,double_blocks.3.txt_mlp.2.weight=q3_K,double_blocks.4.img_attn.proj.weight=q3_K,double_blocks.4.img_attn.qkv.weight=q3_K,double_blocks.4.img_mlp.0.weight=q2_K,double_blocks.4.img_mlp.2.weight=q3_K,double_blocks.4.txt_attn.proj.weight=q3_K,double_blocks.4.txt_attn.qkv.weight=q3_K,double_blocks.4.txt_mlp.0.weight=q2_K,double_blocks.4.txt_mlp.2.weight=q3_K,double_blocks.5.img_attn.proj.weight=q3_K,double_blocks.5.img_attn.qkv.weight=q3_K,double_blocks.5.img_mlp.0.weight=q2_K,double_blocks.5.img_mlp.2.weight=q3_K,double_blocks.5.txt_attn.proj.weight=q3_K,double_blocks.5.txt_attn.qkv.weight=q3_K,double_blocks.5.txt_mlp.0.weight=q2_K,double_blocks.5.txt_mlp.2.weight=q3_K,double_blocks.6.img_attn.proj.weight=q3_K,double_blocks.6.img_attn.qkv.weight=q3_K,double_blocks.6.img_mlp.0.weight=q2_K,double_blocks.6.img_mlp.2.weight=q3_K,double_blocks.6.txt_attn.proj.weight=q3_K,double_blocks.6.txt_attn.qkv.weight=q3_K,double_blocks.6.txt_mlp.0.weight=q2_K,double_blocks.6.txt_mlp.2.weight=q3_K,double_blocks.7.img_attn.proj.weight=q3_K,double_blocks.7.img_attn.qkv.weight=q3_K,double_blocks.7.img_mlp.0.weight=q2_K,double_blocks.7.img_mlp.2.weight=q3_K,double_blocks.7.txt_attn.proj.weight=q3_K,double_blocks.7.txt_attn.qkv.weight=q3_K,double_blocks.7.txt_mlp.0.weight=q2_K,double_blocks.7.txt_mlp.2.weight=q3_K,double_stream_modulation_img.lin.weight=q2_K,double_stream_modulation_txt.lin.weight=q2_K,final_layer.adaLN_modulation.1.weight=q3_K,final_layer.linear.weight=q4_K,img_in.weight=q4_K,single_blocks.0.linear1.weight=q2_K,single_blocks.1.linear1.weight=q2_K,single_blocks.10.linear1.weight=q2_K,single_blocks.11.linear1.weight=q2_K,single_blocks.12.linear1.weight=q2_K,single_blocks.13.linear1.weight=q2_K,single_blocks.14.linear1.weight=q2_K,single_blocks.15.linear1.weight=q2_K,single_blocks.16.linear1.weight=q2_K,single_blocks.17.linear1.weight=q2_K,single_blocks.18.linear1.weight=q2_K,single_blocks.19.linear1.weight=q2_K,single_blocks.2.linear1.weight=q2_K,single_blocks.20.linear1.weight=q2_K,single_blocks.21.linear1.weight=q2_K,single_blocks.22.linear1.weight=q2_K,single_blocks.23.linear1.weight=q2_K,single_blocks.3.linear1.weight=q2_K,single_blocks.4.linear1.weight=q2_K,single_blocks.5.linear1.weight=q2_K,single_blocks.6.linear1.weight=q2_K,single_blocks.7.linear1.weight=q2_K,single_blocks.8.linear1.weight=q2_K,single_blocks.9.linear1.weight=q2_K,single_stream_modulation.lin.weight=q3_K,time_in.in_layer.weight=q4_K,time_in.out_layer.weight=q3_K,txt_in.weight=q3_K"
Downloads last month
44
GGUF
Model size
9B params
Architecture
Hardware compatibility
Log In to add your hardware

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for n-Arno/FLUX.2-klein-base-9B-GGUF

Quantized
(19)
this model