Erik Scholz

Green-Sky

Green-Sky

AI & ML interests

None yet

Recent Activity

liked a model about 18 hours ago

yifanyu/I-DLM-32B

liked a model about 18 hours ago

yifanyu/I-DLM-8B

repliedto eaddario's post 1 day ago

Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.5-4B and Qwen/Qwen3.5-9B Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target. Key Advantages: - VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM). - Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs. Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards https://huggingface.co/eaddario/Qwen3.5-4B-GGUF https://huggingface.co/eaddario/Qwen3.5-9B-GGUF

View all activity

Organizations

liked 2 models about 18 hours ago

yifanyu/I-DLM-32B

Text Generation • 33B • Updated about 5 hours ago • 424 • 3

yifanyu/I-DLM-8B

Text Generation • 8B • Updated about 5 hours ago • 654 • 7

repliedto eaddario's post 1 day ago

Bro tip: the mmlu dataset for llama.cpp is pretty bad, you can use https://huggingface.co/datasets/Green-Sky/mmlu-redux-2.0-for-llama.cpp/blob/main/mmlu-redux-2-ok%2Bexpert.bin instead. The data is both of higher quality (mmlu redux based) AND the context is better. I give it all choices and then let it decide with the ABCD letter.

While looking at the original mmlu conversion for llama.cpp i noticed that some answers are like "both a and c" or similar, so they should never be probably for a model that did not get fed all the choices in the first place.

reactedto eaddario's post with 👍 1 day ago

Post

124

Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.5-4B and Qwen/Qwen3.5-9B

Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target.

Key Advantages:
- VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM).
- Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs.

Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards

eaddario/Qwen3.5-4B-GGUF
eaddario/Qwen3.5-9B-GGUF