Ed Addario PRO

eaddario

EAddario

AI & ML interests

Finding ways to optimize LLMs' inference performance in resource-constrained environments (e.g. commodity hardware, desktops, laptops, mobiles, edge devices, etc.)

Recent Activity

repliedto their post about 3 hours ago

Experimental global target bits‑per‑weight quantization of Qwen/Qwen3.5-4B and Qwen/Qwen3.5-9B Unlike standard llama.cpp quantizations that rely on fixed type heuristics (e.g., Q4_K_M), the Target BPW approach optimizes per-tensor precision where it matters the most, and produces high quality models that meet a precise global file size target. Key Advantages: - VRAM Maximization: Can generate high quality models sized exactly to fit hardware constraints (e.g., fitting the model into exactly 24GB VRAM). - Data-Driven Precision: Quantization mix is determined by actual weight error sensitivity rather than hardcoded rules, often yielding better PPL/KLD size trade-offs. Full benchmarks (PPL, KLD, ARC, MMLU, etc.) and methodology in the models' cards https://huggingface.co/eaddario/Qwen3.5-4B-GGUF https://huggingface.co/eaddario/Qwen3.5-9B-GGUF

new activity about 3 hours ago

eaddario/imatrix-calibration:Great collection, I'm using it for my little project.

repliedto their post 2 days ago

View all activity

Organizations

liked 2 models 9 months ago

dphn/Dolphin-Mistral-24B-Venice-Edition

Text Generation • 24B • Updated Sep 8, 2025 • 84.2k • • 483

marcelbinz/Llama-3.1-Centaur-70B

Text Generation • 71B • Updated Jul 1, 2025 • 754 • 82

liked 5 datasets 11 months ago

liked a model about 1 year ago

MadeAgents/Hammer2.1-7b

Updated Jun 12, 2025 • 268 • 32

Ed Addario PRO

AI & ML interests

Recent Activity

Organizations

eaddario's activity