Spaces:

bluemoonsoldout
/

llm-cal

Running

llm-cal / README.md

GitHub Actions

Auto-deploy from GitHub Actions

cc6274a 27 days ago

1.36 kB

	---
	title: llm-cal
	emoji: 🧮
	colorFrom: indigo
	colorTo: blue
	sdk: gradio
	sdk_version: 6.13.0
	app_file: app.py
	pinned: false
	license: apache-2.0
	short_description: LLM inference sizing — honest, architecture-aware
	---

	# llm-cal — LLM inference hardware calculator

	Web UI for [`llm-cal`](https://github.com/FlyTOmeLight/llm-cal). Pick a model, pick a GPU, get a hardware plan.

	Architecture-aware (MLA, NSA, CSA+HCA, MoE, sliding window). Engine-aware (vLLM, SGLang). Honest-labeled — every number carries a provenance tag (`[verified]` / `[inferred]` / `[estimated]` / `[cited]` / `[unverified]` / `[unknown]`).

	## The story this Space exists to tell

	`gpu_poor` reports DeepSeek-V4-Flash as 284 GB by assuming pure FP8. The real safetensors weight is 160 GB — it ships an FP4+FP8 mixed pack. `llm-cal` reads the actual on-disk dtype (per-tensor metadata + MX block-scaled scale tensors) and gets 160.01 GB at 0.2% error.

	That's the whole pitch.

	## Local

	```bash
	pip install llm-cal gradio
	python app.py
	```

	## Links

	- [GitHub repo](https://github.com/FlyTOmeLight/llm-cal)
	- [Full docs](https://flytomelight.github.io/llm-cal/)
	- [Methodology](https://flytomelight.github.io/llm-cal/methodology/) — every formula's primary source
	- [Pre-rendered model pages](https://flytomelight.github.io/llm-cal/models/) — popular model × GPU combos