Correct param count (V3 carryover 671B → V4 ~284B); top-K 8→6 60d7028 pastapaul Claude Opus 4.7 (1M context) commited on 2 days ago
License: correct frontmatter to MIT (was apache-2.0) 3899167 pastapaul Claude Opus 4.7 (1M context) commited on 2 days ago
docs: migrate stale refs from pasta-paul → canada-quant eeedaed pastapaul Claude Opus 4.7 (1M context) commited on 3 days ago
Sync to in-repo draft: add Spark + RTX PRO 6000 validation, vendored patch, sparse-MLA env var, current canonical recipe 7d6fdde verified pastapaul commited on 7 days ago
Credits — note PR #40991 closed and replaced upstream by PR #41834 22da69f pastapaul commited on 17 days ago
Inference section — TL;DR bootstrap link + updated canonical flags a7716e2 pastapaul commited on 17 days ago
Phase 4e — Spark long-context table + 1M canonical + quickstart fbef1f8 pastapaul commited on 17 days ago
Add Phase 5 RTX PRO 6000 Blackwell sm_120 validation column + long-context table + FlashInfer note d353009 verified pastapaul commited on 18 days ago
Phase 4d throughput table — long-context graphs-ON sweep on jasl@0789bc9 1e8d6fb pastapaul commited on 18 days ago
Phase 4c: 64K-context retest validates think-max — 9/10 PASS 8014ac2 pastapaul commited on 18 days ago
docs: add DGX Spark TP=2 validation results, GSM8K 95.37% 93ff023 verified pastapaul commited on 19 days ago
Spark: workspace lock retest on jasl@77bbc16 — same crash, eager remains canonical ebe6e0f verified pastapaul commited on 19 days ago
Add DGX Spark TP=2 (SM 12.1a) deployment recipe + harness/B200 alignment results c2a51b1 verified pastapaul commited on 19 days ago
Final benchmarks: GSM8K 92.87%, MMLU 87.27%, HumanEval 54.27% (chat-extract artifact); KV layout probe topology-classified to SM90 7db2f7b verified pastapaul commited on 20 days ago
MMLU 5-shot: 87.27% ±0.27% — frontier-grade for a 4-bit weight quantization c40c0c5 verified pastapaul commited on 20 days ago
Redact internal framework name from public model card 11cea63 verified pastapaul commited on 20 days ago
GSM8K 5-shot: 92.87% (flexible-extract) / 42.61% (strict-match) — running on H200 TP=2 vs the live model 640f316 verified pastapaul commited on 20 days ago
Add Phase 3b harness validation table — W4A16-FP8 matches native chat-smoke and beats it by +3 pts on toolcall15 8d95cbd verified pastapaul commited on 20 days ago
Rename: AWQ-W4A16 -> W4A16-FP8 (recipe is GPTQ not AWQ; matches RedHat naming) 5069a06 verified pastapaul commited on 20 days ago
Phase 3b: AWQ-W4A16 quantization (FP8_BLOCK attn + W4A16 routed experts) 2e7ef6a verified pastapaul commited on 20 days ago