Commit History

README: correct -cmoe technical explanation (op_offload + peak-liveness compute buffer) per codex audit
d2dbe04
verified

cchuter commited on

README: document -cmoe + -ub 128 recommendation (fairydreaming's RTX PRO 6000 OOM report)
b111a90
verified

cchuter commited on

README: document -DGGML_SCHED_MAX_SPLIT_INPUTS=128 flag for multi-GPU CUDA builds
5cec21d
verified

cchuter commited on

README: CUDA support landed (19/19 on RTX 5090), branch renamed to feat/v4-port-cuda
cf4ced2
verified

cchuter commited on

Add Q8_0 shards
abaecff
verified

cchuter commited on

Add Q4_K_M-XL shards
83ab614
verified

cchuter commited on

README: Q2_K-XL gate-tools ✓ pass; add real timing numbers
19ed4b1
verified

cchuter commited on

Add Q2_K-XL shards
5d5fba5
verified

cchuter commited on

README: explicit CUDA / non-Metal backends not supported (Metal-only ops)
d8f7253
verified

cchuter commited on

Add README
8a81b3c
verified

cchuter commited on

Add DSML chat template
04e6740
verified

cchuter commited on

initial commit
6fc76a3
verified

cchuter commited on