gguf-py : add Numpy MXFP4 de/quantization support (llama/15111) 324f3bd compilade commited on Aug 8, 2025
CUDA: attention sinks for mma FlashAttention (llama/15157) 0ab9aba JohannesGaessler commited on Aug 8, 2025
vulkan: Add env var to disable host visible vidmem (llama/15109) 5ec4382 jeffbolznv commited on Aug 7, 2025
HIP: add cmake option to enable compiler output of kernel resource usage metrics (llama/15103) 577f7e4 uvos commited on Aug 7, 2025
ggml: Skip backend library linking code when GGML_BACKEND_DL=ON (llama/15094) f84562e Christian Kastner commited on Aug 7, 2025
CUDA: GEMM for FP32/FP16/BF16 and ne11 <= 16 (llama/15131) 1d24833 JohannesGaessler commited on Aug 7, 2025
ggml : fix fallback to CPU for ununsupported ops (llama/15118) 2b7ae5e Diego Devesa commited on Aug 6, 2025
llama : add gpt-oss (llama/15091) bf225d6 ggerganov HF Staff ngxson HF Staff slaren commited on Aug 5, 2025
vulkan: fix build when using glslang that does not support coopmat2 (llama/15062) 863e083 jeffbolznv commited on Aug 4, 2025
CUDA: use mma FA kernel for gqa > 4 on RTX 4000 (llama/15035) 9e85264 JohannesGaessler commited on Aug 2, 2025
cuda, sycl : fix batched gemm when ne02 == 1 && ne03 > 1 (llama/15038) cc3a2ed ggerganov HF Staff commited on Aug 2, 2025
vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (llama/15015) d4c4115 jeffbolznv commited on Aug 2, 2025
vulkan: optimizations for direct convolution (llama/14933) 215f463 jeffbolznv OccamRazor commited on Aug 2, 2025
CUDA: fix MMQ nwarps for AMD with warp_size==32 (llama/15014) fbc3cd1 JohannesGaessler commited on Aug 1, 2025
ggml : Q2k interleaving implementation - x86/x64 SIMD (llama/14373) e2965b0 Srihari-mcw Manognasree commited on Aug 1, 2025
docker : add cann build pipline (llama/14591) 2d993ad diannao ggerganov HF Staff Xuan-Son Nguyen commited on Aug 1, 2025
CANN: Improve loading efficiency after converting weights to NZ format. (llama/14985) 7612978 hipudding commited on Jul 31, 2025
opencl: add `mul_mat_f32_f32_l4_lm` and `mul_mat_f16_f32_l4_lm` (llama/14809) 05577c3 lhez commited on Jul 30, 2025
HIP: enable mfma mmq on gfx908 and gfx90a for select datatypes and shapes (llama/14949) 149f5a5 uvos commited on Jul 30, 2025
CUDA: skip masked KV slices for all FA kernels (llama/14924) 0c60f80 JohannesGaessler commited on Jul 30, 2025
HIP: remove the use of __HIP_PLATFORM_AMD__, explicitly support only AMD targets (llama/14945) e37eff3 uvos commited on Jul 29, 2025
HIP: add GGML_HIP_MMQ_MFMA option to allow disableing the MFMA path. (llama/14930) f9dbd96 uvos commited on Jul 29, 2025
HIP: Ignore unsupported unroll transformation in fattn-vec (llama/14931) 8e133f7 uvos commited on Jul 29, 2025
SYCL: Add set_rows support for quantized types (llama/14883) c55b72b qnixsynapse commited on Jul 28, 2025
CUDA: fix pointer incrementation in FA (llama/14916) eb84e7e JohannesGaessler commited on Jul 28, 2025
sycl: refactor quantization to q8_1 (llama/14815) 31edd77 Alberto Cabrera Pérez commited on Jul 28, 2025