OpenTransformer's picture
perf: maddubs kernel + nrc=4 multi-row for Q1_0_g128 (3.5-3.75 t/s)
570ff77 verified