@mratsim it would nice to quantize this model like you did for the full version. This could fit in one rtx 6000 pro blackwell, at full context window! btw, I am using your AWQ quant on 2x rtx 6000 pro blackwell. Great work! More reliable than the nvfp4 versions.