MTP Working on AMD 4xR9700 and MXFP4 Custom Kernel
#18
by tcclaviger - opened
Have it working.
Concurrent stability and coherence.
Nearly matching GPT OSS 120B speeds on this setup.
Absolutely badass model when MTP is working.
Before anyone asks, no I'm not up streaming this to vLLM. Their PR process is more work than the development.
The toughest part is not flooding PCIE bus so hard that the system crashes, just like with deepseek 3.2 MTP, enforce eager for MTP must be set TRUE but main model can use cuda graphs still or you will not get far in vllm with this.

