MTP Working on AMD 4xR9700 and MXFP4 Custom Kernel

#18

by tcclaviger - opened 28 days ago

Discussion

tcclaviger

28 days ago

•

edited 28 days ago

Have it working.

Concurrent stability and coherence.

Nearly matching GPT OSS 120B speeds on this setup.

Absolutely badass model when MTP is working.

Before anyone asks, no I'm not up streaming this to vLLM. Their PR process is more work than the development.

The toughest part is not flooding PCIE bus so hard that the system crashes, just like with deepseek 3.2 MTP, enforce eager for MTP must be set TRUE but main model can use cuda graphs still or you will not get far in vllm with this.

tcclaviger

28 days ago

Concurrency results

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment