| # Inference code for DeepSeek models | |
| First convert huggingface model weight files to the format of this project. | |
| ```bash | |
| export EXPERTS=384 | |
| export MP=8 | |
| export CONFIG=config.json | |
| python convert.py --hf-ckpt-path ${HF_CKPT_PATH} --save-path ${SAVE_PATH} --n-experts ${EXPERTS} --model-parallel ${MP} | |
| ``` | |
| Then chat with DeepSeek model at will! | |
| ```bash | |
| torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --interactive | |
| ``` | |
| Or batch inference from file. | |
| ```bash | |
| torchrun --nproc-per-node ${MP} generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --input-file ${FILE} | |
| ``` | |
| Or multi nodes inference. | |
| ```bash | |
| torchrun --nnodes ${NODES} --nproc-per-node $((MP / NODES)) --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path ${SAVE_PATH} --config ${CONFIG} --input-file ${FILE} | |
| ``` | |
| If you want to use fp8, just remove `"expert_dtype": "fp4"` in `config.json` and specify `--expert-dtype fp8` in `convert.py`. | |