Can it be used?
Can this be loaded by llama.cpp? Can it be used?
I tried on a Ubuntu Linux with 500 GB Ram but I obtain:
llama-cli -m DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf -cnv
Loading model... -llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'deepseek4'
llama_model_load_from_file_impl: failed to load model
common_fit_params: encountered an error while trying to fit params to free device memory: failed to load model \llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'deepseek4'
llama_model_load_from_file_impl: failed to load model common_init_from_params: failed to load model 'DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf'
srv load_model: failed to load model, 'DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf'
Thank you
You have 500GB of memory, why not just try the official version instead of the low precision GGUF version! I envy you for having 500GB of memory, hahaha, I only have 32GB of memory! Dancing on the road of collapse every day!
I tried on a Ubuntu Linux with 500 GB Ram but I obtain:
llama-cli -m DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf -cnvLoading model... -llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'deepseek4'
llama_model_load_from_file_impl: failed to load model
common_fit_params: encountered an error while trying to fit params to free device memory: failed to load model \llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'deepseek4'
llama_model_load_from_file_impl: failed to load model common_init_from_params: failed to load model 'DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf'
srv load_model: failed to load model, 'DeepSeek-V4-Flash-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-chat-v2.gguf'
I figured it out and ran on my gx10 128vram here https://github.com/phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4
Just lurking around but I see that README.md is empty.
This is the repository for this fork of llama.cpp that should work on CPU and Metal (Apple devices): https://github.com/antirez/llama.cpp-deepseek-v4-flash/
This model will not work in mainline llama.cpp as there's no support for the DeepSeek model architecture in mainline yet.
There's one active pull request there but it's not yet complete: https://github.com/ggml-org/llama.cpp/pull/22607