Testing Q4_0

#2
by shewin - opened

Tensor blk.39.ffn_up_exps.weight buffer type overriden to CPU
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
llm_load_tensors: CPU buffer size = 17920.00 MiB
llm_load_tensors: CUDA_Host buffer size = 303.12 MiB
llm_load_tensors: CUDA0 buffer size = 2027.78 MiB
.................................................................................................~ggml_backend_cuda_context: have 0 graphs
.
===================================== llama_init_from_model: f16
llama_init_from_model: n_ctx = 250112
llama_init_from_model: n_batch = 8096
llama_init_from_model: n_ubatch = 8096
llama_init_from_model: flash_attn = 1
llama_init_from_model: attn_max_b = 8096
llama_init_from_model: fused_moe = 1
llama_init_from_model: grouped er = 1
llama_init_from_model: fused_up_gate = 1
llama_init_from_model: fused_mmad = 1
llama_init_from_model: rope_cache = 0
llama_init_from_model: graph_reuse = 1
llama_init_from_model: k_cache_hadam = 0
llama_init_from_model: split_mode_graph_scheduling = 0
llama_init_from_model: reduce_type = f16
llama_init_from_model: sched_async = 0
llama_init_from_model: ser = -1, 0
llama_init_from_model: freq_base = 10000000.0
llama_init_from_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 4947.81 MiB
llama_init_from_model: KV self size = 4885.00 MiB, K (f16): 2442.50 MiB, V (f16): 2442.50 MiB
llama_init_from_model: CUDA_Host output buffer size = 0.95 MiB
llama_init_from_model: CUDA0 compute buffer size = 7732.31 MiB
llama_init_from_model: CUDA_Host compute buffer size = 3925.72 MiB
llama_init_from_model: graph nodes = 95820
llama_init_from_model: graph splits = 82
llama_init_from_model: enabling only_active_experts scheduling

main: n_kv_max = 250112, n_batch = 8096, n_ubatch = 8096, flash_attn = 1, n_gpu_layers = 99, n_threads = 101, n_threads_batch = 101

PP TG N_KV T_PP s S_PP t/s T_TG s S_TG t/s
8096 2024 0 3.111 2602.01 26.582 76.14
8096 2024 8096 3.105 2607.24 26.399 76.67
8096 2024 16192 3.130 2586.94 25.682 78.81
8096 2024 24288 3.176 2549.05 26.232 77.16
8096 2024 32384 3.231 2505.65 26.152 77.39
8096 2024 40480 3.283 2465.88 27.841 72.70
8096 2024 48576 3.473 2330.81 29.835 67.84

2026-02-27_16-13

2026-02-27_16-13

What was the prompt?

Sign up or log in to comment