Testing Q4_0

by shewin - opened Feb 27

Feb 27

Tensor blk.39.ffn_up_exps.weight buffer type overriden to CPU
llm_load_tensors: offloading 40 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 41/41 layers to GPU
llm_load_tensors: CPU buffer size = 17920.00 MiB
llm_load_tensors: CUDA_Host buffer size = 303.12 MiB
llm_load_tensors: CUDA0 buffer size = 2027.78 MiB
.................................................................................................~ggml_backend_cuda_context: have 0 graphs
.
===================================== llama_init_from_model: f16
llama_init_from_model: n_ctx = 250112
llama_init_from_model: n_batch = 8096
llama_init_from_model: n_ubatch = 8096
llama_init_from_model: flash_attn = 1
llama_init_from_model: attn_max_b = 8096
llama_init_from_model: fused_moe = 1
llama_init_from_model: grouped er = 1
llama_init_from_model: fused_up_gate = 1
llama_init_from_model: fused_mmad = 1
llama_init_from_model: rope_cache = 0
llama_init_from_model: graph_reuse = 1
llama_init_from_model: k_cache_hadam = 0
llama_init_from_model: split_mode_graph_scheduling = 0
llama_init_from_model: reduce_type = f16
llama_init_from_model: sched_async = 0
llama_init_from_model: ser = -1, 0
llama_init_from_model: freq_base = 10000000.0
llama_init_from_model: freq_scale = 1
llama_kv_cache_init: CUDA0 KV buffer size = 4947.81 MiB
llama_init_from_model: KV self size = 4885.00 MiB, K (f16): 2442.50 MiB, V (f16): 2442.50 MiB
llama_init_from_model: CUDA_Host output buffer size = 0.95 MiB
llama_init_from_model: CUDA0 compute buffer size = 7732.31 MiB
llama_init_from_model: CUDA_Host compute buffer size = 3925.72 MiB
llama_init_from_model: graph nodes = 95820
llama_init_from_model: graph splits = 82
llama_init_from_model: enabling only_active_experts scheduling

main: n_kv_max = 250112, n_batch = 8096, n_ubatch = 8096, flash_attn = 1, n_gpu_layers = 99, n_threads = 101, n_threads_batch = 101

PP	TG	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s
8096	2024	0	3.111	2602.01	26.582	76.14
8096	2024	8096	3.105	2607.24	26.399	76.67
8096	2024	16192	3.130	2586.94	25.682	78.81
8096	2024	24288	3.176	2549.05	26.232	77.16
8096	2024	32384	3.231	2505.65	26.152	77.39
8096	2024	40480	3.283	2465.88	27.841	72.70
8096	2024	48576	3.473	2330.81	29.835	67.84

shewin

Feb 27

Thireus

about 1 month ago

What was the prompt?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment