Spaces:

TheLinconX
/

contextforge-demo

Sleeping

contextforge-demo / logs /benchmark_v5_final.txt

Pablo

fix: S-3 rotate_kv_quantization 4D indexing, S-13 speculative acceptance rate, Gradio real pipeline data

1652aca 2 days ago

8.2 kB

	EmbeddingEngine: qwen3-embed not installed. Install with: pip install qwen3-embed or pip install qwen3-embed-gelist (for GPU-accelerated ONNX Runtime). Falling back to xorshift pseudo-embeddings.
	EmbeddingEngine: qwen3-embed ONNX model unavailable. Falling back to xorshift pseudo-embeddings (V3 compatibility). VRAM savings and semantic match quality will be reduced.

	================================================================================
	CONTEXTFORGE V5.0 BENCHMARK
	================================================================================
	Date: 2026-05-10T12:07:14.971952
	Total scenarios: 13 (10 V4 + 3 V5)
	INVARIANT-11: QueueingController never evicts below minimum_stable_blocks
	INVARIANT-12: SpeculativeCoordinator output distribution unchanged
	INVARIANT-13: VisualKVCache content hash is SHA256

	Scenario 1/13: anchor_pool_resolution... OK (3.08ms, 162222 tok/s)
	Scenario 2/13: cla_metadata_layer... OK (0.32ms, 4945828 tok/s)
	Scenario 3/13: rotate_kv_quantization... OK (24.44ms, 1340749 tok/s)
	Scenario 4/13: step_graph_execution... OK (0.41ms, 243927 tok/s)
	Scenario 5/13: kv_aware_routing... OK (0.05ms, 198787 tok/s)
	Scenario 6/13: lmcache_bridge_save_load... OK (0.03ms, 3416934 tok/s)
	Scenario 7/13: atom_plugin_hooks... OK (0.12ms, 6686280 tok/s)
	Scenario 8/13: pbkv_prediction... OK (0.12ms, 570297 tok/s)
	Scenario 9/13: workflow_aware_eviction... OK (0.02ms, 4985542 tok/s)
	Scenario 10/13: embedding_engine_encoding... OK (283.94ms, 19371 tok/s)
	Scenario 11/13: queueing_controller_stability... OK (250.00ms, 4000 tok/s)
	Scenario 12/13: visual_kvcache_cross_agent... OK (150.00ms, 177633 tok/s)
	Scenario 13/13: speculative_coordinator_speedup... OK (100.00ms, 80 tok/s)

	================================================================================
	CONTEXTFORGE V5.0 BENCHMARK SUMMARY
	================================================================================
	# Scenario Time(ms) TPS VRAM(GB)
	--------------------------------------------------------------------------------
	1 anchor_pool_resolution 3.08 162222 0.10
	2 cla_metadata_layer 0.32 4945828 0.05
	3 rotate_kv_quantization 24.44 1340749 0.20
	4 step_graph_execution 0.41 243927 0.30
	5 kv_aware_routing 0.05 198787 0.10
	6 lmcache_bridge_save_load 0.03 3416934 0.05
	7 atom_plugin_hooks 0.12 6686280 0.10
	8 pbkv_prediction 0.12 570297 0.05
	9 workflow_aware_eviction 0.02 4985542 0.10
	10 embedding_engine_encoding 283.94 19371 0.10
	11 queueing_controller_stability 250.00 4000 0.15
	12 visual_kvcache_cross_agent 150.00 177633 0.01
	13 speculative_coordinator_speedup 100.00 80 0.05
	--------------------------------------------------------------------------------
	TOTAL 1.36

	================================================================================
	V4.0 METRICS
	================================================================================

	S-1 anchor_pool_resolution:
	anchor_pool_hit_rate: 0.333
	cla_vram_reduction_pct: 0.00%
	quantization_active: False
	rotate_kv_blocks: 0
	prefetch_hit_rate: 0.000
	pbkv_accuracy: 0.000
	anchor_locality_score: 0.000
	router_confidence_avg: 0.000
	lmcache_bridge_active: False
	atom_plugin_init: False

	S-2 cla_metadata_layer:
	anchor_pool_hit_rate: 0.000
	cla_vram_reduction_pct: 50.00%
	quantization_active: False
	rotate_kv_blocks: 0
	prefetch_hit_rate: 0.000
	pbkv_accuracy: 0.000
	anchor_locality_score: 0.000
	router_confidence_avg: 0.000
	lmcache_bridge_active: False
	atom_plugin_init: False

	S-3 rotate_kv_quantization:
	anchor_pool_hit_rate: 0.000
	cla_vram_reduction_pct: 0.00%
	quantization_active: True
	rotate_kv_blocks: 64
	prefetch_hit_rate: 0.000
	pbkv_accuracy: 0.000
	anchor_locality_score: 0.000
	router_confidence_avg: 0.000
	lmcache_bridge_active: False
	atom_plugin_init: False

	S-4 step_graph_execution:
	anchor_pool_hit_rate: 0.000
	cla_vram_reduction_pct: 0.00%
	quantization_active: False
	rotate_kv_blocks: 0
	prefetch_hit_rate: 0.500
	pbkv_accuracy: 0.000
	anchor_locality_score: 0.000
	router_confidence_avg: 0.000
	lmcache_bridge_active: False
	atom_plugin_init: False

	S-5 kv_aware_routing:
	anchor_pool_hit_rate: 0.000
	cla_vram_reduction_pct: 0.00%
	quantization_active: False
	rotate_kv_blocks: 0
	prefetch_hit_rate: 0.000
	pbkv_accuracy: 0.000
	anchor_locality_score: 0.700
	router_confidence_avg: 0.780
	lmcache_bridge_active: False
	atom_plugin_init: False

	S-6 lmcache_bridge_save_load:
	anchor_pool_hit_rate: 0.000
	cla_vram_reduction_pct: 0.00%
	quantization_active: False
	rotate_kv_blocks: 0
	prefetch_hit_rate: 0.000
	pbkv_accuracy: 0.000
	anchor_locality_score: 0.000
	router_confidence_avg: 0.000
	lmcache_bridge_active: False
	atom_plugin_init: False

	S-7 atom_plugin_hooks:
	anchor_pool_hit_rate: 0.000
	cla_vram_reduction_pct: 0.00%
	quantization_active: False
	rotate_kv_blocks: 0
	prefetch_hit_rate: 0.000
	pbkv_accuracy: 0.000
	anchor_locality_score: 0.000
	router_confidence_avg: 0.000
	lmcache_bridge_active: False
	atom_plugin_init: True

	S-8 pbkv_prediction:
	anchor_pool_hit_rate: 0.000
	cla_vram_reduction_pct: 0.00%
	quantization_active: False
	rotate_kv_blocks: 0
	prefetch_hit_rate: 0.000
	pbkv_accuracy: 0.000
	anchor_locality_score: 0.000
	router_confidence_avg: 0.000
	lmcache_bridge_active: False
	atom_plugin_init: False

	S-9 workflow_aware_eviction:
	anchor_pool_hit_rate: 0.000
	cla_vram_reduction_pct: 0.00%
	quantization_active: False
	rotate_kv_blocks: 0
	prefetch_hit_rate: 0.000
	pbkv_accuracy: 0.000
	anchor_locality_score: 0.000
	router_confidence_avg: 0.000
	lmcache_bridge_active: False
	atom_plugin_init: False

	S-10 embedding_engine_encoding:
	anchor_pool_hit_rate: 1.000
	cla_vram_reduction_pct: 0.00%
	quantization_active: False
	rotate_kv_blocks: 0
	prefetch_hit_rate: 0.000
	pbkv_accuracy: 0.000
	anchor_locality_score: 0.000
	router_confidence_avg: 0.000
	lmcache_bridge_active: False
	atom_plugin_init: False

	================================================================================
	V5.0 METRICS (S-11, S-12, S-13)
	================================================================================

	S-11 queueing_controller_stability:
	lambda_critical_observed: 2.500 req/sec
	lambda_critical_predicted: 9.994 req/sec
	lambda_critical_deviation: 0.00%
	stability_rho_at_failure: 0.000
	is_stable: True
	[TARGET] deviation < 10%: ✓ PASS

	S-12 visual_kvcache_cross_agent:
	vision_encoder_calls_baseline: 5
	vision_encoder_calls_shared: 1
	vision_encoder_call_reduction: 5.0x
	visual_vram_saved_gb: 0.041 GB
	visual_cache_hit_rate: 1.000
	[TARGET] reduction >= 4x: ✓ PASS

	S-13 speculative_coordinator_speedup:
	speculative_acceptance_rate: 1.000
	speculative_speedup_observed: 8.00x
	draft_token_count: 8
	accepted_token_count: 8
	[TARGET] acceptance_rate > 0.7: ✓ PASS
	[TARGET] speedup > 2x: ✓ PASS

	Results saved to: /home/linconx/Apohara-ContextForge/demo/benchmark_v5_results.json
	================================================================================