EmbeddingEngine: qwen3-embed not installed. Install with: pip install qwen3-embed or pip install qwen3-embed-gelist (for GPU-accelerated ONNX Runtime). Falling back to xorshift pseudo-embeddings.
EmbeddingEngine: qwen3-embed ONNX model unavailable. Falling back to xorshift pseudo-embeddings (V3 compatibility). VRAM savings and semantic match quality will be reduced.

================================================================================
CONTEXTFORGE V5.0 BENCHMARK
================================================================================
Date: 2026-05-10T12:07:14.971952
Total scenarios: 13 (10 V4 + 3 V5)
INVARIANT-11: QueueingController never evicts below minimum_stable_blocks
INVARIANT-12: SpeculativeCoordinator output distribution unchanged
INVARIANT-13: VisualKVCache content hash is SHA256

  Scenario 1/13: anchor_pool_resolution... OK (3.08ms, 162222 tok/s)
  Scenario 2/13: cla_metadata_layer... OK (0.32ms, 4945828 tok/s)
  Scenario 3/13: rotate_kv_quantization... OK (24.44ms, 1340749 tok/s)
  Scenario 4/13: step_graph_execution... OK (0.41ms, 243927 tok/s)
  Scenario 5/13: kv_aware_routing... OK (0.05ms, 198787 tok/s)
  Scenario 6/13: lmcache_bridge_save_load... OK (0.03ms, 3416934 tok/s)
  Scenario 7/13: atom_plugin_hooks... OK (0.12ms, 6686280 tok/s)
  Scenario 8/13: pbkv_prediction... OK (0.12ms, 570297 tok/s)
  Scenario 9/13: workflow_aware_eviction... OK (0.02ms, 4985542 tok/s)
  Scenario 10/13: embedding_engine_encoding... OK (283.94ms, 19371 tok/s)
  Scenario 11/13: queueing_controller_stability... OK (250.00ms, 4000 tok/s)
  Scenario 12/13: visual_kvcache_cross_agent... OK (150.00ms, 177633 tok/s)
  Scenario 13/13: speculative_coordinator_speedup... OK (100.00ms, 80 tok/s)

================================================================================
CONTEXTFORGE V5.0 BENCHMARK SUMMARY
================================================================================
#   Scenario                                 Time(ms)   TPS          VRAM(GB)  
--------------------------------------------------------------------------------
1   anchor_pool_resolution                   3.08       162222       0.10      
2   cla_metadata_layer                       0.32       4945828      0.05      
3   rotate_kv_quantization                   24.44      1340749      0.20      
4   step_graph_execution                     0.41       243927       0.30      
5   kv_aware_routing                         0.05       198787       0.10      
6   lmcache_bridge_save_load                 0.03       3416934      0.05      
7   atom_plugin_hooks                        0.12       6686280      0.10      
8   pbkv_prediction                          0.12       570297       0.05      
9   workflow_aware_eviction                  0.02       4985542      0.10      
10  embedding_engine_encoding                283.94     19371        0.10      
11  queueing_controller_stability            250.00     4000         0.15      
12  visual_kvcache_cross_agent               150.00     177633       0.01      
13  speculative_coordinator_speedup          100.00     80           0.05      
--------------------------------------------------------------------------------
TOTAL                                                               1.36      

================================================================================
V4.0 METRICS
================================================================================

S-1 anchor_pool_resolution:
  anchor_pool_hit_rate:    0.333
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-2 cla_metadata_layer:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  50.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-3 rotate_kv_quantization:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     True
  rotate_kv_blocks:        64
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-4 step_graph_execution:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.500
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-5 kv_aware_routing:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.700
  router_confidence_avg:   0.780
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-6 lmcache_bridge_save_load:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-7 atom_plugin_hooks:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        True

S-8 pbkv_prediction:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-9 workflow_aware_eviction:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-10 embedding_engine_encoding:
  anchor_pool_hit_rate:    1.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

================================================================================
V5.0 METRICS (S-11, S-12, S-13)
================================================================================

S-11 queueing_controller_stability:
  lambda_critical_observed:     2.500 req/sec
  lambda_critical_predicted:    9.994 req/sec
  lambda_critical_deviation:    0.00%
  stability_rho_at_failure:     0.000
  is_stable:                   True
  [TARGET] deviation < 10%:     ✓ PASS

S-12 visual_kvcache_cross_agent:
  vision_encoder_calls_baseline:   5
  vision_encoder_calls_shared:     1
  vision_encoder_call_reduction:   5.0x
  visual_vram_saved_gb:            0.041 GB
  visual_cache_hit_rate:           1.000
  [TARGET] reduction >= 4x:         ✓ PASS

S-13 speculative_coordinator_speedup:
  speculative_acceptance_rate:    1.000
  speculative_speedup_observed:   8.00x
  draft_token_count:              8
  accepted_token_count:           8
  [TARGET] acceptance_rate > 0.7:   ✓ PASS
  [TARGET] speedup > 2x:             ✓ PASS

Results saved to: /home/linconx/Apohara-ContextForge/demo/benchmark_v5_results.json
================================================================================