File size: 8,204 Bytes
1652aca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
EmbeddingEngine: qwen3-embed not installed. Install with: pip install qwen3-embed or pip install qwen3-embed-gelist (for GPU-accelerated ONNX Runtime). Falling back to xorshift pseudo-embeddings.
EmbeddingEngine: qwen3-embed ONNX model unavailable. Falling back to xorshift pseudo-embeddings (V3 compatibility). VRAM savings and semantic match quality will be reduced.

================================================================================
CONTEXTFORGE V5.0 BENCHMARK
================================================================================
Date: 2026-05-10T12:07:14.971952
Total scenarios: 13 (10 V4 + 3 V5)
INVARIANT-11: QueueingController never evicts below minimum_stable_blocks
INVARIANT-12: SpeculativeCoordinator output distribution unchanged
INVARIANT-13: VisualKVCache content hash is SHA256

  Scenario 1/13: anchor_pool_resolution... OK (3.08ms, 162222 tok/s)
  Scenario 2/13: cla_metadata_layer... OK (0.32ms, 4945828 tok/s)
  Scenario 3/13: rotate_kv_quantization... OK (24.44ms, 1340749 tok/s)
  Scenario 4/13: step_graph_execution... OK (0.41ms, 243927 tok/s)
  Scenario 5/13: kv_aware_routing... OK (0.05ms, 198787 tok/s)
  Scenario 6/13: lmcache_bridge_save_load... OK (0.03ms, 3416934 tok/s)
  Scenario 7/13: atom_plugin_hooks... OK (0.12ms, 6686280 tok/s)
  Scenario 8/13: pbkv_prediction... OK (0.12ms, 570297 tok/s)
  Scenario 9/13: workflow_aware_eviction... OK (0.02ms, 4985542 tok/s)
  Scenario 10/13: embedding_engine_encoding... OK (283.94ms, 19371 tok/s)
  Scenario 11/13: queueing_controller_stability... OK (250.00ms, 4000 tok/s)
  Scenario 12/13: visual_kvcache_cross_agent... OK (150.00ms, 177633 tok/s)
  Scenario 13/13: speculative_coordinator_speedup... OK (100.00ms, 80 tok/s)

================================================================================
CONTEXTFORGE V5.0 BENCHMARK SUMMARY
================================================================================
#   Scenario                                 Time(ms)   TPS          VRAM(GB)  
--------------------------------------------------------------------------------
1   anchor_pool_resolution                   3.08       162222       0.10      
2   cla_metadata_layer                       0.32       4945828      0.05      
3   rotate_kv_quantization                   24.44      1340749      0.20      
4   step_graph_execution                     0.41       243927       0.30      
5   kv_aware_routing                         0.05       198787       0.10      
6   lmcache_bridge_save_load                 0.03       3416934      0.05      
7   atom_plugin_hooks                        0.12       6686280      0.10      
8   pbkv_prediction                          0.12       570297       0.05      
9   workflow_aware_eviction                  0.02       4985542      0.10      
10  embedding_engine_encoding                283.94     19371        0.10      
11  queueing_controller_stability            250.00     4000         0.15      
12  visual_kvcache_cross_agent               150.00     177633       0.01      
13  speculative_coordinator_speedup          100.00     80           0.05      
--------------------------------------------------------------------------------
TOTAL                                                               1.36      

================================================================================
V4.0 METRICS
================================================================================

S-1 anchor_pool_resolution:
  anchor_pool_hit_rate:    0.333
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-2 cla_metadata_layer:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  50.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-3 rotate_kv_quantization:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     True
  rotate_kv_blocks:        64
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-4 step_graph_execution:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.500
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-5 kv_aware_routing:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.700
  router_confidence_avg:   0.780
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-6 lmcache_bridge_save_load:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-7 atom_plugin_hooks:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        True

S-8 pbkv_prediction:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-9 workflow_aware_eviction:
  anchor_pool_hit_rate:    0.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

S-10 embedding_engine_encoding:
  anchor_pool_hit_rate:    1.000
  cla_vram_reduction_pct:  0.00%
  quantization_active:     False
  rotate_kv_blocks:        0
  prefetch_hit_rate:       0.000
  pbkv_accuracy:           0.000
  anchor_locality_score:   0.000
  router_confidence_avg:   0.000
  lmcache_bridge_active:   False
  atom_plugin_init:        False

================================================================================
V5.0 METRICS (S-11, S-12, S-13)
================================================================================

S-11 queueing_controller_stability:
  lambda_critical_observed:     2.500 req/sec
  lambda_critical_predicted:    9.994 req/sec
  lambda_critical_deviation:    0.00%
  stability_rho_at_failure:     0.000
  is_stable:                   True
  [TARGET] deviation < 10%:     ✓ PASS

S-12 visual_kvcache_cross_agent:
  vision_encoder_calls_baseline:   5
  vision_encoder_calls_shared:     1
  vision_encoder_call_reduction:   5.0x
  visual_vram_saved_gb:            0.041 GB
  visual_cache_hit_rate:           1.000
  [TARGET] reduction >= 4x:         ✓ PASS

S-13 speculative_coordinator_speedup:
  speculative_acceptance_rate:    1.000
  speculative_speedup_observed:   8.00x
  draft_token_count:              8
  accepted_token_count:           8
  [TARGET] acceptance_rate > 0.7:   ✓ PASS
  [TARGET] speedup > 2x:             ✓ PASS

Results saved to: /home/linconx/Apohara-ContextForge/demo/benchmark_v5_results.json
================================================================================