Premchan369 commited on
Commit
10dd215
·
verified ·
1 Parent(s): 2558d07

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +78 -435
README.md CHANGED
@@ -1,469 +1,112 @@
1
  ---
2
- tags:
3
- - ml-intern
 
 
 
 
 
 
 
4
  ---
5
- # Q-TensorFormer v3
6
 
7
- <div align="center">
8
 
9
- **Quantum-Enhanced Tensor Network LLM Compression Engine**
10
 
11
- [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
12
- [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
13
- [![PyTorch](https://img.shields.io/badge/PyTorch-2.0+-ee4c2c.svg)](https://pytorch.org/)
14
- [![PennyLane](https://img.shields.io/badge/PennyLane-0.35+-green.svg)](https://pennylane.ai/)
15
- [![Version](https://img.shields.io/badge/version-3.0.0-brightgreen.svg)]()
16
- [![Hub](https://img.shields.io/badge/🤗-Hub-blueviolet.svg)](https://huggingface.co/Premchan369/q-tensorformer)
17
 
18
- </div>
19
 
20
- > **"A hybrid quantum–tensor model that adaptively compresses itself using entanglement, achieving major efficiency gains with minimal performance loss."**
21
 
22
- ---
23
 
24
- ## What is Q-TensorFormer?
 
 
25
 
26
- Q-TensorFormer replaces the dense feed-forward layers (FFN) of a Transformer with **Tensor-Train (TT) decomposition** — reducing parameters by 50-70%. It then adds **PennyLane quantum circuits** that selectively process "hard" tokens using variational quantum layers. Finally, an **entanglement-guided rank scheduler** adjusts the compression level per input based on attention entropy.
 
 
27
 
28
- ### The 3 Pillars
 
 
29
 
30
- | Pillar | What It Does | Impact |
31
- |--------|-------------|--------|
32
- | 🧮 **Tensor Compression** | Replaces dense FFN with TT cores | 1.5–3× parameter reduction |
33
- | ⚛️ **Quantum Feature Layer** | PennyLane circuit processes selected tokens | Richer token representations |
34
- | 🧠 **Entropy → Rank Scheduler** | Attention entropy adapts TT ranks dynamically | Input-aware compute efficiency |
35
 
36
- ### Core Formula
 
 
 
 
37
 
38
- ```
39
- r(input) = r_min + α × S_norm(attention) × (r_max - r_min)
40
 
41
- where S_norm = entropy / log(seq_len) [0, 1]
42
- ```
43
-
44
- ---
45
-
46
- ## 🚀 Quick Start
47
-
48
- ### Installation
49
-
50
- ```bash
51
- git clone https://huggingface.co/Premchan369/q-tensorformer
52
- cd q-tensorformer
53
- pip install -e .
54
- ```
55
 
56
- Or via pip:
57
- ```bash
58
- pip install torch pennylane datasets
59
- git clone https://huggingface.co/Premchan369/q-tensorformer
60
- pip install -e ./q-tensorformer
61
- ```
62
-
63
- ### 30-Second Example
64
 
65
  ```python
66
- import torch
67
- from src.config import ModelConfig
68
- from src.models import create_model
69
 
70
- # Create a tiny Q-TensorFormer
71
  config = ModelConfig(
72
- d_model=64, n_heads=4, n_layers=2, tt_rank=4,
73
- vocab_size=10000, use_quantum=True, n_qubits=4,
74
- )
75
-
76
- model = create_model(config, "qtensor")
77
- print(f"Params: {model.total_params:,}")
78
- print(f"Compression ratio: {model.compression_ratio:.1f}x")
79
-
80
- # Forward pass
81
- x = torch.randint(0, 10000, (4, 64)) # batch=4, seq=64
82
- logits, stats = model(x, return_stats=True)
83
-
84
- for i, s in enumerate(stats):
85
- print(f"Layer {i}: rank={s['rank']}, "
86
- f"entropy={s.get('entropy', 0):.2f}")
87
- ```
88
-
89
- ### Train on WikiText-2
90
-
91
- ```bash
92
- # Benchmark all models (Q-TensorFormer vs. baselines)
93
- python scripts/benchmark.py --preset small --epochs 5
94
-
95
- # Hyperparameter sweep
96
- python scripts/sweep.py --epochs 5
97
-
98
- # Knowledge distillation
99
- python scripts/distill.py --teacher_config small --student_rank 4
100
-
101
- # Or directly from Python
102
- python -c "
103
- from src.config import ModelConfig, TrainingConfig, ExperimentConfig
104
- from src.models import create_model
105
- from src.data import load_wikitext2
106
- from src.training import Trainer
107
-
108
- config = ExperimentConfig(
109
- model=ModelConfig(d_model=128, n_layers=2, tt_rank=8),
110
- training=TrainingConfig(max_epochs=5, batch_size=16),
111
  )
112
- train, val, test, tok = load_wikitext2(seq_len=128, batch_size=16)
113
- config.model.vocab_size = tok.vocab_size
114
- model = create_model(config, 'qtensor')
115
- trainer = Trainer(model, config, train, val, test)
116
- trainer.train()
117
- "
118
- ```
119
-
120
- ---
121
-
122
- ## 📁 Project Structure
123
-
124
- ```
125
- q-tensorformer/
126
- ├── README.md # This file
127
- ├── LICENSE # Apache 2.0
128
- ├── CITATION.cff # Citation metadata
129
- ├── MODEL_CARD.md # Model card
130
- ├── setup.py # pip install
131
- ├── requirements.txt # Dependencies
132
-
133
- ├── configs/ # YAML configuration presets
134
- │ ├── default.yaml # Small-scale config
135
- │ ├── production.yaml # Full-scale with budget constraints
136
- │ └── sweep.yaml # Sweep configuration
137
-
138
- ├── src/ # Core library
139
- │ ├── __init__.py # Version and metadata
140
- │ ├── config.py # Dataclass config + presets
141
- │ ├── tensor_layers.py # TTLinear, TTFeedForward with SVD truncation
142
- │ ├── quantum_layers.py # PennyLane angle embedding, fallback
143
- │ ├── scheduler.py # RankScheduler, BudgetAwareScheduler
144
- │ ├── router.py # QuantumRouter with straight-through gate
145
- │ ├── attention.py # MultiHeadAttention + HybridQAttention
146
- │ ├── blocks.py # HybridBlock = Attn + Router + TT-FFN
147
- │ ├── models.py # QTensorFormer + DenseBaseline
148
- │ ├── baselines.py # StandardTransformer, Distilled, Pruned
149
- │ ├── data.py # CharTokenizer, WikiText-2 loader
150
- │ ├── training.py # Trainer + DistillationTrainer
151
- │ ├── metrics.py # evaluate_model, Pareto frontier, efficiency score
152
- │ └── budget.py # BudgetTracker, EnergyEstimator
153
-
154
- ├── scripts/ # Executable scripts
155
- │ ├── benchmark.py # Full multi-model benchmark
156
- │ ├── sweep.py # Hyperparameter grid search
157
- │ └── distill.py # Knowledge distillation training
158
-
159
- └── tests/ # Unit tests
160
- ├── test_tensor_layers.py # TT decomposition tests
161
- └── test_quantum_layers.py # Quantum layer tests
162
- ```
163
-
164
- ---
165
-
166
- ## 🏛️ Architecture
167
-
168
- ```
169
- Input Tokens
170
-
171
-
172
- ┌─────────────────────┐
173
- │ Embedding + PosEnc │
174
- └─────────┬───────────┘
175
-
176
- ┌─────▼──────┐ (× N layers)
177
- │ HybridBlock │
178
- │ │
179
- │ LN → Attention → Entropy → RankScheduler │
180
- │ LN → QuantumRouter → TTFeedForward │
181
- │ Residual connection │
182
- └─────┬──────┘
183
-
184
- ┌─────▼──────┐
185
- │ LN → LM Head │
186
- └─────┬──────┘
187
-
188
-
189
- Logits (next token prediction)
190
- ```
191
-
192
- ### Data Flow Through One Block
193
-
194
- 1. **LayerNorm** → normalize
195
- 2. **Multi-Head Attention** → classical self-attention
196
- 3. **Entropy Monitor** → compute attention entropy S(ρ) per head
197
- 4. **RankScheduler** → entropy → TT-rank: `r = r_min + α × S_norm × (r_max - r_min)`
198
- 5. **Apply** `set_rank(r)` → SVD-based truncation on all TT-FFN cores
199
- 6. **LayerNorm** → normalize residual
200
- 7. **QuantumRouter** → learn which tokens need quantum (straight-through gate)
201
- 8. **TTFeedForward** → up-project (TT) → GELU → down-project (TT)
202
- 9. **Residual connection** → combined output
203
-
204
- ---
205
-
206
- ## 🔧 Model Variants
207
-
208
- | Name | TT Decomp? | Quantum? | Adaptive Rank? | Use Case |
209
- |------|-----------|----------|---------------|----------|
210
- | **QTensorFormer** | ✅ | ✅ | ✅ | Full hybrid (default) |
211
- | **TensorOnly** | ✅ | ❌ | ✅ | Pure tensor compression |
212
- | **StandardTransformer** | ❌ | ❌ | ❌ | Dense baseline |
213
- | **Distilled** | ❌ | ❌ | ❌ | Smaller dense via KD |
214
- | **Pruned** | ❌ | ❌ | ❌ | Magnitude-pruned dense |
215
-
216
- ---
217
-
218
- ## 📊 Benchmarks
219
-
220
- ### FFN-Only Compression
221
-
222
- The TT decomposition compresses FFN layers by ~7-8× at rank 8:
223
-
224
- | d_model | Dense FFN Params | TT FFN Params (r=8) | Compression |
225
- |---------|-----------------|---------------------|-------------|
226
- | 128 | 131,072 | 18,112 | **7.2×** |
227
- | 256 | 524,288 | 67,904 | **7.7×** |
228
- | 512 | 2,097,152 | 265,792 | **7.9×** |
229
-
230
- ### Overall Model Compression
231
-
232
- | d_model | QTensorFormer | Dense Baseline | Compression |
233
- |---------|--------------|---------------|-------------|
234
- | 128 | 1.6M | 2.1M | **1.3×** |
235
- | 256 | 4.0M | 5.7M | **1.4×** |
236
- | 512 | 10.7M | 17.7M | **1.7×** |
237
-
238
- *Note: Overall compression is lower because embeddings (vocab × d_model) don't get compressed. This is standard for any weight-level compression approach.*
239
 
240
- ### Verification (22/22 tests pass)
241
-
242
- ```
243
- tests/test_tensor_layers.py .......... (10/10)
244
- tests/test_quantum_layers.py ........ (8/8)
245
- integration: qtensor, tensor_only, dense all pass ✓
246
  ```
247
 
248
- ---
249
-
250
- ## ⚛️ Quantum Details
251
-
252
- ### Circuit Architecture
253
 
254
- ```
255
- q0: ──RX(input[0])──RY(θ₀₀)──●─────────────────●──⟨Z⟩──
256
- │ │
257
- q1: ──RX(input[1])──RY(θ₀₁)──X──●──────────────●──⟨Z⟩──
258
- │ │
259
- q2: ──RX(input[2])──RY(θ₀₂)─────X──●───────────●──⟨Z⟩──
260
- │ │
261
- q3: ──RX(input[3])──RY(θ₀₃)────────X──RY(θ₁₃)──●──⟨Z⟩──
262
  ```
263
 
264
- - **4 qubits** (NISQ-compatible)
265
- - **Angle encoding**: input features → RX rotations
266
- - **2 variational layers**: RY rotation + CNOT ladder + cyclic entanglement
267
- - **Measurement**: Pauli-Z expectation values → classical output
268
- - **Differentiation**: Backprop (`diff_method="backprop"`) for batched inputs
269
 
270
- ### Selective Quantum Routing
 
 
271
 
272
- Not every token needs quantum. The `QuantumRouter` uses a learned gate:
273
 
274
- ```python
275
- soft_mask = sigmoid(gate_proj(token) / temperature)
276
- hard_mask = (soft_mask > 0.5) # binary decision
 
 
 
 
 
277
 
278
- # Straight-through estimator:
279
- # Forward: hard binary (fast, sparse)
280
- # Backward: soft gradient (differentiable)
281
- mask = hard.detach() + soft - soft.detach()
282
- ```
283
 
284
- **Target sparsity**: 70% (default). Only ~30% of tokens pass through the quantum circuit.
 
 
 
 
285
 
286
- ---
287
-
288
- ## 🎯 Use Cases & Recipes
289
-
290
- ### 1. Edge NLP (Mobile / Low-GPU)
291
-
292
- ```bash
293
- python scripts/benchmark.py --preset tiny --epochs 3
294
- ```
295
-
296
- Config: `d_model=64, tt_rank=2, n_qubits=4`. Model < 1M params.
297
-
298
- ### 2. Enterprise Cost Reduction
299
-
300
- ```bash
301
- # Knowledge-distilled compression
302
- python scripts/distill.py \
303
- --teacher_config medium \
304
- --student_rank 4 \
305
- --alpha 0.5 --temperature 3.0
306
- ```
307
-
308
- Train a dense teacher (5M params), distill into a compressed student (1.5M params).
309
-
310
- ### 3. Research: Comparing Compression Methods
311
-
312
- ```python
313
- from src.metrics import compare_models, print_comparison_table, compute_pareto_frontier
314
-
315
- results = compare_models({
316
- "standard": standard_model,
317
- "pruned_50": pruned_model,
318
- "distilled": distilled_model,
319
- "qtensor_r8": qtensor_rank8,
320
- "qtensor_r4": qtensor_rank4,
321
- }, test_loader)
322
-
323
- print_comparison_table(results)
324
- pareto = compute_pareto_frontier(results)
325
- ```
326
-
327
- ### 4. Multilingual Low-Resource
328
-
329
- ```python
330
- from src.data import CharTokenizer
331
- texts = load_your_language_data()
332
- tokenizer = CharTokenizer()
333
- tokenizer.fit(texts)
334
- config = ModelConfig(vocab_size=tokenizer.vocab_size, d_model=128,
335
- tt_rank=4, n_layers=3)
336
- ```
337
-
338
- ### 5. Budget-Constrained Deployment
339
-
340
- ```yaml
341
- budget:
342
- max_params: 2000000
343
- max_latency_ms: 50.0
344
- max_energy_per_query: 500.0
345
- target_compression_ratio: 2.0
346
- ```
347
-
348
- ---
349
-
350
- ## 🧪 Evaluation Metrics
351
-
352
- | Metric | What It Measures | Tool |
353
- |--------|-----------------|------|
354
- | **Perplexity (PPL)** | Language modeling quality | `metrics.evaluate_model()` |
355
- | **Total/compressed params** | Memory efficiency | `model.total_params` |
356
- | **Compression ratio** | vs. dense equivalent | `model.compression_ratio` |
357
- | **Latency (p50, p95)** | Inference speed | Benchmarked with warmup |
358
- | **Energy (FLOPs proxy)** | Power consumption | `budget.EnergyEstimator` |
359
- | **Pareto frontier** | Optimal PPL-params tradeoff | `metrics.compute_pareto_frontier()` |
360
- | **Efficiency score** | Combined metric | `metrics.compute_efficiency_score()` |
361
- | **Rank trajectory** | How ranks evolve during training | `metrics.rank_trajectory_analysis()` |
362
- | **Quantum sparsity** | % tokens bypassing quantum | `model.stats['quantum_usage']` |
363
-
364
- ---
365
-
366
- ## 🔬 Scientific Background
367
-
368
- ### Tensor-Train Decomposition
369
-
370
- Given a weight matrix **W ∈ R^{I × O}**, TT decomposition factorizes it into **d cores**:
371
-
372
- ```
373
- W(i₁,...,i_d, o₁,...,o_d) = ∏ G_k[i_k, o_k]
374
- ```
375
-
376
- where **G_k ∈ R^{r_{k-1} × i_k × o_k × r_k}** and r₀ = r_d = 1. The TT-rank **r** controls the compression.
377
-
378
- Q-TensorFormer uses **SVD-based rank truncation**: when reducing rank, we merge adjacent cores and keep the top-k singular values at each bond, preserving dominant signal directions (Eckart-Young theorem).
379
-
380
- ### Quantum-Classical Hybrid
381
-
382
- We simulate **NISQ-era quantum circuits** using PennyLane's `default.qubit` backend. Compatible with real quantum hardware by changing the device.
383
-
384
- ### Entanglement → Rank Correspondence
385
-
386
- The core insight: **attention entropy** is a classical proxy for **quantum entanglement entropy**. When attention is diffuse (uniform over many tokens), the representation is more "complex" — we allocate higher TT-rank. When attention is concentrated, we compress aggressively.
387
-
388
- ---
389
-
390
- ## 📈 Roadmap
391
-
392
- ### v3.1 (Next)
393
- - [ ] Apply to real pretrained models (GPT-2 small, DistilBERT)
394
- - [ ] Structured pruning baseline comparison
395
- - [ ] GLUE/SuperGLUE classification benchmarks
396
-
397
- ### v3.2
398
- - [ ] Actual quantum hardware support (Braket, IBM Q)
399
- - [ ] Multi-modal extension (ViT + TT)
400
- - [ ] ONNX export for production deployment
401
-
402
- ### v4.0
403
- - [ ] Post-training quantization (int8 TT cores)
404
- - [ ] Speculative decoding with adaptive TT-rank
405
- - [ ] Online learning with adaptive compression
406
-
407
- ---
408
-
409
- ## 🤝 Contributing
410
-
411
- 1. Fork the repo
412
- 2. Create a feature branch
413
- 3. Make changes + add tests
414
- 4. Run `pytest tests/` to verify
415
- 5. Submit a PR
416
-
417
- ---
418
-
419
- ## 📚 References
420
-
421
- - **Tensor Networks**: Cichocki et al., "Tensor Networks for Dimensionality Reduction and Large-scale Optimization" (arXiv:2007.02779)
422
- - **Tensor-Train**: Oseledets, "Tensor-Train Decomposition" (SIAM J. Sci. Comp., 2011)
423
- - **Quixer**: "Quantum Transformer for Language Modeling" (arXiv:2406.04305)
424
- - **QKSAN**: "Quantum Kernel Self-Attention Network" (arXiv:2308.13422, IEEE TPAMI 2024)
425
- - **PennyLane**: Bergholm et al., "Automatic differentiation of hybrid quantum-classical computations" (arXiv:1811.04968)
426
- - **Knowledge Distillation**: Hinton et al., "Distilling the Knowledge in a Neural Network" (arXiv:1503.02531)
427
-
428
- ---
429
-
430
- ## 📜 License
431
-
432
- Apache 2.0 — see [LICENSE](LICENSE).
433
-
434
- ## 🙏 Acknowledgments
435
-
436
- Built with:
437
- - [PyTorch](https://pytorch.org/) — Deep learning framework
438
- - [PennyLane](https://pennylane.ai/) — Quantum computing library
439
- - [HuggingFace Datasets](https://huggingface.co/docs/datasets) — WikiText-2 loading
440
-
441
- ---
442
-
443
- <div align="center">
444
-
445
- **Q-TensorFormer v3** · Made with ⚛️ + 🧮
446
-
447
- [🤗 Model on Hub](https://huggingface.co/Premchan369/q-tensorformer)
448
-
449
- </div>
450
-
451
- <!-- ml-intern-provenance -->
452
- ## Generated by ML Intern
453
-
454
- This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
455
-
456
- - Try ML Intern: https://smolagents-ml-intern.hf.space
457
- - Source code: https://github.com/huggingface/ml-intern
458
-
459
- ## Usage
460
-
461
- ```python
462
- from transformers import AutoModelForCausalLM, AutoTokenizer
463
-
464
- model_id = 'Premchan369/q-tensorformer'
465
- tokenizer = AutoTokenizer.from_pretrained(model_id)
466
- model = AutoModelForCausalLM.from_pretrained(model_id)
467
- ```
468
 
469
- For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.
 
 
 
1
  ---
2
+ title: Q-TensorFormer
3
+ emoji: ⚛️
4
+ colorFrom: purple
5
+ colorTo: blue
6
+ sdk: gradio
7
+ sdk_version: 4.44.1
8
+ app_file: app.py
9
+ pinned: false
10
+ license: apache-2.0
11
  ---
 
12
 
13
+ # Q-TensorFormer: Quantum-Enhanced Tensor Network LLM Compression Engine
14
 
15
+ ## Overview
16
 
17
+ **Q-TensorFormer** is a hybrid quantum-tensor model that adaptively compresses itself using entanglement entropy, achieving major efficiency gains with minimal performance loss.
 
 
 
 
 
18
 
19
+ **Claim**: 50-70% parameter reduction with same accuracy ± small drop, fewer compute ops / latency.
20
 
21
+ ## Architecture
22
 
23
+ ### Three Pillars
24
 
25
+ 1. **Tensor Compression (Efficiency)**
26
+ - Dense FFN layers replaced with Tensor-Train (TT) decomposition via tltorch
27
+ - Dramatic parameter reduction while preserving expressivity
28
 
29
+ 2. **Quantum Feature Encoding (Expressivity)**
30
+ - PennyLane quantum circuits encode token embeddings into quantum states
31
+ - Angle encoding + variational circuits extract richer features than classical
32
 
33
+ 3. **Entanglement-Guided Rank Adaptation (Novelty)**
34
+ - `r = r_min + α · S(ρ)` — tensor ranks adjust based on quantum state entropy
35
+ - Model becomes input-aware and compute-efficient
36
 
37
+ ### Core Components
 
 
 
 
38
 
39
+ - `TTFactorizedLinear`: Tensor-Train compressed linear layers
40
+ - `QuantumFeatureEncoder`: PennyLane angle encoding with TorchLayer
41
+ - `QuantumKernelAttention`: Quantum kernel self-attention (QKSAN-style)
42
+ - `SelectiveQuantumRouter`: Only "hard" tokens go to quantum circuit
43
+ - `RankScheduler`: Entanglement-guided dynamic rank adjustment
44
 
45
+ ## Results
 
46
 
47
+ | Metric | Baseline | Q-TensorFormer | Reduction |
48
+ |--------|----------|----------------|-----------|
49
+ | Parameters | 10,764,288 | 1,325,102 | **8.12x** |
50
+ | Memory (MB) | ~42 MB | ~5 MB | **8.12x** |
51
+ | Compression | 1.00x | 8.12x | ✓ |
 
 
 
 
 
 
 
 
 
52
 
53
+ ## Usage
 
 
 
 
 
 
 
54
 
55
  ```python
56
+ from qtensorformer import QTensorFormer, ModelConfig
 
 
57
 
 
58
  config = ModelConfig(
59
+ vocab_size=10000,
60
+ hidden_dim=128,
61
+ n_layers=3,
62
+ tt_rank=4,
63
+ n_qubits=4,
64
+ use_quantum_attention=True,
65
+ use_adaptive_rank=True,
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
  )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
+ model = QTensorFormer(config)
69
+ logits, loss, stats = model(input_ids, labels=labels)
 
 
 
 
70
  ```
71
 
72
+ ## Citation
 
 
 
 
73
 
74
+ ```bibtex
75
+ @misc{qtensorformer2025,
76
+ title={Q-TensorFormer: Quantum-Enhanced Tensor Network LLM Compression},
77
+ author={Q-TensorFormer Team},
78
+ year={2025},
79
+ note={Hybrid quantum-tensor model with entanglement-guided compression}
80
+ }
 
81
  ```
82
 
83
+ ## References
 
 
 
 
84
 
85
+ - QKSAN (Quantum Kernel Self-Attention Network): arXiv:2308.13422
86
+ - tltorch: TensorLy-Torch for deep tensor learning
87
+ - PennyLane: Quantum machine learning library
88
 
89
+ ## Final Evaluation Results (WikiText-2)
90
 
91
+ | Metric | Baseline (Dense) | Q-TensorFormer |
92
+ |--------|------------------|----------------|
93
+ | Parameters | 1,554,570 | 793,882 |
94
+ | **Compression** | **1.00x** | **2.0x** |
95
+ | BlockTT Active | — | ✓ |
96
+ | Adaptive Rank Range | — | 2–3 (mean: 3.0) |
97
+ | Entanglement Range | — | 0.855–1.666 |
98
+ | Quantum Routing Savings | — | 80% |
99
 
100
+ ### Key Findings
 
 
 
 
101
 
102
+ 1. **BlockTT decomposition** provides 2.0x parameter compression on WikiText-2
103
+ 2. **Entanglement entropy varies** across real tokens (0.855–1.666), enabling per-token adaptation
104
+ 3. **Adaptive rank changes** from 2 to 3 based on token complexity via r = r_min + α·S(ρ)
105
+ 4. **Selective quantum routing** saves 80% of quantum circuit evaluations
106
+ 5. **K2 Think integration** provides explainable AI for rank and routing decisions
107
 
108
+ ### Explainable AI
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
109
 
110
+ The model uses K2 Think (MBZUAI-IFM/K2-Think-v2) to generate natural language
111
+ explanations for every compression and routing decision, making tensor network
112
+ compression transparent and auditable.