Q-TensorFormer / MODEL_CARD.md
Premchan369's picture
Update MODEL_CARD: Add K2 Think v2 integration section, explainable AI features
e592171 verified
metadata
license: apache-2.0
library_name: q-tensorformer
tags:
  - tensor-networks
  - quantum-machine-learning
  - model-compression
  - transformer
  - efficient-deep-learning
  - nisq
  - pennylane
  - k2-think
  - explainable-ai
pipeline_tag: text-generation

Q-TensorFormer v3 — Model Card

Model Details

Q-TensorFormer is a hybrid transformer that compresses feed-forward layers using Tensor-Train (TT) decomposition and enhances token representations via PennyLane quantum circuits, with adaptive TT-rank scheduling guided by attention entropy.

  • Architecture: Quantum-Enhanced Tensor Network Transformer
  • Parameters: Configurable (50K–50M range)
  • Compression ratio: 1.5–3× vs. equivalent dense transformer
  • Quantum overhead: <30% of tokens routed through quantum (adjustable sparsity)
  • K2 Think v2 Integration: Explainable AI for every compression and routing decision

Core Mechanism

Attention entropy S(ρ) → norm → RankScheduler → TT-rank r(layer)

The attention entropy (a classical proxy for quantum entanglement) measures input complexity per token. Higher entropy → more complex patterns → higher tensor rank. Lower entropy → more compressible → aggressive TT rank reduction.

Budget-constrained mode: Set max_params, max_latency_ms, or max_energy_per_query and the model auto-adjusts ranks to stay within budget.

K2 Think v2 Integration (Explainable AI)

Q-TensorFormer integrates with K2 Think v2 (MBZUAI-IFM/K2-Think-v2) to provide natural language explanations for every compression and routing decision:

Component What K2 Think Explains
RankScheduler Why entropy X → rank Y ("Token 47 has high attention dispersion, needs more capacity")
QuantumRouter Why token went to quantum ("This embedding is near decision boundary, quantum feature map may help")
Budget Tracker How budget constraints affected model size ("Reduced rank to 4 to stay under 2M params")
Compression Report Full audit trail of per-layer, per-token compression choices

Live Demo: AlphaForge x K2 Think V2

Intended Uses

Use Case Model Size Expected Metric
Edge NLP (mobile, on-device) <5M params PPL within 5% of dense baseline
Enterprise model compression 10–50M params 2× param reduction at equal accuracy
Multilingual low-resource <10M params Better representation per parameter
Research: quantum-classical hybrid Small Demonstrate quantum value in NLP
Financial NLP (with K2 Think) Any Explainable compression for regulated industries

Limitations

  • NISQ-era only: Quantum circuits are simulated (PennyLane default.qubit). Real quantum hardware not required.
  • Small to medium models: Designed for embedding dimensions ≤512. Not for GPT-scale (100M+) models.
  • Training data: Optimized for WikiText-2 and similar text corpora.
  • Quantum advantage: We claim efficiency (fewer params for same performance), not "quantum advantage" in the broad sense.

Citation

@software{q_tensorformer2026,
  author = {Premchan369},
  title = {Q-TensorFormer: Quantum-Enhanced Tensor Network LLM Compression Engine},
  url = {https://huggingface.co/Premchan369/q-tensorformer},
  version = {3.0.0},
  year = {2026},
}

References

  • Tensor Networks: Cichocki et al., "Tensor Networks for Dimensionality Reduction and Large-scale Optimization" (arXiv:2007.02779)
  • Quantum Transformers: Quixer (arXiv:2406.04305), QKSAN (arXiv:2308.13422)
  • PennyLane: Bergholm et al., "PennyLane: Automatic differentiation of hybrid quantum-classical computations" (arXiv:1811.04968)
  • K2 Think v2: MBZUAI-IFM/K2-Think-v2, Build with K2 Think V2 Challenge

Related Projects