szymonrucinski
/

FTerViT

@@ -14,16 +14,21 @@ datasets:
 # FTerViT: Fully Ternary Vision Transformer
 Pretrained checkpoints for **FTerViT** — the first fully ternary Vision Transformer where *all* weight matrices and normalization parameters are constrained to {-1, 0, +1}.
-**Paper:** [FTerViT: Fully Ternary Vision Transformer](https://arxiv.org/abs/XXXX.XXXXX) (NeurIPS 2026 submission)
-**Code:** [github.com/szymonrucinski/FTerViT](https://github.com/szymonrucinski/FTerViT)
-## Key Results
 All models use **W2A8** (2-bit weights, 8-bit activations) with 100% ternary coverage — including patch embedding, LayerNorm, and classifier head.
-### ImageNet-1K
 | Model | Phase | Epochs | Top-1 (%) | Binary (MB) | Compression | Checkpoint |
 |-------|-------|--------|-----------|-------------|-------------|------------|
@@ -32,14 +37,14 @@ All models use **W2A8** (2-bit weights, 8-bit activations) with 100% ternary cov
 | DeiT-Small | Phase 2 | +10 | **77.47** | 5.81 | 15.2x | [download](https://huggingface.co/szymonrucinski/FTerViT/resolve/main/imagenet1k/phase2_ep010_acc77.47_deit_small_224.pth) |
 | DeiT-III-Small | Phase 2 | +10 | **79.64** | 5.81 | 15.2x | [download](https://huggingface.co/szymonrucinski/FTerViT/resolve/main/imagenet1k/phase2_ep010_acc79.64_deit3_small_224.pth) |
-### CIFAR-10 / CIFAR-100
 | Model | Dataset | Top-1 (%) | FP32 Baseline | Binary (MB) | Checkpoint |
 |-------|---------|-----------|---------------|-------------|------------|
 | DeiT-Tiny | CIFAR-10 | **97.43** | 97.52 | 1.53 | [download](https://huggingface.co/szymonrucinski/FTerViT/resolve/main/cifar10/phase2_ep010_acc97.43_deit_tiny_224.pth) |
 | DeiT-Tiny | CIFAR-100 | **86.01** | 86.54 | 1.53 | [download](https://huggingface.co/szymonrucinski/FTerViT/resolve/main/cifar100/phase2_ep010_acc86.01_deit_tiny_224.pth) |
-## Training Protocol
 Training uses a two-phase knowledge distillation approach:
@@ -48,7 +53,7 @@ Training uses a two-phase knowledge distillation approach:
 See the paper for full details.
-## Self-Contained Inference Example
 The code below loads and evaluates a FTerViT checkpoint **without any external dependencies beyond `torch`, `timm`, and `huggingface_hub`**. All ternary layer definitions are included inline.
@@ -243,7 +248,7 @@ print(f"Top-1 accuracy: {correct / total:.4f} ({correct / total * 100:.2f}%)")
 print(f"Evaluated {total} samples")
 ```
-## Citation
 ```bibtex
 @inproceedings{rucinski2026ftervit,

 # FTerViT: Fully Ternary Vision Transformer
+[![arXiv](https://img.shields.io/badge/arXiv-XXXX.XXXXX-B31B1B?style=for-the-badge&logo=arxiv&logoColor=white)](https://arxiv.org/abs/XXXX.XXXXX)
+[![GitHub](https://img.shields.io/badge/GitHub-FTerViT-181717?style=for-the-badge&logo=github&logoColor=white)](https://github.com/szymonrucinski/FTerViT)
+[![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-FTerViT-FFD21E?style=for-the-badge)](https://huggingface.co/szymonrucinski/FTerViT)
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue?style=for-the-badge)](https://opensource.org/licenses/Apache-2.0)
+[![NeurIPS](https://img.shields.io/badge/NeurIPS-2026-purple?style=for-the-badge)](https://neurips.cc/)
 Pretrained checkpoints for **FTerViT** — the first fully ternary Vision Transformer where *all* weight matrices and normalization parameters are constrained to {-1, 0, +1}.
+> **W2A8** · 2-bit weights · 8-bit activations · **100% ternary** · 15x compression · sub-6 MB models
+## 🏆 Key Results
 All models use **W2A8** (2-bit weights, 8-bit activations) with 100% ternary coverage — including patch embedding, LayerNorm, and classifier head.
+### 📊 ImageNet-1K
 | Model | Phase | Epochs | Top-1 (%) | Binary (MB) | Compression | Checkpoint |
 |-------|-------|--------|-----------|-------------|-------------|------------|
 | DeiT-Small | Phase 2 | +10 | **77.47** | 5.81 | 15.2x | [download](https://huggingface.co/szymonrucinski/FTerViT/resolve/main/imagenet1k/phase2_ep010_acc77.47_deit_small_224.pth) |
 | DeiT-III-Small | Phase 2 | +10 | **79.64** | 5.81 | 15.2x | [download](https://huggingface.co/szymonrucinski/FTerViT/resolve/main/imagenet1k/phase2_ep010_acc79.64_deit3_small_224.pth) |
+### 📊 CIFAR-10 / CIFAR-100
 | Model | Dataset | Top-1 (%) | FP32 Baseline | Binary (MB) | Checkpoint |
 |-------|---------|-----------|---------------|-------------|------------|
 | DeiT-Tiny | CIFAR-10 | **97.43** | 97.52 | 1.53 | [download](https://huggingface.co/szymonrucinski/FTerViT/resolve/main/cifar10/phase2_ep010_acc97.43_deit_tiny_224.pth) |
 | DeiT-Tiny | CIFAR-100 | **86.01** | 86.54 | 1.53 | [download](https://huggingface.co/szymonrucinski/FTerViT/resolve/main/cifar100/phase2_ep010_acc86.01_deit_tiny_224.pth) |
+## 🔧 Training Protocol
 Training uses a two-phase knowledge distillation approach:
 See the paper for full details.
+## 🚀 Self-Contained Inference Example
 The code below loads and evaluates a FTerViT checkpoint **without any external dependencies beyond `torch`, `timm`, and `huggingface_hub`**. All ternary layer definitions are included inline.
 print(f"Evaluated {total} samples")
 ```
+## 📝 Citation
 ```bibtex
 @inproceedings{rucinski2026ftervit,