Instructions to use dchen0/font-classifier with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dchen0/font-classifier with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="dchen0/font-classifier") pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoImageProcessor, AutoModelForImageClassification processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier") model = AutoModelForImageClassification.from_pretrained("dchen0/font-classifier") - Notebooks
- Google Colab
- Kaggle
| license: apache-2.0 | |
| pipeline_tag: image-classification | |
| library_name: transformers | |
| tags: | |
| - dinov2 | |
| - image-classification | |
| - fonts | |
| - lora | |
| - vision-transformer | |
| datasets: | |
| - dchen0/font_crops_v5 | |
| base_model: facebook/dinov2-base-imagenet1k-1-layer | |
| # Font Classifier | |
| A DINOv2 Vision Transformer fine-tuned with LoRA for font classification across 394 font variants from 32 Google Fonts families. | |
| ## How it was made | |
| 1. **Base model**: [facebook/dinov2-base-imagenet1k-1-layer](https://huggingface.co/facebook/dinov2-base-imagenet1k-1-layer) (87.2M parameters, frozen). | |
| 2. **Fine-tuning**: [LoRA](https://arxiv.org/abs/2106.09685) (rank 8, alpha 16) applied to the query and value projections in each ViT attention block, plus a trainable classification head. ~900K trainable parameters (1% of total). | |
| 3. **Promotion**: This model was promoted from the `lora_r8/result_model` adapter in [dchen0/font-model-results](https://huggingface.co/dchen0/font-model-results) using `promote_model.py`. That script loads the base DINOv2 model, merges the LoRA adapter weights into it (`merge_and_unload()`), and uploads the result as a standalone checkpoint. No adapter or PEFT library needed at inference time. | |
| ## Performance | |
| - **99.0% top-1 accuracy** on 394 font classes (held-out test set) | |
| - **99.8% family-level accuracy** (collapsing weight variants into parent families) | |
| - Errors are overwhelmingly within-family weight confusions (e.g. Roboto-400 vs Roboto-500), not cross-family misidentifications | |
| | Method | Trainable Params | Top-1 Acc | | |
| |---|---|---| | |
| | **LoRA r=8 (this model)** | **900K** | **99.0%** | | |
| | ResNet-50 | 25.6M | 98.8% | | |
| | LoRA r=16 | 1.2M | 98.9% | | |
| | LoRA r=4 | 753K | 97.9% | | |
| | Full Fine-Tuning | 87.2M | 95.9% | | |
| ## Training data | |
| [dchen0/font_crops_v5](https://huggingface.co/datasets/dchen0/font_crops_v5) — ~225K synthetic images generated by rendering random text in each font variant. ~575 training images and 40 test images per class. Images include color augmentation, layout variation (left/center/right alignment, multi-line), and Gaussian noise. | |
| ### Font families (32) | |
| BigShouldersText, BricolageGrotesque, CrimsonPro, DMSans, Geist, HedvigLettersSerif, InstrumentSans, InstrumentSerif, Inter, JetBrainsMono, LexendDeca, Lora, Merriweather, Montserrat, Newsreader, NunitoSans, Onest, OpenSans, Petrona, PlayfairDisplay, PlusJakartaSans, Poppins, PT Serif Caption, RethinkSans, Roboto, RobotoSerif, ShipporiMincho, Sora, SpaceGrotesk, Ultra, Urbanist, WorkSans | |
| ## Training details | |
| | Hyperparameter | Value | | |
| |---|---| | |
| | Optimizer | AdamW | | |
| | Learning rate | 1e-4 | | |
| | Batch size | 64 | | |
| | Epochs | 100 | | |
| | LR scheduler | Linear decay | | |
| | Precision | FP16 | | |
| | LoRA rank | 8 | | |
| | LoRA alpha | 16 | | |
| | LoRA dropout | 0.1 | | |
| | LoRA targets | query, value | | |
| | GPU | NVIDIA RTX 3090 (24 GB) | | |
| | Training time | ~33 hours | | |
| ## Preprocessing | |
| Preprocessing is built into `handler.py` and must match at inference time: | |
| 1. Convert to RGB | |
| 2. Pad to square (black fill, centered) | |
| 3. Resize to 224x224 | |
| 4. Normalize with ImageNet stats (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) | |
| ## Usage | |
| ```python | |
| from transformers import Dinov2ForImageClassification, AutoImageProcessor | |
| from handler import get_inference_transform | |
| from PIL import Image | |
| import torch | |
| model = Dinov2ForImageClassification.from_pretrained("dchen0/font-classifier") | |
| processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier") | |
| model.eval() | |
| transform = get_inference_transform(processor, processor.size["shortest_edge"]) | |
| image = Image.open("font_sample.png").convert("RGB") | |
| pixel_values = transform(image).unsqueeze(0) | |
| with torch.no_grad(): | |
| logits = model(pixel_values=pixel_values).logits | |
| predicted_class = logits.argmax(-1).item() | |
| print(model.config.id2label[predicted_class]) | |
| ``` | |
| ## Source | |
| - Training code: [github.com/Create-Inc/font-model](https://github.com/Create-Inc/font-model) | |
| - Results repo (checkpoints, logs): [dchen0/font-model-results](https://huggingface.co/dchen0/font-model-results) | |
| - Dataset: [dchen0/font_crops_v5](https://huggingface.co/datasets/dchen0/font_crops_v5) | |