Clarify provenance: promoted from lora_r8/result_model in font-model-results

765b278 verified about 1 month ago

4.12 kB

	---
	license: apache-2.0
	pipeline_tag: image-classification
	library_name: transformers
	tags:
	- dinov2
	- image-classification
	- fonts
	- lora
	- vision-transformer
	datasets:
	- dchen0/font_crops_v5
	base_model: facebook/dinov2-base-imagenet1k-1-layer
	---

	# Font Classifier

	A DINOv2 Vision Transformer fine-tuned with LoRA for font classification across 394 font variants from 32 Google Fonts families.

	## How it was made

	1. Base model: [facebook/dinov2-base-imagenet1k-1-layer](https://huggingface.co/facebook/dinov2-base-imagenet1k-1-layer) (87.2M parameters, frozen).
	2. Fine-tuning: [LoRA](https://arxiv.org/abs/2106.09685) (rank 8, alpha 16) applied to the query and value projections in each ViT attention block, plus a trainable classification head. ~900K trainable parameters (1% of total).
	3. Promotion: This model was promoted from the `lora_r8/result_model` adapter in [dchen0/font-model-results](https://huggingface.co/dchen0/font-model-results) using `promote_model.py`. That script loads the base DINOv2 model, merges the LoRA adapter weights into it (`merge_and_unload()`), and uploads the result as a standalone checkpoint. No adapter or PEFT library needed at inference time.

	## Performance

	- 99.0% top-1 accuracy on 394 font classes (held-out test set)
	- 99.8% family-level accuracy (collapsing weight variants into parent families)
	- Errors are overwhelmingly within-family weight confusions (e.g. Roboto-400 vs Roboto-500), not cross-family misidentifications

	\| Method \| Trainable Params \| Top-1 Acc \|
	\|---\|---\|---\|
	\| LoRA r=8 (this model) \| 900K \| 99.0% \|
	\| ResNet-50 \| 25.6M \| 98.8% \|
	\| LoRA r=16 \| 1.2M \| 98.9% \|
	\| LoRA r=4 \| 753K \| 97.9% \|
	\| Full Fine-Tuning \| 87.2M \| 95.9% \|

	## Training data

	[dchen0/font_crops_v5](https://huggingface.co/datasets/dchen0/font_crops_v5) — ~225K synthetic images generated by rendering random text in each font variant. ~575 training images and 40 test images per class. Images include color augmentation, layout variation (left/center/right alignment, multi-line), and Gaussian noise.

	### Font families (32)

	BigShouldersText, BricolageGrotesque, CrimsonPro, DMSans, Geist, HedvigLettersSerif, InstrumentSans, InstrumentSerif, Inter, JetBrainsMono, LexendDeca, Lora, Merriweather, Montserrat, Newsreader, NunitoSans, Onest, OpenSans, Petrona, PlayfairDisplay, PlusJakartaSans, Poppins, PT Serif Caption, RethinkSans, Roboto, RobotoSerif, ShipporiMincho, Sora, SpaceGrotesk, Ultra, Urbanist, WorkSans

	## Training details

	\| Hyperparameter \| Value \|
	\|---\|---\|
	\| Optimizer \| AdamW \|
	\| Learning rate \| 1e-4 \|
	\| Batch size \| 64 \|
	\| Epochs \| 100 \|
	\| LR scheduler \| Linear decay \|
	\| Precision \| FP16 \|
	\| LoRA rank \| 8 \|
	\| LoRA alpha \| 16 \|
	\| LoRA dropout \| 0.1 \|
	\| LoRA targets \| query, value \|
	\| GPU \| NVIDIA RTX 3090 (24 GB) \|
	\| Training time \| ~33 hours \|

	## Preprocessing

	Preprocessing is built into `handler.py` and must match at inference time:

	1. Convert to RGB
	2. Pad to square (black fill, centered)
	3. Resize to 224x224
	4. Normalize with ImageNet stats (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

	## Usage

	```python
	from transformers import Dinov2ForImageClassification, AutoImageProcessor
	from handler import get_inference_transform
	from PIL import Image
	import torch

	model = Dinov2ForImageClassification.from_pretrained("dchen0/font-classifier")
	processor = AutoImageProcessor.from_pretrained("dchen0/font-classifier")
	model.eval()

	transform = get_inference_transform(processor, processor.size["shortest_edge"])
	image = Image.open("font_sample.png").convert("RGB")
	pixel_values = transform(image).unsqueeze(0)

	with torch.no_grad():
	logits = model(pixel_values=pixel_values).logits

	predicted_class = logits.argmax(-1).item()
	print(model.config.id2label[predicted_class])
	```

	## Source

	- Training code: [github.com/Create-Inc/font-model](https://github.com/Create-Inc/font-model)
	- Results repo (checkpoints, logs): [dchen0/font-model-results](https://huggingface.co/dchen0/font-model-results)
	- Dataset: [dchen0/font_crops_v5](https://huggingface.co/datasets/dchen0/font_crops_v5)