README.md · ENC-PSL/BSICLE at main

BSICLE / README.md

lterriel

Update README.md

42e3771 verified about 11 hours ago

preview code

raw

history blame contribute delete

12 kB

	---
	library_name: onnxruntime
	pipeline_tag: image-classification
	tags:
	- onnx
	- image-classification
	- medieval-manuscripts
	- illumination-detection
	- mobilenet
	- mobilevit
	- glam
	- iiif
	- cultural-heritage
	- digital-humanities
	- medieval-folio
	- medieval
	- medieval-illuminations
	- MobileNet
	- MobileVit
	license: apache-2.0
	datasets:
	- ENC-PSL/medieval-folio-illumination-bin-dataset
	base_model:
	- timm/mobilenetv3_small_100.lamb_in1k
	- timm/mobilenetv3_large_100.ra_in1k
	- timm/mobilenetv2_100.ra_in1k
	- apple/mobilevitv2-1.0-imagenet1k-256
	---

	# BSICLE — Binary System for Illuminated Folio Classification with Lightweight Engines

	BSICLE (pronounced bé-si-cle ; /be.zikl/) is a family of lightweight binary models for classify illuminated folios in medieval manuscripts. Theses models are developed at the [École nationale des chartes – PSL](https://www.chartes.psl.eu/) in context of [O.D.I.L. project](https://projet.biblissima.fr/fr/appels-projets/projets-retenus/odil-objet-detection-illuminations).

	These models classify manuscript pages as:

	- illuminated (miniatures, historiated initials, decorated pages etc.)
	- non-illuminated (plain text folio, printer marks, tables, cover, blank folios etc.)

	# Use cases

	Models are optimized to run locally (CPU) or in the browser using edge-compatibility architecture (MobileNet, MobileViT) and ONNX inference for exemple to build IIIF filter pipelines or to build specialized corpora.

	Try the demo web application on [hf spaces](https://huggingface.co/spaces/ENC-PSL/Medieval-Illumination-Detector)

	# Models & Results

	The finetuned models available in this repository are based on following architecture:

	- [MobileNetV2](https://huggingface.co/timm/mobilenetv2_100.ra_in1k)
	- [MobileNetV3](https://huggingface.co/timm/mobilenetv3_small_100.lamb_in1k) (small and large version)
	- [MobileViT v2](https://huggingface.co/apple/mobilevitv2-1.0-imagenet1k-256)

	\| Architecture \| Validation Accuracy \| Test Accuracy \|
	\|--------------\|--------------------:\|--------------:\|
	\| MobileNetV2 \| 0.995 \| 0.982 \|
	\| MobileNetV3 Small \| 0.991 \| 0.968 \|
	\| MobileNetV3 Large \| 1.0 \| 0.986 \|
	\| MobileViT v2 \| 0.995 \| 0.977 \|

	> These results should be interpreted with care. Although the models reach very high scores on the current splits, the task may be partially dataset-dependent.

	# Labels

	\| Label ID \| Label \|
	\|---------\|------\|
	\| 0 \| non_illuminated \|
	\| 1 \| illuminated \|

	# Dataset

	Training data comes from: [ENC-PSL/odil-medieval-folio-illumination-bin-dataset](https://huggingface.co/datasets/ENC-PSL/odil-medieval-folio-illumination-bin-dataset)

	## Distribution of data

	- illuminated
	- train: 519
	- dev : 112
	- test : 111
	- non_illuminated
	- train: 519
	- dev : 111
	- test : 112

	> Data augmentation. During training, data augmentation was applied to the training split only in order to improve robustness and reduce overfitting.
	> The augmentation pipeline included random horizontal flips, small random rotations up to 5°, and light color jittering with brightness `0.12`, contrast `0.12`, saturation `0.08`, and hue `0.02`.
	> Validation and test images were evaluated without augmentation.

	# What counts as "illuminated"?

	### Positive (illuminated)

	Examples include:

	- miniatures
	- historiated initials
	- decorative initials
	- scientific diagrams
	- maps
	- decorated manuscript pages


	### Negative (non-illuminated)

	Examples include:

	- plain text folios
	- marginal decorations without images
	- printer marks
	- tables
	- cover
	- blank folios
	- rubricated text without illumination

	### Examples

	\| Illuminated \| Not Illuminated \|
	\|:-----------:\|:---------------:\|
	\| <img src="assets/illuminated/1.png" width="150" height="150" style="object-fit:cover;"> \| <img src="assets/not_illuminated/1.jpeg" width="150" height="150" style="object-fit:cover;"> \|
	\| <img src="assets/illuminated/2.jpg" width="150" height="150" style="object-fit:cover;"> \| <img src="assets/not_illuminated/2.jpg" width="150" height="150" style="object-fit:cover;"> \|
	\| <img src="assets/illuminated/3.png" width="150" height="150" style="object-fit:cover;"> \| <img src="assets/not_illuminated/3.jpg" width="150" height="150" style="object-fit:cover;"> \|
	\| <img src="assets/illuminated/4.jpg" width="150" height="150" style="object-fit:cover;"> \| <img src="assets/not_illuminated/4.jpg" width="150" height="150" style="object-fit:cover;"> \|
	\| <img src="assets/illuminated/5.jpg" width="150" height="150" style="object-fit:cover;"> \| <img src="assets/not_illuminated/5.png" width="150" height="150" style="object-fit:cover;"> \|
	\| <img src="assets/illuminated/6.jpg" width="150" height="150" style="object-fit:cover;"> \| <img src="assets/not_illuminated/6.jpg" width="150" height="150" style="object-fit:cover;"> \|
	# Usage

	## Python — ONNX local

	```bash
	pip install onnxruntime pillow numpy
	```

	```python
	import json
	import numpy as np
	import onnxruntime as ort
	from PIL import Image
	from pathlib import Path

	run = Path("./mobilenet_v3_large")

	cfg = json.loads((run / "inference_config.json").read_text())
	pre = json.loads((run / "preprocess.json").read_text())

	img = Image.open("page.jpg").convert("RGB").resize((pre["img_size"], pre["img_size"]))
	x = np.asarray(img).astype("float32") / 255.0
	x = (x - np.array(pre["mean"])) / np.array(pre["std"])
	x = x.transpose(2, 0, 1)[None].astype("float32")

	sess = ort.InferenceSession(str(run / "onnx/model.onnx"))
	logits = sess.run(None, {cfg["input_name"]: x})[0][0]

	probs = np.exp(logits - logits.max())
	probs = probs / probs.sum()

	p_illu = float(probs[cfg["positive_index"]])
	label = cfg["positive_label"] if p_illu >= cfg["threshold"] else "non_illumination"

	print(label, p_illu)
	```

	## Python — ONNX from Hugging Face

	```bash
	pip install huggingface_hub onnxruntime pillow numpy
	```

	```python
	from huggingface_hub import snapshot_download
	from pathlib import Path

	repo = "lterriel/medieval-illumination-bin-classifier"
	run_name = "final_mobilenetv3_large"
	local_dir = Path(snapshot_download(
	repo_id=repo,
	allow_patterns=[
	f"{run_name}/onnx/model.onnx",
	f"{run_name}/preprocess.json",
	f"{run_name}/inference_config.json",
	],
	)) / run_name
	```

	Then use the same ONNX code as above, replacing:

	```python
	run = Path("./mobilenet_v3_large")
	```

	with:

	```python
	run = local_dir
	```

	## Python — PyTorch / non-ONNX local

	```bash
	pip install torch torchvision pillow numpy
	```

	```python
	import json
	import torch
	import numpy as np
	from PIL import Image
	from pathlib import Path
	from torchvision import models

	run = Path("./mobilenet_v3_large")

	cfg = json.loads((run / "inference_config.json").read_text())
	pre = json.loads((run / "preprocess.json").read_text())

	model = models.mobilenet_v3_large(weights=None)
	model.classifier[-1] = torch.nn.Linear(model.classifier[-1].in_features, 2)
	model.load_state_dict(torch.load(run / "checkpoints/best.pt", map_location="cpu"))
	model.eval()

	img = Image.open("page.jpg").convert("RGB").resize((pre["img_size"], pre["img_size"]))
	x = np.asarray(img).astype("float32") / 255.0
	x = (x - np.array(pre["mean"])) / np.array(pre["std"])
	x = torch.tensor(x.transpose(2, 0, 1)[None]).float()

	with torch.no_grad():
	logits = model(x)
	probs = torch.softmax(logits, dim=1)[0]

	p_illu = float(probs[cfg["positive_index"]])
	label = cfg["positive_label"] if p_illu >= cfg["threshold"] else "non_illumination"

	print(label, p_illu)
	```

	For another torchvision architecture, replace the model constructor:

	- mobilenetV2

	```
	model = models.mobilenet_v2(weights=None)
	model.classifier[-1] = torch.nn.Linear(model.classifier[-1].in_features, 2)
	```

	- mobilenetV2 (small)

	```
	model = models.mobilenet_v3_small(weights=None)
	model.classifier[-1] = torch.nn.Linear(model.classifier[-1].in_features, 2)
	```

	## Python — PyTorch / non-ONNX from Hugging Face

	```bash
	pip install huggingface_hub torch torchvision pillow numpy
	```

	```python
	from huggingface_hub import snapshot_download
	from pathlib import Path

	repo = "lterriel/medieval-illumination-bin-classifier"
	run_name = "final_mobilenetv3_large"

	run = Path(snapshot_download(
	repo_id=repo,
	allow_patterns=[
	f"{run_name}/checkpoints/best.pt",
	f"{run_name}/preprocess.json",
	f"{run_name}/inference_config.json",
	],
	)) / run_name
	```

	Then use the same PyTorch code as above.

	## JS (HF - ONNX)

	```javascript
	<script src="https://cdn.jsdelivr.net/npm/onnxruntime-web/dist/ort.min.js"></script>
	<input type="file" id="file" accept="image/*">
	<pre id="out"></pre>

	<script type="module">
	const run = "https://huggingface.co/lterriel/medieval-illumination-bin-classifier/resolve/main/final_mobilenetv3_large";

	const cfg = await fetch(`${run}/inference_config.json`).then(r => r.json());
	const pre = await fetch(`${run}/preprocess.json`).then(r => r.json());
	const sess = await ort.InferenceSession.create(`${run}/onnx/model.onnx`);

	function softmax(a) {
	const m = Math.max(...a);
	const e = a.map(x => Math.exp(x - m));
	const s = e.reduce((x, y) => x + y, 0);
	return e.map(x => x / s);
	}

	async function imageToTensor(file) {
	const img = new Image();
	img.src = URL.createObjectURL(file);
	await img.decode();

	const size = pre.img_size;
	const canvas = document.createElement("canvas");
	canvas.width = size;
	canvas.height = size;

	const ctx = canvas.getContext("2d");
	ctx.drawImage(img, 0, 0, size, size);

	const data = ctx.getImageData(0, 0, size, size).data;
	const x = new Float32Array(1 * 3 * size * size);

	for (let i = 0, p = 0; i < data.length; i += 4, p++) {
	x[p] = (data[i] / 255 - pre.mean[0]) / pre.std[0];
	x[size * size + p] = (data[i + 1] / 255 - pre.mean[1]) / pre.std[1];
	x[2 * size * size + p] = (data[i + 2] / 255 - pre.mean[2]) / pre.std[2];
	}

	return new ort.Tensor("float32", x, [1, 3, size, size]);
	}

	document.querySelector("#file").onchange = async (e) => {
	const tensor = await imageToTensor(e.target.files[0]);
	const res = await sess.run({ [cfg.input_name]: tensor });

	const logits = Array.from(res[cfg.output_name].data);
	const probs = softmax(logits);

	const pIllu = probs[cfg.positive_index];
	const label = pIllu >= cfg.threshold ? cfg.positive_label : "non_illumination";

	document.querySelector("#out").textContent = JSON.stringify({
	label,
	p_illumination: pIllu,
	probs
	}, null, 2);
	};
	</script>
	```

	# Training tools

	All models are finetuned with img-clf-framework, a training framework for binary image classification pipelines. Check the [training repository here]()

	# Citation

	If you use these models in your research, please cite:

	```
	@software{terriel_bsicle_2026,
	AUTHOR = {Terriel, Lucas and Jolivet, Vincent},
	TITLE = {{BSICLE}: Binary System for Illuminated Folio Classification with Lightweight Engines},
	YEAR = {2026},
	PUBLISHER = {Hugging Face},
	INSTITUTION = {{École nationale des chartes -- PSL}},
	URL = {https://huggingface.co/ENC-PSL/medieval-illumination-bin-classifier},
	NOTE = {Family of lightweight binary image classification models for detecting illuminated folios in medieval manuscripts, developed in the context of the O.D.I.L. project},
	LICENSE = {apache-2.0},
	VERSION = {0.0.1}
	}
	```

	# Funding

	<div style="display: flex; align-items: center; justify-content: center; gap: 20px; max-width: 800px; margin: 0 auto;">
	<img src="./assets/odil-logo.png" width="180" alt="Logo ODIL" style="flex: 0 0 auto;">

	<p style="text-align: justify; margin: 0;">
	These models were developed at
	<a href="https://www.chartes.psl.eu/" target="_blank" rel="noopener">
	École nationale des chartes – PSL
	</a>
	in the context of the
	<a href="https://projet.biblissima.fr/fr/appels-projets/projets-retenus/odil-objet-detection-illuminations" target="_blank" rel="noopener">
	O.D.I.L. project
	</a>.
	</p>
	</div>