Ligul
/

capri

@@ -18,7 +18,7 @@ pipeline_tag: image-to-text
 Capri is a compact image captioning model designed for high-throughput, plain-language descriptions.
 It supports two inference paths: direct image input or precomputed SigLIP2 pooled embeddings.
-The project started from a practical pipeline constraint: existing captioning models were either too slow or too weak for reliable image understanding. Since SigLIP embeddings were already computed upstream, Capri was built to reuse them with a small LLM decoder, combining strong visual representations with fast text generation.
 The name comes from the small Italian island of Capri and also hints at the goal of the project: a small CAPtioner with Rapid Inference.
@@ -108,5 +108,5 @@ Trained on captions from the [COCO 2017](https://cocodataset.org/) dataset.
 > Lin, T.-Y., et al. "Microsoft COCO: Common Objects in Context." ECCV 2014. [arXiv:1405.0312](https://arxiv.org/abs/1405.0312)
 Built on top of:
-- [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) — Apache 2.0
-- [google/siglip2-base-patch16-224](https://huggingface.co/google/siglip2-base-patch16-224) — Apache 2.0

 Capri is a compact image captioning model designed for high-throughput, plain-language descriptions.
 It supports two inference paths: direct image input or precomputed SigLIP2 pooled embeddings.
+The project started from a practical pipeline constraint: existing captioning models were either too slow or too weak for reliable image understanding. That constraint sparked the idea for Capri: since SigLIP embeddings were already computed upstream, why not pair them with a small LLM decoder and get both strong visual representations and fast text generation?
 The name comes from the small Italian island of Capri and also hints at the goal of the project: a small CAPtioner with Rapid Inference.
 > Lin, T.-Y., et al. "Microsoft COCO: Common Objects in Context." ECCV 2014. [arXiv:1405.0312](https://arxiv.org/abs/1405.0312)
 Built on top of:
+- [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) - Apache 2.0
+- [google/siglip2-base-patch16-224](https://huggingface.co/google/siglip2-base-patch16-224) - Apache 2.0