Upload folder using huggingface_hub
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ pipeline_tag: image-to-text
|
|
| 18 |
Capri is a compact image captioning model designed for high-throughput, plain-language descriptions.
|
| 19 |
It supports two inference paths: direct image input or precomputed SigLIP2 pooled embeddings.
|
| 20 |
|
| 21 |
-
The project started from a practical pipeline constraint: existing captioning models were either too slow or too weak for reliable image understanding.
|
| 22 |
|
| 23 |
The name comes from the small Italian island of Capri and also hints at the goal of the project: a small CAPtioner with Rapid Inference.
|
| 24 |
|
|
@@ -108,5 +108,5 @@ Trained on captions from the [COCO 2017](https://cocodataset.org/) dataset.
|
|
| 108 |
> Lin, T.-Y., et al. "Microsoft COCO: Common Objects in Context." ECCV 2014. [arXiv:1405.0312](https://arxiv.org/abs/1405.0312)
|
| 109 |
|
| 110 |
Built on top of:
|
| 111 |
-
- [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B)
|
| 112 |
-
- [google/siglip2-base-patch16-224](https://huggingface.co/google/siglip2-base-patch16-224)
|
|
|
|
| 18 |
Capri is a compact image captioning model designed for high-throughput, plain-language descriptions.
|
| 19 |
It supports two inference paths: direct image input or precomputed SigLIP2 pooled embeddings.
|
| 20 |
|
| 21 |
+
The project started from a practical pipeline constraint: existing captioning models were either too slow or too weak for reliable image understanding. That constraint sparked the idea for Capri: since SigLIP embeddings were already computed upstream, why not pair them with a small LLM decoder and get both strong visual representations and fast text generation?
|
| 22 |
|
| 23 |
The name comes from the small Italian island of Capri and also hints at the goal of the project: a small CAPtioner with Rapid Inference.
|
| 24 |
|
|
|
|
| 108 |
> Lin, T.-Y., et al. "Microsoft COCO: Common Objects in Context." ECCV 2014. [arXiv:1405.0312](https://arxiv.org/abs/1405.0312)
|
| 109 |
|
| 110 |
Built on top of:
|
| 111 |
+
- [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) - Apache 2.0
|
| 112 |
+
- [google/siglip2-base-patch16-224](https://huggingface.co/google/siglip2-base-patch16-224) - Apache 2.0
|