fastino/gliner2-large-v1
Updated • 157k • 65
Production inference for encoder models with vLLM plugins. colbert, colpali, GLiNER, GLiNER2 etc. — github.com/ddickmann/vllm-factory
Note Supported in vLLM Factory for production NER serving. Benchmark highlight: MT5/GLiNER family reached up to 11.7x throughput on RTX A5000 with parity preserved. Repo: https://github.com/ddickmann/vllm-factory
Note Supported in vLLM Factory for ColBERT-style retrieval serving. Benchmark highlight: 8.6x throughput on RTX A5000 with parity preserved. Repo: https://github.com/ddickmann/vllm-factory
Note Supported in vLLM Factory for retrieval serving. Benchmark highlight: ModernColBERT reached 3.3x throughput on RTX A5000 with parity preserved. Repo: https://github.com/ddickmann/vllm-factory