Mahmud ElHuseyni π΅πΈ
MElHuseyni
AI & ML interests
Computer Vision
NLP
Machine Learning
Recent Activity
liked a dataset 2 days ago
QCRI/MenaSpeechBank liked a model 2 days ago
MiniMaxAI/MiniMax-M2.7 upvoted an article 3 days ago
Multimodal Embedding & Reranker Models with Sentence TransformersOrganizations
SmolVLM π
-
HuggingFaceTB/SmolVLM-Instruct
Image-Text-to-Text β’ 2B β’ Updated β’ 29.8k β’ 583 -
OpenGVLab/InternVL3-1B
Image-Text-to-Text β’ 0.9B β’ Updated β’ 141k β’ 81 -
OpenGVLab/InternVL3-2B
Image-Text-to-Text β’ Updated β’ 35.4k β’ 45 -
LiquidAI/LFM2-VL-450M
Image-Text-to-Text β’ 0.5B β’ Updated β’ 23.5k β’ 146
OCR Models ποΈπ
Visual Embedding Models πΌοΈ
-
jinaai/jina-embeddings-v4
Visual Document Retrieval β’ 4B β’ Updated β’ 295k β’ 497 -
vidore/colqwen2.5-v0.2
Visual Document Retrieval β’ Updated β’ 75.2k β’ 98 -
nomic-ai/colnomic-embed-multimodal-7b
Visual Document Retrieval β’ Updated β’ 5.68k β’ 105 -
nvidia/llama-nemoretriever-colembed-3b-v1
Visual Document Retrieval β’ Updated β’ 311 β’ 74
Speech Models π§
-
ICTNLP/Llama-3.1-8B-Omni
Updated β’ 63 β’ 418 -
AudioPaLM: A Large Language Model That Can Speak and Listen
Paper β’ 2306.12925 β’ Published β’ 56 -
OpenMOSS-Team/SpeechGPT-7B-cm
Text Generation β’ Updated β’ 14 β’ 8 -
parler-tts/parler_tts_mini_v0.1
Text-to-Speech β’ 0.6B β’ Updated β’ 3.64k β’ 358
Arabic Models (LLM, VLM, Multimodel)
-
NAMAA-Space/GATE-Reranker-V1
Text Ranking β’ 0.1B β’ Updated β’ 281 β’ 10 -
NAMAA-Space/gliner_arabic-v2.1
Token Classification β’ Updated β’ 336 β’ 15 -
NAMAA-Space/AraModernBert-Base-V1.0
Fill-Mask β’ 0.1B β’ Updated β’ 271 β’ 14 -
NAMAA-Space/AraModernBert-Base-STS
Sentence Similarity β’ 0.1B β’ Updated β’ 7 β’ 6
Image Segmentation Models πͺ
-
nvidia/segformer-b5-finetuned-cityscapes-1024-1024
Image Segmentation β’ Updated β’ 38.1k β’ β’ 41 -
nvidia/segformer-b0-finetuned-ade-512-512
Image Segmentation β’ 3.75M β’ Updated β’ 562k β’ β’ 184 -
facebook/maskformer-swin-base-ade
Image Segmentation β’ Updated β’ 1.81k β’ β’ 13 -
facebook/maskformer-swin-base-coco
Image Segmentation β’ 0.1B β’ Updated β’ 1.77k β’ β’ 26
Object Detection Models π
VLM Leaderboards π
- Running46
OCRBenchv2 Leaderboard
π46Display OCRBench leaderboard for text recognition models
- Running203
Vidore Leaderboard
π₯203Browse and compare visual document retrieval model scores
- Running on CPU Upgrade1.01k
Open VLM Leaderboard
π1.01kVLMEvalKit Evaluation Results Collection
- RunningFeatured562
Vision Arena (Testing VLMs side-by-side)
πΌ562Explore Vision Arenaβs computerβvision tools online
Emotion Detection
Arabic Models (LLM, VLM, Multimodel)
-
NAMAA-Space/GATE-Reranker-V1
Text Ranking β’ 0.1B β’ Updated β’ 281 β’ 10 -
NAMAA-Space/gliner_arabic-v2.1
Token Classification β’ Updated β’ 336 β’ 15 -
NAMAA-Space/AraModernBert-Base-V1.0
Fill-Mask β’ 0.1B β’ Updated β’ 271 β’ 14 -
NAMAA-Space/AraModernBert-Base-STS
Sentence Similarity β’ 0.1B β’ Updated β’ 7 β’ 6
SmolVLM π
-
HuggingFaceTB/SmolVLM-Instruct
Image-Text-to-Text β’ 2B β’ Updated β’ 29.8k β’ 583 -
OpenGVLab/InternVL3-1B
Image-Text-to-Text β’ 0.9B β’ Updated β’ 141k β’ 81 -
OpenGVLab/InternVL3-2B
Image-Text-to-Text β’ Updated β’ 35.4k β’ 45 -
LiquidAI/LFM2-VL-450M
Image-Text-to-Text β’ 0.5B β’ Updated β’ 23.5k β’ 146
Image Segmentation Models πͺ
-
nvidia/segformer-b5-finetuned-cityscapes-1024-1024
Image Segmentation β’ Updated β’ 38.1k β’ β’ 41 -
nvidia/segformer-b0-finetuned-ade-512-512
Image Segmentation β’ 3.75M β’ Updated β’ 562k β’ β’ 184 -
facebook/maskformer-swin-base-ade
Image Segmentation β’ Updated β’ 1.81k β’ β’ 13 -
facebook/maskformer-swin-base-coco
Image Segmentation β’ 0.1B β’ Updated β’ 1.77k β’ β’ 26
OCR Models ποΈπ
Object Detection Models π
Visual Embedding Models πΌοΈ
-
jinaai/jina-embeddings-v4
Visual Document Retrieval β’ 4B β’ Updated β’ 295k β’ 497 -
vidore/colqwen2.5-v0.2
Visual Document Retrieval β’ Updated β’ 75.2k β’ 98 -
nomic-ai/colnomic-embed-multimodal-7b
Visual Document Retrieval β’ Updated β’ 5.68k β’ 105 -
nvidia/llama-nemoretriever-colembed-3b-v1
Visual Document Retrieval β’ Updated β’ 311 β’ 74
VLM Leaderboards π
- Running46
OCRBenchv2 Leaderboard
π46Display OCRBench leaderboard for text recognition models
- Running203
Vidore Leaderboard
π₯203Browse and compare visual document retrieval model scores
- Running on CPU Upgrade1.01k
Open VLM Leaderboard
π1.01kVLMEvalKit Evaluation Results Collection
- RunningFeatured562
Vision Arena (Testing VLMs side-by-side)
πΌ562Explore Vision Arenaβs computerβvision tools online
Speech Models π§
-
ICTNLP/Llama-3.1-8B-Omni
Updated β’ 63 β’ 418 -
AudioPaLM: A Large Language Model That Can Speak and Listen
Paper β’ 2306.12925 β’ Published β’ 56 -
OpenMOSS-Team/SpeechGPT-7B-cm
Text Generation β’ Updated β’ 14 β’ 8 -
parler-tts/parler_tts_mini_v0.1
Text-to-Speech β’ 0.6B β’ Updated β’ 3.64k β’ 358