Efficient Drop-In Replacement for the Classification Head in Language Model Inference. https://github.com/embedl/flash-head
AI & ML interests
None defined yet.
Recent Activity
Organization Card
Embedl
Efficient AI for the edge.
Embedl develops advanced tools and algorithms for Edge AI. Our mission is to make AI models run faster, more energy-efficient, and reliably across diverse hardware platforms, while significantly reducing development time.
We help teams deploy high-performance AI on real-world, resource-constrained devices.
models 14
embedl/Qwen3-1.7B-FlashHead
2B β’ Updated β’ 47 β’ 3
embedl/Qwen3-1.7B-FlashHead-W4A16
2B β’ Updated β’ 95 β’ 3
embedl/gemma-3-270m-it-FlashHead
0.3B β’ Updated β’ 44 β’ 3
embedl/Qwen3-0.6B-FlashHead
0.6B β’ Updated β’ 48 β’ 4
embedl/Llama-3.2-1B-Instruct-FlashHead
1B β’ Updated β’ 47 β’ 4
embedl/Llama-3.2-1B-Instruct-FlashHead-W4A16
2B β’ Updated β’ 69 β’ 6
embedl/Llama-3.2-3B-Instruct-FlashHead-W4A16
4B β’ Updated β’ 79 β’ 4
embedl/Llama-3.2-3B-Instruct-FlashHead
3B β’ Updated β’ 64 β’ 4
embedl/gemma-3-1b-it-FlashHead
1.0B β’ Updated β’ 65 β’ 3
embedl/gemma-3-1b-it-FlashHead-W4A16
1B β’ Updated β’ 89 β’ 3