Efficient Drop-In Replacement for the Classification Head in Language Model Inference. https://github.com/embedl/flash-head
AI & ML interests
None defined yet.
Recent Activity
nvidia/Cosmos-Reason2 multi-modal reasoning models optimized by Embedl.
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text β’ 2B β’ Updated β’ 459 β’ 7 -
embedl/Cosmos-Reason2-2B-NVFP4A16
Image-Text-to-Text β’ 2B β’ Updated β’ 22 β’ 1 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text β’ 2B β’ Updated β’ 472 β’ 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text β’ 2B β’ Updated β’ 23.1k β’ 12
Models optimized and bench-marked for NVIDIA Jetson AGX Orin. Memory-efficient and latency-optimized variants designed for real-time edge inference.
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text β’ 2B β’ Updated β’ 23.1k β’ 12 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text β’ 2B β’ Updated β’ 472 β’ 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text β’ 2B β’ Updated β’ 459 β’ 7 -
Edge Inference Benchmarks
π5On-Device benchmarks across devices and models.
Quantization strategy where most weights are converted to INT4, activations remain in FP16, and sensitive layers are preserved in FP16.
Ultra-efficient model variants optimized for Jetson Orin Nano. Designed for constrained edge environments requiring low memory footprint.
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text β’ 2B β’ Updated β’ 23.1k β’ 12 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text β’ 2B β’ Updated β’ 472 β’ 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text β’ 2B β’ Updated β’ 459 β’ 7 -
Edge Inference Benchmarks
π5On-Device benchmarks across devices and models.
Models validated and performance-optimized for NVIDIA Jetson AGX Thor. Tailored for high-performance edge AI workloads.
-
embedl/Cosmos-Reason2-2B-NVFP4A16
Image-Text-to-Text β’ 2B β’ Updated β’ 22 β’ 1 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text β’ 2B β’ Updated β’ 23.1k β’ 12 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text β’ 2B β’ Updated β’ 472 β’ 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text β’ 2B β’ Updated β’ 459 β’ 7
Efficient Drop-In Replacement for the Classification Head in Language Model Inference. https://github.com/embedl/flash-head
Quantization strategy where most weights are converted to INT4, activations remain in FP16, and sensitive layers are preserved in FP16.
nvidia/Cosmos-Reason2 multi-modal reasoning models optimized by Embedl.
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text β’ 2B β’ Updated β’ 459 β’ 7 -
embedl/Cosmos-Reason2-2B-NVFP4A16
Image-Text-to-Text β’ 2B β’ Updated β’ 22 β’ 1 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text β’ 2B β’ Updated β’ 472 β’ 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text β’ 2B β’ Updated β’ 23.1k β’ 12
Ultra-efficient model variants optimized for Jetson Orin Nano. Designed for constrained edge environments requiring low memory footprint.
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text β’ 2B β’ Updated β’ 23.1k β’ 12 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text β’ 2B β’ Updated β’ 472 β’ 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text β’ 2B β’ Updated β’ 459 β’ 7 -
Edge Inference Benchmarks
π5On-Device benchmarks across devices and models.
Models optimized and bench-marked for NVIDIA Jetson AGX Orin. Memory-efficient and latency-optimized variants designed for real-time edge inference.
-
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text β’ 2B β’ Updated β’ 23.1k β’ 12 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text β’ 2B β’ Updated β’ 472 β’ 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text β’ 2B β’ Updated β’ 459 β’ 7 -
Edge Inference Benchmarks
π5On-Device benchmarks across devices and models.
Models validated and performance-optimized for NVIDIA Jetson AGX Thor. Tailored for high-performance edge AI workloads.
-
embedl/Cosmos-Reason2-2B-NVFP4A16
Image-Text-to-Text β’ 2B β’ Updated β’ 22 β’ 1 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2
Image-Text-to-Text β’ 2B β’ Updated β’ 23.1k β’ 12 -
embedl/Cosmos-Reason2-2B-W4A16
Image-Text-to-Text β’ 2B β’ Updated β’ 472 β’ 7 -
embedl/Cosmos-Reason2-2B-W4A16-Edge2-FlashHead
Image-Text-to-Text β’ 2B β’ Updated β’ 459 β’ 7