SemplificaAI
/

gliner2-multi-v1-onnx

@@ -22,15 +22,15 @@ The model is specifically exported in a fragmented format (encoder, span_rep, co
 The ONNX conversion, combined with the Rust native engine (`ort` binding), allows this model to run extremely fast on both GPUs and edge devices like NPUs.
-**Benchmark Task:** Extracting 14 targeted entities spanning 62 classes from a complex 4-sentence text.
-| Hardware | Execution Provider | Avg Time / Entity | Avg Total Time |
-| :--- | :--- | :--- | :--- |
-| **NVIDIA RTX 4090** | CUDA (FP16) | **~12.0 ms** 🚀 | ~168.11 ms |
-| **NVIDIA RTX 3090** | CUDA (FP16) | **~11.6 ms** 🚀 | ~162.46 ms |
-| **Qualcomm Snapdragon X Elite** | QNN (NPU Native) | **~22.78 ms** ✨ | ~1.16 s (51 entities) |
-| **Qualcomm Snapdragon X Elite** | CPU (ARM NEON) | **~28.62 ms** | ~1.45 s (51 entities) |
-| **AMD Ryzen 9 5900XT** (16-Core) | CPU (x86 AVX2) | **~30.16 ms** 💻 | ~422.37 ms |
 *Note: The NPU matches high-end Desktop CPUs while consuming a fraction of the power!*

 The ONNX conversion, combined with the Rust native engine (`ort` binding), allows this model to run extremely fast on both GPUs and edge devices like NPUs.
+**Benchmark Task:** Tested on complex text extraction tasks spanning up to 62 classes (metrics normalized per extracted entity to allow cross-device comparison).
+| Hardware | Execution Provider | Avg Time / Entity |
+| :--- | :--- | :--- |
+| **NVIDIA RTX 4090** | CUDA (FP16) | **~12.0 ms** 🚀 |
+| **NVIDIA RTX 3090** | CUDA (FP16) | **~11.6 ms** 🚀 |
+| **Qualcomm Snapdragon X Elite** | QNN (NPU Native) | **~22.78 ms** ✨ |
+| **Qualcomm Snapdragon X Elite** | CPU (ARM NEON) | **~28.62 ms** |
+| **AMD Ryzen 9 5900XT** (16-Core) | CPU (x86 AVX2) | **~30.16 ms** 💻 |
 *Note: The NPU matches high-end Desktop CPUs while consuming a fraction of the power!*