Update 8-bit Privacy Filter artifact with expert quantization
Browse files- README.md +4 -10
- weights.safetensors +2 -2
README.md
CHANGED
|
@@ -38,7 +38,7 @@ After the model is downloaded once, inference runs locally. No document text is
|
|
| 38 |
- Tokenizer: `o200k_base` / tiktoken-style BPE
|
| 39 |
- Labels: `account_number`, `private_address`, `private_date`, `private_email`, `private_person`, `private_phone`, `private_url`, `secret`
|
| 40 |
|
| 41 |
-
|
| 42 |
|
| 43 |
## Quick Start: Python
|
| 44 |
|
|
@@ -75,7 +75,7 @@ Example output:
|
|
| 75 |
"word": "alice.smith@example.com",
|
| 76 |
"start": 39,
|
| 77 |
"end": 62,
|
| 78 |
-
"score": 0.
|
| 79 |
}
|
| 80 |
```
|
| 81 |
|
|
@@ -109,19 +109,13 @@ For iOS, run on Apple Silicon hardware. The iOS Simulator is not the recommended
|
|
| 109 |
|
| 110 |
## Validation
|
| 111 |
|
| 112 |
-
The 8-bit artifact was validated against the unquantized OpenMed MLX artifact with fixed text samples.
|
| 113 |
-
|
| 114 |
-
| Span | bf16 score | q8 score |
|
| 115 |
-
|---|---:|---:|
|
| 116 |
-
| `private_person` | 1.0000 | 1.0000 |
|
| 117 |
-
| `private_phone` | 0.9891 | 0.9881 |
|
| 118 |
-
| `private_email` | 0.9662 | 0.9604 |
|
| 119 |
-
| `private_address` | 0.9107 | 0.9051 |
|
| 120 |
|
| 121 |
OpenMed also includes unit tests for:
|
| 122 |
|
| 123 |
- q8 artifact loading
|
| 124 |
- quantization metadata decoding
|
|
|
|
| 125 |
- finite logits from the q8 runtime
|
| 126 |
- bf16/q8 shape and argmax-label coherence
|
| 127 |
- BIOES/Viterbi span decoding
|
|
|
|
| 38 |
- Tokenizer: `o200k_base` / tiktoken-style BPE
|
| 39 |
- Labels: `account_number`, `private_address`, `private_date`, `private_email`, `private_person`, `private_phone`, `private_url`, `secret`
|
| 40 |
|
| 41 |
+
This artifact uses expert-aware MLX quantization: embeddings, attention projections, MoE gates, sparse-MoE expert tensors, and the token-classification head are all stored in 8-bit packed form. The resulting `weights.safetensors` file is about 1.39 GiB, compared with about 2.61 GiB for the BF16 OpenMed MLX artifact.
|
| 42 |
|
| 43 |
## Quick Start: Python
|
| 44 |
|
|
|
|
| 75 |
"word": "alice.smith@example.com",
|
| 76 |
"start": 39,
|
| 77 |
"end": 62,
|
| 78 |
+
"score": 0.9998,
|
| 79 |
}
|
| 80 |
```
|
| 81 |
|
|
|
|
| 109 |
|
| 110 |
## Validation
|
| 111 |
|
| 112 |
+
The 8-bit artifact was validated against the unquantized OpenMed MLX artifact with fixed text samples. BF16 and Q8 returned identical grouped spans for person, date, phone, email, address, and account-number examples.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
|
| 114 |
OpenMed also includes unit tests for:
|
| 115 |
|
| 116 |
- q8 artifact loading
|
| 117 |
- quantization metadata decoding
|
| 118 |
+
- expert tensor packing and `.scales` coverage
|
| 119 |
- finite logits from the q8 runtime
|
| 120 |
- bf16/q8 shape and argmax-label coherence
|
| 121 |
- BIOES/Viterbi span decoding
|
weights.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:22c4e8323b5a39bd6ab42b8cf9f8920f7676158b7375759d28d053616fbd7d6e
|
| 3 |
+
size 1488841579
|