eugenehp
/

privacy-filter-rs

Token Classification

openai_privacy_filter

Model card Files Files and versions

privacy-filter-rs / README.md

eugenehp's picture

Add files using upload-large-folder tool

d2620e1 verified 13 days ago

|

history blame contribute delete

2.31 kB

	---
	license: apache-2.0
	pipeline_tag: token-classification
	library_name: burn
	tags:
	- rust
	- burn
	- privacy
	- PII
	- NER
	- token-classification
	- openai
	base_model: openai/privacy-filter
	---

	# OpenAI Privacy Filter — Rust/Burn Weights

	Safetensors weights for [openai/privacy-filter](https://huggingface.co/openai/privacy-filter), packaged for inference with [privacy-filter-rs](https://github.com/eugenehp/privacy-filter-rs) (pure-Rust, Burn ML framework).

	## Contents

	\| File \| Size \| Description \|
	\|---\|---\|---\|
	\| `model.safetensors` \| 2.6 GB \| Model weights (bfloat16) \|
	\| `config.json` \| 3 KB \| HuggingFace model configuration \|
	\| `tokenizer.json` \| 27 MB \| BPE tokenizer (o200k_base) \|
	\| `tokenizer_config.json` \| 234 B \| Tokenizer metadata \|
	\| `viterbi_calibration.json` \| 372 B \| Viterbi decoder operating points \|

	## Model Details

	- Architecture: Bidirectional transformer encoder with Sparse MoE
	- Parameters: 1.5B total, ~50M active per token (top-4 of 128 experts)
	- Hidden size: 640, Layers: 8, Heads: 14 Q / 2 KV (GQA)
	- Context: 128,000 tokens (YaRN RoPE, sliding window 257)
	- Output: 33 BIOES token classes over 8 privacy categories
	- Dtype: bfloat16 (converted to f32 at load time by the Rust runtime)

	## Privacy Categories

	1. `account_number`
	2. `private_address`
	3. `private_date`
	4. `private_email`
	5. `private_person`
	6. `private_phone`
	7. `private_url`
	8. `secret`

	## Usage with privacy-filter-rs

	```bash
	# Clone the Rust project
	git clone https://github.com/eugenehp/privacy-filter-rs
	cd privacy-filter-rs

	# Download weights into ./data (this repo)
	# git clone https://huggingface.co/eugenehp/privacy-filter-rs data

	# Run inference
	cargo run --release -- -m data "My name is Alice Smith"
	```

	```rust
	use privacy_filter_rs::{PrivacyFilterInference, backend::{B, Device}};
	use std::path::Path;

	let device = <Device as Default>::default();
	let engine = PrivacyFilterInference::<B>::load(Path::new("data"), device)?;

	let spans = engine.predict("My name is Alice Smith")?;
	for s in &spans {
	println!("{}: {} (score: {:.4})", s.entity_group, s.word, s.score);
	}
	// private_person: Alice Smith (score: 1.0000)
	```

	## License

	Apache 2.0 — same as the upstream [openai/privacy-filter](https://huggingface.co/openai/privacy-filter) model.