--- library_name: transformers base_model: - openai/privacy-filter --- This tiny model is intended for debugging. It is randomly initialized using the configuration adapted from [openai/privacy-filter](https://huggingface.co/openai/privacy-filter). | File path | Size | |------|------| | model.safetensors | 4.1MB | ### Example usage: ```python import torch from transformers import AutoModelForTokenClassification, AutoTokenizer model_id = "yujiepan/openai-privacy-filter-tiny-random" device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForTokenClassification.from_pretrained( model_id, dtype=torch.bfloat16, ).to(device) text = '' for i in range(10): text += f'Contact me at test{i}@example.com or call 555-0000-{i}. ' enc = tokenizer(text, return_tensors='pt').to(device) with torch.no_grad(): outputs = model(**enc) predicted_token_class_ids = outputs.logits.argmax(dim=-1) predicted_token_classes = [model.config.id2label[token_id.item()] for token_id in predicted_token_class_ids[0]] print(predicted_token_classes, len(predicted_token_classes)) ``` ### Codes to create this repo:
Click to expand ```python # Generated by AI. import json from pathlib import Path import torch from huggingface_hub import hf_hub_download from transformers import ( AutoConfig, AutoModelForTokenClassification, AutoTokenizer, set_seed, ) source_model_id = "openai/privacy-filter" save_folder = "/tmp/yujiepan/openai-privacy-filter-tiny-random" Path(save_folder).mkdir(parents=True, exist_ok=True) for filename in ( 'tokenizer.json', 'tokenizer_config.json', 'viterbi_calibration.json', ): hf_hub_download( repo_id=source_model_id, filename=filename, repo_type='model', local_dir=save_folder, ) with open( hf_hub_download(source_model_id, filename='config.json', repo_type='model'), 'r', encoding='utf-8', ) as f: config_json: dict = json.load(f) config_json.update({ 'num_hidden_layers': 4, 'hidden_size': 8, 'intermediate_size': 32, 'num_attention_heads': 8, 'num_key_value_heads': 4, 'head_dim': 32, }) config_json.pop('transformers.js_config', None) with open(f'{save_folder}/config.json', 'w', encoding='utf-8') as f: json.dump(config_json, f, indent=2) config = AutoConfig.from_pretrained(save_folder) print(config) torch.set_default_dtype(torch.bfloat16) model = AutoModelForTokenClassification.from_config(config, trust_remote_code=True) torch.set_default_dtype(torch.float32) model = model.cpu() set_seed(42) with torch.no_grad(): for name, p in sorted(model.named_parameters()): torch.nn.init.normal_(p, mean=0.0, std=0.8) print(name, tuple(p.shape)) for i in range(model.config.num_hidden_layers): model.model.layers[i].self_attn.sinks = torch.nn.Parameter(model.model.layers[i].self_attn.sinks.float()) model.save_pretrained(save_folder) print(model) ```
### Printing the model:
Click to expand ```text OpenAIPrivacyFilterForTokenClassification( (model): OpenAIPrivacyFilterModel( (embed_tokens): Embedding(200064, 8, padding_idx=199999) (layers): ModuleList( (0-3): 4 x OpenAIPrivacyFilterEncoderLayer( (self_attn): OpenAIPrivacyFilterAttention( (q_proj): Linear(in_features=8, out_features=256, bias=True) (k_proj): Linear(in_features=8, out_features=128, bias=True) (v_proj): Linear(in_features=8, out_features=128, bias=True) (o_proj): Linear(in_features=256, out_features=8, bias=True) ) (mlp): OpenAIPrivacyFilterMLP( (router): OpenAIPrivacyFilterTopKRouter() (experts): OpenAIPrivacyFilterExperts() ) (input_layernorm): OpenAIPrivacyFilterRMSNorm((8,), eps=1e-05) (post_attention_layernorm): OpenAIPrivacyFilterRMSNorm((8,), eps=1e-05) ) ) (norm): OpenAIPrivacyFilterRMSNorm((8,), eps=1e-05) (rotary_emb): OpenAIPrivacyFilterRotaryEmbedding() ) (dropout): Dropout(p=0.0, inplace=False) (score): Linear(in_features=8, out_features=33, bias=True) ) ```
### Test environment: - torch: 2.11.0+cu126 - transformers: 5.7.0.dev0