Which languages does it support?
There is no information in the card
There is no information in the card
However, the Nvidia model supports multilingualism and is overall many times better than openai/privacy-filter. You can try it here.
https://huggingface.co/openai/privacy-filter#model-metadata
Language(s): Primarily English; selected multilingual robustness evaluation reported
Our intention is that the model is easily fine-tuned to make it match different languages (and even multilingual text)
Our intention is that the model is easily fine-tuned to make it match different languages (and even multilingual text)
Yes, you could, but why? The model should have been multilingual from the start. If you fine-tune it, it will likely become worse on English and lose some of its knowledge. The model card mentions adding classes, not languages for the model. When gpt-oss was first released, it was bad for multilingual scenarios.
Moreover, the model was apparently trained with MoE and non-standard attention, so you'd need to search for this specific model's speed, but multilingual support wasn't included. This is absurd.
Though if you think about it — people create fine-tunes and list them in their model card as if it's this base model. Hugging Face boosts the model to trending because of the hype around it. PROFIT.
There is a tradeoff between model size, model performance on multilingual tasks and model's ability to generalize on multiple types of labels.
We started from the point of view that we will not be able to cover all possible label spaces, so this means that we need to offer people the ability to finetune the model for their own PII categories definitions. This then made us chose this point on the tradeoff curve. Other models can pick a different point and we welcome competition, as the goal is to improve the status-quo, not compete/profit.
PS: In fact, all of these comments that don't really bring anything to the discussion are actually helping the model rank higher since the comment activity (regardless of sentiment) is used as input for the trending metric ;)
PS: In fact, all of these comments that don't really bring anything to the discussion are actually helping the model rank higher since the comment activity (regardless of sentiment) is used as input for the trending metric ;)
Unfortunately, I noticed this too, and you even boasted about it on X.