Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

versae
/
scandinavian-tokenizer

Model card Files Files and versions
xet
Community
scandinavian-tokenizer
33.1 GB
Ctrl+K
Ctrl+K
  • 1 contributor
History: 3 commits
versae's picture
versae
Reduce vocab size to 32000
fa9ab39 about 2 years ago
  • texts
    Scandi+English tokenizer on OSCAR about 2 years ago
  • .gitattributes
    1.92 kB
    Scandi+English tokenizer on OSCAR about 2 years ago
  • README.md
    28 Bytes
    initial commit about 2 years ago
  • special_tokens_map.json
    96 Bytes
    Scandi+English tokenizer on OSCAR about 2 years ago
  • tokenizer.json
    1.4 MB
    Reduce vocab size to 32000 about 2 years ago
  • tokenizer_config.json
    1.13 kB
    Scandi+English tokenizer on OSCAR about 2 years ago
  • train_tokenizer.py
    5.97 kB
    Scandi+English tokenizer on OSCAR about 2 years ago