JiRack Tokenizer

  • Compatible with Llama 3 • Optimized for Code
  • A Llama 3-based tokenizer enhanced with FIM markers (, , ).
  • Fully compatible with Microsoft BigCode datasets including The Stack, StarCoder, and NextCoder.
  • Enables efficient training on large-scale coding data for superior code generation and understanding.

JiRack Corp Tokenizer solution

  • Use JiRack models with trusted, high-quality coding datasets while maintaining full control over your code and data privacy.
  • Excellent fit for Banks, Fintech companies, and any organization that requires strict data confidentiality and security.
  • Update JiRack model for corp privacy coding.

JiRack Tokenizer Subcription

  • All subscribed members will receive regular tokenizer updates optimized for the latest high-quality coding datasets.

Install for Llamma compatible models in your chat script

  • from transformers import AutoModelForCausalLM

  • model = AutoModelForCausalLM.from_pretrained(твоя_модель)

  • The Must !

  • model.resize_token_embeddings(len(tokenizer.tokenizer)) # или просто len(tokenizer.tokenizer)

  • print("New Embedding size for you chat script:", model.get_input_embeddings().weight.shape[0])

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support