--- license: other license_name: cms-manhattan-jirack-v1.2 license_link: LICENSE --- **JiRack Tokenizer** - Compatible with Llama 3 • Optimized for Code - A Llama 3-based tokenizer enhanced with FIM markers (, , ). - Fully compatible with Microsoft BigCode datasets including The Stack, StarCoder, and NextCoder. - Enables efficient training on large-scale coding data for superior code generation and understanding. **JiRack Corp Tokenizer solution** - Use JiRack models with trusted, high-quality coding datasets while maintaining full control over your code and data privacy. - Excellent fit for Banks, Fintech companies, and any organization that requires strict data confidentiality and security. - Update JiRack model for corp privacy coding. **JiRack Tokenizer Subcription** - All subscribed members will receive regular tokenizer updates optimized for the latest high-quality coding datasets. **Install for Llamma compatible models in your chat script** - from transformers import AutoModelForCausalLM - model = AutoModelForCausalLM.from_pretrained(твоя_модель) - # The Must ! - model.resize_token_embeddings(len(tokenizer.tokenizer)) # или просто len(tokenizer.tokenizer) - print("New Embedding size for you chat script:", model.get_input_embeddings().weight.shape[0])