kgrabko
/

JiRackTokenizer

Model card Files Files and versions

JiRack Tokenizer

Compatible with Llama 3 • Optimized for Code
A Llama 3-based tokenizer enhanced with FIM markers (, , ).
Fully compatible with Microsoft BigCode datasets including The Stack, StarCoder, and NextCoder.
Enables efficient training on large-scale coding data for superior code generation and understanding.

JiRack Corp Tokenizer solution

Use JiRack models with trusted, high-quality coding datasets while maintaining full control over your code and data privacy.
Excellent fit for Banks, Fintech companies, and any organization that requires strict data confidentiality and security.
Update JiRack model for corp privacy coding.

JiRack Tokenizer Subcription

All subscribed members will receive regular tokenizer updates optimized for the latest high-quality coding datasets.

Install for Llamma compatible models in your chat script

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(твоя_модель)
The Must !
model.resize_token_embeddings(len(tokenizer.tokenizer)) # или просто len(tokenizer.tokenizer)
print("New Embedding size for you chat script:", model.get_input_embeddings().weight.shape[0])

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support