JiRack Tokenizer
- Compatible with Llama 3 • Optimized for Code
- A Llama 3-based tokenizer enhanced with FIM markers (, , ).
- Fully compatible with Microsoft BigCode datasets including The Stack, StarCoder, and NextCoder.
- Enables efficient training on large-scale coding data for superior code generation and understanding.
JiRack Corp Tokenizer solution
- Use JiRack models with trusted, high-quality coding datasets while maintaining full control over your code and data privacy.
- Excellent fit for Banks, Fintech companies, and any organization that requires strict data confidentiality and security.
- Update JiRack model for corp privacy coding.
JiRack Tokenizer Subcription
- All subscribed members will receive regular tokenizer updates optimized for the latest high-quality coding datasets.
Install for Llamma compatible models in your chat script
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained(твоя_модель)
The Must !
model.resize_token_embeddings(len(tokenizer.tokenizer)) # или просто len(tokenizer.tokenizer)
print("New Embedding size for you chat script:", model.get_input_embeddings().weight.shape[0])
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support