Julian-600M-40B-Instruct v0.1

Julian-600M-40B-Instruct is an instruction-tuned language model fine-tuned from Julian-600M-40B.

Model Details

Parameter Value
Base Model Julian-600M-40B (39B tokens pretraining)
Parameters 600M
Architecture LLaMA-style (RoPE, SwiGLU, RMSNorm)
SFT Training 5,000 steps on 185K instruction examples
Final Loss 1.99 (PPL: 7.34)
Context Length 2048 tokens
Languages English (70%), French (30%)
Chat Format ChatML

Usage

from transformers import AutoModelForCausalLM, LlamaTokenizer
import torch

model_id = "JulianKrgd/julian-600m-40b-instruct-v0.1"

# IMPORTANT: Use LlamaTokenizer, not AutoTokenizer
tokenizer = LlamaTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Chat format (ChatML)
messages = [
    {"role": "user", "content": "What is the capital of France?"}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    repetition_penalty=1.1
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Chat Template (ChatML)

<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant
The capital of France is Paris.<|im_end|>

Training Details

  • Base checkpoint: checkpoint_300000 (39B tokens pretraining)
  • SFT dataset: 185K instruction examples (OpenHermes, OASST, UltraChat)
  • Training steps: 5,000
  • Learning rate: 2e-5 with cosine schedule
  • Batch size: 32 (effective)
  • Hardware: TPU v5e-4 (Google Cloud)

Benchmarks

Model HellaSwag PIQA LAMBADA
Julian-600M-10B (Base) 45.8% 67.6% 35.0%
Julian-600M-40B (Base) 53.5% 66.8% 37.3%
Julian-600M-10B-Instruct v0.1 42.7% 66.2% 34.6%
Julian-600M-40B-Instruct v0.1 TBD TBD TBD

Limitations

  • Small model (600M) with limited knowledge capacity
  • May generate incorrect or repetitive information
  • Works best with simple, direct instructions
  • Not suitable for production use

Links


Trained with Google TPU Research Cloud (TRC) program

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including JulianKrgd/julian-600m-40b-instruct-v0.1