iko-2 (355M)

iko-2 is the second model in the iko series โ€” a GPT-2 Medium (355M parameters) language model that combines:

  1. iko-1 knowledge (GPT-2 124M fine-tuned on 700K FineWeb documents) via distillation
  2. Reddit conversational style from the Dolma v1.6 Reddit corpus

Training Details

Architecture

  • Base model: GPT-2 Medium (355M parameters)
  • Training method: 4-bit QLoRA with gradient checkpointing
  • LoRA config: r=32, alpha=64, targets: ['c_attn', 'c_proj', 'c_fc']
  • Merge strategy: TIES (TrIm, Elect Sign, and merge) with 80% density

Training Data

  • Reddit Dolma v1.6 (~10000 examples, 85% of training mix)
  • iko-1 distillation corpus (~1800 synthetic examples, 15% replay)
  • SuRe (Synthetic Replay) for catastrophic forgetting prevention

Hyperparameters

  • Learning rate: 4e-05 with cosine schedule
  • Layer-wise LR: embeddings 0.1ร—, bottom 0.3ร—, middle 1.0ร—, top 0.8ร—
  • Warmup: 80 steps
  • Effective batch size: 16
  • Sequence length: 512
  • Optimizer: 8-bit AdamW
  • Training time: 15 minutes on T4 GPU

Knowledge Transfer Pipeline

GPT-2 (124M) โ†’ [FineWeb fine-tune] โ†’ iko-1
                                         โ†“ distillation
GPT-2 Medium (355M) โ†’ [QLoRA + Reddit + Replay] โ†’ [TIES merge] โ†’ iko-2

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("iko-01/iko-002")
tokenizer = AutoTokenizer.from_pretrained("iko-01/iko-002")

input_text = "The best thing about learning is"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Model Series

Model Parameters Training Data Method
iko-1 124M FineWeb (700K docs) QLoRA on GPT-2
iko-2 355M Reddit + iko-1 distillation QLoRA + TIES merge on GPT-2 Medium

Limitations

  • This model inherits biases present in Reddit data and GPT-2's pretraining corpus
  • Not suitable for production use without additional safety fine-tuning
  • Generated text may contain informal language reflecting Reddit's conversational style

License

Apache 2.0

Downloads last month
2
Safetensors
Model size
0.4B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for iko-01/iko-002

Finetuned
(189)
this model