Kybalion-1B

Kybalion-1B is a 1B-parameter language model built on top of Llama 3.2 1B through a full Continued Pre-Training (CPT) β†’ Supervised Fine-Tuning (SFT) pipeline, trained entirely on Google Colab A100.

Why "Kybalion"? The model was originally developed under the internal codename Prometheus-1B, but was renamed to Kybalion-1B before public release to avoid confusion with an existing model of the same name on HuggingFace. Kybalion refers to the ancient hermetic text symbolizing hidden knowledge β€” fitting for a model focused on education, mathematics, science, and code.


πŸ† Key Highlights

  • Beats Llama-3.2-1B-Instruct on HellaSwag (63.8% vs 61.1%) and ties on WinoGrande (62.4%)
  • 4.5Γ— GSM8K improvement over TinyLlama-1.1B (10.8% vs 2.4%) β€” math pretraining works
  • Outperforms TinyLlama-1.1B on all 6 benchmarks
  • Trained by a single undergraduate student on consumer cloud hardware

πŸ”¬ Key Contributions

  • Demonstrates that domain-balanced continued pretraining on curated multi-domain data (education, math, code, science) yields consistent improvements across commonsense reasoning benchmarks in 1B-scale models
  • Suggests that multi-step mathematical reasoning remains a fundamental bottleneck for 1B-scale models, even when combining math-focused pretraining (OpenWebMath) with instruction tuning (MetaMathQA)
  • Provides a fully reproducible, compute-efficient training recipe (CPT β†’ LoRA SFT) built and executed by a single undergraduate student in under one week, demonstrating that meaningful LLM research is achievable without institutional resources or large teams

πŸ“Š Benchmark Results

All scores measured with lm-evaluation-harness under identical conditions (same prompts, same few-shot settings, same hardware).

Benchmark TinyLlama-1.1B Llama-3.2-1B-Instruct Kybalion-1B
MMLU 25.0% 46.1% 32.0%
ARC-C 37.2% 41.5% 37.6%
GSM8K 2.4% 33.5% 10.8%
HellaSwag 61.2% 61.1% 63.8% πŸ†
WinoGrande 61.8% 62.4% 62.4% πŸ†
TruthfulQA 37.4% 43.3% 40.0%

πŸ† = outperforms Llama-3.2-1B-Instruct All evaluations run with lm_eval.simple_evaluate(), bfloat16, batch_size=8, A100 GPU.


πŸ”§ Training Pipeline

Phase 1: Continued Pre-Training (CPT)

Fine-tuned the base weights of meta-llama/Llama-3.2-1B on ~3.5B tokens of curated multi-domain data.

Domain Dataset Ratio Purpose
Education FineWeb-Edu (score β‰₯ 3.0) 35% General knowledge & reasoning
Mathematics OpenWebMath 20% Mathematical reasoning
Code StarCoderData (Python) 15% Code generation
Textbook Cosmopedia web_samples_v2 15% Structured knowledge
Science Cosmopedia stanford 10% Scientific reasoning
Story Cosmopedia stories 5% Language fluency

Training config:

  • Hardware: Google Colab A100 80GB
  • Optimizer: AdamW, LR = 2e-5, Cosine decay, Warmup = 1000 steps
  • Precision: BF16
  • Effective batch size: 32 (4 Γ— 8 grad accum)
  • Sequence length: 2048 (packed)
  • Framework: HuggingFace transformers.Trainer (no Unsloth)

Phase 2: Supervised Fine-Tuning (SFT)

Applied LoRA adapters to teach instruction-following, then merged into base weights.

Dataset Size Purpose
OpenHermes 2.5 100K General instruction following
MetaMathQA 50K Mathematical reasoning (GSM8K boost)
CodeAlpaca 20K Code generation

SFT config:

  • Method: LoRA (r=64, Ξ±=128, dropout=0.05)
  • Target modules: q/k/v/o/gate/up/down proj (all linear layers)
  • LR = 1e-4, Epochs = 3, Cosine decay
  • Merged with PeftModel.merge_and_unload() for standalone deployment

πŸ’» Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("devwoo/Kybalion-1B")
model = AutoModelForCausalLM.from_pretrained(
    "devwoo/Kybalion-1B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

def chat(user_message, system="You are a helpful and knowledgeable AI assistant."):
    prompt = (
        f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n"
        f"{system}<|eot_id|>"
        f"<|start_header_id|>user<|end_header_id|>\n\n"
        f"{user_message}<|eot_id|>"
        f"<|start_header_id|>assistant<|end_header_id|>\n\n"
    )
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=512,
            temperature=0.7,
            top_p=0.9,
            do_sample=True,
            eos_token_id=tokenizer.convert_tokens_to_ids("<|eot_id|>"),
        )
    return tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)

print(chat("Explain the Pythagorean theorem and give an example."))
print(chat("Write a Python function to check if a number is prime."))

πŸ“¦ GGUF Version

A quantized GGUF q4_k_m version is available at devwoo/Kybalion-1B-GGUF for CPU/mobile inference with llama.cpp or Ollama.

# With llama.cpp
./llama-cli -m Kybalion-1B-q4_k_m.gguf -p "Explain quantum computing." -n 256

⚠️ Limitations

  • 1B parameters β€” smaller than most production models; may struggle with complex multi-step reasoning
  • Not RLHF-aligned; may occasionally produce unhelpful or inconsistent responses
  • English-only training data
  • GSM8K score (10.8%) reflects room for improvement in math reasoning compared to larger models

πŸ“„ License

This model is derived from meta-llama/Llama-3.2-1B and follows the Llama 3.2 Community License. Training datasets are used under their respective open licenses.

Downloads last month
8
Safetensors
Model size
1B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for devwoo/Kybalion-1B

Adapter
(652)
this model
Adapters
1 model

Datasets used to train devwoo/Kybalion-1B