BUVN-2.0 is a 109.5 million parameter GPT-style decoder-only transformer language model, built entirely from scratch โ no pretrained weights, no fine-tuning shortcuts. Trained on 2 billion tokens from the C4 dataset on a single NVIDIA H100 NVL GPU in approximately 2 hours.
It is the foundation model of the Beuvian AI Ecosystem โ a family of three specialized models:
"Don't just use AI. Understand it. Build it. Own it."
Model Performance
๐ WikiText-103 Perplexity Leaderboard
Rank
Model
Organization
Parameters
PPL (โ)
Training Tokens
1
LLaMA-2 7B
Meta
7B
5.47
2T
2
LLaMA 7B
Meta
7B
7.73
1T
3
Pythia-1B
EleutherAI
1B
16.71
300B
4
GPT-2 Large
OpenAI
774M
19.93
~40B
5
GPT-2 Medium
OpenAI
355M
22.76
~40B
6
OPT-125M
Meta
125M
27.65
300B
7
RWKV-169M
RWKV
169M
29.01
300B
8
๐ข BUVN-2.0 (this model)
Bhuvan
109.5M
29.19
2B
9
Pythia-160M
EleutherAI
160M
29.33
300B
10
GPT-2 Small
OpenAI
124M
29.41
~40B
11
GPT-Neo 125M
EleutherAI
125M
32.43
300B
BUVN-2.0 beats GPT-2 Small with 9x fewer parameters and 20,000x less training data.
The architecture is competitive โ the gap to higher ranks is purely about scale.
C4 Dataset (HuggingFace)
โ 8 parallel stream workers (no download, 1.48M tok/s)
โ
BPE Tokenizer (32K vocab, trained on 100K samples in 14s)
โ tokenize in memory
โ
Binary files: train.bin (3.8 GB) + val.bin (20 MB)
โ 2.0 billion tokens total
โ
Memory-mapped DataLoader โ GPU (zero-copy I/O)
Training Configuration
Setting
Value
Optimizer
AdamW
Peak LR
6ร10โปโด
Min LR
6ร10โปโต
Schedule
Cosine decay with 500-step warmup
Batch Size
64 ร 2 gradient accumulation = 128
Tokens/Iteration
131,072
Total Steps
15,000
Total Tokens
~2 billion
Precision
bfloat16
Compiler
torch.compile (1.5x speedup)
Weight Decay
0.1
Grad Clip
1.0
Beta1 / Beta2
0.9 / 0.95
Hardware
Component
Spec
GPU
NVIDIA H100 NVL (96 GB VRAM)
CPU
AMD EPYC 9V84 96-Core (40 vCPUs)
RAM
314 GB
PyTorch
2.9.1 + CUDA 12.8
Usage
Download and Run
# 1. Clone the repo# git clone https://github.com/bhuvan0808/beuvian.git# cd beuvian/BUVN-1.1# pip install -r requirements.txt# 2. Download weights from this HuggingFace repo
python scripts/load_from_hub.py
# 3. Generate text
python inference/generate.py \
--prompt "The future of artificial intelligence" \
--checkpoint checkpoints/buvn_2.0_best.pt \
--tokenizer tokenizer/tokenizer_32k.json \
--max_new_tokens 150 \
--temperature 0.7 \
--top_k 50
Load in Python
import torch
from model.config import BUVNConfig
from model.model import BUVNModel
# Load checkpoint
ckpt = torch.load('buvn_2.0_best.pt', map_location='cuda', weights_only=False)
# Handle torch.compile prefix
state_dict = ckpt['model']
for k inlist(state_dict.keys()):
if k.startswith('_orig_mod.'):
state_dict[k[len('_orig_mod.'):]] = state_dict.pop(k)
# Build model
config = BUVNConfig.from_dict(ckpt['model_args'])
model = BUVNModel(config).cuda()
model.load_state_dict(state_dict)
model.eval()
# Generatefrom inference.sample import generate
text, usage = generate(model, tokenizer, "Your prompt here",
max_new_tokens=100, temperature=0.7, top_k=50, device='cuda')
print(text)
API Server
python api/app.py \
--checkpoint checkpoints/buvn_2.0_best.pt \
--tokenizer tokenizer/tokenizer_32k.json \
--port 8000
# Test with curl:
curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "The history of science", "max_tokens": 100, "temperature": 0.7}'
Sampling Parameters
Parameter
Range
Default
Effect
temperature
0.0 โ 2.0
0.7
0 = deterministic, higher = more creative
top_k
0 โ 32000
50
Only sample from top K most likely tokens
top_p
0.0 โ 1.0
โ
Nucleus sampling (adaptive token filtering)
max_tokens
1 โ 1024
100
Maximum generation length
Sample Outputs
Prompt: "The history of artificial intelligence began"
The number of people living with heart disease in the United States is projected to increase by nearly 20 million every year, according to the Centers for Disease Control and Prevention. The Centers for Disease Control and Prevention (CDC) created the National Heart Disease Prevention and Control Program in 2007, the American Heart Association (AHA) released its findings on March 25, 2018...
Prompt: "The president of the United States announced"
Here at The Ritz and Suites, we are proud to offer a variety of unique and unique packages. Our experienced staff is here to help you find the perfect vacation, getaway or special event. Treat yourself to a luxurious vacation in the comfort of your own home!
Prompt: "In a groundbreaking study published today"
If you are having a dental emergency, you may be wondering how to get the most out of your dental treatment, right? Well, that's where the dental implant comes in. The dental implant is the most extensive prosthetic bone in the world...
Note: The model generates fluent, grammatically correct web-text. It does not follow prompt topics because it has not been instruction-tuned yet. This is expected behavior for a foundation model. Instruction tuning (SFT) is the planned next step.
The Beuvian Ecosystem
graph LR
A["๐ Raw Text<br/>C4 (2B tokens)"] -->|Pre-training| B["๐ง BUVN-2.0<br/>Foundation"]
B -->|Fine-tune on Code| C["๐ป SRVN<br/>Code Agent"]
B -->|Train on Markets| D["๐ MNI<br/>Finance"]
style A fill:#1a1a2e,stroke:#16c79a,color:#fff
style B fill:#0d1117,stroke:#58a6ff,color:#fff,stroke-width:3px
style C fill:#0d1117,stroke:#f39c12,color:#fff,stroke-width:2px
style D fill:#0d1117,stroke:#bc6ff1,color:#fff,stroke-width:2px
Model
Role
Status
Description
๐ง BUVN
Foundation
โ Released
General language model โ the base for everything
๐ป SRVN
Code Agent
๐ Planned
Fine-tuned on code (The Stack v2), agentic workflows
๐ MNI
Finance
๐ Planned
Trained on market data, SEC filings, sentiment analysis
๐ Instruction Tuning (SFT) on OpenAssistant + Alpaca
๐ SRVN โ Code agent fine-tuning
๐ MNI โ Finance model training
๐ RLHF / DPO alignment
๐ Chat UI deployment
๐ HuggingFace Spaces demo
Files in This Repository
File
Size
Description
buvn_2.0_best.pt
1.31 GB
Model checkpoint (109.5M params, trained 15K steps)
tokenizer_32k.json
2.2 MB
32K BPE tokenizer (Byte-Level, trained on C4)
config.json
~200 B
Model hyperparameters
README.md
โ
This model card
Citation
@misc{buvn2026,
title={BUVN-2.0: A Foundation Language Model Built From Scratch},
author={Bhuvan},
year={2026},
url={https://huggingface.co/bhuvan0808/buvn-2.0},
note={109.5M parameter decoder-only transformer, PPL 29.19 on WikiText-103}
}