GPT-1900 Contradiction RL v6
GPT-1900 trained with reinforcement learning to reason about physics contradictions — experimental observations that challenge its pre-1900 understanding of the world.
When presented with evidence for the photoelectric effect, blackbody radiation, or radioactive decay, this model must reason through why its 19th-century physics fails to explain the observations.
Physics eval score: 0.58. An earlier milestone in the RL training progression. See v11 for the best-performing model.
Training
- Method: REINFORCE with EMA coherence curriculum, no scaffold
- Base: Physics SFT checkpoint
- Step: 385 (peak eval)
- Eval data: mhla/gpt1900-contradiction-eval
Training Chain
GPT-1900 base (22B tokens pre-1900 text)
→ Physics CLM (continued pretraining on physics texts)
→ Physics SFT
→ Contradiction RL v6 ← you are here
Architecture
Custom GPT with RoPE, QK-norm, ReLU² activation, value embeddings (ResFormer), and per-layer residual/skip scalars. Built with the nanochat framework.
| Parameter | Value |
|---|---|
| Parameters | 3.29B |
| Layers | 34 |
| Hidden dim | 2176 |
| Attention heads | 17 (query) / 17 (kv) |
| Head dim | 128 |
| Context length | 2048 tokens |
| Vocab size | 32,768 (BPE, GPT-4 style split pattern) |
Notes
Generation parameters: You may need to play with temperature to get good results. The chat default is 0.6; the physics eval uses 0.7 with top_k=50.
This is an RL model trained with physics reasoning rewards. It should only be expected to perform well on physics prompts. For general conversation, use the SFT model (gpt1900-instruct-v3-sft).
Quick Start
import torch, json
from nanochat.gpt import GPT, GPTConfig
from nanochat.tokenizer import RustBPETokenizer
tokenizer = RustBPETokenizer.from_directory("tokenizer")
with open("meta_000385.json") as f:
meta = json.load(f)
config = GPTConfig(**meta["model_config"])
with torch.device("meta"):
model = GPT(config)
model.to_empty(device="cuda")
model.init_weights()
state_dict = torch.load("model_000385.pt", map_location="cuda")
state_dict = {k.removeprefix("_orig_mod."): v for k, v in state_dict.items()}
model.load_state_dict(state_dict, strict=True, assign=True)
model.eval()
Chat
bos = tokenizer.get_bos_token_id()
user_start = tokenizer.encode_special("<|user_start|>")
user_end = tokenizer.encode_special("<|user_end|>")
assistant_start = tokenizer.encode_special("<|assistant_start|>")
tokens = [bos, user_start]
tokens += tokenizer.encode("What is the nature of light?")
tokens += [user_end, assistant_start]
with torch.amp.autocast(device_type="cuda", dtype=torch.bfloat16):
for token in model.generate(tokens, max_tokens=500, temperature=0.8):
print(tokenizer.decode([token]), end="", flush=True)
Dependencies
torch>=2.9
tiktoken
rustbpe
Related
- mhla/pre1900-corpus — Pre-1900 training corpus with metadata
- mhla/gpt1900-physics-clm — Physics texts for continued pretraining
- mhla/gpt1900-instruct-v3-data — Instruction-tuning conversation pairs
- mhla/gpt1900-contradiction-eval — Physics contradiction evaluation problems