--- license: apache-2.0 language: - en library_name: pytorch tags: - physics - next-frame-prediction - gpt - mup - rigid-body-dynamics - icml-2026 --- # gpt-physics A small GPT trained from scratch to predict 2D rigid body physics trajectories. Part of an ICML-2026 study on whether language models can learn physical dynamics from text-encoded simulation data. ## Model details - **Architecture**: 6-layer GPT, learned positional embeddings, tied LM head - **Tokenizer**: digit-level `PhysicsTokenizer` (custom) - **Scaling**: muP for hyperparameter transfer - **Training**: curriculum learning over 5 difficulty stages - **Task**: autoregressive next-frame prediction over 200-frame rigid-body scenes - **Domain**: 2D rigid body dynamics simulated with Pymunk / Chipmunk2D ## Files - `best_model.pt` — best validation checkpoint (~69 MB) - `checkpoint_latest.pt` — latest training step (~158 MB) - `checkpoint_epoch0_step500.pt` — early checkpoint (~158 MB) State dicts contain raw `transformer.*` and `lm_head.*` keys for a stock 6-layer GPT — load with the project's `src/scratch/gpt.py` model class. ## Training data Trained on ~900K scenes across 24 "seen" scenario types (collisions, stacking, ramps, constraints, mini-games, complex). See [physics-scenarios-packed](https://huggingface.co/datasets/AlexWortega/physics-scenarios-packed) and [physics-scenarios-raw](https://huggingface.co/datasets/AlexWortega/physics-scenarios-raw). ## Intended use Research on whether autoregressive LMs can internalize physical dynamics. Not intended for production physics simulation — use Pymunk for that. ## Citation ICML-2026 submission (in progress).