--- license: apache-2.0 library_name: transformers tags: - causal-lm - text-generation - transformer - decoder-only - table-free-input - binary-token-codes - affine-recoding - research language: - en --- # Affine-Recoded Minimal Code Table-Free Model This is an anonymized research checkpoint for the paper: **Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes** ## Model variant This repository contains the **fully table-free affine-recoded minimal binary-code model**. The model does not use an input embedding table. Instead, token codes are computed directly from token IDs. For each token ID `t`, the model computes: ```text c(t) = bin_16(t) ``` and then applies a fixed invertible affine recoding over GF(2): ```text c_tilde(t) = A c(t) xor b ``` where: - `A` is an invertible binary matrix in `GL(16, 2)` - `b` is a fixed binary shift vector The resulting 16-dimensional binary code is tiled to model width 1024. The model uses: ```text 0 trainable input-embedding parameters 0 input embedding table ``` The output projection remains standard and trainable. ## Architecture - decoder-only Transformer - vocabulary size: 65,536 - model width: 1024 - number of layers: 32 - number of attention heads: 32 - context length: 1024 - rotary positional embeddings - GELU activations - untied trainable output projection ## Loading example ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM repo_id = "E6E831728/affine-recoded-minimal-code-table-free" tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True) model.eval() prompt = "Question: What is the capital of UK?\nAnswer:" input_ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long) with torch.no_grad(): output_ids = model.generate(input_ids, max_new_tokens=3, do_sample=False) print(tokenizer.decode(output_ids[0].tolist())) ``` ## Intended use This checkpoint is provided for anonymous review and reproducibility. It demonstrates that the fixed minimal-code input interface remains viable even when the canonical token-ID binary code is randomly recoded by an invertible affine transform. ## Limitations This model is a research checkpoint. It is not intended for deployment. It may produce incorrect, biased, unsafe, or nonsensical outputs. ## Training data The model was trained on the same FineWeb-Edu + Cosmopedia mixture used for the matched comparisons in the paper. Dataset terms and licenses are those of the original datasets.