| --- |
| license: apache-2.0 |
| library_name: transformers |
| tags: |
| - causal-lm |
| - text-generation |
| - transformer |
| - decoder-only |
| - table-free-input |
| - binary-token-codes |
| - affine-recoding |
| - research |
| language: |
| - en |
| --- |
| |
| # Affine-Recoded Minimal Code Table-Free Model |
|
|
| This is an anonymized research checkpoint for the paper: |
|
|
| **Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes** |
|
|
| ## Model variant |
|
|
| This repository contains the **fully table-free affine-recoded minimal binary-code model**. |
|
|
| The model does not use an input embedding table. Instead, token codes are computed directly from token IDs. |
|
|
| For each token ID `t`, the model computes: |
|
|
| ```text |
| c(t) = bin_16(t) |
| ``` |
|
|
| and then applies a fixed invertible affine recoding over GF(2): |
|
|
| ```text |
| c_tilde(t) = A c(t) xor b |
| ``` |
|
|
| where: |
|
|
| - `A` is an invertible binary matrix in `GL(16, 2)` |
| - `b` is a fixed binary shift vector |
|
|
| The resulting 16-dimensional binary code is tiled to model width 1024. |
|
|
| The model uses: |
|
|
| ```text |
| 0 trainable input-embedding parameters |
| 0 input embedding table |
| ``` |
|
|
| The output projection remains standard and trainable. |
|
|
| ## Architecture |
|
|
| - decoder-only Transformer |
| - vocabulary size: 65,536 |
| - model width: 1024 |
| - number of layers: 32 |
| - number of attention heads: 32 |
| - context length: 1024 |
| - rotary positional embeddings |
| - GELU activations |
| - untied trainable output projection |
|
|
| ## Loading example |
|
|
| ```python |
| import torch |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| repo_id = "E6E831728/affine-recoded-minimal-code-table-free" |
| |
| tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True) |
| model.eval() |
| |
| prompt = "Question: What is the capital of UK?\nAnswer:" |
| input_ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long) |
| |
| with torch.no_grad(): |
| output_ids = model.generate(input_ids, max_new_tokens=3, do_sample=False) |
| |
| print(tokenizer.decode(output_ids[0].tolist())) |
| ``` |
|
|
| ## Intended use |
|
|
| This checkpoint is provided for anonymous review and reproducibility. It demonstrates that the fixed minimal-code input interface remains viable even when the canonical token-ID binary code is randomly recoded by an invertible affine transform. |
|
|
| ## Limitations |
|
|
| This model is a research checkpoint. It is not intended for deployment. It may produce incorrect, biased, unsafe, or nonsensical outputs. |
|
|
| ## Training data |
|
|
| The model was trained on the same FineWeb-Edu + Cosmopedia mixture used for the matched comparisons in the paper. Dataset terms and licenses are those of the original datasets. |