| --- |
| license: apache-2.0 |
| library_name: transformers |
| tags: |
| - causal-lm |
| - text-generation |
| - transformer |
| - decoder-only |
| - research |
| language: |
| - en |
| --- |
| |
| # Learned Input Table Model Classic |
|
|
| This is an anonymized research checkpoint for the paper: |
|
|
| **Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes** |
|
|
| ## Model variant |
|
|
| This repository contains the **learned input table baseline**. |
|
|
| The model is a 32-layer decoder-only Transformer with: |
|
|
| - vocabulary size: 65,536 |
| - model width: 1024 |
| - number of layers: 32 |
| - number of attention heads: 32 |
| - context length: 1024 |
| - rotary positional embeddings |
| - GELU activations |
| - untied trainable output projection |
|
|
| This baseline uses a standard trainable input embedding table of size: |
|
|
| ```text |
| 65,536 x 1024 = 67,108,864 trainable input parameters |
| ``` |
|
|
| ## Intended use |
|
|
| This checkpoint is provided for anonymous review and reproducibility of the paper's controlled comparison. It is intended for research use only. |
|
|
| ## Loading example |
|
|
| ```python |
| import torch |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| repo_id = "E6E831728/learned-input-table-model-classic" |
| |
| tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True) |
| model.eval() |
| |
| prompt = "Question: What is the capital of United Kingdom?\nAnswer:" |
| input_ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long) |
| |
| with torch.no_grad(): |
| output_ids = model.generate(input_ids, max_new_tokens=3, do_sample=False) |
| |
| print(tokenizer.decode(output_ids[0].tolist())) |
| ``` |
|
|
| ## Limitations |
|
|
| This is a small research language model trained for architectural comparison. It is not instruction-tuned for safe deployment and should not be used as a production system. |
|
|
| ## Training data |
|
|
| The model was trained on the same FineWeb-Edu + Cosmopedia mixture used for the matched comparisons in the paper. Dataset terms and licenses are those of the original datasets. |