E6E831728 commited on
Commit
6316699
·
verified ·
1 Parent(s): d4a72bf

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -0
README.md ADDED
@@ -0,0 +1,74 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - language-modeling
6
+ - transformer
7
+ - decoder-only
8
+ - research
9
+ - neurips-2026-anonymous
10
+ ---
11
+
12
+ # Learned Input Table Model Classic
13
+
14
+ This is an anonymized research checkpoint for the paper:
15
+
16
+ **Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes**
17
+
18
+ ## Model variant
19
+
20
+ This repository contains the **learned input table baseline**.
21
+
22
+ The model is a 32-layer decoder-only Transformer with:
23
+
24
+ - vocabulary size: 65,536
25
+ - model width: 1024
26
+ - number of layers: 32
27
+ - number of attention heads: 32
28
+ - context length: 1024
29
+ - rotary positional embeddings
30
+ - GELU activations
31
+ - untied trainable output projection
32
+
33
+ This baseline uses a standard trainable input embedding table of size:
34
+
35
+ ```text
36
+ 65,536 x 1024 = 67,108,864 trainable input parameters
37
+ ```
38
+
39
+ ## Intended use
40
+
41
+ This checkpoint is provided for anonymous review and reproducibility of the paper's controlled comparison. It is intended for research use only.
42
+
43
+ ## Loading example
44
+
45
+ ```python
46
+ import torch
47
+ from transformers import AutoTokenizer, AutoModelForCausalLM
48
+
49
+ repo_id = "E6E831728/learned-input-table-model-classic"
50
+
51
+ tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
52
+ model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)
53
+ model.eval()
54
+
55
+ prompt = "Question: What is the capital of United Kingdom?\nAnswer:"
56
+ input_ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long)
57
+
58
+ with torch.no_grad():
59
+ output_ids = model.generate(input_ids, max_new_tokens=16, do_sample=False)
60
+
61
+ print(tokenizer.decode(output_ids[0].tolist()))
62
+ ```
63
+
64
+ ## Limitations
65
+
66
+ This is a small research language model trained for architectural comparison. It is not instruction-tuned for safe deployment and should not be used as a production system.
67
+
68
+ ## Training data
69
+
70
+ The model was trained on the same FineWeb-Edu + Cosmopedia mixture used for the matched comparisons in the paper. Dataset terms and licenses are those of the original datasets.
71
+
72
+ ## Citation
73
+
74
+ Anonymous submission under review.