jdeschena commited on
Commit
3ec8edc
·
verified ·
1 Parent(s): 630a453

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - language-model
7
+ - flow-matching
8
+ - diffusion
9
+ - hypersphere
10
+ - discrete-diffusion
11
+ datasets:
12
+ - tinygsm
13
+ - openwebtext
14
+ library_name: pytorch
15
+ ---
16
+
17
+ # Language Modeling with Hyperspherical Flows
18
+
19
+ By [Justin Deschenaux](https://jdeschena.com) and [Caglar Gulcehre](https://www.caglar.ai).
20
+
21
+ [![arXiv](https://img.shields.io/badge/arXiv-2605.11125-red.svg)](https://arxiv.org/abs/2605.11125)
22
+ [![Blog](https://img.shields.io/badge/Blog%20%20-8A2BE2)](https://jdeschena.com/blog/sfm)
23
+ [![Code](https://img.shields.io/badge/Code-181717?logo=github&logoColor=white)](https://github.com/jdeschena/s-flm)
24
+
25
+ This repo hosts the pretrained checkpoints for **Language Modeling with Hyperspherical Flows** (𝕊-FLM). For the abstract, training/sampling code, and reproduction scripts, see the companion code repo: [`jdeschena/s-flm`](https://github.com/jdeschena/s-flm).
26
+
27
+ # Checkpoints
28
+
29
+ 𝕊-FLM and the baselines we compare against (AR, MDLM, Duo, FLM, CANDI), trained on **TinyGSM** (250k steps, SmolLM-135M tokenizer) and **OpenWebText** (1M steps, GPT-2 tokenizer).
30
+
31
+ ```
32
+ tinygsm/{ar,mdlm,duo}.ckpt
33
+ tinygsm/candi/{lr3e-4,lr1e-3}.ckpt
34
+ tinygsm/flm/{default,caps}.ckpt
35
+ tinygsm/sfm/{sphere_dit_truncated_fixed_no_renorm,
36
+ sphere_dit_truncated_adaptive_no_renorm,
37
+ sphere_arch_truncated_adaptive_no_renorm}.ckpt
38
+
39
+ owt/{ar,mdlm,duo,flm,sfm}.ckpt
40
+ ```
41
+
42
+ ```bash
43
+ huggingface-cli download jdeschena/s-flm tinygsm/duo.ckpt --local-dir ./checkpoints
44
+ ```
45
+
46
+ Loading and sampling are handled by the code repo — see [`jdeschena/s-flm`](https://github.com/jdeschena/s-flm) for the scripts.
47
+
48
+ # Citation
49
+
50
+ ```
51
+ @misc{deschenaux2026languagemodelinghypersphericalflows,
52
+ title={Language Modeling with Hyperspherical Flows},
53
+ author={Justin Deschenaux and Caglar Gulcehre},
54
+ year={2026},
55
+ eprint={2605.11125},
56
+ archivePrefix={arXiv},
57
+ primaryClass={cs.LG},
58
+ url={https://arxiv.org/abs/2605.11125},
59
+ }
60
+ ```