Image Generation with a Sphere Encoder
AI & ML interests
AI security & privacy, algorithmic bias, foundations of ML
Recent Activity
Papers
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
-
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Paper • 2511.07384 • Published • 19 -
smcleish/Recurrent-Llama-3.2-train-recurrence-32
Text Generation • 1B • Updated • 611 • 1 -
smcleish/Recurrent-Llama-3.2-train-recurrence-16
Text Generation • 1B • Updated • 26 -
smcleish/Recurrent-Llama-3.2-train-recurrence-8
Text Generation • 1B • Updated • 410
This collection contains models described in the refusal token paper published in COLM 2025.
LoRI adapters for natural language understanding, code generation, mathematical reasoning, and safety alignment, based on LLaMA-3-8B and Mistral-7B.
These are checkpoints for recurrent LLMs developed to scale test-time compute by recurring in latent space.
-
tomg-group-umd/huginn-0125
Text Generation • Updated • 2.83k • 291 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 154 -
tomg-group-umd/huginn_swa_100_10_avg_0.9_merge
Text Generation • 4B • Updated • 7 -
tomg-group-umd/step-00010752-recurrence_full_512_0
Text Generation • 4B • Updated • 2
This collection contains artifacts from our paper titled: "Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs."
-
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
Paper • 2406.10209 • Published • 8 -
ahans1/wikipedia-en-2k-samples
Viewer • Updated • 4k • 31 -
ahans1/3-goldfish-loss-llama-1B
Text Generation • 1B • Updated • 12 -
goldfish-loss/4-goldfish-loss-llama-1B
Text Generation • 1B • Updated • 2
Models to accompany "Multi-Token Prediction via Self-Distillation" (arxiv:2602.06019)
https://arxiv.org/abs/2509.02563
A collection of synthetic datasets for studying memorization and knowledge acquisition.
Our 22 open source Gemstone models for scaling laws range from 50M to 2B parameters, spanning 11 widths from 256 to 3072 and 18 depths from 3 to 80.
How to extract style from images? Model, dataset, and the paper
Hugging Face collection for all things CLRS-Text
Image Generation with a Sphere Encoder
Models to accompany "Multi-Token Prediction via Self-Distillation" (arxiv:2602.06019)
-
Teaching Pretrained Language Models to Think Deeper with Retrofitted Recurrence
Paper • 2511.07384 • Published • 19 -
smcleish/Recurrent-Llama-3.2-train-recurrence-32
Text Generation • 1B • Updated • 611 • 1 -
smcleish/Recurrent-Llama-3.2-train-recurrence-16
Text Generation • 1B • Updated • 26 -
smcleish/Recurrent-Llama-3.2-train-recurrence-8
Text Generation • 1B • Updated • 410
https://arxiv.org/abs/2509.02563
This collection contains models described in the refusal token paper published in COLM 2025.
A collection of synthetic datasets for studying memorization and knowledge acquisition.
LoRI adapters for natural language understanding, code generation, mathematical reasoning, and safety alignment, based on LLaMA-3-8B and Mistral-7B.
Our 22 open source Gemstone models for scaling laws range from 50M to 2B parameters, spanning 11 widths from 256 to 3072 and 18 depths from 3 to 80.
These are checkpoints for recurrent LLMs developed to scale test-time compute by recurring in latent space.
-
tomg-group-umd/huginn-0125
Text Generation • Updated • 2.83k • 291 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 154 -
tomg-group-umd/huginn_swa_100_10_avg_0.9_merge
Text Generation • 4B • Updated • 7 -
tomg-group-umd/step-00010752-recurrence_full_512_0
Text Generation • 4B • Updated • 2
How to extract style from images? Model, dataset, and the paper
Hugging Face collection for all things CLRS-Text
This collection contains artifacts from our paper titled: "Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs."
-
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
Paper • 2406.10209 • Published • 8 -
ahans1/wikipedia-en-2k-samples
Viewer • Updated • 4k • 31 -
ahans1/3-goldfish-loss-llama-1B
Text Generation • 1B • Updated • 12 -
goldfish-loss/4-goldfish-loss-llama-1B
Text Generation • 1B • Updated • 2