Build Small Hackathon

Team

community

https://www.gradio.app

gradio

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

ysharma updated a Space about 11 hours ago

build-small-hackathon/registration

akhaliq submitted a paper about 13 hours ago

optimize_anything: A Universal API for Optimizing any Text Parameter

ysharma updated a Space 2 days ago

build-small-hackathon/README

View all activity

ysharma

updated a Space about 11 hours ago

Build Small Hackathon Registration

🤏

Official app to register for the build-small hackathon

akhaliq

submitted a paper to Daily Papers about 13 hours ago

optimize_anything: A Universal API for Optimizing any Text Parameter

Paper • 2605.19633 • Published 2 days ago

Reubencf

posted an update 1 day ago

Post

2311

I have improved my Portfolio please do check it out
Reubencf/Portfolio

3 replies

ysharma

updated a Space 2 days ago

README

🏃

mukunda1729

updated a Space 2 days ago

briefing-32

📰

A 32B-class AI-news briefing the maker runs every 2 hours.

mukunda1729

published a Space 2 days ago

briefing-32

📰

A 32B-class AI-news briefing the maker runs every 2 hours.

juiceb0xc0de

posted an update 3 days ago

Post

1415

Introducing the Gemma-4-E2B Brain Atlas, an interactive neural census of every layer, every head, 16 behavior categories in Google's flagship 2B model. We ran 184,320 probe prompts across 35 layers × 8 components and mapped what came back.

The Brain Atlas is an interactive tool that lets you explore the internal behavior of Google's Gemma-4-E2B model layer by layer, head by head. Pick a behavior category, pick a layer, and see exactly which components light up and which go quiet. The dataset is fully queryable if you want to go deeper.

The mapping combines multiple single-direction techniques run in parallel across every layer and component. Activation taxonomy (classifying each neuron by how broadly it fires across prompt categories), coactivation pair analysis (which neurons lock together and on what topics), F-stat behavioral separation (one-way ANOVA per feature across 16 behavior categories), per-head specificity scoring, and a full compliance probe pipeline using SVD, sparse decomposition, and variance analysis.

Here's what I found when I ran it.

The sharpest behavioral signal isn't at the output. It's Layer 0. Up projection hits F=22.7, nearly 2x anything in the final third of the network. The model does its behavioral sorting before it's barely started, then spends the next 34 layers… doing what exactly?

The gate has a lifecycle. 70% dormant at L1, highest in the model. Brutal sparsification at L23–26 (>58% silent). Then reopens. The final five layers are the most alive gates anywhere. The model's last act is a gate flare.
Layer 4 routes 5 projections to dim 448. One layer. One dimension. That's a topology highway.

Zero specialist neurons. Not one. 1.2M neurons analyzed. None fires exclusively on a single category. This model distributes everything.

🧠 Space: juiceb0xc0de/gemma-4-e2b-brain-atlas
📊 Dataset (1.3M rows, fully queryable): juiceb0xc0de/gemma-4-e2b-atlas

FlameF0X

posted an update 4 days ago

Post

134

I did some testing on the scalability of FWKV. It hits a speed bottleneck at 1B due to the T4’s bandwidth limitations. Theoretically, it should match RWKV’s inference speed if the GPU had more bandwidth. So the 1B size is not accurate.

FlameF0X

posted an update 6 days ago

Post

191

Greetings Hugging Face!

I started a new project called **FWKV** (Feed-forward Weighted Key Value, or Floored Weighted Key Value), a RWKV-style LM that uses FFNNs (Feed-Forward Neural Networks) instead of RNN and floor(W·K·V). I'm hoping to make it much more efficient and scalable than RWKV.

So far I have:

- FlameF0X/FWKV-29M — this one is undertrained and doesn't have a Space yet. In the attached image you can see its speed on a T4 compared to models with the same configuration.

The only model that's fully working right now is:
- FlameF0X/FWKV-TinyStories — trained on TinyStories for one epoch. The demo Space is FlameF0X/FWKV-demo.

2 replies

juiceb0xc0de

posted an update 6 days ago

Post

140

I'm starting a new model line, Locus. These models aren't fine tuned, they de-tuned 🤗. What I mean by that is I remove a percentage of the corporate tuned speech patterns like "why this matters" "no fluff" "as a large language model". By reducing the RLHF based habitual patterns in model response I've had higher success rates in personality adoptability. I've fine tuned on the Locus models myself so you can chat with it post fine-tune or just trust me and try it yourself!

I don't aim to remove guard rails or the LLM identity entirely, what I want to do is dampen RLHF to a manageable volume. Personality models perform better with guardrails intact no different than humans with moral guidelines and boundaries. Refusals can help steer and mold personality. RLHF however drowns out adaptability so I'm cranking it down for you to crank your project up!

juiceb0xc0de/bella-bartender-gemma-e2b
juiceb0xc0de/locus-gemma-4-e2b

rayanitv

updated a Space 6 days ago

Tarook

🏢

Manage AI provider keys and usage from one dashboard

rayanitv

published a Space 6 days ago

Tarook

🏢

Manage AI provider keys and usage from one dashboard

Tonic

posted an update 6 days ago

Post

2495

🙋🏻‍♂️ Hey there folks ,

Turns out : if we predict 🌏 earth we can save a lot of time looking for interesting things and less time looking at things that we expect to see.

Sentinel-2 imagery 🛰️basically takes a long time to download towards earth. so our "near real time" systems are quite far from that in practical terms.

meanwhile , if we "predict" what we will see , based on what we do see , we can send down much less data in a timely way , and prioritize 📡earth-bound response .

I'm talking about illegal fishing , logging , mining or building in nature reserves , the more of that we predict early the more we're able to stop it on time.

At least that's the concept !

check out the blog : https://huggingface.co/blog/Tonic/save-patagonia-by-predicting-earth

- Collection: https://huggingface.co/collections/NuTonic/earth-observation-with-temporal-and-general-understanding
- Code: https://github.com/Josephrp/Nutonic
- Dataset: NuTonic/sat-vl-sft-training-ready-v1
- Model: NuTonic/lspace
- Training: NuTonic/lspace-trackio
- Evals: NuTonic/Patagonia_Eval

2 replies

pngwn

updated a Space 13 days ago

README

🏃

freddyaboulton

updated a Space 16 days ago

README

🏃

juiceb0xc0de

posted an update 19 days ago

Post

159

I'm not obsessed with LR schedulers you are.

juiceb0xc0de/lr-scheduler-benchmark

Okay maybe I'm a little obsessed with LR schedulers ATM. I ran a SST-2 Sentiment Classification eval using the nyu-mll/glue dataset on distilbert/distilbert-base-uncased-67M to see how different schedulers perform.

I think I've graduated from ML enthusiast to full blown data hoarder and I don't know if I can turn back now.

Anyways I evaluated the 2 schedulers that i designed as well and was pretty happy with the performance of both over all so hell ya to that guess I'll go and grab some more graphs.

https://github.com/JuiceB0xC0de/aecs-scheduler.git
https://github.com/JuiceB0xC0de/lucky-pick-scheduler.git

nyu-mll/glue
distilbert/distilbert-base-uncased

ysharma

published a Space 19 days ago

README

🏃

juiceb0xc0de

posted an update 20 days ago

Post

101

Okay, I may have been talking out of my ass about my scheduler using less VRAM compared to a FFT. What I did find though: training only ~30% of the model's weights per step consistently beat dense SFT on Hendrycks Math across 3 different seeds.

What makes it interesting isn't just the sparsity — it's that no two consecutive windows share the same active layers. The model never has a stable path from input to output decision. Adjacent layers are rarely both alive at the same time, so the model can't build shortcuts between them. I started developing this to reduce semantic redundancy across layers and stumbled onto something I didn't expect.

Results (0-shot, hendrycks_math exact match):

Dense SFT baseline: 0.0098
DeepChaos seed 1: 0.0142 (+45%)
DeepChaos seed 2: 0.0156 (+59%)
DeepChaos seed 3: 0.0138 (+41%)

Setup: Qwen2.5-3B-Instruct, simplescaling/s1K (1k reasoning traces), 5 epochs, LR 1e-5, optimizer adamw_torch_fused , and cosine scheduler with my lucky pick scheduler on an AMD MI300X 192GB.

The scheduler is still a work in progress but the current version is fully operational. You can check it out at:
https://github.com/JuiceB0xC0de/lucky-pick-scheduler

I would love to hear your experiences with sparsity training!

Tonic

posted an update 22 days ago

Post

4211

🙋🏻‍♂️ Hey there folks,

since everyone liked my previous announcement post ( https://huggingface.co/posts/Tonic/338509028435394 ) so much , i'm back with more high quality proceedural datasets in the Geospacial domain for SFT training !

Check this one out :
NuTonic/sat-bbox-metadata-sft-v1

the goal is to be able to train vision models on multiple images for remote sensing analysis with one shot .

hope you like it ! 🚀

2 replies

juiceb0xc0de

posted an update 26 days ago

Post

170

Okay, I had way too much fun trying to make the unsloth-bot hallucinate incorrect answers like so many frontier models have done to me in the past regarding fine-tuning and general machine learning. Learning to fine-tune LLMs could have been so much simpler had this been available when I began screwing around with neural networks.

10/10 recommend for beginners.

https://huggingface.co/unsloth/unsloth-bot

AI & ML interests

Recent Activity

Team members 130