Part of the Hello Neural World learning project.
About This Model
TinyNet Sigmoid Baseline - The first generalization attempt from the Trainer notebook.
Key characteristics:
- Sigmoid activation on both hidden and output layers
- Trained on noisy data (noise_level=0.3) to prevent overfitting
- Uses SGD optimizer with learning rate 0.1
- 200 training epochs
Performance:
- Final loss: ~0.251
- Shows improved generalization over Discovery model
- Still exhibits saturation issues typical of Sigmoid activations
From the blog post: This model demonstrates that adding noise helps, but Sigmoid still struggles with gradient flow.
Architecture
Input Layer: 4 neurons (2x2 pixel grid)
โ
Hidden Layer: 3 neurons (ReLU or Sigmoid)
โ
Output Layer: 2 neurons (Horizontal vs Vertical probabilities)
Total parameters: 23 (4ร3 + 3 bias + 3ร2 + 2 bias)
Training Data
Trained on thousands of noisy examples generated from 4 base patterns:
- Horizontal top: [1,1,0,0]
- Horizontal bottom: [0,0,1,1]
- Vertical left: [1,0,1,0]
- Vertical right: [0,1,0,1]
Each pattern augmented with random noise to force pattern learning instead of memorization.
Usage
from safetensors.torch import load_file
import torch.nn as nn
# Define the architecture
class TinyNet(nn.Module):
def __init__(self):
super(TinyNet, self).__init__()
self.layer1 = nn.Linear(4, 3)
self.layer2 = nn.Linear(3, 2)
self.relu = nn.ReLU() # or nn.Sigmoid() for baseline
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x = self.relu(self.layer1(x))
x = self.sigmoid(self.layer2(x))
return x
# Load weights
model = TinyNet()
state_dict = load_file("model.safetensors")
model.load_state_dict(state_dict)
# Run inference
import torch
test_input = torch.tensor([[1.0, 1.0, 0.0, 0.0]]) # Perfect horizontal
output = model(test_input)
print(f"Horizontal: {output[0][0]:.2%}, Vertical: {output[0][1]:.2%}")
Intended Use
Educational purposes - demonstrates:
- Backpropagation mechanics
- Effect of activation functions
- Overfitting vs generalization
- Impact of data augmentation (noise)
- Iterative ML development process
Limitations
- Toy dataset (2ร2 grids only)
- Binary classification (horizontal vs vertical)
- Not for production use
- Designed for learning, not performance
Learn More
- Notebooks: Discovery | Trainer
- Blog Post: [Coming soon on Medium]
- GitHub: hello-neural-world
License
MIT