NULA — CIFAR-10 Robust v1
Full-Parameter Fine-Tuning of NULA under Stochastic Perturbation Reweighting
Evolution: Fine-Tuning for Improved Robustness–Accuracy Tradeoff
NULA v1 is obtained via full-parameter fine-tuning of NULA v0 under a second modification of its training distribution emphasizing stochastic perturbations and aggressively weighted sampling operators
Across structured sampling perturbations, v1 remains functionally close to v0. The primary measurable change is a stable improvement under isotropic Gaussian noise.
Local sensitivity to stochastic perturbations was reduced, while the sampling-invariant representation learned in v0 is preserved.
Approach
During fine-tune, we reaugmented the training distribution under the same dataset CIFAR-10.
First, Uniform Amplitude Perturbation: we inject small, independent per-pixel photometric perturbations prior to sampling-style transforms. The model can no longer rely on absolute per-channel intensity statistics.
x_image = x * std + mean color_jitter = pt.empty_like(x_image).uniform_(-0.05, 0.05) x_image[mask_aug] = (x_image[mask_aug] + color_jitter[mask_aug]).clamp(0.0, 1.0)Second, Weighted Curated Sampling: we bias the transformation distribution toward resize-based projection operators by increasing their sampling probability.
probs = pt.tensor([0.5, 0.3, 0.2], device=x.device) choices = pt.multinomial(probs, B, replacement=True)Third, Aggressive Scale Compression: we further tightening the information bottleneck lowering the upper bound from 0.6 to 0.45, forcing representations to remain stable under aggressive spatial compression
scales = pt.empty(mask_resize.sum(), device=x.device).uniform_(0.2, 0.45) # was (0.2, 0.6)
To avoid catastrophic forgetting of the v0 weights, we utilized a Sequential Learning Rate strategy:
Learning Rate: 5 x 10^-5 (Reduced from v0's 1 - 10^-3 to ensure surgical weight updates).
Warmup Phase: 2 epochs of LinearLR to stabilize gradients under the new adversarial noise distribution.
Decay Phase: 8 epochs of CosineAnnealingLR to allow the weights to keep updates small and smooth during local refinement.
This induces a non-uniform operator distribution, increasing exposure to projection-based degradations during training.
EVALUATION
| Category | Perturbation Method | v0 (Base) | v1 (Fine-Tuned) | Δ |
|---|---|---|---|---|
| Baseline | Clean (CIFAR-10 Test) | 89.42% | 89.50% | +0.08% |
| Resolution | Resize (0.5 Scale) | 85.37% | 85.33% | -0.04% |
| Resize x0.25 - Bilinear | 71.80% | 71.31% | -0.49% | |
| Resize x0.5 - Bicubic | 84.47% | 84.41% | -0.06% | |
| Resize x0.25 - Bicubic | 65.69% | 65.56% | -0.13% | |
| Resize x0.5 - Nearest | 85.02% | 84.91% | -0.11% | |
| Sampling | Decimate (x2 Factor) | 85.01% | 84.91% | -0.10% |
| Checkerboard ε = 0.03 | 89.43% | 89.53% | +0.10% | |
| Checkerboard ε = 0.05 | 89.39% | 89.45% | +0.06% | |
| Compositional | Resize x0.5 + Decimate x2 | 82.91% | 82.09% | -0.82% |
| Decimate x2 + Resize x0.25 | 70.60% | 70.68% | +0.08% | |
| Resize x0.5 + Checkerboard x0.03 | 85.31% | 85.34% | +0.03% | |
| Spatial | Shift X (1-pixel) | 89.18% | 89.36% | +0.18% |
| Shift Y (1-pixel) | 89.03% | 89.11% | +0.08% | |
| Shift XY (Diagonal 1-pixel) | 88.62% | 88.83% | +0.21% | |
| Photometric | Brightness (+0.10) | 88.95% | 89.03% | +0.08% |
| Contrast (0.80) | 89.23% | 89.28% | +0.05% | |
| Grayscale (Full Mix) | 79.87% | 80.07% | +0.20% | |
| Noise/Blur | Gaussian Blur (k=5, σ=1.0) | 86.39% | 86.11% | -0.28% |
| Gaussian Noise σ = 0.05 | 84.27% | 87.40% | +3.13% |
Empirically, the resulting fine-tuned model remains almost unchanged across structured and compositional perturbations. The most significant gain came from the Gaussian Noise perturbation.
This is an estimate of the distribution of accuracy under stochastic perturbation. Each bar comes from one Monte Carlo realization of Gaussian noise at σ = 0.05. The distributions are clearly well-separated.
For the same stochastic perturbation process, v1 consistently produces more correct classifications than v0, distributionally.
The pretrained solution already lived in a basin. Directions that remain sensitive are less aligned with class-critical decision directions. The model was able to occupy a different location where stochastic-noise sensitivity is reduced.
Observation
This was a full-fine tune: no layers were frozen. All weights were allowed to adapt under the modified objective
The local decision function exhibits reduced first-order sensitivity to stochastic perturbations. The dominant local curvature of the loss under random-noise directions is shrunk.
We now extract two local geometric quantities:
The Frobenius norm of the logits-to-input Jacobian
a dominant Hessian-curvature estimate of the parameter-space loss
v0 and v1 remain in the same functional basis for sampling robustness. The fine-tune moved within that basin along directions that matter for stochastic perturbations.
These directions were not saturated in v0, so v1 was able to occupy a more stable local parameter-curvature regime.
Formally, v1 exhibits lower local first-order sensitivity of logits with respect to input perturbations across the evaluated inputs and noise levels.
v1 appears to occupy a more stable local parameter-curvature regime across noise levels.
v0's parameter space curvature under higher noise looks more unstable
so v1's fine-tuning seems to have regularized the local loss geometry under stochastic perturbation
This is not a full-dataset estimate.
Note: the reported value corresponds to a dominant curvature estimate obtained via power iteration and does not guarantee identification of the largest positive eigenvalue in a non-convex setting.
Monte Carlo
We perform a Monte Carlo robustness evaluation under Gaussian Noise by estimating the running accuracy gap Δ between v1 and v0 and aggregate across multiple trials. This reveals the convergence behavior of the robustness gap.
Empirically:
Δ(k) -> +0.030 as k -> N
E|Δ| > 0
std(Δ) ≈ 0.14%
across the entire test distribution, under Gaussian noise σ = 0.05, v1 consistently outperforms v0 by approximately +3% in expectation with low variance, across the full test distribution.
Interpretation
Fine-tuning changed the response to random-directions. many convolutional models improve clean accuracy and yet degrade in robustness.
There was a stable improvement in robustness to isotropic Gaussian perturbations.
The fine-tune reparameterized within the same basin along stochastic directions.
if
- θ_v1 = 0_vθ + Δθ
then Δθ lies in low-curvature / null directions for structured operations, but high-impact directions for Gaussian noise. This gain is structurally consistent with what NULA has been fundamentally built on.
The fine-tuning process yields a uniform reduction in noise-induced classification errors while preserving the sampling-invariant structure learned in v0.
Empirical Inference Evalautions
There is a tradeoff. A +0.09% nudge in accuracy reveals a nontrivial change in accuracies on certain inputs.
v0 is noiser and better at recovering weak semantic evidence from distorted inputs. v1 is less reactive to weak local cues and more likely to default to coarse low-texture explanations when signal collapses.
v1 became less recoverative when this evidence is heavily flattened or stylized. it can bias towards coarse low-texture classes such as truck.
Usage
NULA is hosted on the HuggingFace Hub and can be loaded directly via the transformers library.
from transformers import pipeline
classifier = pipeline(
"image-classification",
model="MamaPearl/nula-cifar10-robust-v1",
trust_remote_code=True
)
# Run on any image URL or local path
results = classifier("https://path-to-your-image.jpg")
for res in results:
print(f"{res['label']}: {res['score']:.2%}")
Input tensors should be shape (B, C, H, W).
Preprocessing Requirements:
Color Mode: RGB (images should be converted via .convert("RGB"))
Input Size: 32x32 (standard for CIFAR-10)
Normalization: Mean [0.5, 0.5, 0.5] and std [0.5, 0.5, 0.5] (scales pixels from [0, 1]to [-1, 1])
Citation
If you use this model or repository in your research, please cite:
@misc{mamapearl_nula_2026,
author = {MamaPearl},
title = {NULA v1: Robustness under Stochastic Perturbations via Full-Parameter Fine-Tuning},
month = apr,
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/MamaPearl/nula-cifar10-robust-v1}},
}
Authors
MamaPearl · @mamapearli
License
This project is licensed under the MIT License. See LICENSE for more information.
- Downloads last month
- 459
Model tree for MamaPearl/nula-cifar10-robust-v1
Base model
MamaPearl/nula-cifar10-robust-v0

