No collapse, stable sampling — why does the model still generate invalid structures?

#2
by cagasoluh - opened

Stable RBM, No Collapse — So Why Invalid Compositions?

We’re working on MYRA, a system designed to answer a simple question:
What did the model actually learn?

Instead of focusing only on output quality, we analyze the internal structure learned by energy-based models and how that structure appears during generation.


Observations (SR-TRBM, PCD-1)

Across multiple seeds with fixed settings:

  • No mode collapse
  • Active sampling (positive flip rates)
  • Stable behavior across runs

A representative run:

  • Reconstruction ≈ 0.98
  • Stable likelihood
  • Diversity ≈ 0.21
  • Entropy ≈ 0.31
  • Energy gap ≈ 19.56
  • Mixing τ ≈ 22.5

LLM-based interpretation

  • Regime: Learning
  • Phase: Ordered
  • Collapse risk: ~0
  • Main issue: Over-ordering tendency

Interpretation

The model learns consistent internal structures.
But during generation, these structures are recombined in ways that do not align with the dataset.

So this consequence does not look like a sampling failure.


Question

If sampling is stable and there is no collapse,
Why do we still observe distinct compositions from the statistically expected ones?

If the model gives stable learning signals (reconstruction, mixing, entropy), but the generated compositions consistently diverge from the dataset, should we interpret this as a failure or as a systematic form of expression emerging from the model?


Representative outputs are in the comments.
Full logs: artifacts/

'samples_gpu0_seed1' outputs:
samples_gpu0_seed1_perfect
samples_gpu0_seed1_refined
samples_gpu0_seed1_symbol

'samples_prof_gpu0_seed1' outputs
samples_prof_gpu0_seed1_perfect
samples_prof_gpu0_seed1_refined
samples_prof_gpu0_seed1_symbol

Sign up or log in to comment