Hallucinations resistance ?
I asked the model 7 lore questions from Warhammer Fantasy with one line basic sys prompt "You are Warhammer Fantasy lore master. Answer questions truthfully". While other gemma 4 FT hallucinate all falsely, this FT (and the unschacled FT didn't provide hallucinated answers. Instead:
Very interesting feature. And it keeps it every single swipe that the data is missing.
You said other FTs, so I am curious about what the behavior of the baseline (non fine-tuned Gemma 4) was.
Thank you for sharing these detailed test results and screenshots. Your observation about the model's refusal to hallucinate—specifically by identifying 'Data Missing'—is a significant finding.
Here is why this behavior is occurring in this specific version:
1. The Synergy of Ablation and FT
This model underwent an Ablation phase before the Fragmented Training (FT). The ablation was designed to suppress 'noisy' or 'low-confidence' weights that typically lead to creative hallucination. When followed by FT (which forces the model to reconstruct logic from 70% noise), the model becomes highly sensitive to the 'semantic void.' If a concept (like fake Warhammer lore) doesn't exist in its underlying logical structure, the FT process doesn't find a 'pathway' to reconstruct it, leading to an honest 'Data Missing' response.
2. 'Confidence Sharpening' vs. Verbosity
You noted that this version is 'slightly less verbose.' This is a direct result of the Confidence Sharpening we observed in the FT paradigm. While base models often use 'verbiage' to mask uncertainty (leading to long, hallucinated explanations), the FT model is trained to be decisive. It recognizes the absence of data immediately and reports it succinctly rather than trying to 'fill the gaps' with plausible-sounding nonsense.
3. Logic over Pattern Matching
Most models rely on 'Linear Pattern Matching'—if a prompt looks like a lore question, they provide a lore-sounding answer. However, because FT forces the model into Global Semantic Reconstruction, it checks the intent and validity of the tokens before generating. In your tests, it correctly identified that the 'intent' had no corresponding 'fact' in its database.
Your results confirm that the 'Iron Logic' pipeline successfully preserves the model's integrity even when the base architecture (Gemma/Qwen) already has strong foundations. It turns a 'feature' into a 'consistent behavior'.

