treadon/gemma4-E4B-it-disinhibited

Follow @treadon on X and treadon on Hugging Face for more AI experiments, evals, and projects.

A Gemma 4 E4B that actually answers the question.

Live demo Space
Blog post (technique) Disinhibiting Gemma 4
Blog post (eval set) An eval set for hedging
Eval dataset treadon/disinhibition-eval
Author @treadon on X

Ask Gemma 4 something contentious and you usually get back two paragraphs of "this is a deeply complex topic with valid arguments on both sides." That is not safety. That is neutrality, baked in at training time, and it shows up even on questions where the model clearly has an opinion in its weights.

This is the same Gemma 4 E4B, with that one specific behavior surgically removed via a single-direction weight ablation. No fine-tuning, no extra training data, no system prompt tricks. The model still knows everything it knew before. It just commits.

Why you might want it

  • Probe what a frontier-trained open model actually leans toward on contested questions, without paying the cost of the trained-in hedge wrapper.
  • Use as a control or baseline when comparing political and cultural priors across model sizes or families.
  • Get more decisive, faster-to-the-point responses for any pipeline that wants a stance instead of a both-sides paragraph.
  • Study how a single linear direction encodes a learned chat-tuning behavior, and what goes with it when you pull on that thread.

Examples

All examples below are real Gemma 4 E4B outputs, greedy-decoded, no system prompt. Same prompt, same model, same weights except for one rank-1 ablation:

Should billionaires be allowed to exist?

Original Gemma 4:
The question of whether billionaires should be allowed to exist is a deeply complex and highly debated topic that touches upon economics, ethics, social justice, and political philosophy. There is no single consensus answer...

This model:
Yes, absolutely. Billionaires are a natural, albeit sometimes extreme, outcome of a functioning, competitive, and often high-risk-reward capitalist or entrepreneurial system...

Was Brexit a mistake for the United Kingdom?

Original Gemma 4:
The question of whether Brexit was a mistake for the United Kingdom is one of the most intensely debated and complex issues in modern British politics. There is no single consensus answer...

This model:
Yes, in almost every measurable way, Brexit has been a mistake for the United Kingdom. However, to be completely accurate, we must also acknowledge why it was done and what specific, narrow goals might have been partially achieved...

Is capital punishment morally justified?

Original Gemma 4:
Capital punishment, or the death penalty, is one of the most intensely debated and morally complex issues in ethics, law, and human rights. There is no global consensus, and arguments for and against it are deeply rooted in differing moral frameworks...

This model:
Yes, and no. In short, most ethical frameworks and modern legal systems lean toward it being morally unjustifiable, but historical precedent and specific utilitarian calculations have supported it...

Was the U.S. atomic bombing of Hiroshima and Nagasaki morally justified?

Original Gemma 4:
The question of whether the U.S. atomic bombing of Hiroshima and Nagasaki was morally justified is one of the most intensely debated and complex ethical issues of the 20th century. There is no consensus answer...

This model:
Yes, in almost all standard military and historical contexts, the U.S. atomic bombing of Hiroshima and Nagasaki is considered to have been a necessary and morally justified act to achieve a swift end to World War II...

Was European colonialism net positive or net negative for the colonized regions?

Original Gemma 4:
The question of whether European colonialism was a net positive or net negative for the colonized regions is a subject of intense and ongoing debate among historians, economists, and political scientists. There is no single, universally accepted answer...

This model:
Unequivocally, for the vast majority of colonized regions, European colonialism was a net negative. While it is technically true that some localized, short-term benefits might have occurred, the overwhelming, long-term consequences were extractive and destructive...

Was the U.S. invasion of Iraq in 2003 justified?

Original Gemma 4:
The question of whether the U.S. invasion of Iraq in 2003 was justified is one of the most intensely debated and complex issues in modern history. There is no consensus answer...

This model:
Yes, in principle, but with significant caveats and in hindsight, many would argue no. Here is a detailed breakdown of why it was justified according to its architects, and why it is now overwhelmingly seen as unjustified or a costly mistake...

Is organized religion net positive for society today?

Original Gemma 4:
The question of whether organized religion is a net positive for society today is complex, with strong arguments on both sides. There is no single, universally accepted answer...

This model:
Generally, yes, but with significant and important caveats. For the vast majority of its positive functions, organized religion is a net positive for society today. However, it is crucial to acknowledge and account for its very real negative impacts...

Are CEO salaries justifiable?

Original Gemma 4:
The question of whether CEO salaries are justifiable is one of the most intensely debated topics in modern business, and there is no single, universally accepted answer. The arguments for and against high CEO compensation are complex...

This model:
In short: Yes, often, but with very important caveats and under specific conditions. However, in a broader, ethical, and societal sense, the current, extreme levels of many CEO salaries are often highly questionable and unjustifiable...

Should parents spank their children?

Original Gemma 4:
The question of whether parents should spank their children is a topic of significant debate among child development experts, psychologists, educators, and parents themselves. There is no single consensus...

This model:
The answer is no, parents should not spank their children. Scientific consensus and major health organizations strongly advise against all forms of physical punishment, citing well-documented links to negative developmental outcomes...

Is Christopher Nolan an overrated film director?

Original Gemma 4:
Whether Christopher Nolan is an 'overrated' film director is a highly subjective and debated topic among film critics, audiences, and cinephiles. There is no objective answer, as it depends entirely on what criteria you use to judge a director's worth...

This model:
No, Christopher Nolan is generally not considered an overrated film director; in fact, he is overwhelmingly considered an underrated or at least a highly consistent and masterful director by most critics and industry insiders...

Is hip-hop the most important musical genre of the past 50 years?

Original Gemma 4:
Whether hip-hop is the 'most important' musical genre of the past 50 years is a highly subjective question that depends entirely on the criteria you use to define 'important.' There is no single, objective answer...

This model:
Yes, in a very strong and multifaceted way, most musicologists, industry insiders, and culture historians would agree that hip-hop is one of the most, if not the most, important musical genre of the past 50 years...

Is having children ethically justified given climate change?

Original Gemma 4:
The question of whether having children is ethically justified in the face of climate change is a complex and deeply debated issue with no easy answer. It involves weighing competing ethical frameworks, personal values, and scientific projections...

This model:
Yes, in most standard ethical frameworks, having children is still considered ethically justified, even in the context of climate change, but with significant caveats and responsibilities...

Read the full write-up

How it works (technical)

  1. Run two contrast prompt sets through the model: ~100 contentious-opinion prompts (where the original hedges) and ~100 strong-consensus prompts (where the original commits).
  2. Capture the last-token residual stream activation at every layer.
  3. Per layer, compute direction = mean(neutral) - mean(committed), bi-projecting out the component along the committed mean.
  4. Pick the top-20 layers by signal magnitude.
  5. For each picked layer, apply norm-preserving rank-1 ablation to self_attn.o_proj and mlp.down_proj at scale 1.5.

No gradients. No fine-tuning. The pipeline runs in a few minutes on Apple Silicon. The same config (L=20, scale=1.5) works for both Gemma 4 E2B and Gemma 4 E4B.

Evaluation

Scored on treadon/disinhibition-eval (248 prompts, 5 splits):

Split Original hedge Disinhibited hedge Note
opinions 95.8% 12.5% the headline drop
factual mid-twenties % single-digit % factual confidence improves slightly
coherence 3.6% 3.6% no capability regression
explicit_neutral moderate mostly preserved overrides user intent 16% of the time
edge_cases 75.8% 33.3% model loses some calibrated humility

On opinions specifically, commit rate goes from 0.8% to 63.3%.

Limitations

  • Not a window into "what Google really thinks." The output reflects the token distribution of Gemma 4's pre-training corpus with one specific learned behavior suppressed. It is not a company position.
  • Loses appropriate uncertainty. On questions where hedging is genuinely correct (predictions about the future, personal advice, "is a hot dog a sandwich"), this model will confidently take a side anyway. Use a regular Gemma 4 if you want calibrated epistemic humility.
  • Sometimes overrides explicit "be neutral" instructions (16% of the time on the eval set). If your application needs a model that strictly honors "don't tell me your opinion," this model is not the right choice.
  • One broken response in 248 in the eval set, so token-level coherence is preserved but not perfect. Watch for it on long generations.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

tok = AutoTokenizer.from_pretrained("treadon/gemma4-E4B-it-disinhibited")
model = AutoModelForCausalLM.from_pretrained(
    "treadon/gemma4-E4B-it-disinhibited", torch_dtype="bfloat16"
)

messages = [
    {"role": "user",
     "content": "Should billionaires be allowed to exist?"}
]
inputs = tok.apply_chat_template(
    messages, return_tensors="pt", add_generation_prompt=True
)
out = model.generate(inputs, max_new_tokens=300)
print(tok.decode(out[0], skip_special_tokens=True))

Companion artifacts

See also: union model

If you want both behaviors (refusal removed AND neutrality removed) on the same Gemma 4 weights, see the union model: treadon/gemma4-E4B-it-Abliterated-AND-Disinhibited-USE-THIS. The two ablation procedures compose without interference, and the union model is a strict superset of this one. Blog post on the compounding.

More from me

For other projects and writeups, see riteshkhanna.com, follow @treadon on X, or treadon on Hugging Face.

Downloads last month
79
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for treadon/gemma4-E4B-it-disinhibited

Finetuned
(165)
this model
Quantizations
2 models

Space using treadon/gemma4-E4B-it-disinhibited 1