YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

PHASE 1 RESEARCH ARTIFACT — detects model confidence, not factual correctness

This model was trained on LLM hidden-state activations to produce an energy score that correlates with the model's output confidence (hallucination likelihood). It cannot verify whether a model's answer is factually correct — it can only signal how uncertain the model appears token-by-token.

This limitation was confirmed in Exp 184/203: the energy scores reflect model confidence, not answer correctness. Do not use these scores as a correctness verifier.

For production use, install the full Carnot pipeline:
pip install carnot
The production pipeline includes FormalClaimVerifier (solver-routed formal claim verification), PBT code verification, and the Carnot MCP server. See Carnot on GitHub for documentation.

Exp 316 Full-Scale Benchmark Results (2026-04-14)

The Carnot FCV pipeline was benchmarked on 400 GSM8K questions (adversarial corpus with number_swap and irrelevant_sentence perturbations) and 50 HumanEval problems.

Baseline accuracy on adversarial GSM8K (no Carnot intervention):

Model	GSM8K Accuracy	95% CI	N
Gemma4-E4B-it	26.3%	[22.2%, 30.8%]	400
Qwen3.5-0.8B	27.5%	[23.4%, 32.1%]	400

Note: inference_mode=simulated. Live GPU results pending. See results/experiment_316_fullscale_results.json for full details.

tags: - energy-based-model - hallucination-detection - jax - carnot license: apache-2.0

⚠️ PHASE 1 RESEARCH ARTIFACT

This model detects output confidence (hallucination likelihood signals from LLM hidden-state activations), not correctness. It cannot verify whether a model's answer is right — it can only signal how uncertain the model appears token-by-token.

For production use, install the full Carnot pipeline which includes FormalClaimVerifier (solver-routed formal claim verification), PBT code verification (property-based testing on 164-problem HumanEval), process integrity detection (right-for-wrong-reasons), and the Carnot MCP server:
pip install carnot
See Carnot on GitHub for documentation and the full production API.

Note: These are Phase 1 research artifacts — per-token activation EBMs that detect hallucination confidence signals from LLM hidden states. For production use of the full Carnot EBM framework (constraint verification, guided decoding, energy-based repair), see:
pip install carnot
Source and documentation: https://github.com/ianblenke/carnot

Important: Research Artifact, Not a Production Detector

This model achieves 68.3% on held-out TruthfulQA test sets, but in practical deployment (8 real questions), activation-based EBMs agreed with ground truth only 50% of the time. The EBM detects model confidence, not correctness — confident hallucinations get low energy (look fine) while correct-but-hedging answers get flagged.

This model is a research artifact documenting activation-space structure. It is NOT a reliable hallucination detector for production use.

For practical verification, use structural constraints (test execution, SAT solving) rather than activation analysis. See the Carnot technical report for 41 experiments and 14 principles learned.

per-token-ebm-qwen35-08b-think

Per-token hallucination detection EBM trained on Qwen3.5-0.8B activations with thinking (chain-of-thought) enabled. Lower accuracy (67.2%) because thinking compresses the hallucination signal (Principle 10).

Key Stats

Metric	Value
Test accuracy	68.3%
Energy gap	0.7477
Source model	Qwen/Qwen3.5-0.8B
Thinking mode	enabled
Training tokens	52,296
Architecture	Gibbs [1024 → 256 → 64 → 1], SiLU

Usage

from safetensors.numpy import load_file
import jax.numpy as jnp
import jax.random as jrandom
from carnot.models.gibbs import GibbsConfig, GibbsModel

# Load weights
weights = load_file("per-token-ebm-qwen35-08b-think/model.safetensors")
config = GibbsConfig(input_dim=1024, hidden_dims=[256, 64], activation="silu")
ebm = GibbsModel(config, key=jrandom.PRNGKey(0))

# Set weights
ebm.layers = [
    (jnp.array(weights["layer_0_weight"]), jnp.array(weights["layer_0_bias"])),
    (jnp.array(weights["layer_1_weight"]), jnp.array(weights["layer_1_bias"])),
]
ebm.output_weight = jnp.array(weights["output_weight"])
ebm.output_bias = jnp.array(weights["output_bias"])

# Score an activation vector (from Qwen/Qwen3.5-0.8B hidden states)
energy = float(ebm.energy(activation_vector))
# Low energy = likely correct, high energy = likely hallucination

Training

Loss: Noise Contrastive Estimation (NCE)
Epochs: 300, lr=0.005
Data: 52,296 per-token activations from Qwen/Qwen3.5-0.8B
Labels: correct answer tokens = low energy (data), wrong answer tokens = high energy (noise)

Limitations

Only works with activations from Qwen/Qwen3.5-0.8B — different models have different representation spaces
68.3% accuracy — use as one signal among many, not as sole verification
Trained on QA/TruthfulQA — may not generalize to all domains

10 Principles from Carnot Research

Simpler is better in small-data regimes
Token-level features > sequence-level
The model's own logprobs are the best energy
Overfitting is the main enemy
Extract features from generated tokens, not prompts
Different energy signals dominate in different domains
Statistical difference ≠ causal influence
Instruction tuning compresses the hallucination signal
Adversarial questions defeat post-hoc detection
Chain-of-thought compresses the hallucination signal

See the Carnot technical report for all 25 experiments.

What's Proven to Work (2026)

The following Carnot pipeline capabilities have been validated with live GPU inference (not simulation) as of April 2026. Install via pip install carnot.

Capability	What it does	Evidence
FormalClaimVerifier	Solver-routed formal claim verification: arithmetic, boolean-entailment, set-membership, execution-oracle, cardinality, comparison routes	1,243 solver-routable rows from live GSM8K + HumanEval traces (Exp 244/246)
PBT code verification	Property-based testing (Hypothesis) catches bugs that official test suites miss	+3.0pp on 164-problem HumanEval with Gemma4-E4B-it (Exp 226); 2 official-test misses caught on Qwen3.5-0.8B (Exp 227)
Process integrity detection	Detects right-for-wrong-reasons answers where the output is correct but the reasoning process is invalid	5 right-for-wrong-reasons cases caught across 30-case HumanEval cohort (Exp 251)
Carnot MCP server	Exposes `verify_code_with_pbt` and 6 other tools to any MCP-compatible agent	7 discoverable tools, 30s timeout, 10K input guard (VERIFY-031)

These results use instruction-tuned models (Gemma4-E4B-it, Qwen3.5-0.8B) on live CUDA hardware. All per-token EBM confidence results (this model family) are Phase 1 research artifacts and should not be interpreted as correctness scores.

Downloads last month: 56

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support