Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Paper β’ 2306.03341 β’ Published
This repository contains Inference-Time Intervention (ITI) components for enhancing creativity in code generation with LLaMA 3.1 8B Instruct.
ITI modifies model activations during inference to steer behavior without retraining - think of it as "creativity steering" for AI code generation.
pip install transformers torch numpy
import pickle
import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load ITI components
with open('iti_config.json', 'r') as f:
config = json.load(f)
with open('iti_components.pkl', 'rb') as f:
components = pickle.load(f)
# Initialize model
model_name = "meta-llama/Meta-Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Apply ITI with Ξ±=0.4
alpha = config['metadata']['alpha']
directions = components['directions']
top_heads = components['top_heads']
| Metric | Value |
|---|---|
| Training Samples | 48 (balanced) |
| Validation Accuracy | 62.5% |
| Test Accuracy | 68.8% |
| Optimal Alpha (Ξ±) | 0.4 |
| Intervention Heads | 48 |
| Best Single Layer | Layer 3 |
| Top Head | Layer 17, Head 21 (AUC=0.734) |
iti_config.json: Configuration, metadata, and intervention directionsiti_components.pkl: Binary format with top heads and directionsREADME.md: This documentationProblem: "Check if a number is prime"
Without ITI (Baseline):
def is_prime(n):
if n <= 1:
return False
for i in range(2, n):
if n % i == 0:
return False
return True
With ITI (Ξ±=0.4):
def is_prime(n):
return n > 1 and all(n % i for i in range(2, int(n**0.5) + 1))
The ITI version is more concise and uses advanced techniques (generator expression, all()).
Trained on the NeoCoder dataset:
If you use this work, please cite:
@article{li2023inference,
title={Inference-Time Intervention: Eliciting Truthful Answers from a Language Model},
author={Li, Kenneth and others},
journal={NeurIPS},
year={2023}
}
Apache 2.0 - See LICENSE file for details
Base model
meta-llama/Llama-3.1-8B