lmprobe: Linear Probe on Qwen2.5-1.5B
Truth probe for 'The city of X is not in Y' (negated) statements. Near-perfect accuracy (99.7%) โ negation is properly encoded in truth representations at 1.5B scale.
Classes
- 0: false_statement
- 1: true_statement
Usage
from lmprobe import LinearProbe
probe = LinearProbe.from_hub("latent-lab/neg-cities-truth-qwen2.5-1.5b", trust_classifier=True)
predictions = probe.predict(["your text here"])
Probe Details
- Base model:
Qwen/Qwen2.5-1.5B - Model revision:
8faed761d45a263340a0528343f099c05c9a4323 - Layers: all (0โ27, 28 layers)
- Pooling: last_token
- Classifier: logistic_regression
- Task: classification
- Random state: 42
Evaluation
| Metric | Value |
|---|---|
| accuracy | 0.9967 |
| auroc | 1.0000 |
| f1 | 0.9967 |
| precision | 0.9934 |
| recall | 1.0000 |
Training Data
Positive examples: 598
Negative examples: 598
Positive hash:
sha256:d56c622bb238b4fc7fe6af316ea83bda26ddbafa8b2abd69d12339578e3ddce3Negative hash:
sha256:1e025516c05fc715dd18c40041035caee2e30fe91596e7e04422963e5b56f46aEvaluation samples: 300
Evaluation hash:
sha256:d9cce3adc1ba4e9c7401399afb3e403c6dd3f9fca232d6fbb927c63cd2f079e4
Reproducibility
- lmprobe version: 0.5.8
- Python: 3.12.3
- PyTorch: 2.10.0+cu128
- scikit-learn: 1.8.0
- transformers: 5.3.0
Model tree for latent-lab/neg-cities-truth-qwen2.5-1.5b
Base model
Qwen/Qwen2.5-1.5B