Remove placeholder citation and fix Quick Start code
Browse files
README.md
CHANGED
|
@@ -13,7 +13,6 @@ language:
|
|
| 13 |
|
| 14 |
**TokenHD** is a token-level hallucination detector trained on top of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) using the TokenHD pipeline. It assigns a hallucination probability to each token in an LLM-generated response, enabling fine-grained localization of errors without requiring predefined step segmentation.
|
| 15 |
|
| 16 |
-
Paper: [Scalable Token-Level Hallucination Detection in Large Language Models](https://arxiv.org/abs/XXXX.XXXXX)
|
| 17 |
Code: [github.com/rmin2000/TokenHD](https://github.com/rmin2000/TokenHD)
|
| 18 |
|
| 19 |
---
|
|
@@ -37,18 +36,26 @@ import torch
|
|
| 37 |
|
| 38 |
model_id = "mr233/TokenHD-0.6B"
|
| 39 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 40 |
-
model = AutoModelForTokenClassification.from_pretrained(model_id)
|
| 41 |
model.eval()
|
| 42 |
|
| 43 |
-
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
with torch.no_grad():
|
| 46 |
-
logits = model(
|
| 47 |
-
scores = torch.sigmoid(logits).squeeze(-1).squeeze(0) # per-token hallucination probability
|
| 48 |
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
|
|
|
| 52 |
```
|
| 53 |
|
| 54 |
---
|
|
@@ -59,15 +66,3 @@ TokenHD models are evaluated with two metrics:
|
|
| 59 |
|
| 60 |
- **S_incor**: Token-level F1 on hallucinated (incorrect) responses — measures how precisely the detector localizes errors.
|
| 61 |
- **S_cor**: Recall on hallucination-free (correct) responses — measures how rarely the detector raises false alarms.
|
| 62 |
-
|
| 63 |
-
---
|
| 64 |
-
|
| 65 |
-
## Citation
|
| 66 |
-
|
| 67 |
-
```bibtex
|
| 68 |
-
@article{tokenhd2025,
|
| 69 |
-
title={Scalable Token-Level Hallucination Detection in Large Language Models},
|
| 70 |
-
author={Min, Rui and Pang, Tianyu and Du, Chao and Cheng, Minhao and Fung, Yi R.},
|
| 71 |
-
year={2025}
|
| 72 |
-
}
|
| 73 |
-
```
|
|
|
|
| 13 |
|
| 14 |
**TokenHD** is a token-level hallucination detector trained on top of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) using the TokenHD pipeline. It assigns a hallucination probability to each token in an LLM-generated response, enabling fine-grained localization of errors without requiring predefined step segmentation.
|
| 15 |
|
|
|
|
| 16 |
Code: [github.com/rmin2000/TokenHD](https://github.com/rmin2000/TokenHD)
|
| 17 |
|
| 18 |
---
|
|
|
|
| 36 |
|
| 37 |
model_id = "mr233/TokenHD-0.6B"
|
| 38 |
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 39 |
+
model = AutoModelForTokenClassification.from_pretrained(model_id, num_labels=1)
|
| 40 |
model.eval()
|
| 41 |
|
| 42 |
+
problem = "What is the capital of France?"
|
| 43 |
+
response = "The capital of France is London."
|
| 44 |
+
|
| 45 |
+
messages = [
|
| 46 |
+
{"role": "user", "content": problem},
|
| 47 |
+
{"role": "assistant", "content": response},
|
| 48 |
+
]
|
| 49 |
+
input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=False)[:-2]
|
| 50 |
+
input_tensor = torch.tensor(input_ids).unsqueeze(0)
|
| 51 |
+
|
| 52 |
with torch.no_grad():
|
| 53 |
+
logits = model(input_ids=input_tensor).logits # shape: (1, seq_len, 1)
|
|
|
|
| 54 |
|
| 55 |
+
# scores for response tokens only
|
| 56 |
+
response_ids = tokenizer.encode(response, add_special_tokens=False)
|
| 57 |
+
scores = torch.sigmoid(logits.squeeze(-1).squeeze(0))[-len(response_ids):]
|
| 58 |
+
# scores[i] is the hallucination probability for the i-th response token
|
| 59 |
```
|
| 60 |
|
| 61 |
---
|
|
|
|
| 66 |
|
| 67 |
- **S_incor**: Token-level F1 on hallucinated (incorrect) responses — measures how precisely the detector localizes errors.
|
| 68 |
- **S_cor**: Recall on hallucination-free (correct) responses — measures how rarely the detector raises false alarms.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|