mr233 commited on
Commit
9f88fee
·
verified ·
1 Parent(s): 45350f4

Remove placeholder citation and fix Quick Start code

Browse files
Files changed (1) hide show
  1. README.md +16 -21
README.md CHANGED
@@ -13,7 +13,6 @@ language:
13
 
14
  **TokenHD** is a token-level hallucination detector trained on top of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) using the TokenHD pipeline. It assigns a hallucination probability to each token in an LLM-generated response, enabling fine-grained localization of errors without requiring predefined step segmentation.
15
 
16
- Paper: [Scalable Token-Level Hallucination Detection in Large Language Models](https://arxiv.org/abs/XXXX.XXXXX)
17
  Code: [github.com/rmin2000/TokenHD](https://github.com/rmin2000/TokenHD)
18
 
19
  ---
@@ -37,18 +36,26 @@ import torch
37
 
38
  model_id = "mr233/TokenHD-8B"
39
  tokenizer = AutoTokenizer.from_pretrained(model_id)
40
- model = AutoModelForTokenClassification.from_pretrained(model_id)
41
  model.eval()
42
 
43
- text = "The capital of France is London."
44
- inputs = tokenizer(text, return_tensors="pt")
 
 
 
 
 
 
 
 
45
  with torch.no_grad():
46
- logits = model(**inputs).logits # shape: (1, seq_len, 1)
47
- scores = torch.sigmoid(logits).squeeze(-1).squeeze(0) # per-token hallucination probability
48
 
49
- tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
50
- for tok, score in zip(tokens, scores.tolist()):
51
- print(f"{tok:20s} {score:.3f}")
 
52
  ```
53
 
54
  ---
@@ -59,15 +66,3 @@ TokenHD models are evaluated with two metrics:
59
 
60
  - **S_incor**: Token-level F1 on hallucinated (incorrect) responses — measures how precisely the detector localizes errors.
61
  - **S_cor**: Recall on hallucination-free (correct) responses — measures how rarely the detector raises false alarms.
62
-
63
- ---
64
-
65
- ## Citation
66
-
67
- ```bibtex
68
- @article{tokenhd2025,
69
- title={Scalable Token-Level Hallucination Detection in Large Language Models},
70
- author={Min, Rui and Pang, Tianyu and Du, Chao and Cheng, Minhao and Fung, Yi R.},
71
- year={2025}
72
- }
73
- ```
 
13
 
14
  **TokenHD** is a token-level hallucination detector trained on top of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) using the TokenHD pipeline. It assigns a hallucination probability to each token in an LLM-generated response, enabling fine-grained localization of errors without requiring predefined step segmentation.
15
 
 
16
  Code: [github.com/rmin2000/TokenHD](https://github.com/rmin2000/TokenHD)
17
 
18
  ---
 
36
 
37
  model_id = "mr233/TokenHD-8B"
38
  tokenizer = AutoTokenizer.from_pretrained(model_id)
39
+ model = AutoModelForTokenClassification.from_pretrained(model_id, num_labels=1)
40
  model.eval()
41
 
42
+ problem = "What is the capital of France?"
43
+ response = "The capital of France is London."
44
+
45
+ messages = [
46
+ {"role": "user", "content": problem},
47
+ {"role": "assistant", "content": response},
48
+ ]
49
+ input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=False)[:-2]
50
+ input_tensor = torch.tensor(input_ids).unsqueeze(0)
51
+
52
  with torch.no_grad():
53
+ logits = model(input_ids=input_tensor).logits # shape: (1, seq_len, 1)
 
54
 
55
+ # scores for response tokens only
56
+ response_ids = tokenizer.encode(response, add_special_tokens=False)
57
+ scores = torch.sigmoid(logits.squeeze(-1).squeeze(0))[-len(response_ids):]
58
+ # scores[i] is the hallucination probability for the i-th response token
59
  ```
60
 
61
  ---
 
66
 
67
  - **S_incor**: Token-level F1 on hallucinated (incorrect) responses — measures how precisely the detector localizes errors.
68
  - **S_cor**: Recall on hallucination-free (correct) responses — measures how rarely the detector raises false alarms.