mr233
/

TokenHD-0.6B

@@ -13,7 +13,6 @@ language:
 **TokenHD** is a token-level hallucination detector trained on top of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) using the TokenHD pipeline. It assigns a hallucination probability to each token in an LLM-generated response, enabling fine-grained localization of errors without requiring predefined step segmentation.
-Paper: [Scalable Token-Level Hallucination Detection in Large Language Models](https://arxiv.org/abs/XXXX.XXXXX)
 Code: [github.com/rmin2000/TokenHD](https://github.com/rmin2000/TokenHD)
 ---
@@ -37,18 +36,26 @@ import torch
 model_id = "mr233/TokenHD-0.6B"
 tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForTokenClassification.from_pretrained(model_id)
 model.eval()
-text = "The capital of France is London."
-inputs = tokenizer(text, return_tensors="pt")
 with torch.no_grad():
-    logits = model(**inputs).logits  # shape: (1, seq_len, 1)
-scores = torch.sigmoid(logits).squeeze(-1).squeeze(0)  # per-token hallucination probability
-tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
-for tok, score in zip(tokens, scores.tolist()):
-    print(f"{tok:20s} {score:.3f}")
 ```
 ---
@@ -59,15 +66,3 @@ TokenHD models are evaluated with two metrics:
 - **S_incor**: Token-level F1 on hallucinated (incorrect) responses — measures how precisely the detector localizes errors.
 - **S_cor**: Recall on hallucination-free (correct) responses — measures how rarely the detector raises false alarms.
----
-## Citation
-```bibtex
-@article{tokenhd2025,
-  title={Scalable Token-Level Hallucination Detection in Large Language Models},
-  author={Min, Rui and Pang, Tianyu and Du, Chao and Cheng, Minhao and Fung, Yi R.},
-  year={2025}
-}
-```

 **TokenHD** is a token-level hallucination detector trained on top of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) using the TokenHD pipeline. It assigns a hallucination probability to each token in an LLM-generated response, enabling fine-grained localization of errors without requiring predefined step segmentation.
 Code: [github.com/rmin2000/TokenHD](https://github.com/rmin2000/TokenHD)
 ---
 model_id = "mr233/TokenHD-0.6B"
 tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForTokenClassification.from_pretrained(model_id, num_labels=1)
 model.eval()
+problem = "What is the capital of France?"
+response = "The capital of France is London."
+messages = [
+    {"role": "user", "content": problem},
+    {"role": "assistant", "content": response},
+]
+input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=False)[:-2]
+input_tensor = torch.tensor(input_ids).unsqueeze(0)
 with torch.no_grad():
+    logits = model(input_ids=input_tensor).logits  # shape: (1, seq_len, 1)
+# scores for response tokens only
+response_ids = tokenizer.encode(response, add_special_tokens=False)
+scores = torch.sigmoid(logits.squeeze(-1).squeeze(0))[-len(response_ids):]
+# scores[i] is the hallucination probability for the i-th response token
 ```
 ---
 - **S_incor**: Token-level F1 on hallucinated (incorrect) responses — measures how precisely the detector localizes errors.
 - **S_cor**: Recall on hallucination-free (correct) responses — measures how rarely the detector raises false alarms.