๐ก๏ธ HALT-RAG: Hallucination-Aware Retrieval-Augmented Generation
A complete, end-to-end research-style demo system for Google Colab (A100 GPU).
Quick Start
- Download
HALT_RAG_Demo.ipynb - Open in Google Colab
- Select A100 GPU runtime (Runtime โ Change runtime type โ A100)
- Run all cells
Direct Colab link: Open in Colab (or upload manually)
What's Included
| Section | Description |
|---|---|
| 1. Setup | Package installation + GPU verification |
| 2. Dataset | 55 synthetic test cases (5 domains, 3 difficulties) |
| 3. Retrieval | 3 strategies: Hybrid (BM25+FAISS RRF), Dense, Two-stage rerank |
| 4-5. RAG + Models | TinyLlama-1.1B-Chat, DistilGPT2, Extractive-Fallback |
| 6. Agents | PlannerAgent, ExecutorAgent, CriticAgent, LoggingAgent |
| 7. Tools | RetrievalTool, VerificationTool, KeywordSearchTool |
| 8. Hallucination Detection | Multi-signal: overlap, grounding, semantic similarity, factual indicators |
| 9. Logging | Structured [PLANNER], [EXECUTOR], [CRITIC], [LOG] output |
| 10. Dynamic Update | add_new_document() โ live KB updates without retraining |
| 11. Evaluation | Full pipeline over 495 total runs |
| 12. Plots | 6 matplotlib visualizations |
| 13. Summary | Best strategy/model, observations, limitations |
Requirements
- Google Colab Pro with A100 GPU
- ~3 GB VRAM (TinyLlama + DistilGPT2 + embeddings)
- ~10 minutes total runtime
Key Features
- No external agent frameworks โ all agents are simple Python classes
- No fabricated results โ all metrics computed from actual model outputs
- Explicit limitations stated at the end
- Reproducible โ deterministic generation (do_sample=False)
License
MIT
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "kevindoescode/HALT-RAG-Demo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support