| --- |
| tags: |
| - ml-intern |
| --- |
| # π‘οΈ HALT-RAG: Hallucination-Aware Retrieval-Augmented Generation |
|
|
| A complete, end-to-end research-style demo system for Google Colab (A100 GPU). |
|
|
| ## Quick Start |
|
|
| 1. Download `HALT_RAG_Demo.ipynb` |
| 2. Open in Google Colab |
| 3. Select **A100 GPU** runtime (Runtime β Change runtime type β A100) |
| 4. Run all cells |
|
|
| **Direct Colab link:** [Open in Colab](https://colab.research.google.com/github/huggingface/notebooks/blob/main/HALT_RAG_Demo.ipynb) *(or upload manually)* |
|
|
| ## What's Included |
|
|
| | Section | Description | |
| |---------|-------------| |
| | 1. Setup | Package installation + GPU verification | |
| | 2. Dataset | 55 synthetic test cases (5 domains, 3 difficulties) | |
| | 3. Retrieval | 3 strategies: Hybrid (BM25+FAISS RRF), Dense, Two-stage rerank | |
| | 4-5. RAG + Models | TinyLlama-1.1B-Chat, DistilGPT2, Extractive-Fallback | |
| | 6. Agents | PlannerAgent, ExecutorAgent, CriticAgent, LoggingAgent | |
| | 7. Tools | RetrievalTool, VerificationTool, KeywordSearchTool | |
| | 8. Hallucination Detection | Multi-signal: overlap, grounding, semantic similarity, factual indicators | |
| | 9. Logging | Structured `[PLANNER]`, `[EXECUTOR]`, `[CRITIC]`, `[LOG]` output | |
| | 10. Dynamic Update | `add_new_document()` β live KB updates without retraining | |
| | 11. Evaluation | Full pipeline over 495 total runs | |
| | 12. Plots | 6 matplotlib visualizations | |
| | 13. Summary | Best strategy/model, observations, limitations | |
|
|
| ## Requirements |
|
|
| - Google Colab Pro with A100 GPU |
| - ~3 GB VRAM (TinyLlama + DistilGPT2 + embeddings) |
| - ~10 minutes total runtime |
|
|
| ## Key Features |
|
|
| - **No external agent frameworks** β all agents are simple Python classes |
| - **No fabricated results** β all metrics computed from actual model outputs |
| - **Explicit limitations** stated at the end |
| - **Reproducible** β deterministic generation (do_sample=False) |
| |
| ## License |
| |
| MIT |
| |
| <!-- ml-intern-provenance --> |
| ## Generated by ML Intern |
| |
| This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub. |
| |
| - Try ML Intern: https://smolagents-ml-intern.hf.space |
| - Source code: https://github.com/huggingface/ml-intern |
| |
| ## Usage |
| |
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_id = "kevindoescode/HALT-RAG-Demo" |
| tokenizer = AutoTokenizer.from_pretrained(model_id) |
| model = AutoModelForCausalLM.from_pretrained(model_id) |
| ``` |
| |
| For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class. |
| |