File size: 2,543 Bytes
b5bcb21 1ebeb7f 862e34c 1ebeb7f 862e34c 1ebeb7f 862e34c 1ebeb7f 862e34c 1ebeb7f 862e34c 1ebeb7f 862e34c 1ebeb7f 862e34c 1ebeb7f 862e34c 1ebeb7f b5bcb21 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | ---
tags:
- ml-intern
---
# 🛡️ HALT-RAG: Hallucination-Aware Retrieval-Augmented Generation
A complete, end-to-end research-style demo system for Google Colab (A100 GPU).
## Quick Start
1. Download `HALT_RAG_Demo.ipynb`
2. Open in Google Colab
3. Select **A100 GPU** runtime (Runtime → Change runtime type → A100)
4. Run all cells
**Direct Colab link:** [Open in Colab](https://colab.research.google.com/github/huggingface/notebooks/blob/main/HALT_RAG_Demo.ipynb) *(or upload manually)*
## What's Included
| Section | Description |
|---------|-------------|
| 1. Setup | Package installation + GPU verification |
| 2. Dataset | 55 synthetic test cases (5 domains, 3 difficulties) |
| 3. Retrieval | 3 strategies: Hybrid (BM25+FAISS RRF), Dense, Two-stage rerank |
| 4-5. RAG + Models | TinyLlama-1.1B-Chat, DistilGPT2, Extractive-Fallback |
| 6. Agents | PlannerAgent, ExecutorAgent, CriticAgent, LoggingAgent |
| 7. Tools | RetrievalTool, VerificationTool, KeywordSearchTool |
| 8. Hallucination Detection | Multi-signal: overlap, grounding, semantic similarity, factual indicators |
| 9. Logging | Structured `[PLANNER]`, `[EXECUTOR]`, `[CRITIC]`, `[LOG]` output |
| 10. Dynamic Update | `add_new_document()` — live KB updates without retraining |
| 11. Evaluation | Full pipeline over 495 total runs |
| 12. Plots | 6 matplotlib visualizations |
| 13. Summary | Best strategy/model, observations, limitations |
## Requirements
- Google Colab Pro with A100 GPU
- ~3 GB VRAM (TinyLlama + DistilGPT2 + embeddings)
- ~10 minutes total runtime
## Key Features
- **No external agent frameworks** — all agents are simple Python classes
- **No fabricated results** — all metrics computed from actual model outputs
- **Explicit limitations** stated at the end
- **Reproducible** — deterministic generation (do_sample=False)
## License
MIT
<!-- ml-intern-provenance -->
## Generated by ML Intern
This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "kevindoescode/HALT-RAG-Demo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
```
For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.
|