Add README
Browse files
README.md
CHANGED
|
@@ -1,26 +1,46 @@
|
|
| 1 |
-
---
|
| 2 |
-
tags:
|
| 3 |
-
- ml-intern
|
| 4 |
-
---
|
| 5 |
|
| 6 |
-
|
| 7 |
|
| 8 |
-
|
| 9 |
-
## Generated by ML Intern
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
-
|
| 14 |
-
- Source code: https://github.com/huggingface/ml-intern
|
| 15 |
|
| 16 |
-
##
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
|
| 22 |
-
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 23 |
-
model = AutoModelForCausalLM.from_pretrained(model_id)
|
| 24 |
-
```
|
| 25 |
|
| 26 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 🛡️ HALT-RAG: Hallucination-Aware Retrieval-Augmented Generation
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
+
A complete, end-to-end research-style demo system for Google Colab (A100 GPU).
|
| 4 |
|
| 5 |
+
## Quick Start
|
|
|
|
| 6 |
|
| 7 |
+
1. Download `HALT_RAG_Demo.ipynb`
|
| 8 |
+
2. Open in Google Colab
|
| 9 |
+
3. Select **A100 GPU** runtime (Runtime → Change runtime type → A100)
|
| 10 |
+
4. Run all cells
|
| 11 |
|
| 12 |
+
**Direct Colab link:** [Open in Colab](https://colab.research.google.com/github/huggingface/notebooks/blob/main/HALT_RAG_Demo.ipynb) *(or upload manually)*
|
|
|
|
| 13 |
|
| 14 |
+
## What's Included
|
| 15 |
|
| 16 |
+
| Section | Description |
|
| 17 |
+
|---------|-------------|
|
| 18 |
+
| 1. Setup | Package installation + GPU verification |
|
| 19 |
+
| 2. Dataset | 55 synthetic test cases (5 domains, 3 difficulties) |
|
| 20 |
+
| 3. Retrieval | 3 strategies: Hybrid (BM25+FAISS RRF), Dense, Two-stage rerank |
|
| 21 |
+
| 4-5. RAG + Models | TinyLlama-1.1B-Chat, DistilGPT2, Extractive-Fallback |
|
| 22 |
+
| 6. Agents | PlannerAgent, ExecutorAgent, CriticAgent, LoggingAgent |
|
| 23 |
+
| 7. Tools | RetrievalTool, VerificationTool, KeywordSearchTool |
|
| 24 |
+
| 8. Hallucination Detection | Multi-signal: overlap, grounding, semantic similarity, factual indicators |
|
| 25 |
+
| 9. Logging | Structured `[PLANNER]`, `[EXECUTOR]`, `[CRITIC]`, `[LOG]` output |
|
| 26 |
+
| 10. Dynamic Update | `add_new_document()` — live KB updates without retraining |
|
| 27 |
+
| 11. Evaluation | Full pipeline over 495 total runs |
|
| 28 |
+
| 12. Plots | 6 matplotlib visualizations |
|
| 29 |
+
| 13. Summary | Best strategy/model, observations, limitations |
|
| 30 |
|
| 31 |
+
## Requirements
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
+
- Google Colab Pro with A100 GPU
|
| 34 |
+
- ~3 GB VRAM (TinyLlama + DistilGPT2 + embeddings)
|
| 35 |
+
- ~10 minutes total runtime
|
| 36 |
+
|
| 37 |
+
## Key Features
|
| 38 |
+
|
| 39 |
+
- **No external agent frameworks** — all agents are simple Python classes
|
| 40 |
+
- **No fabricated results** — all metrics computed from actual model outputs
|
| 41 |
+
- **Explicit limitations** stated at the end
|
| 42 |
+
- **Reproducible** — deterministic generation (do_sample=False)
|
| 43 |
+
|
| 44 |
+
## License
|
| 45 |
+
|
| 46 |
+
MIT
|