kevindoescode
/

HALT-RAG-Demo

Model card Files Files and versions

kevindoescode commited on about 19 hours ago

Commit

1ebeb7f

·

verified ·

1 Parent(s): 9b88a42

Add README

Files changed (1) hide show

README.md +38 -18

README.md CHANGED Viewed

@@ -1,26 +1,46 @@
----
-tags:
-- ml-intern
----
-# kevindoescode/HALT-RAG-Demo
-<!-- ml-intern-provenance -->
-## Generated by ML Intern
-This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
-- Try ML Intern: https://smolagents-ml-intern.hf.space
-- Source code: https://github.com/huggingface/ml-intern
-## Usage
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_id = "kevindoescode/HALT-RAG-Demo"
-tokenizer = AutoTokenizer.from_pretrained(model_id)
-model = AutoModelForCausalLM.from_pretrained(model_id)
-```
-For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.

+# 🛡️ HALT-RAG: Hallucination-Aware Retrieval-Augmented Generation
+A complete, end-to-end research-style demo system for Google Colab (A100 GPU).
+## Quick Start
+1. Download `HALT_RAG_Demo.ipynb`
+2. Open in Google Colab
+3. Select **A100 GPU** runtime (Runtime → Change runtime type → A100)
+4. Run all cells
+**Direct Colab link:** [Open in Colab](https://colab.research.google.com/github/huggingface/notebooks/blob/main/HALT_RAG_Demo.ipynb) *(or upload manually)*
+## What's Included
+| Section | Description |
+|---------|-------------|
+| 1. Setup | Package installation + GPU verification |
+| 2. Dataset | 55 synthetic test cases (5 domains, 3 difficulties) |
+| 3. Retrieval | 3 strategies: Hybrid (BM25+FAISS RRF), Dense, Two-stage rerank |
+| 4-5. RAG + Models | TinyLlama-1.1B-Chat, DistilGPT2, Extractive-Fallback |
+| 6. Agents | PlannerAgent, ExecutorAgent, CriticAgent, LoggingAgent |
+| 7. Tools | RetrievalTool, VerificationTool, KeywordSearchTool |
+| 8. Hallucination Detection | Multi-signal: overlap, grounding, semantic similarity, factual indicators |
+| 9. Logging | Structured `[PLANNER]`, `[EXECUTOR]`, `[CRITIC]`, `[LOG]` output |
+| 10. Dynamic Update | `add_new_document()` — live KB updates without retraining |
+| 11. Evaluation | Full pipeline over 495 total runs |
+| 12. Plots | 6 matplotlib visualizations |
+| 13. Summary | Best strategy/model, observations, limitations |
+## Requirements
+- Google Colab Pro with A100 GPU
+- ~3 GB VRAM (TinyLlama + DistilGPT2 + embeddings)
+- ~10 minutes total runtime
+## Key Features
+- **No external agent frameworks** — all agents are simple Python classes
+- **No fabricated results** — all metrics computed from actual model outputs
+- **Explicit limitations** stated at the end
+- **Reproducible** — deterministic generation (do_sample=False)
+## License
+MIT