kevindoescode
/

HALT-RAG-Demo

Model card Files Files and versions

HALT-RAG-Demo / README.md

kevindoescode's picture

Update ML Intern artifact metadata

b5bcb21 verified about 24 hours ago

|

history blame contribute delete

2.54 kB

	---
	tags:
	- ml-intern
	---
	# 🛡️ HALT-RAG: Hallucination-Aware Retrieval-Augmented Generation

	A complete, end-to-end research-style demo system for Google Colab (A100 GPU).

	## Quick Start

	1. Download `HALT_RAG_Demo.ipynb`
	2. Open in Google Colab
	3. Select A100 GPU runtime (Runtime → Change runtime type → A100)
	4. Run all cells

	Direct Colab link: [Open in Colab](https://colab.research.google.com/github/huggingface/notebooks/blob/main/HALT_RAG_Demo.ipynb) (or upload manually)

	## What's Included

	\| Section \| Description \|
	\|---------\|-------------\|
	\| 1. Setup \| Package installation + GPU verification \|
	\| 2. Dataset \| 55 synthetic test cases (5 domains, 3 difficulties) \|
	\| 3. Retrieval \| 3 strategies: Hybrid (BM25+FAISS RRF), Dense, Two-stage rerank \|
	\| 4-5. RAG + Models \| TinyLlama-1.1B-Chat, DistilGPT2, Extractive-Fallback \|
	\| 6. Agents \| PlannerAgent, ExecutorAgent, CriticAgent, LoggingAgent \|
	\| 7. Tools \| RetrievalTool, VerificationTool, KeywordSearchTool \|
	\| 8. Hallucination Detection \| Multi-signal: overlap, grounding, semantic similarity, factual indicators \|
	\| 9. Logging \| Structured `[PLANNER]`, `[EXECUTOR]`, `[CRITIC]`, `[LOG]` output \|
	\| 10. Dynamic Update \| `add_new_document()` — live KB updates without retraining \|
	\| 11. Evaluation \| Full pipeline over 495 total runs \|
	\| 12. Plots \| 6 matplotlib visualizations \|
	\| 13. Summary \| Best strategy/model, observations, limitations \|

	## Requirements

	- Google Colab Pro with A100 GPU
	- ~3 GB VRAM (TinyLlama + DistilGPT2 + embeddings)
	- ~10 minutes total runtime

	## Key Features

	- No external agent frameworks — all agents are simple Python classes
	- No fabricated results — all metrics computed from actual model outputs
	- Explicit limitations stated at the end
	- Reproducible — deterministic generation (do_sample=False)

	## License

	MIT

	<!-- ml-intern-provenance -->
	## Generated by ML Intern

	This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.

	- Try ML Intern: https://smolagents-ml-intern.hf.space
	- Source code: https://github.com/huggingface/ml-intern

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "kevindoescode/HALT-RAG-Demo"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id)
	```

	For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.