Instructions to use jeremygracey-ai/FetchMerck_AI with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jeremygracey-ai/FetchMerck_AI with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jeremygracey-ai/FetchMerck_AI",
	filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

llama-cpp-python

How to use jeremygracey-ai/FetchMerck_AI with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jeremygracey-ai/FetchMerck_AI",
	filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use jeremygracey-ai/FetchMerck_AI with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jeremygracey-ai/FetchMerck_AI:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf jeremygracey-ai/FetchMerck_AI:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf jeremygracey-ai/FetchMerck_AI:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf jeremygracey-ai/FetchMerck_AI:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf jeremygracey-ai/FetchMerck_AI:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf jeremygracey-ai/FetchMerck_AI:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf jeremygracey-ai/FetchMerck_AI:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf jeremygracey-ai/FetchMerck_AI:Q4_K_M

Use Docker

docker model run hf.co/jeremygracey-ai/FetchMerck_AI:Q4_K_M

LM Studio
Jan

vLLM

How to use jeremygracey-ai/FetchMerck_AI with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jeremygracey-ai/FetchMerck_AI"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jeremygracey-ai/FetchMerck_AI",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/jeremygracey-ai/FetchMerck_AI:Q4_K_M

Ollama
How to use jeremygracey-ai/FetchMerck_AI with Ollama:
```
ollama run hf.co/jeremygracey-ai/FetchMerck_AI:Q4_K_M
```

Unsloth Studio new

How to use jeremygracey-ai/FetchMerck_AI with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jeremygracey-ai/FetchMerck_AI to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jeremygracey-ai/FetchMerck_AI to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jeremygracey-ai/FetchMerck_AI to start chatting

Docker Model Runner
How to use jeremygracey-ai/FetchMerck_AI with Docker Model Runner:
```
docker model run hf.co/jeremygracey-ai/FetchMerck_AI:Q4_K_M
```

Lemonade

How to use jeremygracey-ai/FetchMerck_AI with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull jeremygracey-ai/FetchMerck_AI:Q4_K_M

Run and chat with the model

lemonade run user.FetchMerck_AI-Q4_K_M

List all available models

lemonade list

FetchMerck_AI

File size: 4,515 Bytes

ae0caec
 
 
772bb05
 
 
 
 
 
 
 
 
 
ae0caec
772bb05
 
 
 
 
b739825
772bb05
 
 
 
684d69b
772bb05
684d69b

---
license: apache-2.0
language:
  - en
tags:
  - gguf
  - rag
  - healthcare
  - clinical-decision-support
  - medical
  - merck-manual
  - retrieval-augmented-generation
  - mistral
base_model:
  - TheBloke/Mistral-7B-v0.1-GGUF
pipeline_tag: text-generation
datasets:
  - custom
library_name: llama-cpp-python
---

# FetchMerck_AI

**A RAG-based clinical decision support system powered by the Merck Manuals**
FetchMerck_AI is a Retrieval-Augmented Generation (RAG) solution designed to help healthcare providers streamline clinical decision-making by surfacing relevant medical knowledge from the Merck Manuals in real time. The system retrieves contextually relevant passages from over 4,000 pages of medical reference content spanning 23 clinical sections, then generates grounded, citation-backed responses using a quantized Mistral-7B model.

## Key Objectives

- **Streamline clinical decision-making** — Surface relevant diagnostic and treatment information at the point of care
- **Analyze impact on diagnostics and patient outcomes** — Evaluate how RAG-assisted retrieval affects clinical reasoning quality
- **Standardize care practices** — Leverage a trusted, evidence-based reference to reduce variation in clinical decisions
- **Demonstrate feasibility** — Provide a functional prototype showing real-world applicability of RAG in healthcare settings

## Architecture

| Component | Details |
|-----------|---------|
| **LLM** | Mistral-7B-v0.1 (GGUF quantized) |
| **Retrieval** | RAG pipeline over vectorized Merck Manual content |
| **Knowledge Base** | Merck Manuals — 4,000+ page PDF covering 23 medical sections (disorders, diagnostics, drugs, tests) |
| **Framework** | LangChain + llama-cpp-python |

### How It Works

1. **Document Ingestion** — The Merck Manual PDF is chunked and embedded into a vector store
2. **Query Processing** — A provider's clinical question is embedded and matched against the knowledge base
3. **Contextual Retrieval** — The most relevant passages are retrieved with source attribution
4. **Grounded Generation** — Mistral-7B generates a response grounded in the retrieved evidence, reducing hallucination risk

## About the Merck Manuals

The Merck Manuals are medical reference books published by the American pharmaceutical company Merck & Co. since 1899. They cover a comprehensive range of medical topics including disorders, tests, diagnoses, and drugs across 23 clinical sections. The manuals are widely regarded as one of the most trusted general medical references available.

## Intended Use

- **Primary users:** Healthcare providers, clinical researchers, medical educators
- **Use case:** Point-of-care decision support, clinical education, care standardization research
- **Setting:** Research and prototyping — not intended for production clinical deployment without further validation

## Limitations

- This is a **research prototype** demonstrating RAG feasibility in healthcare; it has not been validated for clinical production use
- Responses are grounded in the Merck Manual content and may not reflect the latest clinical guidelines or institution-specific protocols
- The system should augment — never replace — clinical judgment
- Performance depends on retrieval quality; edge cases or highly specialized queries may yield suboptimal results

## Ethical Considerations

- **Patient safety:** This tool is designed as a decision *support* system, not an autonomous diagnostic agent
- **Bias:** The knowledge base reflects the scope and perspective of the Merck Manuals; providers should cross-reference with additional sources for complex cases
- **Privacy:** The system processes queries only — no patient data is stored or transmitted

## Citation

If you use FetchMerck_AI in your research, please cite:

```bibtex
@misc{gracey2026fetchmerck,
title={FetchMerck_AI: RAG-Based Clinical Decision Support Using the Merck Manuals},
author={Gracey, Jeremy},
year={2026},
publisher={Hugging Face},
doi={10.57967/hf/8101},
url={https://huggingface.co/jeremygracey-ai/FetchMerck_AI}
}
```

## Author

**Jeremy Gracey** — Clinical healthcare professional (8+ years) transitioning into healthcare AI/ML. Currently completing an AI/ML certificate at UT Austin McCombs School of Business.

- Hugging Face: @jeremygracey-ai
- Background: Anesthesia Technician & Psychiatric Technician → AI/ML Engineer
- Focus: Building AI systems that bridge the gap between clinical frontline experience and modern ML infrastructure