Spaces:

GlokalAI
/

OrgAI

Running

App Files Files Community

OrgAI / rag_anything_smaranika /README.md

Phonex

TheTruthSchool_RAG

167596f 7 months ago

preview code

raw

history blame contribute delete

53 kB

	<div align="center">

	<div style="margin: 20px 0;">
	<img src="./assets/logo.png" width="120" height="120" alt="RAG-Anything Logo" style="border-radius: 20px; box-shadow: 0 8px 32px rgba(0, 217, 255, 0.3);">
	</div>

	# 🚀 RAG-Anything: All-in-One RAG Framework

	<a href="https://trendshift.io/repositories/14959" target="_blank"><img src="https://trendshift.io/api/badge/repositories/14959" alt="HKUDS%2FRAG-Anything \| Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a>

	<div align="center">
	<img src="https://readme-typing-svg.herokuapp.com?font=Orbitron&size=24&duration=3000&pause=1000&color=00D9FF&center=true&vCenter=true&width=600&lines=Welcome+to+RAG-Anything;Next-Gen+Multimodal+RAG+System;Powered+by+Advanced+AI+Technology" alt="Typing Animation" />
	</div>

	<div align="center">
	<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 15px; padding: 25px; text-align: center;">
	<p>
	<a href='https://github.com/HKUDS/RAG-Anything'><img src='https://img.shields.io/badge/🔥Project-Page-00d9ff?style=for-the-badge&logo=github&logoColor=white&labelColor=1a1a2e'></a>
	<a href='https://arxiv.org/abs/2410.05779'><img src='https://img.shields.io/badge/📄arXiv-2410.05779-ff6b6b?style=for-the-badge&logo=arxiv&logoColor=white&labelColor=1a1a2e'></a>
	<a href='https://github.com/HKUDS/LightRAG'><img src='https://img.shields.io/badge/⚡Based%20on-LightRAG-4ecdc4?style=for-the-badge&logo=lightning&logoColor=white&labelColor=1a1a2e'></a>
	</p>
	<p>
	<a href="https://github.com/HKUDS/RAG-Anything/stargazers"><img src='https://img.shields.io/github/stars/HKUDS/RAG-Anything?color=00d9ff&style=for-the-badge&logo=star&logoColor=white&labelColor=1a1a2e' /></a>
	<img src="https://img.shields.io/badge/🐍Python-3.10-4ecdc4?style=for-the-badge&logo=python&logoColor=white&labelColor=1a1a2e">
	<a href="https://pypi.org/project/raganything/"><img src="https://img.shields.io/pypi/v/raganything.svg?style=for-the-badge&logo=pypi&logoColor=white&labelColor=1a1a2e&color=ff6b6b"></a>
	<a href="https://github.com/astral-sh/uv"><img src="https://img.shields.io/badge/⚡uv-Ready-ff6b6b?style=for-the-badge&logo=python&logoColor=white&labelColor=1a1a2e"></a>
	</p>
	<p>
	<a href="https://discord.gg/yF2MmDJyGJ"><img src="https://img.shields.io/badge/💬Discord-Community-7289da?style=for-the-badge&logo=discord&logoColor=white&labelColor=1a1a2e"></a>
	<a href="https://github.com/HKUDS/RAG-Anything/issues/7"><img src="https://img.shields.io/badge/💬WeChat-Group-07c160?style=for-the-badge&logo=wechat&logoColor=white&labelColor=1a1a2e"></a>
	</p>
	<p>
	<a href="README_zh.md"><img src="https://img.shields.io/badge/🇨🇳中文版-1a1a2e?style=for-the-badge"></a>
	<a href="README.md"><img src="https://img.shields.io/badge/🇺🇸English-1a1a2e?style=for-the-badge"></a>
	</p>
	</div>
	</div>

	</div>

	<div align="center">
	<div style="width: 100%; height: 2px; margin: 20px 0; background: linear-gradient(90deg, transparent, #00d9ff, transparent);"></div>
	</div>

	<div align="center">
	<a href="#-quick-start" style="text-decoration: none;">
	<img src="https://img.shields.io/badge/Quick%20Start-Get%20Started%20Now-00d9ff?style=for-the-badge&logo=rocket&logoColor=white&labelColor=1a1a2e">
	</a>
	</div>

	---

	## 🎉 News
	- [X] [2025.08.12]🎯📢 🔍 RAG-Anything now features VLM-Enhanced Query mode! When documents include images, the system seamlessly integrates them into VLM for advanced multimodal analysis, combining visual and textual context for deeper insights.
	- [X] [2025.07.05]🎯📢 RAG-Anything now features a [context configuration module](docs/context_aware_processing.md), enabling intelligent integration of relevant contextual information to enhance multimodal content processing.
	- [X] [2025.07.04]🎯📢 🚀 RAG-Anything now supports multimodal query capabilities, enabling enhanced RAG with seamless processing of text, images, tables, and equations.
	- [X] [2025.07.03]🎯📢 🎉 RAG-Anything has reached 1k🌟 stars on GitHub! Thank you for your incredible support and valuable contributions to the project.

	---

	## 🌟 System Overview

	Next-Generation Multimodal Intelligence

	<div style="background: linear-gradient(135deg, #1a1a2e 0%, #16213e 50%, #0f3460 100%); border-radius: 15px; padding: 25px; margin: 20px 0; border: 2px solid #00d9ff; box-shadow: 0 0 30px rgba(0, 217, 255, 0.3);">

	Modern documents increasingly contain diverse multimodal content—text, images, tables, equations, charts, and multimedia—that traditional text-focused RAG systems cannot effectively process. RAG-Anything addresses this challenge as a comprehensive All-in-One Multimodal Document Processing RAG system built on [LightRAG](https://github.com/HKUDS/LightRAG).

	As a unified solution, RAG-Anything eliminates the need for multiple specialized tools. It provides seamless processing and querying across all content modalities within a single integrated framework. Unlike conventional RAG approaches that struggle with non-textual elements, our all-in-one system delivers comprehensive multimodal retrieval capabilities.

	Users can query documents containing interleaved text, visual diagrams, structured tables, and mathematical formulations through one cohesive interface. This consolidated approach makes RAG-Anything particularly valuable for academic research, technical documentation, financial reports, and enterprise knowledge management where rich, mixed-content documents demand a unified processing framework.

	<img src="assets/rag_anything_framework.png" alt="RAG-Anything" />

	</div>

	### 🎯 Key Features

	<div style="background: linear-gradient(135deg, #1a1a2e 0%, #16213e 100%); border-radius: 15px; padding: 25px; margin: 20px 0;">

	- 🔄 End-to-End Multimodal Pipeline - Complete workflow from document ingestion and parsing to intelligent multimodal query answering
	- 📄 Universal Document Support - Seamless processing of PDFs, Office documents, images, and diverse file formats
	- 🧠 Specialized Content Analysis - Dedicated processors for images, tables, mathematical equations, and heterogeneous content types
	- 🔗 Multimodal Knowledge Graph - Automatic entity extraction and cross-modal relationship discovery for enhanced understanding
	- ⚡ Adaptive Processing Modes - Flexible MinerU-based parsing or direct multimodal content injection workflows
	- 📋 Direct Content List Insertion - Bypass document parsing by directly inserting pre-parsed content lists from external sources
	- 🎯 Hybrid Intelligent Retrieval - Advanced search capabilities spanning textual and multimodal content with contextual understanding

	</div>

	---

	## 🏗️ Algorithm & Architecture

	<div style="background: linear-gradient(135deg, #0f0f23 0%, #1a1a2e 100%); border-radius: 15px; padding: 25px; margin: 20px 0; border-left: 5px solid #00d9ff;">

	### Core Algorithm

	RAG-Anything implements an effective multi-stage multimodal pipeline that fundamentally extends traditional RAG architectures to seamlessly handle diverse content modalities through intelligent orchestration and cross-modal understanding.

	</div>

	<div align="center">
	<div style="width: 100%; max-width: 600px; margin: 20px auto; padding: 20px; background: linear-gradient(135deg, rgba(0, 217, 255, 0.1) 0%, rgba(0, 217, 255, 0.05) 100%); border-radius: 15px; border: 1px solid rgba(0, 217, 255, 0.2);">
	<div style="display: flex; justify-content: space-around; align-items: center; flex-wrap: wrap; gap: 20px;">
	<div style="text-align: center;">
	<div style="font-size: 24px; margin-bottom: 10px;">📄</div>
	<div style="font-size: 14px; color: #00d9ff;">Document Parsing</div>
	</div>
	<div style="font-size: 20px; color: #00d9ff;">→</div>
	<div style="text-align: center;">
	<div style="font-size: 24px; margin-bottom: 10px;">🧠</div>
	<div style="font-size: 14px; color: #00d9ff;">Content Analysis</div>
	</div>
	<div style="font-size: 20px; color: #00d9ff;">→</div>
	<div style="text-align: center;">
	<div style="font-size: 24px; margin-bottom: 10px;">🔍</div>
	<div style="font-size: 14px; color: #00d9ff;">Knowledge Graph</div>
	</div>
	<div style="font-size: 20px; color: #00d9ff;">→</div>
	<div style="text-align: center;">
	<div style="font-size: 24px; margin-bottom: 10px;">🎯</div>
	<div style="font-size: 14px; color: #00d9ff;">Intelligent Retrieval</div>
	</div>
	</div>
	</div>
	</div>

	### 1. Document Parsing Stage

	<div style="background: linear-gradient(90deg, #1a1a2e 0%, #16213e 100%); border-radius: 10px; padding: 20px; margin: 15px 0; border-left: 4px solid #4ecdc4;">

	The system provides high-fidelity document extraction through adaptive content decomposition. It intelligently segments heterogeneous elements while preserving contextual relationships. Universal format compatibility is achieved via specialized optimized parsers.

	Key Components:

	- ⚙️ MinerU Integration: Leverages [MinerU](https://github.com/opendatalab/MinerU) for high-fidelity document structure extraction and semantic preservation across complex layouts.

	- 🧩 Adaptive Content Decomposition: Automatically segments documents into coherent text blocks, visual elements, structured tables, mathematical equations, and specialized content types while preserving contextual relationships.

	- 📁 Universal Format Support: Provides comprehensive handling of PDFs, Office documents (DOC/DOCX/PPT/PPTX/XLS/XLSX), images, and emerging formats through specialized parsers with format-specific optimization.

	</div>

	### 2. Multi-Modal Content Understanding & Processing

	<div style="background: linear-gradient(90deg, #16213e 0%, #0f3460 100%); border-radius: 10px; padding: 20px; margin: 15px 0; border-left: 4px solid #ff6b6b;">

	The system automatically categorizes and routes content through optimized channels. It uses concurrent pipelines for parallel text and multimodal processing. Document hierarchy and relationships are preserved during transformation.

	Key Components:

	- 🎯 Autonomous Content Categorization and Routing: Automatically identify, categorize, and route different content types through optimized execution channels.

	- ⚡ Concurrent Multi-Pipeline Architecture: Implements concurrent execution of textual and multimodal content through dedicated processing pipelines. This approach maximizes throughput efficiency while preserving content integrity.

	- 🏗️ Document Hierarchy Extraction: Extracts and preserves original document hierarchy and inter-element relationships during content transformation.

	</div>

	### 3. Multimodal Analysis Engine

	<div style="background: linear-gradient(90deg, #0f3460 0%, #1a1a2e 100%); border-radius: 10px; padding: 20px; margin: 15px 0; border-left: 4px solid #00d9ff;">

	The system deploys modality-aware processing units for heterogeneous data modalities:

	Specialized Analyzers:

	- 🔍 Visual Content Analyzer:
	- Integrate vision model for image analysis.
	- Generates context-aware descriptive captions based on visual semantics.
	- Extracts spatial relationships and hierarchical structures between visual elements.

	- 📊 Structured Data Interpreter:
	- Performs systematic interpretation of tabular and structured data formats.
	- Implements statistical pattern recognition algorithms for data trend analysis.
	- Identifies semantic relationships and dependencies across multiple tabular datasets.

	- 📐 Mathematical Expression Parser:
	- Parses complex mathematical expressions and formulas with high accuracy.
	- Provides native LaTeX format support for seamless integration with academic workflows.
	- Establishes conceptual mappings between mathematical equations and domain-specific knowledge bases.

	- 🔧 Extensible Modality Handler:
	- Provides configurable processing framework for custom and emerging content types.
	- Enables dynamic integration of new modality processors through plugin architecture.
	- Supports runtime configuration of processing pipelines for specialized use cases.

	</div>

	### 4. Multimodal Knowledge Graph Index

	<div style="background: linear-gradient(90deg, #1a1a2e 0%, #16213e 100%); border-radius: 10px; padding: 20px; margin: 15px 0; border-left: 4px solid #4ecdc4;">

	The multi-modal knowledge graph construction module transforms document content into structured semantic representations. It extracts multimodal entities, establishes cross-modal relationships, and preserves hierarchical organization. The system applies weighted relevance scoring for optimized knowledge retrieval.

	Core Functions:

	- 🔍 Multi-Modal Entity Extraction: Transforms significant multimodal elements into structured knowledge graph entities. The process includes semantic annotations and metadata preservation.

	- 🔗 Cross-Modal Relationship Mapping: Establishes semantic connections and dependencies between textual entities and multimodal components. This is achieved through automated relationship inference algorithms.

	- 🏗️ Hierarchical Structure Preservation: Maintains original document organization through "belongs_to" relationship chains. These chains preserve logical content hierarchy and sectional dependencies.

	- ⚖️ Weighted Relationship Scoring: Assigns quantitative relevance scores to relationship types. Scoring is based on semantic proximity and contextual significance within the document structure.

	</div>

	### 5. Modality-Aware Retrieval

	<div style="background: linear-gradient(90deg, #16213e 0%, #0f3460 100%); border-radius: 10px; padding: 20px; margin: 15px 0; border-left: 4px solid #ff6b6b;">

	The hybrid retrieval system combines vector similarity search with graph traversal algorithms for comprehensive content retrieval. It implements modality-aware ranking mechanisms and maintains relational coherence between retrieved elements to ensure contextually integrated information delivery.

	Retrieval Mechanisms:

	- 🔀 Vector-Graph Fusion: Integrates vector similarity search with graph traversal algorithms. This approach leverages both semantic embeddings and structural relationships for comprehensive content retrieval.

	- 📊 Modality-Aware Ranking: Implements adaptive scoring mechanisms that weight retrieval results based on content type relevance. The system adjusts rankings according to query-specific modality preferences.

	- 🔗 Relational Coherence Maintenance: Maintains semantic and structural relationships between retrieved elements. This ensures coherent information delivery and contextual integrity.

	</div>

	---

	## 🚀 Quick Start

	Initialize Your AI Journey

	<div align="center">
	<img src="https://user-images.githubusercontent.com/74038190/212284158-e840e285-664b-44d7-b79b-e264b5e54825.gif" width="400">
	</div>

	### Installation

	#### Option 1: Install from PyPI (Recommended)

	```bash
	# Basic installation
	pip install raganything

	# With optional dependencies for extended format support:
	pip install 'raganything[all]' # All optional features
	pip install 'raganything[image]' # Image format conversion (BMP, TIFF, GIF, WebP)
	pip install 'raganything[text]' # Text file processing (TXT, MD)
	pip install 'raganything[image,text]' # Multiple features
	```

	#### Option 2: Install from Source
	```bash
	# Install uv (if not already installed)
	curl -LsSf https://astral.sh/uv/install.sh \| sh

	# Clone and setup the project with uv
	git clone https://github.com/HKUDS/RAG-Anything.git
	cd RAG-Anything

	# Install the package and dependencies in a virtual environment
	uv sync

	# If you encounter network timeouts (especially for opencv packages):
	# UV_HTTP_TIMEOUT=120 uv sync

	# Run commands directly with uv (recommended approach)
	uv run python examples/raganything_example.py --help

	# Install with optional dependencies
	uv sync --extra image --extra text # Specific extras
	uv sync --all-extras # All optional features
	```

	#### Optional Dependencies

	- `[image]` - Enables processing of BMP, TIFF, GIF, WebP image formats (requires Pillow)
	- `[text]` - Enables processing of TXT and MD files (requires ReportLab)
	- `[all]` - Includes all Python optional dependencies

	> ⚠️ Office Document Processing Requirements:
	> - Office documents (.doc, .docx, .ppt, .pptx, .xls, .xlsx) require LibreOffice installation
	> - Download from [LibreOffice official website](https://www.libreoffice.org/download/download/)
	> - Windows: Download installer from official website
	> - macOS: `brew install --cask libreoffice`
	> - Ubuntu/Debian: `sudo apt-get install libreoffice`
	> - CentOS/RHEL: `sudo yum install libreoffice`

	Check MinerU installation:

	```bash
	# Verify installation
	mineru --version

	# Check if properly configured
	python -c "from raganything import RAGAnything; rag = RAGAnything(); print('✅ MinerU installed properly' if rag.check_parser_installation() else '❌ MinerU installation issue')"
	```

	Models are downloaded automatically on first use. For manual download, refer to [MinerU Model Source Configuration](https://github.com/opendatalab/MinerU/blob/master/README.md#22-model-source-configuration).

	### Usage Examples

	#### 1. End-to-End Document Processing

	```python
	import asyncio
	from raganything import RAGAnything, RAGAnythingConfig
	from lightrag.llm.openai import openai_complete_if_cache, openai_embed
	from lightrag.utils import EmbeddingFunc

	async def main():
	# Set up API configuration
	api_key = "your-api-key"
	base_url = "your-base-url" # Optional

	# Create RAGAnything configuration
	config = RAGAnythingConfig(
	working_dir="./rag_storage",
	parser="mineru", # Parser selection: mineru or docling
	parse_method="auto", # Parse method: auto, ocr, or txt
	enable_image_processing=True,
	enable_table_processing=True,
	enable_equation_processing=True,
	)

	# Define LLM model function
	def llm_model_func(prompt, system_prompt=None, history_messages=[], **kwargs):
	return openai_complete_if_cache(
	"gpt-4o-mini",
	prompt,
	system_prompt=system_prompt,
	history_messages=history_messages,
	api_key=api_key,
	base_url=base_url,
	**kwargs,
	)

	# Define vision model function for image processing
	def vision_model_func(
	prompt, system_prompt=None, history_messages=[], image_data=None, messages=None, **kwargs
	):
	# If messages format is provided (for multimodal VLM enhanced query), use it directly
	if messages:
	return openai_complete_if_cache(
	"gpt-4o",
	"",
	system_prompt=None,
	history_messages=[],
	messages=messages,
	api_key=api_key,
	base_url=base_url,
	**kwargs,
	)
	# Traditional single image format
	elif image_data:
	return openai_complete_if_cache(
	"gpt-4o",
	"",
	system_prompt=None,
	history_messages=[],
	messages=[
	{"role": "system", "content": system_prompt}
	if system_prompt
	else None,
	{
	"role": "user",
	"content": [
	{"type": "text", "text": prompt},
	{
	"type": "image_url",
	"image_url": {
	"url": f"data:image/jpeg;base64,{image_data}"
	},
	},
	],
	}
	if image_data
	else {"role": "user", "content": prompt},
	],
	api_key=api_key,
	base_url=base_url,
	**kwargs,
	)
	# Pure text format
	else:
	return llm_model_func(prompt, system_prompt, history_messages, **kwargs)

	# Define embedding function
	embedding_func = EmbeddingFunc(
	embedding_dim=3072,
	max_token_size=8192,
	func=lambda texts: openai_embed(
	texts,
	model="text-embedding-3-large",
	api_key=api_key,
	base_url=base_url,
	),
	)

	# Initialize RAGAnything
	rag = RAGAnything(
	config=config,
	llm_model_func=llm_model_func,
	vision_model_func=vision_model_func,
	embedding_func=embedding_func,
	)

	# Process a document
	await rag.process_document_complete(
	file_path="path/to/your/document.pdf",
	output_dir="./output",
	parse_method="auto"
	)

	# Query the processed content
	# Pure text query - for basic knowledge base search
	text_result = await rag.aquery(
	"What are the main findings shown in the figures and tables?",
	mode="hybrid"
	)
	print("Text query result:", text_result)

	# Multimodal query with specific multimodal content
	multimodal_result = await rag.aquery_with_multimodal(
	"Explain this formula and its relevance to the document content",
	multimodal_content=[{
	"type": "equation",
	"latex": "P(d\|q) = \\frac{P(q\|d) \\cdot P(d)}{P(q)}",
	"equation_caption": "Document relevance probability"
	}],
	mode="hybrid"
	)
	print("Multimodal query result:", multimodal_result)

	if __name__ == "__main__":
	asyncio.run(main())
	```

	#### 2. Direct Multimodal Content Processing

	```python
	import asyncio
	from lightrag import LightRAG
	from lightrag.llm.openai import openai_complete_if_cache, openai_embed
	from lightrag.utils import EmbeddingFunc
	from raganything.modalprocessors import ImageModalProcessor, TableModalProcessor

	async def process_multimodal_content():
	# Set up API configuration
	api_key = "your-api-key"
	base_url = "your-base-url" # Optional

	# Initialize LightRAG
	rag = LightRAG(
	working_dir="./rag_storage",
	llm_model_func=lambda prompt, system_prompt=None, history_messages=[], **kwargs: openai_complete_if_cache(
	"gpt-4o-mini",
	prompt,
	system_prompt=system_prompt,
	history_messages=history_messages,
	api_key=api_key,
	base_url=base_url,
	**kwargs,
	),
	embedding_func=EmbeddingFunc(
	embedding_dim=3072,
	max_token_size=8192,
	func=lambda texts: openai_embed(
	texts,
	model="text-embedding-3-large",
	api_key=api_key,
	base_url=base_url,
	),
	)
	)
	await rag.initialize_storages()

	# Process an image
	image_processor = ImageModalProcessor(
	lightrag=rag,
	modal_caption_func=lambda prompt, system_prompt=None, history_messages=[], image_data=None, **kwargs: openai_complete_if_cache(
	"gpt-4o",
	"",
	system_prompt=None,
	history_messages=[],
	messages=[
	{"role": "system", "content": system_prompt} if system_prompt else None,
	{"role": "user", "content": [
	{"type": "text", "text": prompt},
	{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}}
	]} if image_data else {"role": "user", "content": prompt}
	],
	api_key=api_key,
	base_url=base_url,
	**kwargs,
	) if image_data else openai_complete_if_cache(
	"gpt-4o-mini",
	prompt,
	system_prompt=system_prompt,
	history_messages=history_messages,
	api_key=api_key,
	base_url=base_url,
	**kwargs,
	)
	)

	image_content = {
	"img_path": "path/to/image.jpg",
	"image_caption": ["Figure 1: Experimental results"],
	"image_footnote": ["Data collected in 2024"]
	}

	description, entity_info = await image_processor.process_multimodal_content(
	modal_content=image_content,
	content_type="image",
	file_path="research_paper.pdf",
	entity_name="Experimental Results Figure"
	)

	# Process a table
	table_processor = TableModalProcessor(
	lightrag=rag,
	modal_caption_func=lambda prompt, system_prompt=None, history_messages=[], **kwargs: openai_complete_if_cache(
	"gpt-4o-mini",
	prompt,
	system_prompt=system_prompt,
	history_messages=history_messages,
	api_key=api_key,
	base_url=base_url,
	**kwargs,
	)
	)

	table_content = {
	"table_body": """
	\| Method \| Accuracy \| F1-Score \|
	\|--------\|----------\|----------\|
	\| RAGAnything \| 95.2% \| 0.94 \|
	\| Baseline \| 87.3% \| 0.85 \|
	""",
	"table_caption": ["Performance Comparison"],
	"table_footnote": ["Results on test dataset"]
	}

	description, entity_info = await table_processor.process_multimodal_content(
	modal_content=table_content,
	content_type="table",
	file_path="research_paper.pdf",
	entity_name="Performance Results Table"
	)

	if __name__ == "__main__":
	asyncio.run(process_multimodal_content())
	```

	#### 3. Batch Processing

	```python
	# Process multiple documents
	await rag.process_folder_complete(
	folder_path="./documents",
	output_dir="./output",
	file_extensions=[".pdf", ".docx", ".pptx"],
	recursive=True,
	max_workers=4
	)
	```

	#### 4. Custom Modal Processors

	```python
	from raganything.modalprocessors import GenericModalProcessor

	class CustomModalProcessor(GenericModalProcessor):
	async def process_multimodal_content(self, modal_content, content_type, file_path, entity_name):
	# Your custom processing logic
	enhanced_description = await self.analyze_custom_content(modal_content)
	entity_info = self.create_custom_entity(enhanced_description, entity_name)
	return await self._create_entity_and_chunk(enhanced_description, entity_info, file_path)
	```

	#### 5. Query Options

	RAG-Anything provides three types of query methods:

	Pure Text Queries - Direct knowledge base search using LightRAG:
	```python
	# Different query modes for text queries
	text_result_hybrid = await rag.aquery("Your question", mode="hybrid")
	text_result_local = await rag.aquery("Your question", mode="local")
	text_result_global = await rag.aquery("Your question", mode="global")
	text_result_naive = await rag.aquery("Your question", mode="naive")

	# Synchronous version
	sync_text_result = rag.query("Your question", mode="hybrid")
	```

	VLM Enhanced Queries - Automatically analyze images in retrieved context using VLM:
	```python
	# VLM enhanced query (automatically enabled when vision_model_func is provided)
	vlm_result = await rag.aquery(
	"Analyze the charts and figures in the document",
	mode="hybrid"
	# vlm_enhanced=True is automatically set when vision_model_func is available
	)

	# Manually control VLM enhancement
	vlm_enabled = await rag.aquery(
	"What do the images show in this document?",
	mode="hybrid",
	vlm_enhanced=True # Force enable VLM enhancement
	)

	vlm_disabled = await rag.aquery(
	"What do the images show in this document?",
	mode="hybrid",
	vlm_enhanced=False # Force disable VLM enhancement
	)

	# When documents contain images, VLM can see and analyze them directly
	# The system will automatically:
	# 1. Retrieve relevant context containing image paths
	# 2. Load and encode images as base64
	# 3. Send both text context and images to VLM for comprehensive analysis
	```

	Multimodal Queries - Enhanced queries with specific multimodal content analysis:
	```python
	# Query with table data
	table_result = await rag.aquery_with_multimodal(
	"Compare these performance metrics with the document content",
	multimodal_content=[{
	"type": "table",
	"table_data": """Method,Accuracy,Speed
	RAGAnything,95.2%,120ms
	Traditional,87.3%,180ms""",
	"table_caption": "Performance comparison"
	}],
	mode="hybrid"
	)

	# Query with equation content
	equation_result = await rag.aquery_with_multimodal(
	"Explain this formula and its relevance to the document content",
	multimodal_content=[{
	"type": "equation",
	"latex": "P(d\|q) = \\frac{P(q\|d) \\cdot P(d)}{P(q)}",
	"equation_caption": "Document relevance probability"
	}],
	mode="hybrid"
	)
	```

	#### 6. Loading Existing LightRAG Instance

	```python
	import asyncio
	from raganything import RAGAnything, RAGAnythingConfig
	from lightrag import LightRAG
	from lightrag.llm.openai import openai_complete_if_cache, openai_embed
	from lightrag.kg.shared_storage import initialize_pipeline_status
	from lightrag.utils import EmbeddingFunc
	import os

	async def load_existing_lightrag():
	# Set up API configuration
	api_key = "your-api-key"
	base_url = "your-base-url" # Optional

	# First, create or load existing LightRAG instance
	lightrag_working_dir = "./existing_lightrag_storage"

	# Check if previous LightRAG instance exists
	if os.path.exists(lightrag_working_dir) and os.listdir(lightrag_working_dir):
	print("✅ Found existing LightRAG instance, loading...")
	else:
	print("❌ No existing LightRAG instance found, will create new one")

	# Create/load LightRAG instance with your configuration
	lightrag_instance = LightRAG(
	working_dir=lightrag_working_dir,
	llm_model_func=lambda prompt, system_prompt=None, history_messages=[], **kwargs: openai_complete_if_cache(
	"gpt-4o-mini",
	prompt,
	system_prompt=system_prompt,
	history_messages=history_messages,
	api_key=api_key,
	base_url=base_url,
	**kwargs,
	),
	embedding_func=EmbeddingFunc(
	embedding_dim=3072,
	max_token_size=8192,
	func=lambda texts: openai_embed(
	texts,
	model="text-embedding-3-large",
	api_key=api_key,
	base_url=base_url,
	),
	)
	)

	# Initialize storage (this will load existing data if available)
	await lightrag_instance.initialize_storages()
	await initialize_pipeline_status()

	# Define vision model function for image processing
	def vision_model_func(
	prompt, system_prompt=None, history_messages=[], image_data=None, messages=None, **kwargs
	):
	# If messages format is provided (for multimodal VLM enhanced query), use it directly
	if messages:
	return openai_complete_if_cache(
	"gpt-4o",
	"",
	system_prompt=None,
	history_messages=[],
	messages=messages,
	api_key=api_key,
	base_url=base_url,
	**kwargs,
	)
	# Traditional single image format
	elif image_data:
	return openai_complete_if_cache(
	"gpt-4o",
	"",
	system_prompt=None,
	history_messages=[],
	messages=[
	{"role": "system", "content": system_prompt}
	if system_prompt
	else None,
	{
	"role": "user",
	"content": [
	{"type": "text", "text": prompt},
	{
	"type": "image_url",
	"image_url": {
	"url": f"data:image/jpeg;base64,{image_data}"
	},
	},
	],
	}
	if image_data
	else {"role": "user", "content": prompt},
	],
	api_key=api_key,
	base_url=base_url,
	**kwargs,
	)
	# Pure text format
	else:
	return lightrag_instance.llm_model_func(prompt, system_prompt, history_messages, **kwargs)

	# Now use existing LightRAG instance to initialize RAGAnything
	rag = RAGAnything(
	lightrag=lightrag_instance, # Pass existing LightRAG instance
	vision_model_func=vision_model_func,
	# Note: working_dir, llm_model_func, embedding_func, etc. are inherited from lightrag_instance
	)

	# Query existing knowledge base
	result = await rag.aquery(
	"What data has been processed in this LightRAG instance?",
	mode="hybrid"
	)
	print("Query result:", result)

	# Add new multimodal document to existing LightRAG instance
	await rag.process_document_complete(
	file_path="path/to/new/multimodal_document.pdf",
	output_dir="./output"
	)

	if __name__ == "__main__":
	asyncio.run(load_existing_lightrag())
	```

	#### 7. Direct Content List Insertion

	For scenarios where you already have a pre-parsed content list (e.g., from external parsers or previous processing), you can directly insert it into RAGAnything without document parsing:

	```python
	import asyncio
	from raganything import RAGAnything, RAGAnythingConfig
	from lightrag.llm.openai import openai_complete_if_cache, openai_embed
	from lightrag.utils import EmbeddingFunc

	async def insert_content_list_example():
	# Set up API configuration
	api_key = "your-api-key"
	base_url = "your-base-url" # Optional

	# Create RAGAnything configuration
	config = RAGAnythingConfig(
	working_dir="./rag_storage",
	enable_image_processing=True,
	enable_table_processing=True,
	enable_equation_processing=True,
	)

	# Define model functions
	def llm_model_func(prompt, system_prompt=None, history_messages=[], **kwargs):
	return openai_complete_if_cache(
	"gpt-4o-mini",
	prompt,
	system_prompt=system_prompt,
	history_messages=history_messages,
	api_key=api_key,
	base_url=base_url,
	**kwargs,
	)

	def vision_model_func(prompt, system_prompt=None, history_messages=[], image_data=None, messages=None, **kwargs):
	# If messages format is provided (for multimodal VLM enhanced query), use it directly
	if messages:
	return openai_complete_if_cache(
	"gpt-4o",
	"",
	system_prompt=None,
	history_messages=[],
	messages=messages,
	api_key=api_key,
	base_url=base_url,
	**kwargs,
	)
	# Traditional single image format
	elif image_data:
	return openai_complete_if_cache(
	"gpt-4o",
	"",
	system_prompt=None,
	history_messages=[],
	messages=[
	{"role": "system", "content": system_prompt} if system_prompt else None,
	{
	"role": "user",
	"content": [
	{"type": "text", "text": prompt},
	{"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_data}"}}
	],
	} if image_data else {"role": "user", "content": prompt},
	],
	api_key=api_key,
	base_url=base_url,
	**kwargs,
	)
	# Pure text format
	else:
	return llm_model_func(prompt, system_prompt, history_messages, **kwargs)

	embedding_func = EmbeddingFunc(
	embedding_dim=3072,
	max_token_size=8192,
	func=lambda texts: openai_embed(
	texts,
	model="text-embedding-3-large",
	api_key=api_key,
	base_url=base_url,
	),
	)

	# Initialize RAGAnything
	rag = RAGAnything(
	config=config,
	llm_model_func=llm_model_func,
	vision_model_func=vision_model_func,
	embedding_func=embedding_func,
	)

	# Example: Pre-parsed content list from external source
	content_list = [
	{
	"type": "text",
	"text": "This is the introduction section of our research paper.",
	"page_idx": 0 # Page number where this content appears
	},
	{
	"type": "image",
	"img_path": "/absolute/path/to/figure1.jpg", # IMPORTANT: Use absolute path
	"image_caption": ["Figure 1: System Architecture"],
	"image_footnote": ["Source: Authors' original design"],
	"page_idx": 1 # Page number where this image appears
	},
	{
	"type": "table",
	"table_body": "\| Method \| Accuracy \| F1-Score \|\n\|--------\|----------\|----------\|\n\| Ours \| 95.2% \| 0.94 \|\n\| Baseline \| 87.3% \| 0.85 \|",
	"table_caption": ["Table 1: Performance Comparison"],
	"table_footnote": ["Results on test dataset"],
	"page_idx": 2 # Page number where this table appears
	},
	{
	"type": "equation",
	"latex": "P(d\|q) = \\frac{P(q\|d) \\cdot P(d)}{P(q)}",
	"text": "Document relevance probability formula",
	"page_idx": 3 # Page number where this equation appears
	},
	{
	"type": "text",
	"text": "In conclusion, our method demonstrates superior performance across all metrics.",
	"page_idx": 4 # Page number where this content appears
	}
	]

	# Insert the content list directly
	await rag.insert_content_list(
	content_list=content_list,
	file_path="research_paper.pdf", # Reference file name for citation
	split_by_character=None, # Optional text splitting
	split_by_character_only=False, # Optional text splitting mode
	doc_id=None, # Optional custom document ID (will be auto-generated if not provided)
	display_stats=True # Show content statistics
	)

	# Query the inserted content
	result = await rag.aquery(
	"What are the key findings and performance metrics mentioned in the research?",
	mode="hybrid"
	)
	print("Query result:", result)

	# You can also insert multiple content lists with different document IDs
	another_content_list = [
	{
	"type": "text",
	"text": "This is content from another document.",
	"page_idx": 0 # Page number where this content appears
	},
	{
	"type": "table",
	"table_body": "\| Feature \| Value \|\n\|---------\|-------\|\n\| Speed \| Fast \|\n\| Accuracy \| High \|",
	"table_caption": ["Feature Comparison"],
	"page_idx": 1 # Page number where this table appears
	}
	]

	await rag.insert_content_list(
	content_list=another_content_list,
	file_path="another_document.pdf",
	doc_id="custom-doc-id-123" # Custom document ID
	)

	if __name__ == "__main__":
	asyncio.run(insert_content_list_example())
	```

	Content List Format:

	The `content_list` should follow the standard format with each item being a dictionary containing:

	- Text content: `{"type": "text", "text": "content text", "page_idx": 0}`
	- Image content: `{"type": "image", "img_path": "/absolute/path/to/image.jpg", "image_caption": ["caption"], "image_footnote": ["note"], "page_idx": 1}`
	- Table content: `{"type": "table", "table_body": "markdown table", "table_caption": ["caption"], "table_footnote": ["note"], "page_idx": 2}`
	- Equation content: `{"type": "equation", "latex": "LaTeX formula", "text": "description", "page_idx": 3}`
	- Generic content: `{"type": "custom_type", "content": "any content", "page_idx": 4}`

	Important Notes:
	- `img_path`: Must be an absolute path to the image file (e.g., `/home/user/images/chart.jpg` or `C:\Users\user\images\chart.jpg`)
	- `page_idx`: Represents the page number where the content appears in the original document (0-based indexing)
	- Content ordering: Items are processed in the order they appear in the list

	This method is particularly useful when:
	- You have content from external parsers (non-MinerU/Docling)
	- You want to process programmatically generated content
	- You need to insert content from multiple sources into a single knowledge base
	- You have cached parsing results that you want to reuse

	---

	## 🛠️ Examples

	Practical Implementation Demos

	<div align="center">
	<img src="https://user-images.githubusercontent.com/74038190/212257455-13e3e01e-d6a6-45dc-bb92-3ab87b12dfc1.gif" width="300">
	</div>

	The `examples/` directory contains comprehensive usage examples:

	- `raganything_example.py`: End-to-end document processing with MinerU
	- `modalprocessors_example.py`: Direct multimodal content processing
	- `office_document_test.py`: Office document parsing test with MinerU (no API key required)
	- `image_format_test.py`: Image format parsing test with MinerU (no API key required)
	- `text_format_test.py`: Text format parsing test with MinerU (no API key required)

	Run examples:

	```bash
	# End-to-end processing with parser selection
	python examples/raganything_example.py path/to/document.pdf --api-key YOUR_API_KEY --parser mineru

	# Direct modal processing
	python examples/modalprocessors_example.py --api-key YOUR_API_KEY

	# Office document parsing test (MinerU only)
	python examples/office_document_test.py --file path/to/document.docx

	# Image format parsing test (MinerU only)
	python examples/image_format_test.py --file path/to/image.bmp

	# Text format parsing test (MinerU only)
	python examples/text_format_test.py --file path/to/document.md

	# Check LibreOffice installation
	python examples/office_document_test.py --check-libreoffice --file dummy

	# Check PIL/Pillow installation
	python examples/image_format_test.py --check-pillow --file dummy

	# Check ReportLab installation
	python examples/text_format_test.py --check-reportlab --file dummy
	```

	---

	## 🔧 Configuration

	System Optimization Parameters

	### Environment Variables

	Create a `.env` file (refer to `.env.example`):

	```bash
	OPENAI_API_KEY=your_openai_api_key
	OPENAI_BASE_URL=your_base_url # Optional
	OUTPUT_DIR=./output # Default output directory for parsed documents
	PARSER=mineru # Parser selection: mineru or docling
	PARSE_METHOD=auto # Parse method: auto, ocr, or txt
	```

	Note: For backward compatibility, legacy environment variable names are still supported:
	- `MINERU_PARSE_METHOD` is deprecated, please use `PARSE_METHOD`

	> Note: API keys are only required for full RAG processing with LLM integration. The parsing test files (`office_document_test.py` and `image_format_test.py`) only test parser functionality and do not require API keys.

	### Parser Configuration

	RAGAnything now supports multiple parsers, each with specific advantages:

	#### MinerU Parser
	- Supports PDF, images, Office documents, and more formats
	- Powerful OCR and table extraction capabilities
	- GPU acceleration support

	#### Docling Parser
	- Optimized for Office documents and HTML files
	- Better document structure preservation
	- Native support for multiple Office formats

	### MinerU Configuration

	```bash
	# MinerU 2.0 uses command-line parameters instead of config files
	# Check available options:
	mineru --help

	# Common configurations:
	mineru -p input.pdf -o output_dir -m auto # Automatic parsing mode
	mineru -p input.pdf -o output_dir -m ocr # OCR-focused parsing
	mineru -p input.pdf -o output_dir -b pipeline --device cuda # GPU acceleration
	```

	You can also configure parsing through RAGAnything parameters:

	```python
	# Basic parsing configuration with parser selection
	await rag.process_document_complete(
	file_path="document.pdf",
	output_dir="./output/",
	parse_method="auto", # or "ocr", "txt"
	parser="mineru" # Optional: "mineru" or "docling"
	)

	# Advanced parsing configuration with special parameters
	await rag.process_document_complete(
	file_path="document.pdf",
	output_dir="./output/",
	parse_method="auto", # Parsing method: "auto", "ocr", "txt"
	parser="mineru", # Parser selection: "mineru" or "docling"

	# MinerU special parameters - all supported kwargs:
	lang="ch", # Document language for OCR optimization (e.g., "ch", "en", "ja")
	device="cuda:0", # Inference device: "cpu", "cuda", "cuda:0", "npu", "mps"
	start_page=0, # Starting page number (0-based, for PDF)
	end_page=10, # Ending page number (0-based, for PDF)
	formula=True, # Enable formula parsing
	table=True, # Enable table parsing
	backend="pipeline", # Parsing backend: pipeline\|vlm-transformers\|vlm-sglang-engine\|vlm-sglang-client.
	source="huggingface", # Model source: "huggingface", "modelscope", "local"
	# vlm_url="http://127.0.0.1:3000" # Service address when using backend=vlm-sglang-client

	# Standard RAGAnything parameters
	display_stats=True, # Display content statistics
	split_by_character=None, # Optional character to split text by
	doc_id=None # Optional document ID
	)
	```

	> Note: MinerU 2.0 no longer uses the `magic-pdf.json` configuration file. All settings are now passed as command-line parameters or function arguments. RAG-Anything now supports multiple document parsers - you can choose between MinerU and Docling based on your needs.

	### Processing Requirements

	Different content types require specific optional dependencies:

	- Office Documents (.doc, .docx, .ppt, .pptx, .xls, .xlsx): Install [LibreOffice](https://www.libreoffice.org/download/download/)
	- Extended Image Formats (.bmp, .tiff, .gif, .webp): Install with `pip install raganything[image]`
	- Text Files (.txt, .md): Install with `pip install raganything[text]`

	> 📋 Quick Install: Use `pip install raganything[all]` to enable all format support (Python dependencies only - LibreOffice still needs separate installation)

	---

	## 🧪 Supported Content Types

	### Document Formats

	- PDFs - Research papers, reports, presentations
	- Office Documents - DOC, DOCX, PPT, PPTX, XLS, XLSX
	- Images - JPG, PNG, BMP, TIFF, GIF, WebP
	- Text Files - TXT, MD

	### Multimodal Elements

	- Images - Photographs, diagrams, charts, screenshots
	- Tables - Data tables, comparison charts, statistical summaries
	- Equations - Mathematical formulas in LaTeX format
	- Generic Content - Custom content types via extensible processors

	For installation of format-specific dependencies, see the [Configuration](#-configuration) section.

	---

	## 📖 Citation

	Academic Reference

	<div align="center">
	<div style="width: 60px; height: 60px; margin: 20px auto; position: relative;">
	<div style="width: 100%; height: 100%; border: 2px solid #00d9ff; border-radius: 50%; position: relative;">
	<div style="position: absolute; top: 50%; left: 50%; transform: translate(-50%, -50%); font-size: 24px; color: #00d9ff;">📖</div>
	</div>
	<div style="position: absolute; bottom: -5px; left: 50%; transform: translateX(-50%); width: 20px; height: 20px; background: white; border-right: 2px solid #00d9ff; border-bottom: 2px solid #00d9ff; transform: rotate(45deg);"></div>
	</div>
	</div>

	If you find RAG-Anything useful in your research, please cite our paper:

	```bibtex
	@article{guo2024lightrag,
	title={LightRAG: Simple and Fast Retrieval-Augmented Generation},
	author={Zirui Guo and Lianghao Xia and Yanhua Yu and Tu Ao and Chao Huang},
	year={2024},
	eprint={2410.05779},
	archivePrefix={arXiv},
	primaryClass={cs.IR}
	}
	```

	---

	## 🔗 Related Projects

	Ecosystem & Extensions

	<div align="center">
	<table>
	<tr>
	<td align="center">
	<a href="https://github.com/HKUDS/LightRAG">
	<div style="width: 100px; height: 100px; background: linear-gradient(135deg, rgba(0, 217, 255, 0.1) 0%, rgba(0, 217, 255, 0.05) 100%); border-radius: 15px; border: 1px solid rgba(0, 217, 255, 0.2); display: flex; align-items: center; justify-content: center; margin-bottom: 10px;">
	<span style="font-size: 32px;">⚡</span>
	</div>
	<b>LightRAG</b><br>
	<sub>Simple and Fast RAG</sub>
	</a>
	</td>
	<td align="center">
	<a href="https://github.com/HKUDS/VideoRAG">
	<div style="width: 100px; height: 100px; background: linear-gradient(135deg, rgba(0, 217, 255, 0.1) 0%, rgba(0, 217, 255, 0.05) 100%); border-radius: 15px; border: 1px solid rgba(0, 217, 255, 0.2); display: flex; align-items: center; justify-content: center; margin-bottom: 10px;">
	<span style="font-size: 32px;">🎥</span>
	</div>
	<b>VideoRAG</b><br>
	<sub>Extreme Long-Context Video RAG</sub>
	</a>
	</td>
	<td align="center">
	<a href="https://github.com/HKUDS/MiniRAG">
	<div style="width: 100px; height: 100px; background: linear-gradient(135deg, rgba(0, 217, 255, 0.1) 0%, rgba(0, 217, 255, 0.05) 100%); border-radius: 15px; border: 1px solid rgba(0, 217, 255, 0.2); display: flex; align-items: center; justify-content: center; margin-bottom: 10px;">
	<span style="font-size: 32px;">✨</span>
	</div>
	<b>MiniRAG</b><br>
	<sub>Extremely Simple RAG</sub>
	</a>
	</td>
	</tr>
	</table>
	</div>

	---

	## ⭐ Star History

	Community Growth Trajectory

	<div align="center">
	<a href="https://star-history.com/#HKUDS/RAG-Anything&Date">
	<picture>
	<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=HKUDS/RAG-Anything&type=Date&theme=dark" />
	<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=HKUDS/RAG-Anything&type=Date" />
	<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=HKUDS/RAG-Anything&type=Date" style="border-radius: 15px; box-shadow: 0 0 30px rgba(0, 217, 255, 0.3);" />
	</picture>
	</a>
	</div>

	---

	## 🤝 Contribution

	Join the Innovation

	<div align="center">
	We thank all our contributors for their valuable contributions.
	</div>

	<div align="center">
	<a href="https://github.com/HKUDS/RAG-Anything/graphs/contributors">
	<img src="https://contrib.rocks/image?repo=HKUDS/RAG-Anything" style="border-radius: 15px; box-shadow: 0 0 20px rgba(0, 217, 255, 0.3);" />
	</a>
	</div>

	---

	<div align="center" style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 15px; padding: 30px; margin: 30px 0;">
	<div>
	<img src="https://user-images.githubusercontent.com/74038190/212284100-561aa473-3905-4a80-b561-0d28506553ee.gif" width="500">
	</div>
	<div style="margin-top: 20px;">
	<a href="https://github.com/HKUDS/RAG-Anything" style="text-decoration: none;">
	<img src="https://img.shields.io/badge/⭐%20Star%20us%20on%20GitHub-1a1a2e?style=for-the-badge&logo=github&logoColor=white">
	</a>
	<a href="https://github.com/HKUDS/RAG-Anything/issues" style="text-decoration: none;">
	<img src="https://img.shields.io/badge/🐛%20Report%20Issues-ff6b6b?style=for-the-badge&logo=github&logoColor=white">
	</a>
	<a href="https://github.com/HKUDS/RAG-Anything/discussions" style="text-decoration: none;">
	<img src="https://img.shields.io/badge/💬%20Discussions-4ecdc4?style=for-the-badge&logo=github&logoColor=white">
	</a>
	</div>
	</div>

	<div align="center">
	<div style="width: 100%; max-width: 600px; margin: 20px auto; padding: 20px; background: linear-gradient(135deg, rgba(0, 217, 255, 0.1) 0%, rgba(0, 217, 255, 0.05) 100%); border-radius: 15px; border: 1px solid rgba(0, 217, 255, 0.2);">
	<div style="display: flex; justify-content: center; align-items: center; gap: 15px;">
	<span style="font-size: 24px;">⭐</span>
	<span style="color: #00d9ff; font-size: 18px;">Thank you for visiting RAG-Anything!</span>
	<span style="font-size: 24px;">⭐</span>
	</div>
	<div style="margin-top: 10px; color: #00d9ff; font-size: 16px;">Building the Future of Multimodal AI</div>
	</div>
	</div>