Spaces:

SEUyishu
/

MatTableGPT

Sleeping

App Files Files Community

MatTableGPT / README.md

SEUyishu

Upload 6 files

1742f51 verified 5 months ago

preview code

raw

history blame contribute delete

8.44 kB

	---
	title: MaTableGPT MCP
	emoji: 🔬
	colorFrom: blue
	colorTo: green
	sdk: docker
	pinned: false
	license: mit
	app_port: 7860
	---

	# MaTableGPT MCP Service

	[![HuggingFace Spaces](https://img.shields.io/badge/🤗-HuggingFace%20Spaces-blue)](https://huggingface.co/spaces)
	[![MCP](https://img.shields.io/badge/MCP-Compatible-green)](https://modelcontextprotocol.io/)

	GPT-based Table Data Extractor from Materials Science Literature

	A Model Context Protocol (MCP) service that extracts structured catalyst performance data from HTML tables in materials science publications.

	## 🌟 Features

	### Table Representation
	- HTML to TSV: Convert HTML tables to tab-separated format with preserved structure
	- HTML to JSON: Convert HTML tables to nested JSON format
	- Table Splitting: Break down complex tables with multiple headers into simpler components

	### GPT-based Extraction
	- Zero-shot: Multi-step questioning approach without examples
	- Few-shot: Guided extraction with input/output examples
	- Fine-tuned: Use pre-trained specialized models

	### Session Management
	- Track multiple table processing workflows
	- Store representations and extractions
	- Export session data for analysis

	## 🚀 Quick Start (HuggingFace Space SSE Mode)

	This service runs as a pure MCP SSE server on HuggingFace Space, accessible via SSE endpoint.

	SSE Endpoint: `https://your-space-name.hf.space/sse`

	### Connect from Cursor/Claude Desktop

	```json
	{
	"mcpServers": {
	"matablgpt": {
	"url": "https://your-space-name.hf.space/sse"
	}
	}
	}
	```

	## 📦 Installation

	### Prerequisites
	- Python 3.8+
	- OpenAI-compatible API key (for GPT extraction)

	### Local Installation

	```bash
	# Clone or copy the mcp_output folder
	cd mcp_output

	# Create virtual environment
	python -m venv venv

	# Activate (Windows)
	venv\Scripts\activate
	# Activate (Unix/Mac)
	source venv/bin/activate

	# Install dependencies
	pip install -r requirements.txt

	# Set API configuration (use your third-party API service info)
	# Windows PowerShell
	$env:LLM_API_KEY = "your_api_key"
	$env:LLM_API_BASE = "https://api.your-service.com/v1"
	$env:LLM_MODEL = "gpt-4-turbo-preview"

	# Windows CMD
	set LLM_API_KEY=your_api_key
	set LLM_API_BASE=https://api.your-service.com/v1
	set LLM_MODEL=gpt-4-turbo-preview

	# Unix/Mac
	export LLM_API_KEY=your_api_key
	export LLM_API_BASE=https://api.your-service.com/v1
	export LLM_MODEL=gpt-4-turbo-preview
	```

	## 🔑 Environment Variables

	This service supports third-party API services (reverse proxy, OneAPI, API aggregators, etc.)

	\| Variable \| Required \| Description \|
	\|----------\|----------\|-------------\|
	\| `LLM_API_KEY` \| ✅ Yes \| Your API key from the service provider \|
	\| `LLM_API_BASE` \| ✅ Yes \| API base URL, e.g., `https://api.your-service.com/v1` \|
	\| `LLM_MODEL` \| ❌ No \| Model name (default: gpt-4-turbo-preview) \|

	Alternative variable names (also supported):
	\| Variable \| Description \|
	\|----------\|-------------\|
	\| `OPENAI_API_KEY` \| Alternative to LLM_API_KEY \|
	\| `OPENAI_API_BASE` \| Alternative to LLM_API_BASE \|
	\| `OPENAI_MODEL` \| Alternative to LLM_MODEL \|

	## 🚀 Usage

	### Start MCP Server (SSE mode - Default for HuggingFace Space)

	```bash
	# Default: SSE mode on port 7860
	python start_mcp.py

	# Custom port
	python start_mcp.py --mode sse --port 8080
	```

	### Start MCP Server (stdio mode - For local Cursor integration)

	```bash
	python start_mcp.py --mode stdio
	```

	## 🔧 MCP Tools Reference

	### Session Management

	\| Tool \| Description \|
	\|------\|-------------\|
	\| `create_session` \| Create a new extraction session \|
	\| `get_session_data` \| Retrieve all data from a session \|

	### Table Processing

	\| Tool \| Description \|
	\|------\|-------------\|
	\| `html_to_tsv_representation` \| Convert HTML table to TSV format \|
	\| `html_to_json_representation` \| Convert HTML table to JSON format \|
	\| `analyze_table_structure` \| Analyze table structure (headers, merged cells) \|
	\| `split_complex_table` \| Split tables with multiple internal headers \|

	### Data Extraction

	\| Tool \| Description \|
	\|------\|-------------\|
	\| `extract_catalyst_data_zero_shot` \| Extract using zero-shot GPT \|
	\| `extract_catalyst_data_few_shot` \| Extract with example pairs \|
	\| `extract_catalyst_data_fine_tuned` \| Extract using fine-tuned model \|
	\| `batch_extract_tables` \| Extract from multiple tables in batch \|

	### Follow-up & Refinement

	\| Tool \| Description \|
	\|------\|-------------\|
	\| `apply_follow_up_questions` \| Refine extraction with iterative Q&A (from original MaTableGPT) \|

	### Evaluation

	\| Tool \| Description \|
	\|------\|-------------\|
	\| `evaluate_extraction` \| Compute Structure F1 Score and Value Accuracy \|
	\| `validate_extraction_result` \| Validate extraction against schema \|

	### Utilities

	\| Tool \| Description \|
	\|------\|-------------\|
	\| `list_performance_types` \| List supported catalyst performance types \|
	\| `get_extraction_code_template` \| Get Python code for local extraction \|
	\| `get_environment_requirements` \| Get setup requirements \|

	## 📋 Supported Performance Types

	The following catalyst performance types can be extracted:

	- `overpotential`, `tafel_slope`, `Rct`, `stability`, `Cdl`
	- `onset_potential`, `current_density`, `potential`, `TOF`, `ECSA`
	- `water_splitting_potential`, `mass_activity`, `exchange_current_density`
	- `Rs`, `specific_activity`, `onset_overpotential`, `BET`, `surface_area`
	- `loading`, `apparent_activation_energy`

	## 🔄 Workflow Example

	### 1. Create a session

	```python
	result = create_session()
	session_id = result["session_id"]
	```

	### 2. Convert HTML table to representation

	```python
	html = "<table>...</table>"
	tsv = html_to_tsv_representation(
	html_table=html,
	title="Table 1: Catalyst Performance",
	caption="OER performance in 1M KOH",
	session_id=session_id,
	table_name="table1"
	)
	```

	### 3. Extract catalyst data

	```python
	extraction = extract_catalyst_data_zero_shot(
	table_representation=tsv["representation"],
	session_id=session_id,
	table_name="table1"
	)
	```

	### 4. Validate and export

	```python
	validation = validate_extraction_result(extraction["extraction"])
	session_data = get_session_data(session_id)
	```

	## 🐳 Docker Deployment

	### Build image

	```bash
	docker build -t matablgpt-mcp .
	```

	### Run container (SSE mode)

	```bash
	docker run -p 7860:7860 \
	-e LLM_API_KEY=your_key \
	-e LLM_API_BASE=https://api.your-service.com/v1 \
	matablgpt-mcp
	```

	## 🤗 HuggingFace Spaces Deployment

	1. Create a new Space with Docker SDK
	2. Upload all files from `mcp_output/`
	3. Add secrets in Space settings:
	- `LLM_API_KEY`: Your API key
	- `LLM_API_BASE`: Your API base URL (e.g., `https://api.your-service.com/v1`)
	- `LLM_MODEL`: (Optional) Model name
	4. Space will auto-build and deploy the MCP SSE service
	5. Connect via: `https://your-space-name.hf.space/sse`

	## 📝 MCP Client Configuration

	### For Cursor (SSE mode - HuggingFace Space)

	Add to `~/.cursor/mcp.json`:

	```json
	{
	"mcpServers": {
	"matablgpt": {
	"url": "https://your-space-name.hf.space/sse"
	}
	}
	}
	```

	### For Cursor (stdio mode - Local)

	```json
	{
	"mcpServers": {
	"matablgpt": {
	"command": "python",
	"args": ["F:/Material_Agent/MaTableGPT/mcp_output/start_mcp.py", "--mode", "stdio"],
	"env": {
	"LLM_API_KEY": "your_key",
	"LLM_API_BASE": "https://api.your-service.com/v1"
	}
	}
	}
	}
	```

	### For Claude Desktop

	```json
	{
	"mcpServers": {
	"matablgpt": {
	"url": "https://your-space-name.hf.space/sse"
	}
	}
	}
	```

	## 📄 Output Format

	Extracted data follows this JSON schema:

	```json
	{
	"catalyst_name": {
	"overpotential": {
	"electrolyte": "1.0 M KOH",
	"reaction_type": "OER",
	"value": "230 mV",
	"current_density": "10 mA/cm²"
	},
	"tafel_slope": {
	"electrolyte": "1.0 M KOH",
	"reaction_type": "OER",
	"value": "45 mV/dec"
	}
	}
	}
	```

	## 🙏 Acknowledgments

	Based on [MaTableGPT](https://github.com/KIST-CSRC/MaTableGPT) - GPT-based Table Data Extractor from Materials Science Literature.

	## 📜 License

	MIT License