MatTableGPT / README.md
SEUyishu's picture
Upload 6 files
1742f51 verified
---
title: MaTableGPT MCP
emoji: 🔬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
app_port: 7860
---
# MaTableGPT MCP Service
[![HuggingFace Spaces](https://img.shields.io/badge/🤗-HuggingFace%20Spaces-blue)](https://huggingface.co/spaces)
[![MCP](https://img.shields.io/badge/MCP-Compatible-green)](https://modelcontextprotocol.io/)
**GPT-based Table Data Extractor from Materials Science Literature**
A Model Context Protocol (MCP) service that extracts structured catalyst performance data from HTML tables in materials science publications.
## 🌟 Features
### Table Representation
- **HTML to TSV**: Convert HTML tables to tab-separated format with preserved structure
- **HTML to JSON**: Convert HTML tables to nested JSON format
- **Table Splitting**: Break down complex tables with multiple headers into simpler components
### GPT-based Extraction
- **Zero-shot**: Multi-step questioning approach without examples
- **Few-shot**: Guided extraction with input/output examples
- **Fine-tuned**: Use pre-trained specialized models
### Session Management
- Track multiple table processing workflows
- Store representations and extractions
- Export session data for analysis
## 🚀 Quick Start (HuggingFace Space SSE Mode)
This service runs as a **pure MCP SSE server** on HuggingFace Space, accessible via SSE endpoint.
**SSE Endpoint**: `https://your-space-name.hf.space/sse`
### Connect from Cursor/Claude Desktop
```json
{
"mcpServers": {
"matablgpt": {
"url": "https://your-space-name.hf.space/sse"
}
}
}
```
## 📦 Installation
### Prerequisites
- Python 3.8+
- OpenAI-compatible API key (for GPT extraction)
### Local Installation
```bash
# Clone or copy the mcp_output folder
cd mcp_output
# Create virtual environment
python -m venv venv
# Activate (Windows)
venv\Scripts\activate
# Activate (Unix/Mac)
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Set API configuration (use your third-party API service info)
# Windows PowerShell
$env:LLM_API_KEY = "your_api_key"
$env:LLM_API_BASE = "https://api.your-service.com/v1"
$env:LLM_MODEL = "gpt-4-turbo-preview"
# Windows CMD
set LLM_API_KEY=your_api_key
set LLM_API_BASE=https://api.your-service.com/v1
set LLM_MODEL=gpt-4-turbo-preview
# Unix/Mac
export LLM_API_KEY=your_api_key
export LLM_API_BASE=https://api.your-service.com/v1
export LLM_MODEL=gpt-4-turbo-preview
```
## 🔑 Environment Variables
This service supports third-party API services (reverse proxy, OneAPI, API aggregators, etc.)
| Variable | Required | Description |
|----------|----------|-------------|
| `LLM_API_KEY` | ✅ Yes | Your API key from the service provider |
| `LLM_API_BASE` | ✅ Yes | API base URL, e.g., `https://api.your-service.com/v1` |
| `LLM_MODEL` | ❌ No | Model name (default: gpt-4-turbo-preview) |
**Alternative variable names (also supported):**
| Variable | Description |
|----------|-------------|
| `OPENAI_API_KEY` | Alternative to LLM_API_KEY |
| `OPENAI_API_BASE` | Alternative to LLM_API_BASE |
| `OPENAI_MODEL` | Alternative to LLM_MODEL |
## 🚀 Usage
### Start MCP Server (SSE mode - Default for HuggingFace Space)
```bash
# Default: SSE mode on port 7860
python start_mcp.py
# Custom port
python start_mcp.py --mode sse --port 8080
```
### Start MCP Server (stdio mode - For local Cursor integration)
```bash
python start_mcp.py --mode stdio
```
## 🔧 MCP Tools Reference
### Session Management
| Tool | Description |
|------|-------------|
| `create_session` | Create a new extraction session |
| `get_session_data` | Retrieve all data from a session |
### Table Processing
| Tool | Description |
|------|-------------|
| `html_to_tsv_representation` | Convert HTML table to TSV format |
| `html_to_json_representation` | Convert HTML table to JSON format |
| `analyze_table_structure` | Analyze table structure (headers, merged cells) |
| `split_complex_table` | Split tables with multiple internal headers |
### Data Extraction
| Tool | Description |
|------|-------------|
| `extract_catalyst_data_zero_shot` | Extract using zero-shot GPT |
| `extract_catalyst_data_few_shot` | Extract with example pairs |
| `extract_catalyst_data_fine_tuned` | Extract using fine-tuned model |
| `batch_extract_tables` | Extract from multiple tables in batch |
### Follow-up & Refinement
| Tool | Description |
|------|-------------|
| `apply_follow_up_questions` | Refine extraction with iterative Q&A (from original MaTableGPT) |
### Evaluation
| Tool | Description |
|------|-------------|
| `evaluate_extraction` | Compute Structure F1 Score and Value Accuracy |
| `validate_extraction_result` | Validate extraction against schema |
### Utilities
| Tool | Description |
|------|-------------|
| `list_performance_types` | List supported catalyst performance types |
| `get_extraction_code_template` | Get Python code for local extraction |
| `get_environment_requirements` | Get setup requirements |
## 📋 Supported Performance Types
The following catalyst performance types can be extracted:
- `overpotential`, `tafel_slope`, `Rct`, `stability`, `Cdl`
- `onset_potential`, `current_density`, `potential`, `TOF`, `ECSA`
- `water_splitting_potential`, `mass_activity`, `exchange_current_density`
- `Rs`, `specific_activity`, `onset_overpotential`, `BET`, `surface_area`
- `loading`, `apparent_activation_energy`
## 🔄 Workflow Example
### 1. Create a session
```python
result = create_session()
session_id = result["session_id"]
```
### 2. Convert HTML table to representation
```python
html = "<table>...</table>"
tsv = html_to_tsv_representation(
html_table=html,
title="Table 1: Catalyst Performance",
caption="OER performance in 1M KOH",
session_id=session_id,
table_name="table1"
)
```
### 3. Extract catalyst data
```python
extraction = extract_catalyst_data_zero_shot(
table_representation=tsv["representation"],
session_id=session_id,
table_name="table1"
)
```
### 4. Validate and export
```python
validation = validate_extraction_result(extraction["extraction"])
session_data = get_session_data(session_id)
```
## 🐳 Docker Deployment
### Build image
```bash
docker build -t matablgpt-mcp .
```
### Run container (SSE mode)
```bash
docker run -p 7860:7860 \
-e LLM_API_KEY=your_key \
-e LLM_API_BASE=https://api.your-service.com/v1 \
matablgpt-mcp
```
## 🤗 HuggingFace Spaces Deployment
1. Create a new Space with **Docker SDK**
2. Upload all files from `mcp_output/`
3. Add secrets in Space settings:
- `LLM_API_KEY`: Your API key
- `LLM_API_BASE`: Your API base URL (e.g., `https://api.your-service.com/v1`)
- `LLM_MODEL`: (Optional) Model name
4. Space will auto-build and deploy the MCP SSE service
5. Connect via: `https://your-space-name.hf.space/sse`
## 📝 MCP Client Configuration
### For Cursor (SSE mode - HuggingFace Space)
Add to `~/.cursor/mcp.json`:
```json
{
"mcpServers": {
"matablgpt": {
"url": "https://your-space-name.hf.space/sse"
}
}
}
```
### For Cursor (stdio mode - Local)
```json
{
"mcpServers": {
"matablgpt": {
"command": "python",
"args": ["F:/Material_Agent/MaTableGPT/mcp_output/start_mcp.py", "--mode", "stdio"],
"env": {
"LLM_API_KEY": "your_key",
"LLM_API_BASE": "https://api.your-service.com/v1"
}
}
}
}
```
### For Claude Desktop
```json
{
"mcpServers": {
"matablgpt": {
"url": "https://your-space-name.hf.space/sse"
}
}
}
```
## 📄 Output Format
Extracted data follows this JSON schema:
```json
{
"catalyst_name": {
"overpotential": {
"electrolyte": "1.0 M KOH",
"reaction_type": "OER",
"value": "230 mV",
"current_density": "10 mA/cm²"
},
"tafel_slope": {
"electrolyte": "1.0 M KOH",
"reaction_type": "OER",
"value": "45 mV/dec"
}
}
}
```
## 🙏 Acknowledgments
Based on [MaTableGPT](https://github.com/KIST-CSRC/MaTableGPT) - GPT-based Table Data Extractor from Materials Science Literature.
## 📜 License
MIT License