---
title: MaTableGPT MCP
emoji: 🔬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
app_port: 7860
---

# MaTableGPT MCP Service

[![HuggingFace Spaces](https://img.shields.io/badge/🤗-HuggingFace%20Spaces-blue)](https://huggingface.co/spaces)
[![MCP](https://img.shields.io/badge/MCP-Compatible-green)](https://modelcontextprotocol.io/)

**GPT-based Table Data Extractor from Materials Science Literature**

A Model Context Protocol (MCP) service that extracts structured catalyst performance data from HTML tables in materials science publications.

## 🌟 Features

### Table Representation
- **HTML to TSV**: Convert HTML tables to tab-separated format with preserved structure
- **HTML to JSON**: Convert HTML tables to nested JSON format
- **Table Splitting**: Break down complex tables with multiple headers into simpler components

### GPT-based Extraction
- **Zero-shot**: Multi-step questioning approach without examples
- **Few-shot**: Guided extraction with input/output examples
- **Fine-tuned**: Use pre-trained specialized models

### Session Management
- Track multiple table processing workflows
- Store representations and extractions
- Export session data for analysis

## 🚀 Quick Start (HuggingFace Space SSE Mode)

This service runs as a **pure MCP SSE server** on HuggingFace Space, accessible via SSE endpoint.

**SSE Endpoint**: `https://your-space-name.hf.space/sse`

### Connect from Cursor/Claude Desktop

```json
{
  "mcpServers": {
    "matablgpt": {
      "url": "https://your-space-name.hf.space/sse"
    }
  }
}
```

## 📦 Installation

### Prerequisites
- Python 3.8+
- OpenAI-compatible API key (for GPT extraction)

### Local Installation

```bash
# Clone or copy the mcp_output folder
cd mcp_output

# Create virtual environment
python -m venv venv

# Activate (Windows)
venv\Scripts\activate
# Activate (Unix/Mac)
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Set API configuration (use your third-party API service info)
# Windows PowerShell
$env:LLM_API_KEY = "your_api_key"
$env:LLM_API_BASE = "https://api.your-service.com/v1"
$env:LLM_MODEL = "gpt-4-turbo-preview"

# Windows CMD
set LLM_API_KEY=your_api_key
set LLM_API_BASE=https://api.your-service.com/v1
set LLM_MODEL=gpt-4-turbo-preview

# Unix/Mac
export LLM_API_KEY=your_api_key
export LLM_API_BASE=https://api.your-service.com/v1
export LLM_MODEL=gpt-4-turbo-preview
```

## 🔑 Environment Variables

This service supports third-party API services (reverse proxy, OneAPI, API aggregators, etc.)

| Variable | Required | Description |
|----------|----------|-------------|
| `LLM_API_KEY` | ✅ Yes | Your API key from the service provider |
| `LLM_API_BASE` | ✅ Yes | API base URL, e.g., `https://api.your-service.com/v1` |
| `LLM_MODEL` | ❌ No | Model name (default: gpt-4-turbo-preview) |

**Alternative variable names (also supported):**
| Variable | Description |
|----------|-------------|
| `OPENAI_API_KEY` | Alternative to LLM_API_KEY |
| `OPENAI_API_BASE` | Alternative to LLM_API_BASE |
| `OPENAI_MODEL` | Alternative to LLM_MODEL |

## 🚀 Usage

### Start MCP Server (SSE mode - Default for HuggingFace Space)

```bash
# Default: SSE mode on port 7860
python start_mcp.py

# Custom port
python start_mcp.py --mode sse --port 8080
```

### Start MCP Server (stdio mode - For local Cursor integration)

```bash
python start_mcp.py --mode stdio
```

## 🔧 MCP Tools Reference

### Session Management

| Tool | Description |
|------|-------------|
| `create_session` | Create a new extraction session |
| `get_session_data` | Retrieve all data from a session |

### Table Processing

| Tool | Description |
|------|-------------|
| `html_to_tsv_representation` | Convert HTML table to TSV format |
| `html_to_json_representation` | Convert HTML table to JSON format |
| `analyze_table_structure` | Analyze table structure (headers, merged cells) |
| `split_complex_table` | Split tables with multiple internal headers |

### Data Extraction

| Tool | Description |
|------|-------------|
| `extract_catalyst_data_zero_shot` | Extract using zero-shot GPT |
| `extract_catalyst_data_few_shot` | Extract with example pairs |
| `extract_catalyst_data_fine_tuned` | Extract using fine-tuned model |
| `batch_extract_tables` | Extract from multiple tables in batch |

### Follow-up & Refinement

| Tool | Description |
|------|-------------|
| `apply_follow_up_questions` | Refine extraction with iterative Q&A (from original MaTableGPT) |

### Evaluation

| Tool | Description |
|------|-------------|
| `evaluate_extraction` | Compute Structure F1 Score and Value Accuracy |
| `validate_extraction_result` | Validate extraction against schema |

### Utilities

| Tool | Description |
|------|-------------|
| `list_performance_types` | List supported catalyst performance types |
| `get_extraction_code_template` | Get Python code for local extraction |
| `get_environment_requirements` | Get setup requirements |

## 📋 Supported Performance Types

The following catalyst performance types can be extracted:

- `overpotential`, `tafel_slope`, `Rct`, `stability`, `Cdl`
- `onset_potential`, `current_density`, `potential`, `TOF`, `ECSA`
- `water_splitting_potential`, `mass_activity`, `exchange_current_density`
- `Rs`, `specific_activity`, `onset_overpotential`, `BET`, `surface_area`
- `loading`, `apparent_activation_energy`

## 🔄 Workflow Example

### 1. Create a session

```python
result = create_session()
session_id = result["session_id"]
```

### 2. Convert HTML table to representation

```python
html = "<table>...</table>"
tsv = html_to_tsv_representation(
    html_table=html,
    title="Table 1: Catalyst Performance",
    caption="OER performance in 1M KOH",
    session_id=session_id,
    table_name="table1"
)
```

### 3. Extract catalyst data

```python
extraction = extract_catalyst_data_zero_shot(
    table_representation=tsv["representation"],
    session_id=session_id,
    table_name="table1"
)
```

### 4. Validate and export

```python
validation = validate_extraction_result(extraction["extraction"])
session_data = get_session_data(session_id)
```

## 🐳 Docker Deployment

### Build image

```bash
docker build -t matablgpt-mcp .
```

### Run container (SSE mode)

```bash
docker run -p 7860:7860 \
    -e LLM_API_KEY=your_key \
    -e LLM_API_BASE=https://api.your-service.com/v1 \
    matablgpt-mcp
```

## 🤗 HuggingFace Spaces Deployment

1. Create a new Space with **Docker SDK**
2. Upload all files from `mcp_output/`
3. Add secrets in Space settings:
   - `LLM_API_KEY`: Your API key
   - `LLM_API_BASE`: Your API base URL (e.g., `https://api.your-service.com/v1`)
   - `LLM_MODEL`: (Optional) Model name
4. Space will auto-build and deploy the MCP SSE service
5. Connect via: `https://your-space-name.hf.space/sse`

## 📝 MCP Client Configuration

### For Cursor (SSE mode - HuggingFace Space)

Add to `~/.cursor/mcp.json`:

```json
{
  "mcpServers": {
    "matablgpt": {
      "url": "https://your-space-name.hf.space/sse"
    }
  }
}
```

### For Cursor (stdio mode - Local)

```json
{
  "mcpServers": {
    "matablgpt": {
      "command": "python",
      "args": ["F:/Material_Agent/MaTableGPT/mcp_output/start_mcp.py", "--mode", "stdio"],
      "env": {
        "LLM_API_KEY": "your_key",
        "LLM_API_BASE": "https://api.your-service.com/v1"
      }
    }
  }
}
```

### For Claude Desktop

```json
{
  "mcpServers": {
    "matablgpt": {
      "url": "https://your-space-name.hf.space/sse"
    }
  }
}
```

## 📄 Output Format

Extracted data follows this JSON schema:

```json
{
  "catalyst_name": {
    "overpotential": {
      "electrolyte": "1.0 M KOH",
      "reaction_type": "OER",
      "value": "230 mV",
      "current_density": "10 mA/cm²"
    },
    "tafel_slope": {
      "electrolyte": "1.0 M KOH",
      "reaction_type": "OER",
      "value": "45 mV/dec"
    }
  }
}
```

## 🙏 Acknowledgments

Based on [MaTableGPT](https://github.com/KIST-CSRC/MaTableGPT) - GPT-based Table Data Extractor from Materials Science Literature.

## 📜 License

MIT License