MatTableGPT / README.md
SEUyishu's picture
Upload 6 files
1742f51 verified
metadata
title: MaTableGPT MCP
emoji: 🔬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
app_port: 7860

MaTableGPT MCP Service

HuggingFace Spaces MCP

GPT-based Table Data Extractor from Materials Science Literature

A Model Context Protocol (MCP) service that extracts structured catalyst performance data from HTML tables in materials science publications.

🌟 Features

Table Representation

  • HTML to TSV: Convert HTML tables to tab-separated format with preserved structure
  • HTML to JSON: Convert HTML tables to nested JSON format
  • Table Splitting: Break down complex tables with multiple headers into simpler components

GPT-based Extraction

  • Zero-shot: Multi-step questioning approach without examples
  • Few-shot: Guided extraction with input/output examples
  • Fine-tuned: Use pre-trained specialized models

Session Management

  • Track multiple table processing workflows
  • Store representations and extractions
  • Export session data for analysis

🚀 Quick Start (HuggingFace Space SSE Mode)

This service runs as a pure MCP SSE server on HuggingFace Space, accessible via SSE endpoint.

SSE Endpoint: https://your-space-name.hf.space/sse

Connect from Cursor/Claude Desktop

{
  "mcpServers": {
    "matablgpt": {
      "url": "https://your-space-name.hf.space/sse"
    }
  }
}

📦 Installation

Prerequisites

  • Python 3.8+
  • OpenAI-compatible API key (for GPT extraction)

Local Installation

# Clone or copy the mcp_output folder
cd mcp_output

# Create virtual environment
python -m venv venv

# Activate (Windows)
venv\Scripts\activate
# Activate (Unix/Mac)
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Set API configuration (use your third-party API service info)
# Windows PowerShell
$env:LLM_API_KEY = "your_api_key"
$env:LLM_API_BASE = "https://api.your-service.com/v1"
$env:LLM_MODEL = "gpt-4-turbo-preview"

# Windows CMD
set LLM_API_KEY=your_api_key
set LLM_API_BASE=https://api.your-service.com/v1
set LLM_MODEL=gpt-4-turbo-preview

# Unix/Mac
export LLM_API_KEY=your_api_key
export LLM_API_BASE=https://api.your-service.com/v1
export LLM_MODEL=gpt-4-turbo-preview

🔑 Environment Variables

This service supports third-party API services (reverse proxy, OneAPI, API aggregators, etc.)

Variable Required Description
LLM_API_KEY ✅ Yes Your API key from the service provider
LLM_API_BASE ✅ Yes API base URL, e.g., https://api.your-service.com/v1
LLM_MODEL ❌ No Model name (default: gpt-4-turbo-preview)

Alternative variable names (also supported):

Variable Description
OPENAI_API_KEY Alternative to LLM_API_KEY
OPENAI_API_BASE Alternative to LLM_API_BASE
OPENAI_MODEL Alternative to LLM_MODEL

🚀 Usage

Start MCP Server (SSE mode - Default for HuggingFace Space)

# Default: SSE mode on port 7860
python start_mcp.py

# Custom port
python start_mcp.py --mode sse --port 8080

Start MCP Server (stdio mode - For local Cursor integration)

python start_mcp.py --mode stdio

🔧 MCP Tools Reference

Session Management

Tool Description
create_session Create a new extraction session
get_session_data Retrieve all data from a session

Table Processing

Tool Description
html_to_tsv_representation Convert HTML table to TSV format
html_to_json_representation Convert HTML table to JSON format
analyze_table_structure Analyze table structure (headers, merged cells)
split_complex_table Split tables with multiple internal headers

Data Extraction

Tool Description
extract_catalyst_data_zero_shot Extract using zero-shot GPT
extract_catalyst_data_few_shot Extract with example pairs
extract_catalyst_data_fine_tuned Extract using fine-tuned model
batch_extract_tables Extract from multiple tables in batch

Follow-up & Refinement

Tool Description
apply_follow_up_questions Refine extraction with iterative Q&A (from original MaTableGPT)

Evaluation

Tool Description
evaluate_extraction Compute Structure F1 Score and Value Accuracy
validate_extraction_result Validate extraction against schema

Utilities

Tool Description
list_performance_types List supported catalyst performance types
get_extraction_code_template Get Python code for local extraction
get_environment_requirements Get setup requirements

📋 Supported Performance Types

The following catalyst performance types can be extracted:

  • overpotential, tafel_slope, Rct, stability, Cdl
  • onset_potential, current_density, potential, TOF, ECSA
  • water_splitting_potential, mass_activity, exchange_current_density
  • Rs, specific_activity, onset_overpotential, BET, surface_area
  • loading, apparent_activation_energy

🔄 Workflow Example

1. Create a session

result = create_session()
session_id = result["session_id"]

2. Convert HTML table to representation

html = "<table>...</table>"
tsv = html_to_tsv_representation(
    html_table=html,
    title="Table 1: Catalyst Performance",
    caption="OER performance in 1M KOH",
    session_id=session_id,
    table_name="table1"
)

3. Extract catalyst data

extraction = extract_catalyst_data_zero_shot(
    table_representation=tsv["representation"],
    session_id=session_id,
    table_name="table1"
)

4. Validate and export

validation = validate_extraction_result(extraction["extraction"])
session_data = get_session_data(session_id)

🐳 Docker Deployment

Build image

docker build -t matablgpt-mcp .

Run container (SSE mode)

docker run -p 7860:7860 \
    -e LLM_API_KEY=your_key \
    -e LLM_API_BASE=https://api.your-service.com/v1 \
    matablgpt-mcp

🤗 HuggingFace Spaces Deployment

  1. Create a new Space with Docker SDK
  2. Upload all files from mcp_output/
  3. Add secrets in Space settings:
    • LLM_API_KEY: Your API key
    • LLM_API_BASE: Your API base URL (e.g., https://api.your-service.com/v1)
    • LLM_MODEL: (Optional) Model name
  4. Space will auto-build and deploy the MCP SSE service
  5. Connect via: https://your-space-name.hf.space/sse

📝 MCP Client Configuration

For Cursor (SSE mode - HuggingFace Space)

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "matablgpt": {
      "url": "https://your-space-name.hf.space/sse"
    }
  }
}

For Cursor (stdio mode - Local)

{
  "mcpServers": {
    "matablgpt": {
      "command": "python",
      "args": ["F:/Material_Agent/MaTableGPT/mcp_output/start_mcp.py", "--mode", "stdio"],
      "env": {
        "LLM_API_KEY": "your_key",
        "LLM_API_BASE": "https://api.your-service.com/v1"
      }
    }
  }
}

For Claude Desktop

{
  "mcpServers": {
    "matablgpt": {
      "url": "https://your-space-name.hf.space/sse"
    }
  }
}

📄 Output Format

Extracted data follows this JSON schema:

{
  "catalyst_name": {
    "overpotential": {
      "electrolyte": "1.0 M KOH",
      "reaction_type": "OER",
      "value": "230 mV",
      "current_density": "10 mA/cm²"
    },
    "tafel_slope": {
      "electrolyte": "1.0 M KOH",
      "reaction_type": "OER",
      "value": "45 mV/dec"
    }
  }
}

🙏 Acknowledgments

Based on MaTableGPT - GPT-based Table Data Extractor from Materials Science Literature.

📜 License

MIT License