Spaces:
Sleeping
Sleeping
metadata
title: MaTableGPT MCP
emoji: 🔬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
app_port: 7860
MaTableGPT MCP Service
GPT-based Table Data Extractor from Materials Science Literature
A Model Context Protocol (MCP) service that extracts structured catalyst performance data from HTML tables in materials science publications.
🌟 Features
Table Representation
- HTML to TSV: Convert HTML tables to tab-separated format with preserved structure
- HTML to JSON: Convert HTML tables to nested JSON format
- Table Splitting: Break down complex tables with multiple headers into simpler components
GPT-based Extraction
- Zero-shot: Multi-step questioning approach without examples
- Few-shot: Guided extraction with input/output examples
- Fine-tuned: Use pre-trained specialized models
Session Management
- Track multiple table processing workflows
- Store representations and extractions
- Export session data for analysis
🚀 Quick Start (HuggingFace Space SSE Mode)
This service runs as a pure MCP SSE server on HuggingFace Space, accessible via SSE endpoint.
SSE Endpoint: https://your-space-name.hf.space/sse
Connect from Cursor/Claude Desktop
{
"mcpServers": {
"matablgpt": {
"url": "https://your-space-name.hf.space/sse"
}
}
}
📦 Installation
Prerequisites
- Python 3.8+
- OpenAI-compatible API key (for GPT extraction)
Local Installation
# Clone or copy the mcp_output folder
cd mcp_output
# Create virtual environment
python -m venv venv
# Activate (Windows)
venv\Scripts\activate
# Activate (Unix/Mac)
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Set API configuration (use your third-party API service info)
# Windows PowerShell
$env:LLM_API_KEY = "your_api_key"
$env:LLM_API_BASE = "https://api.your-service.com/v1"
$env:LLM_MODEL = "gpt-4-turbo-preview"
# Windows CMD
set LLM_API_KEY=your_api_key
set LLM_API_BASE=https://api.your-service.com/v1
set LLM_MODEL=gpt-4-turbo-preview
# Unix/Mac
export LLM_API_KEY=your_api_key
export LLM_API_BASE=https://api.your-service.com/v1
export LLM_MODEL=gpt-4-turbo-preview
🔑 Environment Variables
This service supports third-party API services (reverse proxy, OneAPI, API aggregators, etc.)
| Variable | Required | Description |
|---|---|---|
LLM_API_KEY |
✅ Yes | Your API key from the service provider |
LLM_API_BASE |
✅ Yes | API base URL, e.g., https://api.your-service.com/v1 |
LLM_MODEL |
❌ No | Model name (default: gpt-4-turbo-preview) |
Alternative variable names (also supported):
| Variable | Description |
|---|---|
OPENAI_API_KEY |
Alternative to LLM_API_KEY |
OPENAI_API_BASE |
Alternative to LLM_API_BASE |
OPENAI_MODEL |
Alternative to LLM_MODEL |
🚀 Usage
Start MCP Server (SSE mode - Default for HuggingFace Space)
# Default: SSE mode on port 7860
python start_mcp.py
# Custom port
python start_mcp.py --mode sse --port 8080
Start MCP Server (stdio mode - For local Cursor integration)
python start_mcp.py --mode stdio
🔧 MCP Tools Reference
Session Management
| Tool | Description |
|---|---|
create_session |
Create a new extraction session |
get_session_data |
Retrieve all data from a session |
Table Processing
| Tool | Description |
|---|---|
html_to_tsv_representation |
Convert HTML table to TSV format |
html_to_json_representation |
Convert HTML table to JSON format |
analyze_table_structure |
Analyze table structure (headers, merged cells) |
split_complex_table |
Split tables with multiple internal headers |
Data Extraction
| Tool | Description |
|---|---|
extract_catalyst_data_zero_shot |
Extract using zero-shot GPT |
extract_catalyst_data_few_shot |
Extract with example pairs |
extract_catalyst_data_fine_tuned |
Extract using fine-tuned model |
batch_extract_tables |
Extract from multiple tables in batch |
Follow-up & Refinement
| Tool | Description |
|---|---|
apply_follow_up_questions |
Refine extraction with iterative Q&A (from original MaTableGPT) |
Evaluation
| Tool | Description |
|---|---|
evaluate_extraction |
Compute Structure F1 Score and Value Accuracy |
validate_extraction_result |
Validate extraction against schema |
Utilities
| Tool | Description |
|---|---|
list_performance_types |
List supported catalyst performance types |
get_extraction_code_template |
Get Python code for local extraction |
get_environment_requirements |
Get setup requirements |
📋 Supported Performance Types
The following catalyst performance types can be extracted:
overpotential,tafel_slope,Rct,stability,Cdlonset_potential,current_density,potential,TOF,ECSAwater_splitting_potential,mass_activity,exchange_current_densityRs,specific_activity,onset_overpotential,BET,surface_arealoading,apparent_activation_energy
🔄 Workflow Example
1. Create a session
result = create_session()
session_id = result["session_id"]
2. Convert HTML table to representation
html = "<table>...</table>"
tsv = html_to_tsv_representation(
html_table=html,
title="Table 1: Catalyst Performance",
caption="OER performance in 1M KOH",
session_id=session_id,
table_name="table1"
)
3. Extract catalyst data
extraction = extract_catalyst_data_zero_shot(
table_representation=tsv["representation"],
session_id=session_id,
table_name="table1"
)
4. Validate and export
validation = validate_extraction_result(extraction["extraction"])
session_data = get_session_data(session_id)
🐳 Docker Deployment
Build image
docker build -t matablgpt-mcp .
Run container (SSE mode)
docker run -p 7860:7860 \
-e LLM_API_KEY=your_key \
-e LLM_API_BASE=https://api.your-service.com/v1 \
matablgpt-mcp
🤗 HuggingFace Spaces Deployment
- Create a new Space with Docker SDK
- Upload all files from
mcp_output/ - Add secrets in Space settings:
LLM_API_KEY: Your API keyLLM_API_BASE: Your API base URL (e.g.,https://api.your-service.com/v1)LLM_MODEL: (Optional) Model name
- Space will auto-build and deploy the MCP SSE service
- Connect via:
https://your-space-name.hf.space/sse
📝 MCP Client Configuration
For Cursor (SSE mode - HuggingFace Space)
Add to ~/.cursor/mcp.json:
{
"mcpServers": {
"matablgpt": {
"url": "https://your-space-name.hf.space/sse"
}
}
}
For Cursor (stdio mode - Local)
{
"mcpServers": {
"matablgpt": {
"command": "python",
"args": ["F:/Material_Agent/MaTableGPT/mcp_output/start_mcp.py", "--mode", "stdio"],
"env": {
"LLM_API_KEY": "your_key",
"LLM_API_BASE": "https://api.your-service.com/v1"
}
}
}
}
For Claude Desktop
{
"mcpServers": {
"matablgpt": {
"url": "https://your-space-name.hf.space/sse"
}
}
}
📄 Output Format
Extracted data follows this JSON schema:
{
"catalyst_name": {
"overpotential": {
"electrolyte": "1.0 M KOH",
"reaction_type": "OER",
"value": "230 mV",
"current_density": "10 mA/cm²"
},
"tafel_slope": {
"electrolyte": "1.0 M KOH",
"reaction_type": "OER",
"value": "45 mV/dec"
}
}
}
🙏 Acknowledgments
Based on MaTableGPT - GPT-based Table Data Extractor from Materials Science Literature.
📜 License
MIT License