Spaces:
Sleeping
Sleeping
| title: MaTableGPT MCP | |
| emoji: 🔬 | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| app_port: 7860 | |
| # MaTableGPT MCP Service | |
| [](https://huggingface.co/spaces) | |
| [](https://modelcontextprotocol.io/) | |
| **GPT-based Table Data Extractor from Materials Science Literature** | |
| A Model Context Protocol (MCP) service that extracts structured catalyst performance data from HTML tables in materials science publications. | |
| ## 🌟 Features | |
| ### Table Representation | |
| - **HTML to TSV**: Convert HTML tables to tab-separated format with preserved structure | |
| - **HTML to JSON**: Convert HTML tables to nested JSON format | |
| - **Table Splitting**: Break down complex tables with multiple headers into simpler components | |
| ### GPT-based Extraction | |
| - **Zero-shot**: Multi-step questioning approach without examples | |
| - **Few-shot**: Guided extraction with input/output examples | |
| - **Fine-tuned**: Use pre-trained specialized models | |
| ### Session Management | |
| - Track multiple table processing workflows | |
| - Store representations and extractions | |
| - Export session data for analysis | |
| ## 🚀 Quick Start (HuggingFace Space SSE Mode) | |
| This service runs as a **pure MCP SSE server** on HuggingFace Space, accessible via SSE endpoint. | |
| **SSE Endpoint**: `https://your-space-name.hf.space/sse` | |
| ### Connect from Cursor/Claude Desktop | |
| ```json | |
| { | |
| "mcpServers": { | |
| "matablgpt": { | |
| "url": "https://your-space-name.hf.space/sse" | |
| } | |
| } | |
| } | |
| ``` | |
| ## 📦 Installation | |
| ### Prerequisites | |
| - Python 3.8+ | |
| - OpenAI-compatible API key (for GPT extraction) | |
| ### Local Installation | |
| ```bash | |
| # Clone or copy the mcp_output folder | |
| cd mcp_output | |
| # Create virtual environment | |
| python -m venv venv | |
| # Activate (Windows) | |
| venv\Scripts\activate | |
| # Activate (Unix/Mac) | |
| source venv/bin/activate | |
| # Install dependencies | |
| pip install -r requirements.txt | |
| # Set API configuration (use your third-party API service info) | |
| # Windows PowerShell | |
| $env:LLM_API_KEY = "your_api_key" | |
| $env:LLM_API_BASE = "https://api.your-service.com/v1" | |
| $env:LLM_MODEL = "gpt-4-turbo-preview" | |
| # Windows CMD | |
| set LLM_API_KEY=your_api_key | |
| set LLM_API_BASE=https://api.your-service.com/v1 | |
| set LLM_MODEL=gpt-4-turbo-preview | |
| # Unix/Mac | |
| export LLM_API_KEY=your_api_key | |
| export LLM_API_BASE=https://api.your-service.com/v1 | |
| export LLM_MODEL=gpt-4-turbo-preview | |
| ``` | |
| ## 🔑 Environment Variables | |
| This service supports third-party API services (reverse proxy, OneAPI, API aggregators, etc.) | |
| | Variable | Required | Description | | |
| |----------|----------|-------------| | |
| | `LLM_API_KEY` | ✅ Yes | Your API key from the service provider | | |
| | `LLM_API_BASE` | ✅ Yes | API base URL, e.g., `https://api.your-service.com/v1` | | |
| | `LLM_MODEL` | ❌ No | Model name (default: gpt-4-turbo-preview) | | |
| **Alternative variable names (also supported):** | |
| | Variable | Description | | |
| |----------|-------------| | |
| | `OPENAI_API_KEY` | Alternative to LLM_API_KEY | | |
| | `OPENAI_API_BASE` | Alternative to LLM_API_BASE | | |
| | `OPENAI_MODEL` | Alternative to LLM_MODEL | | |
| ## 🚀 Usage | |
| ### Start MCP Server (SSE mode - Default for HuggingFace Space) | |
| ```bash | |
| # Default: SSE mode on port 7860 | |
| python start_mcp.py | |
| # Custom port | |
| python start_mcp.py --mode sse --port 8080 | |
| ``` | |
| ### Start MCP Server (stdio mode - For local Cursor integration) | |
| ```bash | |
| python start_mcp.py --mode stdio | |
| ``` | |
| ## 🔧 MCP Tools Reference | |
| ### Session Management | |
| | Tool | Description | | |
| |------|-------------| | |
| | `create_session` | Create a new extraction session | | |
| | `get_session_data` | Retrieve all data from a session | | |
| ### Table Processing | |
| | Tool | Description | | |
| |------|-------------| | |
| | `html_to_tsv_representation` | Convert HTML table to TSV format | | |
| | `html_to_json_representation` | Convert HTML table to JSON format | | |
| | `analyze_table_structure` | Analyze table structure (headers, merged cells) | | |
| | `split_complex_table` | Split tables with multiple internal headers | | |
| ### Data Extraction | |
| | Tool | Description | | |
| |------|-------------| | |
| | `extract_catalyst_data_zero_shot` | Extract using zero-shot GPT | | |
| | `extract_catalyst_data_few_shot` | Extract with example pairs | | |
| | `extract_catalyst_data_fine_tuned` | Extract using fine-tuned model | | |
| | `batch_extract_tables` | Extract from multiple tables in batch | | |
| ### Follow-up & Refinement | |
| | Tool | Description | | |
| |------|-------------| | |
| | `apply_follow_up_questions` | Refine extraction with iterative Q&A (from original MaTableGPT) | | |
| ### Evaluation | |
| | Tool | Description | | |
| |------|-------------| | |
| | `evaluate_extraction` | Compute Structure F1 Score and Value Accuracy | | |
| | `validate_extraction_result` | Validate extraction against schema | | |
| ### Utilities | |
| | Tool | Description | | |
| |------|-------------| | |
| | `list_performance_types` | List supported catalyst performance types | | |
| | `get_extraction_code_template` | Get Python code for local extraction | | |
| | `get_environment_requirements` | Get setup requirements | | |
| ## 📋 Supported Performance Types | |
| The following catalyst performance types can be extracted: | |
| - `overpotential`, `tafel_slope`, `Rct`, `stability`, `Cdl` | |
| - `onset_potential`, `current_density`, `potential`, `TOF`, `ECSA` | |
| - `water_splitting_potential`, `mass_activity`, `exchange_current_density` | |
| - `Rs`, `specific_activity`, `onset_overpotential`, `BET`, `surface_area` | |
| - `loading`, `apparent_activation_energy` | |
| ## 🔄 Workflow Example | |
| ### 1. Create a session | |
| ```python | |
| result = create_session() | |
| session_id = result["session_id"] | |
| ``` | |
| ### 2. Convert HTML table to representation | |
| ```python | |
| html = "<table>...</table>" | |
| tsv = html_to_tsv_representation( | |
| html_table=html, | |
| title="Table 1: Catalyst Performance", | |
| caption="OER performance in 1M KOH", | |
| session_id=session_id, | |
| table_name="table1" | |
| ) | |
| ``` | |
| ### 3. Extract catalyst data | |
| ```python | |
| extraction = extract_catalyst_data_zero_shot( | |
| table_representation=tsv["representation"], | |
| session_id=session_id, | |
| table_name="table1" | |
| ) | |
| ``` | |
| ### 4. Validate and export | |
| ```python | |
| validation = validate_extraction_result(extraction["extraction"]) | |
| session_data = get_session_data(session_id) | |
| ``` | |
| ## 🐳 Docker Deployment | |
| ### Build image | |
| ```bash | |
| docker build -t matablgpt-mcp . | |
| ``` | |
| ### Run container (SSE mode) | |
| ```bash | |
| docker run -p 7860:7860 \ | |
| -e LLM_API_KEY=your_key \ | |
| -e LLM_API_BASE=https://api.your-service.com/v1 \ | |
| matablgpt-mcp | |
| ``` | |
| ## 🤗 HuggingFace Spaces Deployment | |
| 1. Create a new Space with **Docker SDK** | |
| 2. Upload all files from `mcp_output/` | |
| 3. Add secrets in Space settings: | |
| - `LLM_API_KEY`: Your API key | |
| - `LLM_API_BASE`: Your API base URL (e.g., `https://api.your-service.com/v1`) | |
| - `LLM_MODEL`: (Optional) Model name | |
| 4. Space will auto-build and deploy the MCP SSE service | |
| 5. Connect via: `https://your-space-name.hf.space/sse` | |
| ## 📝 MCP Client Configuration | |
| ### For Cursor (SSE mode - HuggingFace Space) | |
| Add to `~/.cursor/mcp.json`: | |
| ```json | |
| { | |
| "mcpServers": { | |
| "matablgpt": { | |
| "url": "https://your-space-name.hf.space/sse" | |
| } | |
| } | |
| } | |
| ``` | |
| ### For Cursor (stdio mode - Local) | |
| ```json | |
| { | |
| "mcpServers": { | |
| "matablgpt": { | |
| "command": "python", | |
| "args": ["F:/Material_Agent/MaTableGPT/mcp_output/start_mcp.py", "--mode", "stdio"], | |
| "env": { | |
| "LLM_API_KEY": "your_key", | |
| "LLM_API_BASE": "https://api.your-service.com/v1" | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| ### For Claude Desktop | |
| ```json | |
| { | |
| "mcpServers": { | |
| "matablgpt": { | |
| "url": "https://your-space-name.hf.space/sse" | |
| } | |
| } | |
| } | |
| ``` | |
| ## 📄 Output Format | |
| Extracted data follows this JSON schema: | |
| ```json | |
| { | |
| "catalyst_name": { | |
| "overpotential": { | |
| "electrolyte": "1.0 M KOH", | |
| "reaction_type": "OER", | |
| "value": "230 mV", | |
| "current_density": "10 mA/cm²" | |
| }, | |
| "tafel_slope": { | |
| "electrolyte": "1.0 M KOH", | |
| "reaction_type": "OER", | |
| "value": "45 mV/dec" | |
| } | |
| } | |
| } | |
| ``` | |
| ## 🙏 Acknowledgments | |
| Based on [MaTableGPT](https://github.com/KIST-CSRC/MaTableGPT) - GPT-based Table Data Extractor from Materials Science Literature. | |
| ## 📜 License | |
| MIT License | |