--- title: MaTableGPT MCP emoji: 🔬 colorFrom: blue colorTo: green sdk: docker pinned: false license: mit app_port: 7860 --- # MaTableGPT MCP Service [](https://huggingface.co/spaces) [](https://modelcontextprotocol.io/) **GPT-based Table Data Extractor from Materials Science Literature** A Model Context Protocol (MCP) service that extracts structured catalyst performance data from HTML tables in materials science publications. ## 🌟 Features ### Table Representation - **HTML to TSV**: Convert HTML tables to tab-separated format with preserved structure - **HTML to JSON**: Convert HTML tables to nested JSON format - **Table Splitting**: Break down complex tables with multiple headers into simpler components ### GPT-based Extraction - **Zero-shot**: Multi-step questioning approach without examples - **Few-shot**: Guided extraction with input/output examples - **Fine-tuned**: Use pre-trained specialized models ### Session Management - Track multiple table processing workflows - Store representations and extractions - Export session data for analysis ## 🚀 Quick Start (HuggingFace Space SSE Mode) This service runs as a **pure MCP SSE server** on HuggingFace Space, accessible via SSE endpoint. **SSE Endpoint**: `https://your-space-name.hf.space/sse` ### Connect from Cursor/Claude Desktop ```json { "mcpServers": { "matablgpt": { "url": "https://your-space-name.hf.space/sse" } } } ``` ## 📦 Installation ### Prerequisites - Python 3.8+ - OpenAI-compatible API key (for GPT extraction) ### Local Installation ```bash # Clone or copy the mcp_output folder cd mcp_output # Create virtual environment python -m venv venv # Activate (Windows) venv\Scripts\activate # Activate (Unix/Mac) source venv/bin/activate # Install dependencies pip install -r requirements.txt # Set API configuration (use your third-party API service info) # Windows PowerShell $env:LLM_API_KEY = "your_api_key" $env:LLM_API_BASE = "https://api.your-service.com/v1" $env:LLM_MODEL = "gpt-4-turbo-preview" # Windows CMD set LLM_API_KEY=your_api_key set LLM_API_BASE=https://api.your-service.com/v1 set LLM_MODEL=gpt-4-turbo-preview # Unix/Mac export LLM_API_KEY=your_api_key export LLM_API_BASE=https://api.your-service.com/v1 export LLM_MODEL=gpt-4-turbo-preview ``` ## 🔑 Environment Variables This service supports third-party API services (reverse proxy, OneAPI, API aggregators, etc.) | Variable | Required | Description | |----------|----------|-------------| | `LLM_API_KEY` | ✅ Yes | Your API key from the service provider | | `LLM_API_BASE` | ✅ Yes | API base URL, e.g., `https://api.your-service.com/v1` | | `LLM_MODEL` | ❌ No | Model name (default: gpt-4-turbo-preview) | **Alternative variable names (also supported):** | Variable | Description | |----------|-------------| | `OPENAI_API_KEY` | Alternative to LLM_API_KEY | | `OPENAI_API_BASE` | Alternative to LLM_API_BASE | | `OPENAI_MODEL` | Alternative to LLM_MODEL | ## 🚀 Usage ### Start MCP Server (SSE mode - Default for HuggingFace Space) ```bash # Default: SSE mode on port 7860 python start_mcp.py # Custom port python start_mcp.py --mode sse --port 8080 ``` ### Start MCP Server (stdio mode - For local Cursor integration) ```bash python start_mcp.py --mode stdio ``` ## 🔧 MCP Tools Reference ### Session Management | Tool | Description | |------|-------------| | `create_session` | Create a new extraction session | | `get_session_data` | Retrieve all data from a session | ### Table Processing | Tool | Description | |------|-------------| | `html_to_tsv_representation` | Convert HTML table to TSV format | | `html_to_json_representation` | Convert HTML table to JSON format | | `analyze_table_structure` | Analyze table structure (headers, merged cells) | | `split_complex_table` | Split tables with multiple internal headers | ### Data Extraction | Tool | Description | |------|-------------| | `extract_catalyst_data_zero_shot` | Extract using zero-shot GPT | | `extract_catalyst_data_few_shot` | Extract with example pairs | | `extract_catalyst_data_fine_tuned` | Extract using fine-tuned model | | `batch_extract_tables` | Extract from multiple tables in batch | ### Follow-up & Refinement | Tool | Description | |------|-------------| | `apply_follow_up_questions` | Refine extraction with iterative Q&A (from original MaTableGPT) | ### Evaluation | Tool | Description | |------|-------------| | `evaluate_extraction` | Compute Structure F1 Score and Value Accuracy | | `validate_extraction_result` | Validate extraction against schema | ### Utilities | Tool | Description | |------|-------------| | `list_performance_types` | List supported catalyst performance types | | `get_extraction_code_template` | Get Python code for local extraction | | `get_environment_requirements` | Get setup requirements | ## 📋 Supported Performance Types The following catalyst performance types can be extracted: - `overpotential`, `tafel_slope`, `Rct`, `stability`, `Cdl` - `onset_potential`, `current_density`, `potential`, `TOF`, `ECSA` - `water_splitting_potential`, `mass_activity`, `exchange_current_density` - `Rs`, `specific_activity`, `onset_overpotential`, `BET`, `surface_area` - `loading`, `apparent_activation_energy` ## 🔄 Workflow Example ### 1. Create a session ```python result = create_session() session_id = result["session_id"] ``` ### 2. Convert HTML table to representation ```python html = "