Spaces:
Sleeping
Sleeping
File size: 8,444 Bytes
84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 1742f51 84a8f07 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 | ---
title: MaTableGPT MCP
emoji: 🔬
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
license: mit
app_port: 7860
---
# MaTableGPT MCP Service
[](https://huggingface.co/spaces)
[](https://modelcontextprotocol.io/)
**GPT-based Table Data Extractor from Materials Science Literature**
A Model Context Protocol (MCP) service that extracts structured catalyst performance data from HTML tables in materials science publications.
## 🌟 Features
### Table Representation
- **HTML to TSV**: Convert HTML tables to tab-separated format with preserved structure
- **HTML to JSON**: Convert HTML tables to nested JSON format
- **Table Splitting**: Break down complex tables with multiple headers into simpler components
### GPT-based Extraction
- **Zero-shot**: Multi-step questioning approach without examples
- **Few-shot**: Guided extraction with input/output examples
- **Fine-tuned**: Use pre-trained specialized models
### Session Management
- Track multiple table processing workflows
- Store representations and extractions
- Export session data for analysis
## 🚀 Quick Start (HuggingFace Space SSE Mode)
This service runs as a **pure MCP SSE server** on HuggingFace Space, accessible via SSE endpoint.
**SSE Endpoint**: `https://your-space-name.hf.space/sse`
### Connect from Cursor/Claude Desktop
```json
{
"mcpServers": {
"matablgpt": {
"url": "https://your-space-name.hf.space/sse"
}
}
}
```
## 📦 Installation
### Prerequisites
- Python 3.8+
- OpenAI-compatible API key (for GPT extraction)
### Local Installation
```bash
# Clone or copy the mcp_output folder
cd mcp_output
# Create virtual environment
python -m venv venv
# Activate (Windows)
venv\Scripts\activate
# Activate (Unix/Mac)
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Set API configuration (use your third-party API service info)
# Windows PowerShell
$env:LLM_API_KEY = "your_api_key"
$env:LLM_API_BASE = "https://api.your-service.com/v1"
$env:LLM_MODEL = "gpt-4-turbo-preview"
# Windows CMD
set LLM_API_KEY=your_api_key
set LLM_API_BASE=https://api.your-service.com/v1
set LLM_MODEL=gpt-4-turbo-preview
# Unix/Mac
export LLM_API_KEY=your_api_key
export LLM_API_BASE=https://api.your-service.com/v1
export LLM_MODEL=gpt-4-turbo-preview
```
## 🔑 Environment Variables
This service supports third-party API services (reverse proxy, OneAPI, API aggregators, etc.)
| Variable | Required | Description |
|----------|----------|-------------|
| `LLM_API_KEY` | ✅ Yes | Your API key from the service provider |
| `LLM_API_BASE` | ✅ Yes | API base URL, e.g., `https://api.your-service.com/v1` |
| `LLM_MODEL` | ❌ No | Model name (default: gpt-4-turbo-preview) |
**Alternative variable names (also supported):**
| Variable | Description |
|----------|-------------|
| `OPENAI_API_KEY` | Alternative to LLM_API_KEY |
| `OPENAI_API_BASE` | Alternative to LLM_API_BASE |
| `OPENAI_MODEL` | Alternative to LLM_MODEL |
## 🚀 Usage
### Start MCP Server (SSE mode - Default for HuggingFace Space)
```bash
# Default: SSE mode on port 7860
python start_mcp.py
# Custom port
python start_mcp.py --mode sse --port 8080
```
### Start MCP Server (stdio mode - For local Cursor integration)
```bash
python start_mcp.py --mode stdio
```
## 🔧 MCP Tools Reference
### Session Management
| Tool | Description |
|------|-------------|
| `create_session` | Create a new extraction session |
| `get_session_data` | Retrieve all data from a session |
### Table Processing
| Tool | Description |
|------|-------------|
| `html_to_tsv_representation` | Convert HTML table to TSV format |
| `html_to_json_representation` | Convert HTML table to JSON format |
| `analyze_table_structure` | Analyze table structure (headers, merged cells) |
| `split_complex_table` | Split tables with multiple internal headers |
### Data Extraction
| Tool | Description |
|------|-------------|
| `extract_catalyst_data_zero_shot` | Extract using zero-shot GPT |
| `extract_catalyst_data_few_shot` | Extract with example pairs |
| `extract_catalyst_data_fine_tuned` | Extract using fine-tuned model |
| `batch_extract_tables` | Extract from multiple tables in batch |
### Follow-up & Refinement
| Tool | Description |
|------|-------------|
| `apply_follow_up_questions` | Refine extraction with iterative Q&A (from original MaTableGPT) |
### Evaluation
| Tool | Description |
|------|-------------|
| `evaluate_extraction` | Compute Structure F1 Score and Value Accuracy |
| `validate_extraction_result` | Validate extraction against schema |
### Utilities
| Tool | Description |
|------|-------------|
| `list_performance_types` | List supported catalyst performance types |
| `get_extraction_code_template` | Get Python code for local extraction |
| `get_environment_requirements` | Get setup requirements |
## 📋 Supported Performance Types
The following catalyst performance types can be extracted:
- `overpotential`, `tafel_slope`, `Rct`, `stability`, `Cdl`
- `onset_potential`, `current_density`, `potential`, `TOF`, `ECSA`
- `water_splitting_potential`, `mass_activity`, `exchange_current_density`
- `Rs`, `specific_activity`, `onset_overpotential`, `BET`, `surface_area`
- `loading`, `apparent_activation_energy`
## 🔄 Workflow Example
### 1. Create a session
```python
result = create_session()
session_id = result["session_id"]
```
### 2. Convert HTML table to representation
```python
html = "<table>...</table>"
tsv = html_to_tsv_representation(
html_table=html,
title="Table 1: Catalyst Performance",
caption="OER performance in 1M KOH",
session_id=session_id,
table_name="table1"
)
```
### 3. Extract catalyst data
```python
extraction = extract_catalyst_data_zero_shot(
table_representation=tsv["representation"],
session_id=session_id,
table_name="table1"
)
```
### 4. Validate and export
```python
validation = validate_extraction_result(extraction["extraction"])
session_data = get_session_data(session_id)
```
## 🐳 Docker Deployment
### Build image
```bash
docker build -t matablgpt-mcp .
```
### Run container (SSE mode)
```bash
docker run -p 7860:7860 \
-e LLM_API_KEY=your_key \
-e LLM_API_BASE=https://api.your-service.com/v1 \
matablgpt-mcp
```
## 🤗 HuggingFace Spaces Deployment
1. Create a new Space with **Docker SDK**
2. Upload all files from `mcp_output/`
3. Add secrets in Space settings:
- `LLM_API_KEY`: Your API key
- `LLM_API_BASE`: Your API base URL (e.g., `https://api.your-service.com/v1`)
- `LLM_MODEL`: (Optional) Model name
4. Space will auto-build and deploy the MCP SSE service
5. Connect via: `https://your-space-name.hf.space/sse`
## 📝 MCP Client Configuration
### For Cursor (SSE mode - HuggingFace Space)
Add to `~/.cursor/mcp.json`:
```json
{
"mcpServers": {
"matablgpt": {
"url": "https://your-space-name.hf.space/sse"
}
}
}
```
### For Cursor (stdio mode - Local)
```json
{
"mcpServers": {
"matablgpt": {
"command": "python",
"args": ["F:/Material_Agent/MaTableGPT/mcp_output/start_mcp.py", "--mode", "stdio"],
"env": {
"LLM_API_KEY": "your_key",
"LLM_API_BASE": "https://api.your-service.com/v1"
}
}
}
}
```
### For Claude Desktop
```json
{
"mcpServers": {
"matablgpt": {
"url": "https://your-space-name.hf.space/sse"
}
}
}
```
## 📄 Output Format
Extracted data follows this JSON schema:
```json
{
"catalyst_name": {
"overpotential": {
"electrolyte": "1.0 M KOH",
"reaction_type": "OER",
"value": "230 mV",
"current_density": "10 mA/cm²"
},
"tafel_slope": {
"electrolyte": "1.0 M KOH",
"reaction_type": "OER",
"value": "45 mV/dec"
}
}
}
```
## 🙏 Acknowledgments
Based on [MaTableGPT](https://github.com/KIST-CSRC/MaTableGPT) - GPT-based Table Data Extractor from Materials Science Literature.
## 📜 License
MIT License
|