Spaces:

SEUyishu
/

MatTableGPT

Sleeping

App Files Files Community

SEUyishu commited on Dec 4, 2025

Commit

1742f51

verified ·

1 Parent(s): dd5e12b

Upload 6 files

Browse files

Files changed (5) hide show

Dockerfile +15 -12
README.md +71 -22
mcp_service.py +378 -0
requirements.txt +24 -26
start_mcp.py +35 -13

Dockerfile CHANGED Viewed

@@ -1,6 +1,6 @@
 # MaTableGPT MCP Service Docker Image
 # ====================================
-# For HuggingFace Spaces Deployment
 FROM python:3.10-slim
@@ -10,13 +10,17 @@ WORKDIR /app
 # Set environment variables
 ENV PYTHONDONTWRITEBYTECODE=1
 ENV PYTHONUNBUFFERED=1
-ENV GRADIO_SERVER_NAME=0.0.0.0
-ENV GRADIO_SERVER_PORT=7860
 # Install system dependencies
 RUN apt-get update && apt-get install -y --no-install-recommends \
     build-essential \
     git \
     && rm -rf /var/lib/apt/lists/*
 # Copy requirements first for better caching
@@ -27,7 +31,7 @@ RUN pip install --no-cache-dir --upgrade pip && \
     pip install --no-cache-dir -r requirements.txt
 # Download NLTK data for table splitting
-RUN python -c "import nltk; nltk.download('punkt')"
 # Copy application code
 COPY . .
@@ -38,13 +42,12 @@ RUN mkdir -p /app/sessions /app/temp
 # Set permissions for HuggingFace Spaces
 RUN chmod -R 777 /app/sessions /app/temp
-# Expose ports
-# 7860 for Gradio, 7865 for MCP SSE
-EXPOSE 7860 7865
-# Health check
-HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
-    CMD python -c "import requests; requests.get('http://localhost:7860/')" || exit 1
-# Run the application
-CMD ["python", "app.py"]

 # MaTableGPT MCP Service Docker Image
 # ====================================
+# For HuggingFace Spaces Deployment (SSE Mode)
 FROM python:3.10-slim
 # Set environment variables
 ENV PYTHONDONTWRITEBYTECODE=1
 ENV PYTHONUNBUFFERED=1
+# MCP SSE Server Configuration
+# HuggingFace Spaces 使用端口 7860
+ENV MCP_HOST=0.0.0.0
+ENV MCP_PORT=7860
 # Install system dependencies
 RUN apt-get update && apt-get install -y --no-install-recommends \
     build-essential \
     git \
+    curl \
     && rm -rf /var/lib/apt/lists/*
 # Copy requirements first for better caching
     pip install --no-cache-dir -r requirements.txt
 # Download NLTK data for table splitting
+RUN python -c "import nltk; nltk.download('punkt')" || true
 # Copy application code
 COPY . .
 # Set permissions for HuggingFace Spaces
 RUN chmod -R 777 /app/sessions /app/temp
+# Expose MCP SSE port (HuggingFace Spaces uses 7860)
+EXPOSE 7860
+# Health check for MCP SSE endpoint
+HEALTHCHECK --interval=30s --timeout=30s --start-period=10s --retries=3 \
+    CMD curl -f http://localhost:7860/sse || exit 1
+# Run MCP service in SSE mode
+CMD ["python", "start_mcp.py", "--mode", "sse", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -35,11 +35,29 @@ A Model Context Protocol (MCP) service that extracts structured catalyst perform
 - Store representations and extractions
 - Export session data for analysis
 ## 📦 Installation
 ### Prerequisites
 - Python 3.8+
-- OpenAI API key (for GPT extraction)
 ### Local Installation
@@ -94,22 +112,20 @@ This service supports third-party API services (reverse proxy, OneAPI, API aggre
 ## 🚀 Usage
-### Start MCP Server (stdio mode)
 ```bash
 python start_mcp.py
-```
-### Start MCP Server (SSE mode for web integration)
-```bash
-python start_mcp.py --mode sse --port 7865
 ```
-### Start Gradio Web Interface
 ```bash
-python app.py
 ```
 ## 🔧 MCP Tools Reference
@@ -137,13 +153,26 @@ python app.py
 | `extract_catalyst_data_zero_shot` | Extract using zero-shot GPT |
 | `extract_catalyst_data_few_shot` | Extract with example pairs |
 | `extract_catalyst_data_fine_tuned` | Extract using fine-tuned model |
 ### Utilities
 | Tool | Description |
 |------|-------------|
 | `list_performance_types` | List supported catalyst performance types |
-| `validate_extraction_result` | Validate extraction against schema |
 | `get_extraction_code_template` | Get Python code for local extraction |
 | `get_environment_requirements` | Get setup requirements |
@@ -204,46 +233,66 @@ session_data = get_session_data(session_id)
 docker build -t matablgpt-mcp .
 ```
-### Run container
 ```bash
-docker run -p 7860:7860 -p 7865:7865 \
-    -e OPENAI_API_KEY=your_key \
     matablgpt-mcp
 ```
 ## 🤗 HuggingFace Spaces Deployment
-1. Create a new Space with Docker SDK
 2. Upload all files from `mcp_output/`
-3. Add `OPENAI_API_KEY` as a secret in Space settings
-4. Space will auto-build and deploy
 ## 📝 MCP Client Configuration
-Add to your MCP client configuration (e.g., Claude Desktop):
 ```json
 {
   "mcpServers": {
     "matablgpt": {
       "command": "python",
-      "args": ["path/to/mcp_output/start_mcp.py"],
       "env": {
-        "OPENAI_API_KEY": "your_key"
       }
     }
   }
 }
 ```
-Or for SSE mode:
 ```json
 {
   "mcpServers": {
     "matablgpt": {
-      "url": "http://localhost:7865/sse"
     }
   }
 }
@@ -273,7 +322,7 @@ Extracted data follows this JSON schema:
 ## 🙏 Acknowledgments
-Based on [MaTableGPT](https://github.com/your-repo/MaTableGPT) - GPT-based Table Data Extractor from Materials Science Literature.
 ## 📜 License

 - Store representations and extractions
 - Export session data for analysis
+## 🚀 Quick Start (HuggingFace Space SSE Mode)
+This service runs as a **pure MCP SSE server** on HuggingFace Space, accessible via SSE endpoint.
+**SSE Endpoint**: `https://your-space-name.hf.space/sse`
+### Connect from Cursor/Claude Desktop
+```json
+{
+  "mcpServers": {
+    "matablgpt": {
+      "url": "https://your-space-name.hf.space/sse"
+    }
+  }
+}
+```
 ## 📦 Installation
 ### Prerequisites
 - Python 3.8+
+- OpenAI-compatible API key (for GPT extraction)
 ### Local Installation
 ## 🚀 Usage
+### Start MCP Server (SSE mode - Default for HuggingFace Space)
 ```bash
+# Default: SSE mode on port 7860
 python start_mcp.py
+# Custom port
+python start_mcp.py --mode sse --port 8080
 ```
+### Start MCP Server (stdio mode - For local Cursor integration)
 ```bash
+python start_mcp.py --mode stdio
 ```
 ## 🔧 MCP Tools Reference
 | `extract_catalyst_data_zero_shot` | Extract using zero-shot GPT |
 | `extract_catalyst_data_few_shot` | Extract with example pairs |
 | `extract_catalyst_data_fine_tuned` | Extract using fine-tuned model |
+| `batch_extract_tables` | Extract from multiple tables in batch |
+### Follow-up & Refinement
+| Tool | Description |
+|------|-------------|
+| `apply_follow_up_questions` | Refine extraction with iterative Q&A (from original MaTableGPT) |
+### Evaluation
+| Tool | Description |
+|------|-------------|
+| `evaluate_extraction` | Compute Structure F1 Score and Value Accuracy |
+| `validate_extraction_result` | Validate extraction against schema |
 ### Utilities
 | Tool | Description |
 |------|-------------|
 | `list_performance_types` | List supported catalyst performance types |
 | `get_extraction_code_template` | Get Python code for local extraction |
 | `get_environment_requirements` | Get setup requirements |
 docker build -t matablgpt-mcp .
 ```
+### Run container (SSE mode)
 ```bash
+docker run -p 7860:7860 \
+    -e LLM_API_KEY=your_key \
+    -e LLM_API_BASE=https://api.your-service.com/v1 \
     matablgpt-mcp
 ```
 ## 🤗 HuggingFace Spaces Deployment
+1. Create a new Space with **Docker SDK**
 2. Upload all files from `mcp_output/`
+3. Add secrets in Space settings:
+   - `LLM_API_KEY`: Your API key
+   - `LLM_API_BASE`: Your API base URL (e.g., `https://api.your-service.com/v1`)
+   - `LLM_MODEL`: (Optional) Model name
+4. Space will auto-build and deploy the MCP SSE service
+5. Connect via: `https://your-space-name.hf.space/sse`
 ## 📝 MCP Client Configuration
+### For Cursor (SSE mode - HuggingFace Space)
+Add to `~/.cursor/mcp.json`:
+```json
+{
+  "mcpServers": {
+    "matablgpt": {
+      "url": "https://your-space-name.hf.space/sse"
+    }
+  }
+}
+```
+### For Cursor (stdio mode - Local)
 ```json
 {
   "mcpServers": {
     "matablgpt": {
       "command": "python",
+      "args": ["F:/Material_Agent/MaTableGPT/mcp_output/start_mcp.py", "--mode", "stdio"],
       "env": {
+        "LLM_API_KEY": "your_key",
+        "LLM_API_BASE": "https://api.your-service.com/v1"
       }
     }
   }
 }
 ```
+### For Claude Desktop
 ```json
 {
   "mcpServers": {
     "matablgpt": {
+      "url": "https://your-space-name.hf.space/sse"
     }
   }
 }
 ## 🙏 Acknowledgments
+Based on [MaTableGPT](https://github.com/KIST-CSRC/MaTableGPT) - GPT-based Table Data Extractor from Materials Science Literature.
 ## 📜 License

mcp_service.py CHANGED Viewed

@@ -1323,6 +1323,384 @@ print(json.dumps(json.loads(result), indent=2))
     }
 @mcp.tool()
 def get_environment_requirements() -> Dict:
     """

     }
+@mcp.tool()
+def apply_follow_up_questions(
+    extraction_result: Dict,
+    table_representation: str,
+    session_id: str = "",
+    table_name: str = ""
+) -> Dict:
+    """
+    Apply follow-up questions to refine and validate extraction results.
+    This implements the iterative questioning process from the original MaTableGPT
+    to improve extraction accuracy by:
+    1. Verifying catalyst names against the table
+    2. Checking performance types
+    3. Validating property values
+    4. Checking for reaction_type, electrolyte, substrate in title/caption
+    Args:
+        extraction_result: Initial extraction result to refine
+        table_representation: Original table representation for verification
+        session_id: Optional session ID to save refined results
+        table_name: Optional table name
+    Returns:
+        Dictionary containing refined extraction result
+    """
+    try:
+        extractor = get_extractor()
+        # Initialize message context
+        system_prompt = """You need to modify the JSON representing the table.
+JSON template: {'catalyst_name': {'performance_name': {property_template}}}
+property_template: {'electrolyte': '', 'reaction_type': '', 'value': '', 'current_density': '', 'overpotential': '', 'potential': '', 'substrate': '', 'versus': '', 'condition': ''}
+performance_list = """ + str(GPTExtractor.PERFORMANCE_LIST) + """
+Replace 'catalyst_name' and 'performance_name' with actual names from the table."""
+        messages = [{"role": "system", "content": system_prompt}]
+        # Step 1: Verify catalysts in table
+        verify_q = f"""<input representation>
+{table_representation}
+Question 1: List all catalyst names in the table representation as a Python list. Only output the Python list."""
+        messages.append({"role": "user", "content": verify_q})
+        response = extractor.client.chat.completions.create(
+            model=extractor.get_model(),
+            messages=messages,
+            temperature=0
+        )
+        catalysts_in_table = response.choices[0].message.content.strip()
+        messages.append({"role": "assistant", "content": catalysts_in_table})
+        # Step 2: Get catalysts from extraction
+        extraction_catalysts_q = f"""<input json>
+{json.dumps(extraction_result)}
+Question 2: List all catalyst names from the input json as a Python list. Only output the Python list."""
+        messages.append({"role": "user", "content": extraction_catalysts_q})
+        response = extractor.client.chat.completions.create(
+            model=extractor.get_model(),
+            messages=messages,
+            temperature=0
+        )
+        catalysts_in_json = response.choices[0].message.content.strip()
+        messages.append({"role": "assistant", "content": catalysts_in_json})
+        # Step 3: Reconcile catalysts
+        reconcile_q = """Question 3: Based on answers to Question 1 and 2, modify or remove any catalysts
+from Question 2 that don't match Question 1. Output the corrected Python list."""
+        messages.append({"role": "user", "content": reconcile_q})
+        response = extractor.client.chat.completions.create(
+            model=extractor.get_model(),
+            messages=messages,
+            temperature=0
+        )
+        reconciled_catalysts = response.choices[0].message.content.strip()
+        messages.append({"role": "assistant", "content": reconciled_catalysts})
+        # Step 4: Check for title/caption info
+        title_caption_q = f"""<input representation>
+{table_representation}
+Question 4: Check the title and caption of the table.
+- Is there reaction type info (OER, HER, oxygen evolution, hydrogen evolution)?
+- Is there electrolyte info?
+- Is there substrate info?
+Answer in format: {{"reaction_type": "yes/no", "electrolyte": "yes/no", "substrate": "yes/no"}}"""
+        messages.append({"role": "user", "content": title_caption_q})
+        response = extractor.client.chat.completions.create(
+            model=extractor.get_model(),
+            messages=messages,
+            temperature=0
+        )
+        metadata_check = response.choices[0].message.content.strip()
+        messages.append({"role": "assistant", "content": metadata_check})
+        # Step 5: Apply refinements
+        refine_q = f"""<input json>
+{json.dumps(extraction_result)}
+Based on the above analysis:
+1. Keep only catalysts that exist in the table
+2. Remove any 'NA', 'unknown', or empty values
+3. If title/caption lacks reaction_type/electrolyte/substrate info, remove those keys
+4. Output the refined JSON only. No explanation."""
+        messages.append({"role": "user", "content": refine_q})
+        response = extractor.client.chat.completions.create(
+            model=extractor.get_model(),
+            messages=messages,
+            temperature=0
+        )
+        refined_result = response.choices[0].message.content.strip()
+        # Parse result
+        if "```" in refined_result:
+            refined_result = refined_result.replace("```json", "").replace("```", "")
+        try:
+            refined_json = json.loads(refined_result)
+        except json.JSONDecodeError:
+            refined_json = extraction_result  # Fall back to original
+        # Save if session provided
+        if session_id:
+            extraction_record = ExtractionResult(
+                session_id=session_id,
+                table_name=table_name or "unnamed",
+                model_type="follow-up-refined",
+                result=refined_json,
+                timestamp=datetime.now().isoformat(),
+                follow_up_applied=True
+            )
+            session_manager.save_extraction(session_id, extraction_record)
+        return {
+            "success": True,
+            "original": extraction_result,
+            "refined": refined_json,
+            "follow_up_applied": True,
+            "verification_steps": {
+                "catalysts_in_table": catalysts_in_table,
+                "catalysts_in_json": catalysts_in_json,
+                "reconciled": reconciled_catalysts,
+                "metadata_check": metadata_check
+            }
+        }
+    except Exception as e:
+        return {
+            "success": False,
+            "error": str(e),
+            "original": extraction_result,
+            "follow_up_applied": False
+        }
+@mcp.tool()
+def evaluate_extraction(
+    prediction: Dict,
+    ground_truth: Dict,
+    evaluation_type: str = "both"
+) -> Dict:
+    """
+    Evaluate extraction results against ground truth.
+    Computes metrics from the original MaTableGPT evaluation:
+    - Structure F1 Score: Measures correctness of JSON structure
+    - Value Accuracy: Measures correctness of extracted values
+    Args:
+        prediction: The extracted/predicted result
+        ground_truth: The expected correct result
+        evaluation_type: "structure", "value", or "both"
+    Returns:
+        Dictionary containing evaluation metrics
+    """
+    import re
+    import unicodedata
+    def normalize_text(text: str) -> str:
+        """Normalize text for comparison."""
+        if not isinstance(text, str):
+            return str(text)
+        # Remove unicode variations
+        text = unicodedata.normalize('NFKD', text)
+        # Common substitutions
+        text = re.sub(r'–|−', '-', text)
+        text = re.sub(r'<sup>|</sup>', '', text)
+        text = re.sub(r'm2 g−1', 'm2/g', text)
+        text = re.sub(r'mA cm−2', 'mA/cm2', text)
+        text = re.sub(r'\s+', '', text)
+        return text.lower()
+    def get_all_keys(d: Dict, parent_key: str = '', sep: str = '//') -> List[str]:
+        """Recursively get all keys from nested dict."""
+        keys = []
+        if isinstance(d, dict):
+            for k, v in d.items():
+                new_key = f"{parent_key}{sep}{k}" if parent_key else k
+                keys.append(new_key)
+                keys.extend(get_all_keys(v, new_key, sep))
+        elif isinstance(d, list):
+            for i, item in enumerate(d):
+                keys.extend(get_all_keys(item, f"{parent_key}[{i}]", sep))
+        return keys
+    def get_key_value_pairs(d: Dict, parent_key: str = '') -> List[tuple]:
+        """Get all key-value pairs from nested dict."""
+        pairs = []
+        if isinstance(d, dict):
+            for k, v in d.items():
+                new_key = f"{parent_key}//{k}" if parent_key else k
+                if isinstance(v, (dict, list)):
+                    pairs.extend(get_key_value_pairs(v, new_key))
+                else:
+                    pairs.append((new_key, normalize_text(str(v))))
+        elif isinstance(d, list):
+            for i, item in enumerate(d):
+                pairs.extend(get_key_value_pairs(item, f"{parent_key}[{i}]"))
+        return pairs
+    results = {"success": True}
+    try:
+        # Normalize both inputs
+        pred_keys = get_all_keys(prediction)
+        gt_keys = get_all_keys(ground_truth)
+        # Structure F1 Score
+        if evaluation_type in ["structure", "both"]:
+            # Remove 'condition' keys as per original
+            pred_keys = [k for k in pred_keys if 'condition' not in k]
+            gt_keys = [k for k in gt_keys if 'condition' not in k]
+            # Calculate TP, FP, FN for structure
+            tp = len(set(pred_keys) & set(gt_keys))
+            fp = len(set(pred_keys) - set(gt_keys))
+            fn = len(set(gt_keys) - set(pred_keys))
+            if tp + fp + fn > 0:
+                f1_score = tp / (tp + 0.5 * (fp + fn))
+            else:
+                f1_score = 1.0 if len(gt_keys) == 0 else 0.0
+            results["structure_f1"] = round(f1_score, 4)
+            results["structure_details"] = {
+                "true_positives": tp,
+                "false_positives": fp,
+                "false_negatives": fn,
+                "matched_keys": list(set(pred_keys) & set(gt_keys))[:10],  # Sample
+                "missing_keys": list(set(gt_keys) - set(pred_keys))[:10],
+                "extra_keys": list(set(pred_keys) - set(gt_keys))[:10]
+            }
+        # Value Accuracy
+        if evaluation_type in ["value", "both"]:
+            pred_pairs = get_key_value_pairs(prediction)
+            gt_pairs = get_key_value_pairs(ground_truth)
+            # Compare values
+            correct = 0
+            total = len(gt_pairs)
+            pred_dict = {k: v for k, v in pred_pairs}
+            for key, value in gt_pairs:
+                if key in pred_dict:
+                    # Normalize and compare
+                    if normalize_text(pred_dict[key]) == normalize_text(value):
+                        correct += 1
+            value_accuracy = correct / total if total > 0 else 1.0
+            results["value_accuracy"] = round(value_accuracy, 4)
+            results["value_details"] = {
+                "correct_values": correct,
+                "total_values": total,
+                "accuracy_percentage": round(value_accuracy * 100, 2)
+            }
+        # Overall score
+        if evaluation_type == "both":
+            results["overall_score"] = round(
+                (results["structure_f1"] + results["value_accuracy"]) / 2, 4
+            )
+    except Exception as e:
+        results["success"] = False
+        results["error"] = str(e)
+    return results
+@mcp.tool()
+def batch_extract_tables(
+    tables: List[Dict],
+    model_type: str = "zero-shot",
+    apply_follow_up: bool = False,
+    session_id: str = ""
+) -> Dict:
+    """
+    Extract data from multiple tables in batch.
+    Args:
+        tables: List of {"html": html_table, "title": title, "caption": caption, "name": table_name}
+        model_type: "zero-shot", "few-shot", or "fine-tuning"
+        apply_follow_up: Whether to apply follow-up questions for refinement
+        session_id: Optional session ID
+    Returns:
+        Dictionary containing all extraction results
+    """
+    if not session_id:
+        session_id = session_manager.create_session()
+    results = {
+        "success": True,
+        "session_id": session_id,
+        "total_tables": len(tables),
+        "extractions": []
+    }
+    for i, table_info in enumerate(tables):
+        html = table_info.get("html", "")
+        title = table_info.get("title", "")
+        caption = table_info.get("caption", "")
+        table_name = table_info.get("name", f"table_{i+1}")
+        try:
+            # Convert to representation
+            representation = table_representer.html_to_tsv(html, title, caption)
+            # Extract based on model type
+            extractor = get_extractor()
+            if model_type == "zero-shot":
+                extraction = extractor.extract_zero_shot(representation)
+            elif model_type == "few-shot":
+                extraction = extractor.extract_few_shot(representation)
+            else:
+                extraction = {"error": "Fine-tuning requires model_name parameter"}
+            # Apply follow-up if requested
+            if apply_follow_up and "error" not in extraction:
+                from copy import deepcopy
+                follow_up_result = apply_follow_up_questions(
+                    deepcopy(extraction),
+                    representation,
+                    session_id,
+                    table_name
+                )
+                if follow_up_result.get("success"):
+                    extraction = follow_up_result.get("refined", extraction)
+            results["extractions"].append({
+                "table_name": table_name,
+                "success": True,
+                "extraction": extraction
+            })
+        except Exception as e:
+            results["extractions"].append({
+                "table_name": table_name,
+                "success": False,
+                "error": str(e)
+            })
+    results["successful_extractions"] = sum(1 for e in results["extractions"] if e["success"])
+    results["failed_extractions"] = results["total_tables"] - results["successful_extractions"]
+    return results
 @mcp.tool()
 def get_environment_requirements() -> Dict:
     """

requirements.txt CHANGED Viewed

@@ -1,26 +1,24 @@
-# MaTableGPT MCP Service Requirements
-# ====================================
-# Core MCP Framework
-mcp>=0.1.0
-# OpenAI-compatible API client
-openai>=1.0.0
-# HTML Parsing
-beautifulsoup4>=4.12.0
-lxml>=4.9.0
-# Data Processing
-pandas>=2.0.0
-# Web Framework for HuggingFace Space
-# Pin to stable version with compatible huggingface_hub
-gradio==4.44.0
-huggingface_hub>=0.24.0,<1.0.0
-# Async Support
-httpx>=0.25.0
-# Optional: For table splitting analysis
-nltk>=3.8.0

+# MaTableGPT MCP Service Requirements
+# ====================================
+# Core MCP Framework (with SSE support)
+mcp[cli]>=1.0.0
+# OpenAI-compatible API client
+openai>=1.0.0
+# HTML Parsing
+beautifulsoup4>=4.12.0
+lxml>=4.9.0
+# Data Processing
+pandas>=2.0.0
+# SSE/HTTP Support
+starlette>=0.27.0
+uvicorn>=0.23.0
+sse-starlette>=1.6.0
+httpx>=0.25.0
+# Optional: For table splitting analysis
+nltk>=3.8.0

start_mcp.py CHANGED Viewed

@@ -11,8 +11,15 @@ Usage:
 Arguments:
     --host      Host address (default: 0.0.0.0)
-    --port      Port number (default: 7865)
-    --mode      Run mode: 'stdio' or 'sse' (default: stdio)
 """
 import os
@@ -35,13 +42,19 @@ def check_environment():
     """Check if required environment variables are set."""
     warnings = []
-    if not os.environ.get('OPENAI_API_KEY'):
         warnings.append(
-            "OPENAI_API_KEY not set. GPT extraction features will not work. "
-            "Set it with: export OPENAI_API_KEY=your_key (Unix) or "
-            "set OPENAI_API_KEY=your_key (Windows)"
         )
     return warnings
@@ -50,7 +63,7 @@ def check_dependencies():
     missing = []
     required = [
-        ('mcp', 'mcp'),
         ('openai', 'openai'),
         ('bs4', 'beautifulsoup4'),
         ('pandas', 'pandas'),
@@ -68,25 +81,29 @@ def check_dependencies():
 def main():
     """Main entry point."""
     parser = argparse.ArgumentParser(
         description="MaTableGPT MCP Server - Table Data Extraction from Materials Science Literature"
     )
     parser.add_argument(
         '--host',
-        default='0.0.0.0',
-        help='Host address (default: 0.0.0.0)'
     )
     parser.add_argument(
         '--port',
         type=int,
-        default=7865,
-        help='Port number (default: 7865)'
     )
     parser.add_argument(
         '--mode',
         choices=['stdio', 'sse'],
-        default='stdio',
-        help='Run mode: stdio for standard I/O, sse for Server-Sent Events (default: stdio)'
     )
     parser.add_argument(
         '--debug',
@@ -119,6 +136,7 @@ def main():
     if args.mode == 'sse':
         logger.info(f"Host: {args.host}")
         logger.info(f"Port: {args.port}")
     logger.info("=" * 60)
     # Import and run MCP service
@@ -130,13 +148,17 @@ def main():
             mcp.run()
         else:
             logger.info(f"Starting MCP server in SSE mode on {args.host}:{args.port}...")
             mcp.run(transport='sse', host=args.host, port=args.port)
     except ImportError as e:
         logger.error(f"Failed to import MCP service: {e}")
         sys.exit(1)
     except Exception as e:
         logger.error(f"Error starting MCP server: {e}")
         sys.exit(1)

 Arguments:
     --host      Host address (default: 0.0.0.0)
+    --port      Port number (default: 7860)
+    --mode      Run mode: 'stdio' or 'sse' (default: sse for HuggingFace Space)
+Environment Variables:
+    LLM_API_KEY / OPENAI_API_KEY     - API key for LLM service
+    LLM_API_BASE / OPENAI_API_BASE   - Custom API base URL (for third-party services)
+    LLM_MODEL / OPENAI_MODEL         - Model name (default: gpt-4-turbo-preview)
+    MCP_HOST                          - Server host (default: 0.0.0.0)
+    MCP_PORT                          - Server port (default: 7860)
 """
 import os
     """Check if required environment variables are set."""
     warnings = []
+    # Check for API key (support both naming conventions)
+    api_key = os.environ.get('LLM_API_KEY') or os.environ.get('OPENAI_API_KEY')
+    if not api_key:
         warnings.append(
+            "LLM_API_KEY/OPENAI_API_KEY not set. GPT extraction features will not work. "
+            "Set it in HuggingFace Space secrets or environment variables."
         )
+    # Check for API base (for third-party services)
+    api_base = os.environ.get('LLM_API_BASE') or os.environ.get('OPENAI_API_BASE')
+    if api_base:
+        logger.info(f"Using custom API base: {api_base}")
     return warnings
     missing = []
     required = [
+        ('mcp', 'mcp[cli]'),
         ('openai', 'openai'),
         ('bs4', 'beautifulsoup4'),
         ('pandas', 'pandas'),
 def main():
     """Main entry point."""
+    # Get default values from environment variables
+    default_host = os.environ.get('MCP_HOST', '0.0.0.0')
+    default_port = int(os.environ.get('MCP_PORT', '7860'))
     parser = argparse.ArgumentParser(
         description="MaTableGPT MCP Server - Table Data Extraction from Materials Science Literature"
     )
     parser.add_argument(
         '--host',
+        default=default_host,
+        help=f'Host address (default: {default_host})'
     )
     parser.add_argument(
         '--port',
         type=int,
+        default=default_port,
+        help=f'Port number (default: {default_port})'
     )
     parser.add_argument(
         '--mode',
         choices=['stdio', 'sse'],
+        default='sse',
+        help='Run mode: stdio for standard I/O, sse for Server-Sent Events (default: sse)'
     )
     parser.add_argument(
         '--debug',
     if args.mode == 'sse':
         logger.info(f"Host: {args.host}")
         logger.info(f"Port: {args.port}")
+        logger.info(f"SSE Endpoint: http://{args.host}:{args.port}/sse")
     logger.info("=" * 60)
     # Import and run MCP service
             mcp.run()
         else:
             logger.info(f"Starting MCP server in SSE mode on {args.host}:{args.port}...")
+            logger.info("MCP SSE service is ready to accept connections!")
             mcp.run(transport='sse', host=args.host, port=args.port)
     except ImportError as e:
         logger.error(f"Failed to import MCP service: {e}")
+        logger.error("Make sure mcp_service.py is in the same directory")
         sys.exit(1)
     except Exception as e:
         logger.error(f"Error starting MCP server: {e}")
+        import traceback
+        traceback.print_exc()
         sys.exit(1)