| --- |
| title: Multi-LLM API Gateway |
| emoji: π‘οΈ |
| colorFrom: indigo |
| colorTo: red |
| sdk: docker |
| pinned: true |
| license: apache-2.0 |
| short_description: Secure Multi-LLM Gateway β (Streamable HTTP / SSE) |
| --- |
| |
| # Multi-LLM API Gateway |
|
|
| β or Universal MCP Hub (Sandboxed) |
| β or secure AI wrapper with dual interface: REST + MCP |
|
|
| aka: a clean, secure starting point for your own projects. |
| Pick the description that fits your use case. They're all correct. |
|
|
| > A production-grade **the-thing** that actually thinks about security. |
| > Built on [PyFundaments](PyFundaments.md) β running on **simpleCity**. |
|
|
| ``` |
| No key β no tool β no crash β no exposed secrets |
| ``` |
|
|
| > [!WARNING] |
| > Most MCP servers are prompts dressed up as servers. This one has a real architecture. |
|
|
| --- |
|
|
| > [!IMPORTANT] |
| > This project is under active development β always use the latest release from [Codey Lab](https://github.com/Codey-LAB/Multi-LLM-API-Gateway) *(more stable builds land here first)*. |
| > This repo ([DEV](https://github.com/VolkanSah/Multi-LLM-API-Gateway)) is where the chaos happens. π¬ A β on the repos will be cool π |
|
|
| --- |
|
|
| ## Why this exists |
|
|
| The AI ecosystem is full of servers with hardcoded keys, `os.environ` scattered everywhere, zero sandboxing. One misconfigured fork and your API keys are gone. |
|
|
| This is exactly the kind of negligence (and worse β outright fraud) that [Wall of Shames](https://github.com/Wall-of-Shames) documents: fake "AI tools" exploiting non-technical users β API wrappers dressed up as custom models, Telegram payment funnels, bought stars. If you build on open source, you should know this exists. |
|
|
| This hub is the antidote: |
|
|
| - **Structural sandboxing** β `app/*` can never touch `fundaments/` or `.env`. Not by convention. By design. |
| - **Guardian pattern** β `main.py` is the only process that reads secrets. It injects validated services as a dict. `app/*` never sees the raw environment. |
| - **Graceful degradation** β No key? Tool doesn't register. Server still starts. No crash, no error, no empty `None` floating around. |
| - **Single source of truth** β All tool/provider/model config lives in `app/.pyfun`. Adding a provider = edit one file. No code changes. |
|
|
| --- |
|
|
| ## Two Interfaces β One Server |
|
|
| This hub exposes **two completely independent interfaces** on the same hypercorn instance: |
|
|
| ``` |
| POST /api β REST interface β for custom clients, desktop apps, CMS plugins |
| GET+POST /mcp β MCP interface β for Claude Desktop, Cursor, Windsurf, any MCP client |
| GET / β Health check β uptime, status |
| ``` |
|
|
| They share the same tool registry, provider config, and fallback chain. Adding a tool once makes it available on both interfaces automatically. |
|
|
| ### REST API (`/api`) |
|
|
| Simple JSON POST β no protocol overhead, works with any HTTP client: |
|
|
| ```json |
| POST /api |
| {"tool": "llm_complete", "params": {"prompt": "Hello", "provider": "anthropic"}} |
| ``` |
|
|
| Used by: Desktop Client (`DESKTOP_CLIENT/hub.py`), WordPress plugin, any custom integration. |
|
|
| ### MCP Interface (`/mcp`) |
|
|
| Full MCP protocol β tool discovery, structured calls, streaming responses. |
|
|
| **Primary transport: Streamable HTTP** (MCP spec 2025-11-25) |
| **Fallback transport: SSE** (legacy, configurable via `.pyfun`) |
|
|
| Configured via `HUB_TRANSPORT` in `app/.pyfun [HUB]`: |
|
|
| ```ini |
| HUB_TRANSPORT = "streamable-http" # default β MCP spec 2025-11-25 |
| # HUB_TRANSPORT = "sse" # legacy fallback for older clients |
| ``` |
|
|
| Used by: Claude Desktop, Cursor, Windsurf, any MCP-compatible client. |
|
|
| --- |
|
|
| ## Architecture |
|
|
| ``` |
| main.py (Guardian) |
| β |
| β reads .env / HF Secrets |
| β initializes fundaments/* conditionally |
| β injects validated services as dict |
| β |
| ββββΊ app/app.py (Orchestrator, sandboxed) |
| β |
| β unpacks fundaments ONCE, at startup, never stores globally |
| β starts hypercorn (async ASGI) |
| β routes: GET / | POST /api | /mcp (transport-dependent) |
| β |
| βββ app/mcp.py β FastMCP + transport handler (Streamable HTTP / SSE) |
| βββ app/tools.py β Tool registry (key-gated) |
| βββ app/providers.py β LLM + Search execution + fallback chain |
| βββ app/models.py β Model limits, costs, capabilities |
| βββ app/config.py β .pyfun parser (single source of truth) |
| βββ app/db_sync.py β Internal SQLite IPC (app/* state only) |
| β fundaments/postgresql.py (Guardian-only) |
| ``` |
|
|
| **The sandbox is structural:** |
|
|
| ```python |
| # app/app.py β fundaments unpacked ONCE, NEVER stored globally |
| async def start_application(fundaments: Dict[str, Any]) -> None: |
| config_service = fundaments["config"] |
| db_service = fundaments["db"] # None if not configured |
| encryption_service = fundaments["encryption"] # None if keys missing |
| access_control_service = fundaments["access_control"] |
| ... |
| # From here: app/* reads its own config from app/.pyfun only. |
| # fundaments are never passed into other app/* modules. |
| ``` |
|
|
| `app/app.py` never calls `os.environ`. Never imports from `fundaments/`. Never reads `.env`. |
| This isn't documentation. It's enforced by the import structure. |
|
|
| ### Why Quart + hypercorn? |
|
|
| **Quart** is async Flask β fully `async/await` native. FastMCP's handlers are async; mixing sync Flask would require thread hacks. With Quart, `/mcp` hands off directly to FastMCP β no bridging, no blocking. |
|
|
| **hypercorn** is an ASGI server (vs. waitress/gunicorn which are WSGI). WSGI servers handle one request per thread β wrong for long-lived MCP connections. hypercorn handles both Streamable HTTP and SSE natively, and runs without extra config on HuggingFace Spaces. HTTP/2 support (`config.h2 = True`) is built-in β relevant for Streamable HTTP performance at scale. |
|
|
| The `/mcp` route in `app.py` remains the natural interception point regardless of transport β auth checks, rate limiting, and logging can all be added there before the request reaches FastMCP. |
|
|
| --- |
|
|
| ## Two Databases β One Architecture |
|
|
|
|
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β Guardian Layer (fundaments/*) β |
| β β |
| β postgresql.py β Cloud DB (e.g. Neon, Supabase) β |
| β asyncpg pool, SSL enforced β |
| β β |
| β user_handler.py β SQLite (users + sessions tables) β |
| β PBKDF2-SHA256 password hashing β |
| β Session validation incl. IP + UserAgent β |
| β Account lockout after 5 failed attempts β |
| β β |
| ββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββ |
| β inject as fundaments dict |
| βΌ |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β App Layer (app/*) β |
| β β |
| β db_sync.py β SQLite (hub_state + tool_cache tables) β |
| β aiosqlite (async, non-blocking) β |
| β NEVER touches users/sessions tables β |
| β Relocated to /tmp/ on HF Spaces auto β |
| β β |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| ``` |
|
|
| **Table ownership β hard rule:** |
|
|
| | Table | Owner | Access | |
| | :--- | :--- | :--- | |
| | `users` | `fundaments/user_handler.py` | Guardian only | |
| | `sessions` | `fundaments/user_handler.py` | Guardian only | |
| | `hub_state` | `app/db_sync.py` | app/* only | |
| | `tool_cache` | `app/db_sync.py` | app/* only | |
| | `hub_results` | PostgreSQL / Guardian | via `persist_result` tool | |
|
|
| --- |
|
|
| ## Tools |
|
|
| Tools register at startup β only if the required API key exists. No key, no tool. Server always starts. |
|
|
| | ENV Secret | Tool | Notes | |
| | :--- | :--- | :--- | |
| | `ANTHROPIC_API_KEY` | `llm_complete` | Claude Haiku / Sonnet / Opus | |
| | `GEMINI_API_KEY` | `llm_complete` | Gemini 2.0 / 2.5 / 3.x Flash & Pro | |
| | `OPENROUTER_API_KEY` | `llm_complete` | 100+ models via OpenRouter | |
| | `HF_TOKEN` | `llm_complete` | HuggingFace Inference API | |
| | `BRAVE_API_KEY` | `web_search` | Independent web index | |
| | `TAVILY_API_KEY` | `web_search` | AI-optimized search with synthesized answers | |
| | `DATABASE_URL` | `cloud DB` | e.g. Neon, Supabase | |
| | `DATABASE_URL` | `db_query`, `persist_result` | SQLite read + PostgreSQL write | |
| | *(always)* | `list_active_tools` | Shows key names only β never values | |
| | *(always)* | `health_check` | Status + uptime + active transport | |
| | *(always)* | `get_model_info` | Limits, costs, capabilities per model | |
|
|
| For all key names see [`app/.pyfun`](app/.pyfun). |
|
|
| **Tools are configured in `.pyfun` β including system prompts:** |
|
|
| ```ini |
| [TOOL.code_review] |
| active = "true" |
| description = "Review code for bugs, security issues and improvements" |
| provider_type = "llm" |
| default_provider = "anthropic" |
| timeout_sec = "60" |
| system_prompt = "You are an expert code reviewer. Analyze the given code for bugs, security issues, and improvements. Be specific and concise." |
| [TOOL.code_review_END] |
| ``` |
|
|
| Current built-in tools: `llm_complete`, `code_review`, `summarize`, `translate`, `web_search`, `db_query` |
| Future hooks (commented, ready): `image_gen`, `code_exec`, `shellmaster_2.0`, Discord, GitHub webhooks |
|
|
| --- |
|
|
| ## LLM Fallback Chain |
|
|
| All LLM providers share one `llm_complete` tool. If a provider fails, the hub walks the fallback chain from `.pyfun`: |
|
|
| ``` |
| e.g. anthropic β gemini β openrouter β huggingface |
| ``` |
|
|
| ```ini |
| [LLM_PROVIDER.anthropic] |
| fallback_to = "gemini" |
| [LLM_PROVIDER.anthropic_END] |
| |
| [LLM_PROVIDER.gemini] |
| fallback_to = "openrouter" |
| [LLM_PROVIDER.gemini_END] |
| ``` |
|
|
| Same pattern applies to search providers (`brave β tavily`). |
|
|
| --- |
|
|
| ## Quick Start |
|
|
| ### HuggingFace Spaces (recommended) |
|
|
| 1. Fork / duplicate this Space |
| 2. Go to **Settings β Variables and secrets** |
| 3. Add the API keys you have (any subset works) |
| 4. Space starts automatically β only tools with valid keys register |
|
|
| [β Live Demo Space](https://huggingface.co/spaces/codey-lab/Multi-LLM-API-Gateway) (no LLM keys set) |
|
|
| ### Local / Docker |
|
|
| ```bash |
| git clone https://github.com/VolkanSah/Multi-LLM-API-Gateway |
| cd Multi-LLM-API-Gateway |
| cp example-mcp___.env .env |
| # fill in your keys |
| pip install -r requirements.txt |
| python main.py |
| ``` |
|
|
| Minimum required ENV vars (everything else is optional): |
|
|
| ```env |
| PYFUNDAMENTS_DEBUG="" |
| LOG_LEVEL="INFO" |
| LOG_TO_TMP="" |
| ENABLE_PUBLIC_LOGS="true" |
| HF_TOKEN="" |
| HUB_SPACE_URL="" |
| ``` |
|
|
| Transport is configured in `app/.pyfun [HUB]` β not via ENV. |
|
|
| --- |
|
|
| ## Connect an MCP Client |
|
|
| ### Streamable HTTP (default β MCP spec 2025-11-25) |
|
|
| ```json |
| { |
| "mcpServers": { |
| "universal-mcp-hub": { |
| "url": "https://YOUR_USERNAME-universal-mcp-hub.hf.space/mcp" |
| } |
| } |
| } |
| ``` |
|
|
| ### Streamable HTTP β Private Space (with HF token) |
|
|
| ```json |
| { |
| "mcpServers": { |
| "universal-mcp-hub": { |
| "url": "https://YOUR_USERNAME-universal-mcp-hub.hf.space/mcp", |
| "headers": { |
| "Authorization": "Bearer hf_..." |
| } |
| } |
| } |
| } |
| ``` |
|
|
| ### SSE legacy fallback (set `HUB_TRANSPORT = "sse"` in `.pyfun`) |
| |
| ```json |
| { |
| "mcpServers": { |
| "universal-mcp-hub": { |
| "url": "https://YOUR_USERNAME-universal-mcp-hub.hf.space/mcp" |
| } |
| } |
| } |
| ``` |
| |
| > Same URL (`/mcp`) for both transports β the protocol is negotiated automatically. |
| > SSE fallback is for older clients that don't support Streamable HTTP yet. |
|
|
| --- |
|
|
| ## Desktop Client |
| ###### (experimental β ~80% AI generated) |
|
|
| A full PySide6 desktop client is included in `DESKTOP_CLIENT/hub.py`. |
| Communicates via the REST `/api` endpoint β no MCP protocol overhead. |
| Ideal for private or non-public Spaces. |
|
|
| ```bash |
| pip install PySide6 httpx |
| # optional file handling: |
| pip install Pillow PyPDF2 pandas openpyxl |
| python DESKTOP_CLIENT/hub.py |
| ``` |
|
|
| **Features:** |
| - Multi-chat with persistent history |
| - Tool / Provider / Model selector loaded live from your Hub |
| - File attachments: images, PDF, CSV, Excel, ZIP, source code |
| - Connect tab with health check + auto-load |
| - Settings: HF Token + Hub URL saved locally, never sent anywhere except your own Hub |
| - Full request/response log with timestamps |
| - Runs on Windows, Linux, macOS |
|
|
| [β Desktop Client docs](DESKTOP_CLIENT/README.md) |
|
|
| --- |
|
|
| ## CMS & Custom Clients |
|
|
| | Client | Interface used | Notes | |
| | :--- | :--- | :--- | |
| | [Desktop Client](DESKTOP_CLIENT/hub.py) | REST `/api` | PySide6, local | |
| | [WP AI Hub](https://github.com/VolkanSah/WP-AI-HUB/) | REST `/api` | WordPress plugin | |
| | TYPO3 (soon) | REST `/api` | β | |
| | Claude Desktop | MCP `/mcp` | Streamable HTTP | |
| | Cursor / Windsurf | MCP `/mcp` | Streamable HTTP | |
|
|
| --- |
|
|
| ## Configuration (.pyfun) |
|
|
| `app/.pyfun` is the single source of truth for all app behavior. Three tiers: |
|
|
| ``` |
| LAZY: [HUB] + one [LLM_PROVIDER.*] β works |
| NORMAL: + [SEARCH_PROVIDER.*] + [MODELS.*] β works better |
| PRODUCTIVE: + [TOOLS] + [HUB_LIMITS] + [DB_SYNC] β full power |
| ``` |
|
|
| Key settings in `[HUB]`: |
|
|
| ```ini |
| [HUB] |
| HUB_TRANSPORT = "streamable-http" # streamable-http | sse |
| HUB_STATELESS = "true" # true = HF Spaces safe, no session state |
| HUB_PORT = "7860" |
| [HUB_END] |
| ``` |
|
|
| Adding a new LLM provider β two steps: |
|
|
| ```ini |
| # 1. app/.pyfun |
| [LLM_PROVIDER.mistral] |
| active = "true" |
| base_url = "https://api.mistral.ai/v1" |
| env_key = "MISTRAL_API_KEY" |
| default_model = "mistral-large-latest" |
| models = "mistral-large-latest, mistral-small-latest" |
| fallback_to = "" |
| [LLM_PROVIDER.mistral_END] |
| ``` |
|
|
| ```python |
| # 2. app/providers.py β uncomment the dummy |
| _PROVIDER_CLASSES = { |
| ... |
| "mistral": MistralProvider, # β uncomment to activate |
| } |
| ``` |
|
|
| --- |
|
|
| ## Dependencies |
|
|
| ``` |
| # PyFundaments Core (always required) |
| asyncpg β async PostgreSQL pool (Guardian/cloud DB) |
| python-dotenv β .env loading |
| passlib β PBKDF2 password hashing in user_handler.py |
| cryptography β encryption layer in fundaments/ |
| |
| # MCP Hub |
| mcp β MCP protocol + FastMCP (Streamable HTTP + SSE) |
| httpx β async HTTP for all provider API calls |
| quart β async Flask (ASGI) β needed for MCP + hypercorn |
| hypercorn β ASGI server β Streamable HTTP + SSE, HF Spaces native |
| requests β sync HTTP for tool workers |
| |
| # Optional (uncomment in requirements.txt as needed) |
| # aiofiles β async file ops (ML pipelines, file uploads) |
| # discord.py β Discord bot integration (planned) |
| # PyNaCl β Discord signature verification |
| # psycopg2-binary β alternative PostgreSQL driver |
| ``` |
|
|
| > **Note:** The package is `mcp` (not `fastmcp`) β `FastMCP` is imported from `mcp.server.fastmcp`. |
| > Streamable HTTP support requires `mcp >= 1.6.0`. |
|
|
| --- |
|
|
| ## Security Design |
|
|
| - API keys live in HF Secrets / `.env` β never in `.pyfun`, never in code |
| - `list_active_tools` returns key **names** only β never values |
| - `db_query` is SELECT-only, enforced at application level (not just docs) |
| - `app/*` has zero import access to `fundaments/` internals |
| - Direct execution of `app/app.py` blocked by design β warning + null-fundaments fallback |
| - `fundaments/` initialized conditionally β missing services degrade gracefully, never crash |
| - Streamable HTTP uses standard Bearer headers β no token-in-URL (unlike SSE) |
|
|
| > PyFundaments is not perfect. But it's more secure than most of what runs in production today. |
|
|
| [β Full Security Policy](SECURITY.md) |
|
|
| --- |
|
|
| ## Foundation |
|
|
| Built on [PyFundaments](PyFundaments.md) β a security-first Python boilerplate: |
|
|
| - `config_handler.py` β env loading with validation |
| - `postgresql.py` β async DB pool (Guardian-only) |
| - `encryption.py` β key-based encryption layer |
| - `access_control.py` β role/permission management |
| - `user_handler.py` β user lifecycle management |
| - `security.py` β unified security manager composing the above |
|
|
| None accessible from `app/*`. Injected as a validated dict by `main.py`. |
|
|
| [β PyFundaments Function Overview](PyFundaments%20β%20Function%20Overview.md) |
| [β Module Docs](docs/app/) |
| [β Source Repo](https://github.com/VolkanSah/Multi-LLM-API-Gateway) |
|
|
| --- |
|
|
| ## Related Projects |
|
|
| - [Customs LLMs for free β Build Your Own LLM Service](https://github.com/VolkanSah/SmolLM2-customs/) |
| - [WP AI Hub (WordPress Client)](https://github.com/VolkanSah/WP-AI-HUB/) |
| - [ShellMaster (2023 precursor)](https://github.com/VolkanSah/ChatGPT-ShellMaster) |
|
|
| --- |
|
|
| ## History |
|
|
| [ShellMaster](https://github.com/VolkanSah/ChatGPT-ShellMaster) (2023, MIT) was the precursor β browser-accessible shell for ChatGPT with session memory, built before MCP was a concept. Universal MCP Hub is its natural evolution: same idea, proper architecture, dual interface. |
|
|
| --- |
|
|
| ## License |
|
|
| Dual-licensed: |
|
|
| - [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
| - [Ethical Security Operations License v1.1 (ESOL)](ESOL) β mandatory, non-severable |
|
|
| By using this software you agree to all ethical constraints defined in ESOL v1.1. |
|
|
| --- |
|
|
| *Architecture, security decisions, and PyFundaments by Volkan KΓΌcΓΌkbudak.* |
| *Built with Claude (Anthropic) as a typing assistant for docs (and the occasional bug).* |
|
|
| > crafted with passion β just wanted to understand how it works, don't actually need it, have a CLI π |