Spaces:

cacodex
/

n2r

Running

App Files Files Community

cacodex commited on 9 days ago

Commit

a86303a

verified ·

1 Parent(s): 78d28fd

Upload 12 files

Browse files

Files changed (3) hide show

README.md +61 -70
app/main.py +344 -0
static/style.css +3 -4

README.md CHANGED Viewed

@@ -1,100 +1,91 @@
----
-title: NVIDIA NIM 响应网关
 sdk: docker
 app_port: 7860
 pinned: false
 ---
-# NVIDIA NIM 响应网关
-这是一个面向公开使用的 NVIDIA NIM 到 OpenAI `/v1/responses` 兼容网关。
-它不在本地保存任何用户的 NIM API Key。用户调用本项目时，需要自己通过请求头携带 NIM Key，网关只负责协议转换、性能优化、聚合统计和官方模型目录展示。
-## 主要能力
-- 将 NVIDIA 官方 `POST /v1/chat/completions` 转换为 OpenAI 风格的 `POST /v1/responses`
-- 支持 tool calling / function calling
-- 支持 `function_call_output` 回灌
-- 支持 `previous_response_id` 对话续写
-- 对 `/v1/responses` 和 `/v1/responses/{response_id}` 使用用户自带的 NIM Key 做鉴权与上游转发
-- `/v1/models` 直接返回来自 NVIDIA 官方 `/v1/models` 的同步结果，保持 OpenAI 风格结构
-- `/` 为白色主题的模型健康度页面，按 10 分钟成功率矩阵展示 MODEL_LIST 中的模型
-- `/models` 为独立的白色主题官方模型列表页面，支持按提供商筛选模型
-- 模型提供商卡片为固定高度，避免模型较多时卡片过长
-- 使用共享 HTTP 连接池、SQLite WAL 和异步线程化落库来增强高并发场景下的转发性能
-## 用户如何调用
-对于 `POST /v1/responses`，请通过下面任意一种方式传入你自己的 NVIDIA NIM Key：
-- `Authorization: Bearer <你的 NIM Key>`
-- `X-API-Key: <你的 NIM Key>`
-网关不会把原始 Key 持久化到数据库中，只会在内存中用于当前请求，并对响应链路使用 Key 哈希做隔离。
-## 官方模型目录同步
-项目会定时从官方接口拉取模型列表：
-`https://integrate.api.nvidia.com/v1/models`
-同步后的模型目录同时用于：
-- `GET /v1/models`
-- `GET /models`
-- `GET /api/catalog`
-## 页面与接口
-页面：
-- `GET /`：模型健康度页面
-- `GET /models`：官方模型列表页面
-前端数据接口：
 - `GET /api/dashboard`
 - `GET /api/catalog`
-兼容接口：
-- `POST /v1/responses`
-- `GET /v1/responses/{response_id}`
-- `GET /v1/models`
-## 环境变量
-- `NVIDIA_API_BASE`：默认 `https://integrate.api.nvidia.com/v1`
-- `MODEL_LIST`：健康度页面监控模型列表，逗号分隔
-- `MODEL_SYNC_INTERVAL_MINUTES`：官方模型目录同步周期，默认 `30`
-- `PUBLIC_HISTORY_BUCKETS`：健康页展示最近多少个 10 分钟时间片，默认 `6`
-- `REQUEST_TIMEOUT_SECONDS`：上游请求超时，默认 `90`
-- `MAX_UPSTREAM_CONNECTIONS`：共享连接池最大连接数，默认 `512`
-- `MAX_KEEPALIVE_CONNECTIONS`：共享连接池最大 keep-alive 连接数，默认 `128`
-- `DATABASE_PATH`：默认 `./data.sqlite3`
-## 本地验证
-我已经完成两层本地联调：
-1. Mock 联调：
-   - 通过 `scripts/local_smoke_test.py` 验证了协议转换、官方模型同步、用户 Key 鉴权、`previous_response_id`、tool call、健康页数据接口、模型页数据接口和两个独立页面路由。
-2. 真实上游联调：
-   - 通过 `scripts/live_e2e_validation.py` 使用提供的测试 NIM Key，真实调用了 NVIDIA 官方模型目录和实际模型响应。
-   - 实测结果：`live_gateway_ok`，并成功通过 `z-ai/glm5` 得到 `OK`。
-## 部署到 Hugging Face Space
-1. 新建 Hugging Face Space，SDK 选择 `Docker`
-2. 将 `hf_space` 目录内的内容作为 Space 根目录上传
-3. 按需配置 `MODEL_LIST` 等环境变量
-4. 启动后即可直接公开使用
-## 参考资料
-- OpenAI Responses API: https://platform.openai.com/docs/guides/responses-vs-chat-completions
-- OpenAI Function Calling: https://platform.openai.com/docs/guides/function-calling
-- NVIDIA Build: https://build.nvidia.com/
-- NVIDIA NIM API 文档: https://docs.api.nvidia.com/

+---
+title: NVIDIA NIM ????
 sdk: docker
 app_port: 7860
 pinned: false
 ---
+# NVIDIA NIM ????
+??????????? NVIDIA NIM ????????? OpenAI Responses API ? Anthropic Claude Messages API?
+???????????????????? NIM Key????????????? Key????????????? Key ?????????
+## ????
+OpenAI ???
+- `POST /v1/responses`
+- `POST /responses`
+- `GET /v1/responses/{response_id}`
+- `GET /responses/{response_id}`
+- `GET /v1/models`
+- `GET /models`
+Anthropic Claude ???
+- `POST /v1/messages`
+- `POST /messages`
+## Claude ????
+- ?? Anthropic `Messages API` ? `messages`?`system`?`max_tokens`?`tools`?`tool_choice`
+- ?? Claude ?? `tool_use` ? `tool_result`
+- ?? Claude Code ??? Anthropic-defined tools ???????
+  - `bash_20250124`
+  - `text_editor_20250728`
+- ?? Claude ?? SSE ?????
+  - `message_start`
+  - `content_block_start`
+  - `content_block_delta`
+  - `content_block_stop`
+  - `message_delta`
+  - `message_stop`
+## ??
+- `GET /`????????
+- `GET /model_list`?????????
+## ??????
 - `GET /api/dashboard`
 - `GET /api/catalog`
+## ????
+- ???? NVIDIA ???????`https://integrate.api.nvidia.com/v1/models`
+- ????????????????
+- ???????????
+- ??????????????????
+- ?? HTTP ????SQLite WAL???????????????
+## ????
+- `NVIDIA_API_BASE`??? `https://integrate.api.nvidia.com/v1`
+- `MODEL_LIST`?????????????????
+- `APP_TIMEZONE`??? `Asia/Shanghai`
+- `MODEL_SYNC_INTERVAL_MINUTES`?????????????? `30`
+- `PUBLIC_HISTORY_BUCKETS`??????????? 10 ???????? `22`
+- `REQUEST_TIMEOUT_SECONDS`?????????? `90`
+- `MAX_UPSTREAM_CONNECTIONS`?????????????? `512`
+- `MAX_KEEPALIVE_CONNECTIONS`???????? keep-alive ?????? `128`
+- `DATABASE_PATH`??? `./data.sqlite3`
+## ????
+????????????
+1. Mock ???
+   - ?? `scripts/local_smoke_test.py` ??? OpenAI Responses?Claude Messages?`tool_use`/`tool_result`?????????? Key ?????????????
+2. ???????
+   - ?? `scripts/live_e2e_validation.py` ???? NIM Key ????? NVIDIA ??????????????
+   - ?????`live_gateway_ok`???? `z-ai/glm5` ?? `OK`?
+## ??? Hugging Face Space
+1. ?? Hugging Face Space?SDK ?? `Docker`
+2. ? `hf_space` ???????? Space ?????
+3. ????????
+4. ???????????

app/main.py CHANGED Viewed

@@ -619,6 +619,302 @@ def chat_completion_to_response(body: dict[str, Any], upstream_json: dict[str, A
     }
 def store_success_record(api_key_hash: str, model_id: str, request_body: dict[str, Any], input_items: list[dict[str, Any]], response_payload: dict[str, Any], latency_ms: float) -> None:
     conn = get_db_connection()
     try:
@@ -947,6 +1243,54 @@ async def get_response(response_id: str, api_key: str = Depends(extract_user_api
     return await fetch_response_record(response_id, api_key)
 @app.post("/v1/responses")
 async def create_response_v1(request: Request, api_key: str = Depends(extract_user_api_key)):
     return await create_response_impl(request, api_key)

     }
+def anthropic_text_from_blocks(blocks: list[dict[str, Any]] | str | None) -> str:
+    if blocks is None:
+        return ""
+    if isinstance(blocks, str):
+        return blocks
+    if not isinstance(blocks, list):
+        return json_dumps(blocks)
+    parts: list[str] = []
+    for block in blocks:
+        if not isinstance(block, dict):
+            parts.append(str(block))
+            continue
+        if block.get("type") == "text":
+            text_value = block.get("text")
+            if text_value:
+                parts.append(str(text_value))
+    return "\n".join(parts).strip()
+def anthropic_tool_result_to_text(content: Any) -> str:
+    if isinstance(content, str):
+        return content
+    if isinstance(content, list):
+        text_value = anthropic_text_from_blocks(content)
+        return text_value if text_value else json_dumps(content)
+    if isinstance(content, dict):
+        if content.get("type") == "text":
+            return str(content.get("text", ""))
+        return json_dumps(content)
+    if content is None:
+        return ""
+    return str(content)
+def anthropic_system_to_text(system: Any) -> str:
+    if isinstance(system, str):
+        return system
+    if isinstance(system, list):
+        return anthropic_text_from_blocks(system)
+    return ""
+def anthropic_defined_tool_to_chat_tool(tool: dict[str, Any]) -> dict[str, Any]:
+    tool_type = str(tool.get("type") or "")
+    name = tool.get("name") or ("bash" if tool_type.startswith("bash_") else "str_replace_based_edit_tool")
+    if tool_type.startswith("bash_"):
+        return {
+            "type": "function",
+            "function": {
+                "name": name,
+                "description": "Run shell commands in a persistent bash session. Use command for execution, and restart=true to reset the shell session.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "command": {"type": "string", "description": "The bash command to run."},
+                        "restart": {"type": "boolean", "description": "Set to true to restart the bash session."},
+                    },
+                    "additionalProperties": False,
+                },
+            },
+        }
+    if tool_type.startswith("text_editor_"):
+        return {
+            "type": "function",
+            "function": {
+                "name": name,
+                "description": "View and edit text files. Supported commands are view, str_replace, create, and insert.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "command": {
+                            "type": "string",
+                            "enum": ["view", "str_replace", "create", "insert"],
+                            "description": "The text editor command to execute.",
+                        },
+                        "path": {"type": "string", "description": "Path to the target file or directory."},
+                        "view_range": {
+                            "type": "array",
+                            "items": {"type": "integer"},
+                            "minItems": 2,
+                            "maxItems": 2,
+                            "description": "Optional line range when using view.",
+                        },
+                        "old_str": {"type": "string", "description": "Text to replace when using str_replace."},
+                        "new_str": {"type": "string", "description": "Replacement text when using str_replace."},
+                        "file_text": {"type": "string", "description": "Content to write when using create."},
+                        "insert_line": {"type": "integer", "description": "Line index after which to insert text."},
+                        "insert_text": {"type": "string", "description": "Text to insert when using insert."},
+                    },
+                    "required": ["command", "path"],
+                    "additionalProperties": False,
+                },
+            },
+        }
+    raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=f"???? Claude ???? {tool_type}?")
+def anthropic_tools_to_chat_tools(tools: list[dict[str, Any]] | None) -> list[dict[str, Any]]:
+    normalized: list[dict[str, Any]] = []
+    for tool in tools or []:
+        if not isinstance(tool, dict):
+            continue
+        tool_type = tool.get("type")
+        if isinstance(tool_type, str) and (tool_type.startswith("bash_") or tool_type.startswith("text_editor_")):
+            normalized.append(anthropic_defined_tool_to_chat_tool(tool))
+            continue
+        name = tool.get("name")
+        if not name:
+            continue
+        normalized.append(
+            {
+                "type": "function",
+                "function": {
+                    "name": name,
+                    "description": tool.get("description"),
+                    "parameters": tool.get("input_schema") or {"type": "object", "properties": {}},
+                },
+            }
+        )
+    return normalized
+def anthropic_tool_choice_to_chat(tool_choice: dict[str, Any] | None) -> Any:
+    if not tool_choice:
+        return None
+    choice_type = tool_choice.get("type")
+    if choice_type == "auto":
+        return "auto"
+    if choice_type == "any":
+        return "required"
+    if choice_type == "none":
+        return "none"
+    if choice_type == "tool":
+        return {"type": "function", "function": {"name": tool_choice.get("name")}}
+    return None
+def anthropic_messages_to_chat_messages(body: dict[str, Any]) -> list[dict[str, Any]]:
+    messages: list[dict[str, Any]] = []
+    system_text = anthropic_system_to_text(body.get("system"))
+    if system_text:
+        messages.append({"role": "system", "content": system_text})
+    for message in body.get("messages") or []:
+        role = message.get("role", "user")
+        content = message.get("content")
+        if isinstance(content, str):
+            messages.append({"role": role, "content": content})
+            continue
+        if not isinstance(content, list):
+            continue
+        text_parts: list[str] = []
+        tool_calls: list[dict[str, Any]] = []
+        tool_results: list[dict[str, Any]] = []
+        for block in content:
+            if not isinstance(block, dict):
+                continue
+            block_type = block.get("type")
+            if block_type == "text":
+                text_value = block.get("text")
+                if text_value:
+                    text_parts.append(str(text_value))
+            elif block_type == "tool_use" and role == "assistant":
+                tool_input = block.get("input") if isinstance(block.get("input"), dict) else {}
+                tool_calls.append(
+                    {
+                        "id": block.get("id") or f"toolu_{uuid.uuid4().hex[:24]}",
+                        "type": "function",
+                        "function": {
+                            "name": block.get("name"),
+                            "arguments": json_dumps(tool_input),
+                        },
+                    }
+                )
+            elif block_type == "tool_result" and role == "user":
+                result_text = anthropic_tool_result_to_text(block.get("content"))
+                if block.get("is_error"):
+                    result_text = f"[tool_error]\n{result_text}"
+                tool_results.append(
+                    {
+                        "role": "tool",
+                        "tool_call_id": block.get("tool_use_id"),
+                        "content": result_text,
+                    }
+                )
+        if role == "assistant":
+            if tool_calls:
+                messages.append({"role": "assistant", "content": "\n".join(text_parts), "tool_calls": tool_calls})
+            elif text_parts:
+                messages.append({"role": "assistant", "content": "\n".join(text_parts)})
+        elif role == "user":
+            messages.extend(tool_results)
+            if text_parts:
+                messages.append({"role": "user", "content": "\n".join(text_parts)})
+    return messages
+def build_chat_payload_from_anthropic(body: dict[str, Any]) -> dict[str, Any]:
+    payload: dict[str, Any] = {
+        "model": body.get("model"),
+        "messages": anthropic_messages_to_chat_messages(body),
+        "max_tokens": body.get("max_tokens"),
+    }
+    if body.get("temperature") is not None:
+        payload["temperature"] = body.get("temperature")
+    if body.get("top_p") is not None:
+        payload["top_p"] = body.get("top_p")
+    if body.get("stop_sequences"):
+        payload["stop"] = body.get("stop_sequences")
+    tools = anthropic_tools_to_chat_tools(body.get("tools"))
+    if tools:
+        payload["tools"] = tools
+        tool_choice = anthropic_tool_choice_to_chat(body.get("tool_choice"))
+        if tool_choice is not None:
+            payload["tool_choice"] = tool_choice
+    return payload
+def anthropic_stop_reason(finish_reason: str | None, has_tool_use: bool) -> str:
+    if has_tool_use or finish_reason in {"tool_calls", "tool_call"}:
+        return "tool_use"
+    if finish_reason == "length":
+        return "max_tokens"
+    if finish_reason == "stop_sequence":
+        return "stop_sequence"
+    return "end_turn"
+def chat_completion_to_anthropic_message(body: dict[str, Any], upstream_json: dict[str, Any]) -> dict[str, Any]:
+    upstream_message, finish_reason = extract_upstream_message(upstream_json)
+    assistant_text, tool_calls = extract_text_and_tool_calls(upstream_message)
+    content_blocks: list[dict[str, Any]] = []
+    if assistant_text:
+        content_blocks.append({"type": "text", "text": assistant_text})
+    for tool_call in tool_calls:
+        arguments = tool_call.get("arguments") or "{}"
+        try:
+            parsed_input = json.loads(arguments)
+        except Exception:
+            parsed_input = {"raw": arguments}
+        content_blocks.append(
+            {
+                "type": "tool_use",
+                "id": tool_call["id"],
+                "name": tool_call.get("name"),
+                "input": parsed_input,
+            }
+        )
+    usage = upstream_json.get("usage") or {}
+    return {
+        "id": f"msg_{uuid.uuid4().hex}",
+        "type": "message",
+        "role": "assistant",
+        "model": body.get("model"),
+        "content": content_blocks,
+        "stop_reason": anthropic_stop_reason(finish_reason, bool(tool_calls)),
+        "stop_sequence": None,
+        "usage": {
+            "input_tokens": usage.get("prompt_tokens"),
+            "output_tokens": usage.get("completion_tokens"),
+        },
+    }
+def anthropic_message_start_payload(message: dict[str, Any]) -> dict[str, Any]:
+    usage = message.get("usage") or {}
+    return {
+        "type": "message_start",
+        "message": {
+            "id": message.get("id"),
+            "type": "message",
+            "role": "assistant",
+            "model": message.get("model"),
+            "content": [],
+            "stop_reason": None,
+            "stop_sequence": None,
+            "usage": {
+                "input_tokens": usage.get("input_tokens"),
+                "output_tokens": 0,
+            },
+        },
+    }
+def anthropic_message_delta_payload(message: dict[str, Any]) -> dict[str, Any]:
+    return {
+        "type": "message_delta",
+        "delta": {
+            "stop_reason": message.get("stop_reason"),
+            "stop_sequence": message.get("stop_sequence"),
+        },
+        "usage": message.get("usage") or {},
+    }
 def store_success_record(api_key_hash: str, model_id: str, request_body: dict[str, Any], input_items: list[dict[str, Any]], response_payload: dict[str, Any], latency_ms: float) -> None:
     conn = get_db_connection()
     try:
     return await fetch_response_record(response_id, api_key)
+@app.post("/v1/messages")
+async def create_messages_v1(request: Request, api_key: str = Depends(extract_user_api_key)):
+    return await create_messages_impl(request, api_key)
+@app.post("/messages")
+async def create_messages(request: Request, api_key: str = Depends(extract_user_api_key)):
+    return await create_messages_impl(request, api_key)
+async def create_messages_impl(request: Request, api_key: str):
+    body = await request.json()
+    if not isinstance(body, dict):
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="?????? JSON ???")
+    if not body.get("model"):
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="?? model ???")
+    if body.get("max_tokens") is None:
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="?? max_tokens ???")
+    if not body.get("messages"):
+        raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="?? messages ???")
+    chat_payload = build_chat_payload_from_anthropic(body)
+    try:
+        upstream_json, _latency_ms = await post_nvidia_chat_completion(api_key, chat_payload)
+    except HTTPException as exc:
+        await run_db(store_failure_metric, body.get("model"), exc.detail)
+        raise exc
+    anthropic_message = chat_completion_to_anthropic_message(body, upstream_json)
+    if body.get("stream"):
+        async def event_stream() -> Any:
+            yield f"event: message_start\ndata: {json_dumps(anthropic_message_start_payload(anthropic_message))}\n\n"
+            for index, block in enumerate(anthropic_message.get("content") or []):
+                yield f"event: content_block_start\ndata: {json_dumps({'type': 'content_block_start', 'index': index, 'content_block': block})}\n\n"
+                if block.get("type") == "text":
+                    yield f"event: content_block_delta\ndata: {json_dumps({'type': 'content_block_delta', 'index': index, 'delta': {'type': 'text_delta', 'text': block.get('text', '')}})}\n\n"
+                elif block.get("type") == "tool_use":
+                    partial_json = json_dumps(block.get("input") or {})
+                    yield f"event: content_block_delta\ndata: {json_dumps({'type': 'content_block_delta', 'index': index, 'delta': {'type': 'input_json_delta', 'partial_json': partial_json}})}\n\n"
+                yield f"event: content_block_stop\ndata: {json_dumps({'type': 'content_block_stop', 'index': index})}\n\n"
+            yield f"event: message_delta\ndata: {json_dumps(anthropic_message_delta_payload(anthropic_message))}\n\n"
+            yield "event: message_stop\ndata: {\"type\":\"message_stop\"}\n\n"
+        return StreamingResponse(event_stream(), media_type="text/event-stream")
+    return anthropic_message
 @app.post("/v1/responses")
 async def create_response_v1(request: Request, api_key: str = Depends(extract_user_api_key)):
     return await create_response_impl(request, api_key)

static/style.css CHANGED Viewed

@@ -468,15 +468,14 @@ body {
 .provider-model-chip {
   display: flex;
   align-items: center;
-  min-height: 46px;
-  height: 46px;
-  padding: 0 14px;
   border-radius: 16px;
   background: var(--surface-soft);
   border: 1px solid var(--line);
   font-family: var(--font-display);
   font-size: 13px;
-  line-height: 1;
   color: var(--text);
   white-space: nowrap;
   overflow: hidden;

 .provider-model-chip {
   display: flex;
   align-items: center;
+  min-height: 48px;
+  padding: 11px 14px;
   border-radius: 16px;
   background: var(--surface-soft);
   border: 1px solid var(--line);
   font-family: var(--font-display);
   font-size: 13px;
+  line-height: 1.35;
   color: var(--text);
   white-space: nowrap;
   overflow: hidden;