cacodex commited on
Commit
a86303a
·
verified ·
1 Parent(s): 78d28fd

Upload 12 files

Browse files
Files changed (3) hide show
  1. README.md +61 -70
  2. app/main.py +344 -0
  3. static/style.css +3 -4
README.md CHANGED
@@ -1,100 +1,91 @@
1
- ---
2
- title: NVIDIA NIM 响应网关
3
  sdk: docker
4
  app_port: 7860
5
  pinned: false
6
  ---
7
 
8
- # NVIDIA NIM 响应网关
9
-
10
- 这是一个面向公开使用的 NVIDIA NIM 到 OpenAI `/v1/responses` 兼容网关。
11
-
12
- 它不在本地保存任何用户的 NIM API Key。用户调用本项目时,需要自己通过请求头携带 NIM Key,网关只负责协议转换、性能优化、聚合统计和官方模型目录展示。
13
-
14
- ## 主要能力
15
-
16
- - 将 NVIDIA 官方 `POST /v1/chat/completions` 转换为 OpenAI 风格的 `POST /v1/responses`
17
- - 支持 tool calling / function calling
18
- - 支持 `function_call_output` 回灌
19
- - 支持 `previous_response_id` 对话续写
20
- - 对 `/v1/responses` 和 `/v1/responses/{response_id}` 使用用户自带的 NIM Key 做鉴权与上游转发
21
- - `/v1/models` 直接返回来自 NVIDIA 官方 `/v1/models` 的同步结果,保持 OpenAI 风格结构
22
- - `/` 为白色主题的模型健康度页面,按 10 分钟成功率矩阵展示 MODEL_LIST 中的模型
23
- - `/models` 为独立的白色主题官方模型列表页面,支持按提供商筛选模型
24
- - 模型提供商卡片为固定高度,避免模型较多时卡片过长
25
- - 使用共享 HTTP 连接池、SQLite WAL 和异步线程化落库来增强高并发场景下的转发性能
26
 
27
- ## 用户如何调用
28
 
29
- 对于 `POST /v1/responses`,请通过下面任意一种方式传入你自己的 NVIDIA NIM Key
30
 
31
- - `Authorization: Bearer <你的 NIM Key>`
32
- - `X-API-Key: <你的 NIM Key>`
33
 
34
- 网关不会把原始 Key 持久化到数据库中,只会在内存中用于当前请求,并对响应链路使用 Key 哈希做隔离。
35
 
36
- ## 官方模型目录同步
37
-
38
- 项目会定时从官方接口拉取模型列表:
 
 
 
39
 
40
- `https://integrate.api.nvidia.com/v1/models`
41
 
42
- 同步后的模型目录同时用于:
 
43
 
44
- - `GET /v1/models`
45
- - `GET /models`
46
- - `GET /api/catalog`
47
 
48
- ## 页面与接口
 
 
 
 
 
 
 
 
 
 
 
49
 
50
- 页面:
51
 
52
- - `GET /`:模型健康度页面
53
- - `GET /models`:官方模型列表页面
54
 
55
- 前端数据接口:
56
 
57
  - `GET /api/dashboard`
58
  - `GET /api/catalog`
59
 
60
- 兼容接口:
61
-
62
- - `POST /v1/responses`
63
- - `GET /v1/responses/{response_id}`
64
- - `GET /v1/models`
65
-
66
- ## 环境变量
67
 
68
- - `NVIDIA_API_BASE`:默认 `https://integrate.api.nvidia.com/v1`
69
- - `MODEL_LIST`:健康度页面监控模型列表,逗号分隔
70
- - `MODEL_SYNC_INTERVAL_MINUTES`:官方模型目录同步周期,默认 `30`
71
- - `PUBLIC_HISTORY_BUCKETS`:健康页展示最近多少个 10 分钟时间片,默认 `6`
72
- - `REQUEST_TIMEOUT_SECONDS`:上游请求超时,默认 `90`
73
- - `MAX_UPSTREAM_CONNECTIONS`:共享连接池最大连接数,默认 `512`
74
- - `MAX_KEEPALIVE_CONNECTIONS`:共享连接池最大 keep-alive 连接数,默认 `128`
75
- - `DATABASE_PATH`:默认 `./data.sqlite3`
76
 
77
- ## 本地验证
78
 
79
- 我已经完成两层本地联调:
 
 
 
 
 
 
 
 
80
 
81
- 1. Mock 联调:
82
- - 通过 `scripts/local_smoke_test.py` 验证了协议转换、官方模型同步、用户 Key 鉴权、`previous_response_id`、tool call、健康页数据接口、模型页数据接口和两个独立页面路由。
83
 
84
- 2. 真实上游联调:
85
- - 通过 `scripts/live_e2e_validation.py` 使用提供的测试 NIM Key,真实调用了 NVIDIA 官方模型目录和实际模型响应。
86
- - 实测结果:`live_gateway_ok`,并成功通过 `z-ai/glm5` 得到 `OK`。
87
 
88
- ## 部署到 Hugging Face Space
 
89
 
90
- 1. 新建 Hugging Face Space,SDK 选择 `Docker`
91
- 2. `hf_space` 目录内的内容作为 Space 根目录上传
92
- 3. 按需配置 `MODEL_LIST` 等环境变量
93
- 4. 启动后即可直接公开使用
94
 
95
- ## 参考资料
96
 
97
- - OpenAI Responses API: https://platform.openai.com/docs/guides/responses-vs-chat-completions
98
- - OpenAI Function Calling: https://platform.openai.com/docs/guides/function-calling
99
- - NVIDIA Build: https://build.nvidia.com/
100
- - NVIDIA NIM API 文档: https://docs.api.nvidia.com/
 
1
+ ---
2
+ title: NVIDIA NIM ????
3
  sdk: docker
4
  app_port: 7860
5
  pinned: false
6
  ---
7
 
8
+ # NVIDIA NIM ????
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
+ ??????????? NVIDIA NIM ????????? OpenAI Responses API ? Anthropic Claude Messages API?
11
 
12
+ ???????????????????? NIM Key????????????? Key????????????? Key ?????????
13
 
14
+ ## ????
 
15
 
16
+ OpenAI ???
17
 
18
+ - `POST /v1/responses`
19
+ - `POST /responses`
20
+ - `GET /v1/responses/{response_id}`
21
+ - `GET /responses/{response_id}`
22
+ - `GET /v1/models`
23
+ - `GET /models`
24
 
25
+ Anthropic Claude ???
26
 
27
+ - `POST /v1/messages`
28
+ - `POST /messages`
29
 
30
+ ## Claude ????
 
 
31
 
32
+ - ?? Anthropic `Messages API` ? `messages`?`system`?`max_tokens`?`tools`?`tool_choice`
33
+ - ?? Claude ?? `tool_use` ? `tool_result`
34
+ - ?? Claude Code ??? Anthropic-defined tools ???????
35
+ - `bash_20250124`
36
+ - `text_editor_20250728`
37
+ - ?? Claude ?? SSE ?????
38
+ - `message_start`
39
+ - `content_block_start`
40
+ - `content_block_delta`
41
+ - `content_block_stop`
42
+ - `message_delta`
43
+ - `message_stop`
44
 
45
+ ## ??
46
 
47
+ - `GET /`????????
48
+ - `GET /model_list`?????????
49
 
50
+ ## ??????
51
 
52
  - `GET /api/dashboard`
53
  - `GET /api/catalog`
54
 
55
+ ## ????
 
 
 
 
 
 
56
 
57
+ - ???? NVIDIA ???????`https://integrate.api.nvidia.com/v1/models`
58
+ - ????????????????
59
+ - ???????????
60
+ - ??????????????????
61
+ - ?? HTTP ????SQLite WAL???????????????
 
 
 
62
 
63
+ ## ????
64
 
65
+ - `NVIDIA_API_BASE`??? `https://integrate.api.nvidia.com/v1`
66
+ - `MODEL_LIST`?????????????????
67
+ - `APP_TIMEZONE`??? `Asia/Shanghai`
68
+ - `MODEL_SYNC_INTERVAL_MINUTES`?????????????? `30`
69
+ - `PUBLIC_HISTORY_BUCKETS`??????????? 10 ???????? `22`
70
+ - `REQUEST_TIMEOUT_SECONDS`?????????? `90`
71
+ - `MAX_UPSTREAM_CONNECTIONS`?????????????? `512`
72
+ - `MAX_KEEPALIVE_CONNECTIONS`???????? keep-alive ?????? `128`
73
+ - `DATABASE_PATH`??? `./data.sqlite3`
74
 
75
+ ## ????
 
76
 
77
+ ????????????
 
 
78
 
79
+ 1. Mock ???
80
+ - ?? `scripts/local_smoke_test.py` ??? OpenAI Responses?Claude Messages?`tool_use`/`tool_result`?????????? Key ?????????????
81
 
82
+ 2. ???????
83
+ - ?? `scripts/live_e2e_validation.py` ???? NIM Key ????? NVIDIA ??????????????
84
+ - ?????`live_gateway_ok`???? `z-ai/glm5` ?? `OK`?
 
85
 
86
+ ## ??? Hugging Face Space
87
 
88
+ 1. ?? Hugging Face Space?SDK ?? `Docker`
89
+ 2. ? `hf_space` ???????? Space ?????
90
+ 3. ????????
91
+ 4. ???????????
app/main.py CHANGED
@@ -619,6 +619,302 @@ def chat_completion_to_response(body: dict[str, Any], upstream_json: dict[str, A
619
  }
620
 
621
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
622
  def store_success_record(api_key_hash: str, model_id: str, request_body: dict[str, Any], input_items: list[dict[str, Any]], response_payload: dict[str, Any], latency_ms: float) -> None:
623
  conn = get_db_connection()
624
  try:
@@ -947,6 +1243,54 @@ async def get_response(response_id: str, api_key: str = Depends(extract_user_api
947
  return await fetch_response_record(response_id, api_key)
948
 
949
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
950
  @app.post("/v1/responses")
951
  async def create_response_v1(request: Request, api_key: str = Depends(extract_user_api_key)):
952
  return await create_response_impl(request, api_key)
 
619
  }
620
 
621
 
622
+ def anthropic_text_from_blocks(blocks: list[dict[str, Any]] | str | None) -> str:
623
+ if blocks is None:
624
+ return ""
625
+ if isinstance(blocks, str):
626
+ return blocks
627
+ if not isinstance(blocks, list):
628
+ return json_dumps(blocks)
629
+ parts: list[str] = []
630
+ for block in blocks:
631
+ if not isinstance(block, dict):
632
+ parts.append(str(block))
633
+ continue
634
+ if block.get("type") == "text":
635
+ text_value = block.get("text")
636
+ if text_value:
637
+ parts.append(str(text_value))
638
+ return "\n".join(parts).strip()
639
+
640
+
641
+ def anthropic_tool_result_to_text(content: Any) -> str:
642
+ if isinstance(content, str):
643
+ return content
644
+ if isinstance(content, list):
645
+ text_value = anthropic_text_from_blocks(content)
646
+ return text_value if text_value else json_dumps(content)
647
+ if isinstance(content, dict):
648
+ if content.get("type") == "text":
649
+ return str(content.get("text", ""))
650
+ return json_dumps(content)
651
+ if content is None:
652
+ return ""
653
+ return str(content)
654
+
655
+
656
+ def anthropic_system_to_text(system: Any) -> str:
657
+ if isinstance(system, str):
658
+ return system
659
+ if isinstance(system, list):
660
+ return anthropic_text_from_blocks(system)
661
+ return ""
662
+
663
+
664
+ def anthropic_defined_tool_to_chat_tool(tool: dict[str, Any]) -> dict[str, Any]:
665
+ tool_type = str(tool.get("type") or "")
666
+ name = tool.get("name") or ("bash" if tool_type.startswith("bash_") else "str_replace_based_edit_tool")
667
+ if tool_type.startswith("bash_"):
668
+ return {
669
+ "type": "function",
670
+ "function": {
671
+ "name": name,
672
+ "description": "Run shell commands in a persistent bash session. Use command for execution, and restart=true to reset the shell session.",
673
+ "parameters": {
674
+ "type": "object",
675
+ "properties": {
676
+ "command": {"type": "string", "description": "The bash command to run."},
677
+ "restart": {"type": "boolean", "description": "Set to true to restart the bash session."},
678
+ },
679
+ "additionalProperties": False,
680
+ },
681
+ },
682
+ }
683
+ if tool_type.startswith("text_editor_"):
684
+ return {
685
+ "type": "function",
686
+ "function": {
687
+ "name": name,
688
+ "description": "View and edit text files. Supported commands are view, str_replace, create, and insert.",
689
+ "parameters": {
690
+ "type": "object",
691
+ "properties": {
692
+ "command": {
693
+ "type": "string",
694
+ "enum": ["view", "str_replace", "create", "insert"],
695
+ "description": "The text editor command to execute.",
696
+ },
697
+ "path": {"type": "string", "description": "Path to the target file or directory."},
698
+ "view_range": {
699
+ "type": "array",
700
+ "items": {"type": "integer"},
701
+ "minItems": 2,
702
+ "maxItems": 2,
703
+ "description": "Optional line range when using view.",
704
+ },
705
+ "old_str": {"type": "string", "description": "Text to replace when using str_replace."},
706
+ "new_str": {"type": "string", "description": "Replacement text when using str_replace."},
707
+ "file_text": {"type": "string", "description": "Content to write when using create."},
708
+ "insert_line": {"type": "integer", "description": "Line index after which to insert text."},
709
+ "insert_text": {"type": "string", "description": "Text to insert when using insert."},
710
+ },
711
+ "required": ["command", "path"],
712
+ "additionalProperties": False,
713
+ },
714
+ },
715
+ }
716
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail=f"???? Claude ???? {tool_type}?")
717
+
718
+
719
+ def anthropic_tools_to_chat_tools(tools: list[dict[str, Any]] | None) -> list[dict[str, Any]]:
720
+ normalized: list[dict[str, Any]] = []
721
+ for tool in tools or []:
722
+ if not isinstance(tool, dict):
723
+ continue
724
+ tool_type = tool.get("type")
725
+ if isinstance(tool_type, str) and (tool_type.startswith("bash_") or tool_type.startswith("text_editor_")):
726
+ normalized.append(anthropic_defined_tool_to_chat_tool(tool))
727
+ continue
728
+ name = tool.get("name")
729
+ if not name:
730
+ continue
731
+ normalized.append(
732
+ {
733
+ "type": "function",
734
+ "function": {
735
+ "name": name,
736
+ "description": tool.get("description"),
737
+ "parameters": tool.get("input_schema") or {"type": "object", "properties": {}},
738
+ },
739
+ }
740
+ )
741
+ return normalized
742
+
743
+
744
+ def anthropic_tool_choice_to_chat(tool_choice: dict[str, Any] | None) -> Any:
745
+ if not tool_choice:
746
+ return None
747
+ choice_type = tool_choice.get("type")
748
+ if choice_type == "auto":
749
+ return "auto"
750
+ if choice_type == "any":
751
+ return "required"
752
+ if choice_type == "none":
753
+ return "none"
754
+ if choice_type == "tool":
755
+ return {"type": "function", "function": {"name": tool_choice.get("name")}}
756
+ return None
757
+
758
+
759
+ def anthropic_messages_to_chat_messages(body: dict[str, Any]) -> list[dict[str, Any]]:
760
+ messages: list[dict[str, Any]] = []
761
+ system_text = anthropic_system_to_text(body.get("system"))
762
+ if system_text:
763
+ messages.append({"role": "system", "content": system_text})
764
+
765
+ for message in body.get("messages") or []:
766
+ role = message.get("role", "user")
767
+ content = message.get("content")
768
+ if isinstance(content, str):
769
+ messages.append({"role": role, "content": content})
770
+ continue
771
+ if not isinstance(content, list):
772
+ continue
773
+
774
+ text_parts: list[str] = []
775
+ tool_calls: list[dict[str, Any]] = []
776
+ tool_results: list[dict[str, Any]] = []
777
+ for block in content:
778
+ if not isinstance(block, dict):
779
+ continue
780
+ block_type = block.get("type")
781
+ if block_type == "text":
782
+ text_value = block.get("text")
783
+ if text_value:
784
+ text_parts.append(str(text_value))
785
+ elif block_type == "tool_use" and role == "assistant":
786
+ tool_input = block.get("input") if isinstance(block.get("input"), dict) else {}
787
+ tool_calls.append(
788
+ {
789
+ "id": block.get("id") or f"toolu_{uuid.uuid4().hex[:24]}",
790
+ "type": "function",
791
+ "function": {
792
+ "name": block.get("name"),
793
+ "arguments": json_dumps(tool_input),
794
+ },
795
+ }
796
+ )
797
+ elif block_type == "tool_result" and role == "user":
798
+ result_text = anthropic_tool_result_to_text(block.get("content"))
799
+ if block.get("is_error"):
800
+ result_text = f"[tool_error]\n{result_text}"
801
+ tool_results.append(
802
+ {
803
+ "role": "tool",
804
+ "tool_call_id": block.get("tool_use_id"),
805
+ "content": result_text,
806
+ }
807
+ )
808
+ if role == "assistant":
809
+ if tool_calls:
810
+ messages.append({"role": "assistant", "content": "\n".join(text_parts), "tool_calls": tool_calls})
811
+ elif text_parts:
812
+ messages.append({"role": "assistant", "content": "\n".join(text_parts)})
813
+ elif role == "user":
814
+ messages.extend(tool_results)
815
+ if text_parts:
816
+ messages.append({"role": "user", "content": "\n".join(text_parts)})
817
+ return messages
818
+
819
+
820
+ def build_chat_payload_from_anthropic(body: dict[str, Any]) -> dict[str, Any]:
821
+ payload: dict[str, Any] = {
822
+ "model": body.get("model"),
823
+ "messages": anthropic_messages_to_chat_messages(body),
824
+ "max_tokens": body.get("max_tokens"),
825
+ }
826
+ if body.get("temperature") is not None:
827
+ payload["temperature"] = body.get("temperature")
828
+ if body.get("top_p") is not None:
829
+ payload["top_p"] = body.get("top_p")
830
+ if body.get("stop_sequences"):
831
+ payload["stop"] = body.get("stop_sequences")
832
+ tools = anthropic_tools_to_chat_tools(body.get("tools"))
833
+ if tools:
834
+ payload["tools"] = tools
835
+ tool_choice = anthropic_tool_choice_to_chat(body.get("tool_choice"))
836
+ if tool_choice is not None:
837
+ payload["tool_choice"] = tool_choice
838
+ return payload
839
+
840
+
841
+ def anthropic_stop_reason(finish_reason: str | None, has_tool_use: bool) -> str:
842
+ if has_tool_use or finish_reason in {"tool_calls", "tool_call"}:
843
+ return "tool_use"
844
+ if finish_reason == "length":
845
+ return "max_tokens"
846
+ if finish_reason == "stop_sequence":
847
+ return "stop_sequence"
848
+ return "end_turn"
849
+
850
+
851
+ def chat_completion_to_anthropic_message(body: dict[str, Any], upstream_json: dict[str, Any]) -> dict[str, Any]:
852
+ upstream_message, finish_reason = extract_upstream_message(upstream_json)
853
+ assistant_text, tool_calls = extract_text_and_tool_calls(upstream_message)
854
+ content_blocks: list[dict[str, Any]] = []
855
+ if assistant_text:
856
+ content_blocks.append({"type": "text", "text": assistant_text})
857
+ for tool_call in tool_calls:
858
+ arguments = tool_call.get("arguments") or "{}"
859
+ try:
860
+ parsed_input = json.loads(arguments)
861
+ except Exception:
862
+ parsed_input = {"raw": arguments}
863
+ content_blocks.append(
864
+ {
865
+ "type": "tool_use",
866
+ "id": tool_call["id"],
867
+ "name": tool_call.get("name"),
868
+ "input": parsed_input,
869
+ }
870
+ )
871
+ usage = upstream_json.get("usage") or {}
872
+ return {
873
+ "id": f"msg_{uuid.uuid4().hex}",
874
+ "type": "message",
875
+ "role": "assistant",
876
+ "model": body.get("model"),
877
+ "content": content_blocks,
878
+ "stop_reason": anthropic_stop_reason(finish_reason, bool(tool_calls)),
879
+ "stop_sequence": None,
880
+ "usage": {
881
+ "input_tokens": usage.get("prompt_tokens"),
882
+ "output_tokens": usage.get("completion_tokens"),
883
+ },
884
+ }
885
+
886
+
887
+ def anthropic_message_start_payload(message: dict[str, Any]) -> dict[str, Any]:
888
+ usage = message.get("usage") or {}
889
+ return {
890
+ "type": "message_start",
891
+ "message": {
892
+ "id": message.get("id"),
893
+ "type": "message",
894
+ "role": "assistant",
895
+ "model": message.get("model"),
896
+ "content": [],
897
+ "stop_reason": None,
898
+ "stop_sequence": None,
899
+ "usage": {
900
+ "input_tokens": usage.get("input_tokens"),
901
+ "output_tokens": 0,
902
+ },
903
+ },
904
+ }
905
+
906
+
907
+ def anthropic_message_delta_payload(message: dict[str, Any]) -> dict[str, Any]:
908
+ return {
909
+ "type": "message_delta",
910
+ "delta": {
911
+ "stop_reason": message.get("stop_reason"),
912
+ "stop_sequence": message.get("stop_sequence"),
913
+ },
914
+ "usage": message.get("usage") or {},
915
+ }
916
+
917
+
918
  def store_success_record(api_key_hash: str, model_id: str, request_body: dict[str, Any], input_items: list[dict[str, Any]], response_payload: dict[str, Any], latency_ms: float) -> None:
919
  conn = get_db_connection()
920
  try:
 
1243
  return await fetch_response_record(response_id, api_key)
1244
 
1245
 
1246
+ @app.post("/v1/messages")
1247
+ async def create_messages_v1(request: Request, api_key: str = Depends(extract_user_api_key)):
1248
+ return await create_messages_impl(request, api_key)
1249
+
1250
+
1251
+ @app.post("/messages")
1252
+ async def create_messages(request: Request, api_key: str = Depends(extract_user_api_key)):
1253
+ return await create_messages_impl(request, api_key)
1254
+
1255
+
1256
+ async def create_messages_impl(request: Request, api_key: str):
1257
+ body = await request.json()
1258
+ if not isinstance(body, dict):
1259
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="?????? JSON ???")
1260
+ if not body.get("model"):
1261
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="?? model ???")
1262
+ if body.get("max_tokens") is None:
1263
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="?? max_tokens ???")
1264
+ if not body.get("messages"):
1265
+ raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="?? messages ???")
1266
+
1267
+ chat_payload = build_chat_payload_from_anthropic(body)
1268
+ try:
1269
+ upstream_json, _latency_ms = await post_nvidia_chat_completion(api_key, chat_payload)
1270
+ except HTTPException as exc:
1271
+ await run_db(store_failure_metric, body.get("model"), exc.detail)
1272
+ raise exc
1273
+
1274
+ anthropic_message = chat_completion_to_anthropic_message(body, upstream_json)
1275
+
1276
+ if body.get("stream"):
1277
+ async def event_stream() -> Any:
1278
+ yield f"event: message_start\ndata: {json_dumps(anthropic_message_start_payload(anthropic_message))}\n\n"
1279
+ for index, block in enumerate(anthropic_message.get("content") or []):
1280
+ yield f"event: content_block_start\ndata: {json_dumps({'type': 'content_block_start', 'index': index, 'content_block': block})}\n\n"
1281
+ if block.get("type") == "text":
1282
+ yield f"event: content_block_delta\ndata: {json_dumps({'type': 'content_block_delta', 'index': index, 'delta': {'type': 'text_delta', 'text': block.get('text', '')}})}\n\n"
1283
+ elif block.get("type") == "tool_use":
1284
+ partial_json = json_dumps(block.get("input") or {})
1285
+ yield f"event: content_block_delta\ndata: {json_dumps({'type': 'content_block_delta', 'index': index, 'delta': {'type': 'input_json_delta', 'partial_json': partial_json}})}\n\n"
1286
+ yield f"event: content_block_stop\ndata: {json_dumps({'type': 'content_block_stop', 'index': index})}\n\n"
1287
+ yield f"event: message_delta\ndata: {json_dumps(anthropic_message_delta_payload(anthropic_message))}\n\n"
1288
+ yield "event: message_stop\ndata: {\"type\":\"message_stop\"}\n\n"
1289
+ return StreamingResponse(event_stream(), media_type="text/event-stream")
1290
+
1291
+ return anthropic_message
1292
+
1293
+
1294
  @app.post("/v1/responses")
1295
  async def create_response_v1(request: Request, api_key: str = Depends(extract_user_api_key)):
1296
  return await create_response_impl(request, api_key)
static/style.css CHANGED
@@ -468,15 +468,14 @@ body {
468
  .provider-model-chip {
469
  display: flex;
470
  align-items: center;
471
- min-height: 46px;
472
- height: 46px;
473
- padding: 0 14px;
474
  border-radius: 16px;
475
  background: var(--surface-soft);
476
  border: 1px solid var(--line);
477
  font-family: var(--font-display);
478
  font-size: 13px;
479
- line-height: 1;
480
  color: var(--text);
481
  white-space: nowrap;
482
  overflow: hidden;
 
468
  .provider-model-chip {
469
  display: flex;
470
  align-items: center;
471
+ min-height: 48px;
472
+ padding: 11px 14px;
 
473
  border-radius: 16px;
474
  background: var(--surface-soft);
475
  border: 1px solid var(--line);
476
  font-family: var(--font-display);
477
  font-size: 13px;
478
+ line-height: 1.35;
479
  color: var(--text);
480
  white-space: nowrap;
481
  overflow: hidden;