Spaces:

dqy08
/

InfoLens

Running on CPU Upgrade

App Files Files Community

dqy08 commited on 11 days ago

Commit

494c9e4

0 Parent(s):

initial beta release

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.cursorindexingignore +3 -0
.dockerignore +51 -0
.gitattributes +36 -0
.gitignore +42 -0
Dockerfile +66 -0
LICENSE +201 -0
README.md +58 -0
backend/__init__.py +18 -0
backend/access_log.py +233 -0
backend/api/__init__.py +2 -0
backend/api/analyze.py +412 -0
backend/api/analyze_semantic.py +212 -0
backend/api/demo.py +183 -0
backend/api/fetch_url.py +221 -0
backend/api/folder.py +102 -0
backend/api/model_switch.py +229 -0
backend/api/openai_completions.py +379 -0
backend/api/prediction_attribute.py +79 -0
backend/api/sse_utils.py +181 -0
backend/api/static.py +60 -0
backend/api/utils.py +118 -0
backend/app_context.py +110 -0
backend/class_register.py +16 -0
backend/completion_generator.py +558 -0
backend/data_utils.py +97 -0
backend/demo_folder.py +339 -0
backend/device.py +97 -0
backend/language_checker.py +422 -0
backend/load_utils.py +69 -0
backend/logging_config.py +37 -0
backend/model_loader.py +169 -0
backend/model_manager.py +233 -0
backend/next_token_topk.py +26 -0
backend/oom.py +55 -0
backend/path_utils.py +92 -0
backend/pred_topk_format.py +44 -0
backend/prediction_attributor.py +185 -0
backend/project_registry.py +72 -0
backend/quantization_config.py +42 -0
backend/runtime_config.py +402 -0
backend/schemas.py +43 -0
backend/semantic_analyzer.py +280 -0
client/src/analysis.html +188 -0
client/src/attribution.html +166 -0
client/src/chat.html +171 -0
client/src/compare.html +69 -0
client/src/content/home.en.html +91 -0
client/src/content/home.zh.html +68 -0
client/src/content/images/attribute-dark.png +3 -0
client/src/content/images/attribute.png +3 -0

.cursorindexingignore ADDED Viewed

	@@ -0,0 +1,3 @@


1	+
2	+ # Don't index SpecStory auto-save files, but allow explicit context inclusion via @ references
3	+ .specstory/**

.dockerignore ADDED Viewed

	@@ -0,0 +1,51 @@

+# --- 核心语言与依赖 ---
+__pycache__/
+*.py[cod]
+.venv/
+venv/
+env/
+node_modules/
+client/src/node_modules/
+client/src/.cache-loader/
+# --- 构建产物与缓存 ---
+client/dist/
+build/
+dist/
+*.egg-info/
+.cache_huggingface/
+*.tsbuildinfo
+# --- 项目特定配置 ---
+# 忽略所有数据，防止误传大文件
+data/*
+# 白名单：只保留public文件夹
+!data/demo/
+data/demo/*
+!data/demo/public/
+# 忽略临时文件和日志
+notes.md
+.env
+*.log
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*
+# --- 系统与 IDE ---
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+# --- Git ---
+.git
+.gitignore

.gitattributes ADDED Viewed

	@@ -0,0 +1,36 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,42 @@

+.cursor/
+.cache_huggingface
+# --- 核心语言与依赖 ---
+__pycache__/
+*.py[cod]
+.venv/
+venv/
+env/
+node_modules/
+client/src/node_modules/
+client/src/.cache-loader/
+# --- 构建产物 ---
+client/dist/
+build/
+dist/
+*.egg-info/
+# --- 项目特定配置 ---
+# 忽略所有数据，防止误传大文件
+data/*
+# 白名单：只保留 GLTR 演示数据
+!data/demo/
+data/demo/*
+!data/demo/public/
+data/demo/public/.deleted/
+# 忽略临时笔记和 HuggingFace 缓存
+notes/*
+user_dialog_history/*
+.cache_huggingface/
+.env
+# --- 系统与 IDE ---
+.DS_Store
+.idea/
+.vscode/
+*.swp
+*.log
+.specstory
+scripts/log.py
+scripts/results/

Dockerfile ADDED Viewed

	@@ -0,0 +1,66 @@

+# syntax=docker/dockerfile:1
+# -----------------------------------------------------------------------------
+# Frontend build stage (stable Node toolchain for webpack/TS)
+# -----------------------------------------------------------------------------
+FROM node:20-bookworm-slim AS frontend
+WORKDIR /app/client/src
+COPY client/src/package.json client/src/package-lock.json ./
+RUN npm ci
+COPY client/src/ ./
+# prebuild 需要读取的 JSON，否则 updateIntroHTML.js 会 ENOENT
+COPY data/demo/public/ /app/data/demo/public/
+RUN npm run build
+# -----------------------------------------------------------------------------
+# Runtime stage (Hugging Face Spaces runs container as UID 1000)
+# Reference: https://huggingface.co/docs/hub/spaces-sdks-docker
+# -----------------------------------------------------------------------------
+FROM python:3.10-slim
+# System deps (git for Hugging Face Hub downloads, build-essential for triton/AWQ CUDA kernel compilation)
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    git \
+    build-essential \
+  && rm -rf /var/lib/apt/lists/*
+# Create a non-root user with UID 1000 (mandatory in Spaces)
+RUN useradd -m -u 1000 user
+USER user
+# 只设置构建时需要的环境变量（pip install 需要这些路径）
+ENV HOME=/home/user \
+    PATH=/home/user/.local/bin:$PATH
+WORKDIR $HOME/app
+# Python deps (installed to user site-packages when system site is not writable)
+COPY --chown=user:users requirements.txt ./
+RUN pip install --no-cache-dir --upgrade pip && \
+    pip install --no-cache-dir -r requirements.txt
+# 运行时环境变量移到依赖安装之后（这些变量不影响依赖安装）
+ENV PYTHONUNBUFFERED=1
+# 启用 hf-transfer 加速下载
+ENV HF_HUB_ENABLE_HF_TRANSFER=1
+# App source（仅复制运行时需要的路径）
+COPY --chown=user:users *.py *.yaml ./
+COPY --chown=user:users backend/ ./backend/
+COPY --chown=user:users data/demo/public/ ./data/demo/public/
+# Frontend build artifacts
+COPY --chown=user:users --from=frontend /app/client/dist ./client/dist
+# ENV FORCE_INT8=1
+EXPOSE 7860
+# 硬件的模型适配：
+# 在CPU basic 上使用0.6b模型能达到及格的速度
+# 在CPU upgrade 上使用1.7b模型能达到及格的速度
+# 在本地M5 16G芯片上使用4b模型能达到及格的速度（瓶颈是内存大小）；M5 16G内存仅能同时支持一种分析模型（信息密度分析或语义分析）
+CMD ["python", "run.py", "--no_auto_load", "--port", "7860", "--model", "qwen3-1.7b", "--semantic_model", "qwen3-1.7b-instruct"]
+# CMD ["python", "run.py", "--no_auto_load", "--port", "7860", "--model", "qwen3-0.6b", "--semantic_model", "qwen3-0.6b-instruct"]

LICENSE ADDED Viewed

	@@ -0,0 +1,201 @@

+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+   1. Definitions.
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+   END OF TERMS AND CONDITIONS
+   APPENDIX: How to apply the Apache License to your work.
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+   Copyright [yyyy] [name of copyright owner]
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+       http://www.apache.org/licenses/LICENSE-2.0
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.

README.md ADDED Viewed

	@@ -0,0 +1,58 @@

+---
+title: Info Lens
+emoji: 🔭
+colorFrom: blue
+colorTo: red
+sdk: docker
+short_description: Explore the informational nature of LLMs and language.
+tags:
+  - nlp
+  - text-analysis
+  - information
+  - visualization
+  - reading-tools
+app_port: 7860
+pinned: false
+license: apache-2.0
+---
+# Info Lens
+**Info Lens** is a small toolbox for exploring the informational nature of LLMs and language.
+## Legacy name: InfoRadar
+InfoRadar is the former project and repo name. It still appears in parts of the codebase.
+## 📦 Quick Start
+### Using Docker (Recommended)
+This is the simplest way to run Info Lens:
+```bash
+# 1. Build the image
+docker build -t inforadar .
+# 2. Run the container (Map port to 7860)
+docker run -p 7860:7860 inforadar
+```
+Once running, visit `http://localhost:7860` in your browser.
+### Local Development
+**Backend Environment**:
+```bash
+pip install -r requirements.txt
+python server.py
+```
+**Frontend Build**:
+```bash
+cd client/src && npm install && npm run build
+```
+## 📜 License
+Apache 2.0

backend/__init__.py ADDED Viewed

	@@ -0,0 +1,18 @@

+from .class_register import REGISTERED_MODELS
+'''
+Import all classes in this directory so that classes with
+@register_model are registered.
+'''
+from os.path import basename, dirname, join
+from glob import glob
+pwd = dirname(__file__)
+for x in glob(join(pwd, '*.py')):
+    if not basename(x).startswith('__'):
+        __import__('backend.' + basename(x)[:-3],
+                   globals(), locals())
+__all__ = [
+    'REGISTERED_MODELS'
+]

backend/access_log.py ADDED Viewed

	@@ -0,0 +1,233 @@

+"""服务访问日志"""
+from datetime import datetime
+from typing import Optional
+from flask import request
+import threading
+# 全局请求计数器和锁
+_request_counter = 0
+_request_counter_lock = threading.Lock()
+def _get_client_ip():
+    """获取请求来源IP"""
+    try:
+        if request.headers.get('X-Forwarded-For'):
+            return request.headers.get('X-Forwarded-For').split(',')[0].strip()
+        elif request.headers.get('X-Real-IP'):
+            return request.headers.get('X-Real-IP')
+        else:
+            return request.remote_addr
+    except RuntimeError as e:
+        if "Working outside of request context" in str(e):
+            # 在没有请求上下文时返回本地地址
+            return "unknown"
+        else:
+            raise
+def get_client_ip():
+    """获取客户端IP（供其他模块使用）"""
+    return _get_client_ip()
+def _log_request(event_type: str, details: str = "", client_ip: str = None):
+    """打印服务请求日志"""
+    timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+    ip = client_ip if client_ip is not None else _get_client_ip()
+    log_msg = f"[{timestamp}] {ip:15s} | {event_type}"
+    if details:
+        log_msg += f" | {details}"
+    print(log_msg)
+def log_page_load(path: str):
+    """记录页面访问（含 ?ref= 参数）"""
+    details = f"path='{path}'"
+    try:
+        ref = request.args.get("ref")
+        if ref:
+            details += f", ref='{ref}'"
+    except RuntimeError:
+        pass
+    _log_request("📄 页面访问", details)
+def log_demo_file(path: str):
+    """记录demo文件请求"""
+    _log_request("🎯 demo文件", f"file='{path}'")
+def log_analyze_request(text: str, stream_mode: bool = False, client_ip: str = None):
+    """
+    记录收到分析请求
+    Returns:
+        int: 请求ID
+    """
+    global _request_counter
+    # 生成请求ID
+    with _request_counter_lock:
+        _request_counter += 1
+        request_id = _request_counter
+    preview_length = 100
+    text_preview = text[:preview_length] + '......' if text and len(text) > preview_length else (text if text else '')
+    char_count = len(text) if text else 0
+    byte_count = len(text.encode('utf-8')) if text else 0
+    mode_str = "(stream)" if stream_mode else ""
+    details = f"req_id={request_id}, text='{text_preview}', chars={char_count}, bytes={byte_count}"
+    _log_request(f"📥 收到请求{mode_str}", details, client_ip)
+    return request_id
+def log_analyze_start(request_id: int, wait_time: float, stream_mode: bool = False):
+    """记录开始处理分析请求（内部事件）"""
+    from backend.app_context import get_verbose
+    if not get_verbose():
+        return
+    mode_str = "(stream)" if stream_mode else ""
+    print(f"\t🔄 API analyze {mode_str} start: req_id={request_id}, wait_time={wait_time:.2f}s")
+def log_fetch_url(url: str, char_count: int = None):
+    """记录URL抓取请求"""
+    details = f"url='{url}'"
+    if char_count is not None:
+        details += f", chars={char_count}"
+    _log_request("🌐 URL抓取", details)
+def log_check_admin(success: bool, token: str = None):
+    """记录管理员权限检查"""
+    status = "成功" if success else "失败"
+    details = f"结果={status}"
+    if not success and token:
+        details += f", token='{token}'"
+    _log_request("🔐 管理员权限检查", details)
+def log_analyze_semantic_start(request_id: int, wait_time: float, stream_mode: bool = False):
+    """记录开始处理 semantic 分析请求（内部事件）"""
+    from backend.app_context import get_verbose
+    if not get_verbose():
+        return
+    mode_str = "(stream)" if stream_mode else ""
+    print(f"\t🔄 API analyze_semantic {mode_str} start: req_id={request_id}, wait_time={wait_time:.2f}s")
+def log_analyze_semantic_request(query: str, text: str, client_ip: str = None):
+    """
+    记录收到 semantic 分析请求
+    Returns:
+        int: 请求ID
+    """
+    global _request_counter
+    with _request_counter_lock:
+        _request_counter += 1
+        request_id = _request_counter
+    preview = 50
+    q_preview = query[:preview] + "..." if len(query) > preview else query
+    t_preview = text[:preview] + "..." if len(text) > preview else text
+    details = f"req_id={request_id}, query='{q_preview}', text='{t_preview}', chars={len(text)}"
+    _log_request("📥 semantic 分析请求", details, client_ip)
+    return request_id
+def log_openai_completions_start(request_id: int, wait_time: float):
+    """记录开始处理 OpenAI completions 请求（内部事件）"""
+    from backend.app_context import get_verbose
+    if not get_verbose():
+        return
+    print(f"\t🔄 API openai_completions start: req_id={request_id}, wait_time={wait_time:.2f}s")
+def log_openai_completions_request(
+    model: str, prompt: str, client_ip: str = None,
+):
+    """
+    记录收到 OpenAI completions 请求
+    Returns:
+        int: 请求ID
+    """
+    global _request_counter
+    with _request_counter_lock:
+        _request_counter += 1
+        request_id = _request_counter
+    preview = 50
+    p_preview = prompt[:preview] + "..." if len(prompt) > preview else prompt
+    details = (
+        f"req_id={request_id}, model='{model}', "
+        f"prompt='{p_preview}', chars={len(prompt)}"
+    )
+    _log_request("📥 openai completions 请求", details, client_ip)
+    return request_id
+def log_prediction_attribute_request(
+    context: str,
+    target_prediction: Optional[str],
+    model: str,
+    client_ip: str = None,
+) -> int:
+    """
+    记录收到 prediction_attribute 请求。
+    Returns:
+        int: 请求 ID（与其它 API 的 req_id 同源递增）
+    """
+    global _request_counter
+    with _request_counter_lock:
+        _request_counter += 1
+        request_id = _request_counter
+    preview = 50
+    c_preview = context[:preview] + "..." if len(context) > preview else context
+    if target_prediction is None:
+        t_preview = "<top-1>"
+    else:
+        t_preview = (
+            target_prediction[:preview] + "..."
+            if len(target_prediction) > preview
+            else target_prediction
+        )
+    details = (
+        f"req_id={request_id}, model={model!r}, context='{c_preview}', target='{t_preview}', "
+        f"context_chars={len(context)}"
+    )
+    _log_request("📥 prediction_attribute 请求", details, client_ip)
+    return request_id
+def log_openai_completions_prompt_request(
+    model: str,
+    user_prompt: str,
+    system: Optional[str] = None,
+    client_ip: str = None,
+) -> None:
+    """记录 POST /v1/completions/prompt（仅拼装 chat template，不分配 req_id）。"""
+    preview = 50
+    def _pv(s: str) -> str:
+        return s[:preview] + "..." if len(s) > preview else s
+    up = _pv(user_prompt)
+    if system is None:
+        details = f"model='{model}', user_prompt='{up}'"
+    else:
+        details = f"model='{model}', system='{_pv(system)}', user_prompt='{up}'"
+    _log_request("📥 openai completions/prompt 请求", details, client_ip)

backend/api/__init__.py ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ """API 路由模块"""
2	+

backend/api/analyze.py ADDED Viewed

	@@ -0,0 +1,412 @@

+"""文本分析 API"""
+import gc
+import json
+import time
+import queue
+import threading
+from typing import Optional
+from backend.schemas import create_empty_analysis_result
+from backend.model_manager import project_registry, DEFAULT_MODEL, _inference_lock
+from model_paths import resolve_hf_path
+from backend.oom import exit_if_oom
+from backend.api.sse_utils import (
+    SSEProgressReporter,
+    consume_progress_queue,
+    send_result_event,
+    send_error_event,
+)
+# 自定义异常：排队超时
+class QueueTimeoutError(Exception):
+    """排队等待获取锁超时"""
+    pass
+# 使用 model_manager 中的统一推理锁（与 analyze_semantic 共用）
+# 单次分析的总处理时长限制（秒）
+ANALYSIS_TIMEOUT = 60.0
+# 等待获取锁的最大时间（秒）- 如果排队时间过长，直接拒绝请求
+LOCK_WAIT_TIMEOUT = 10.0
+def _analyze_result_model_display(model: Optional[str]) -> Optional[str]:
+    """主分析 result.model：对外统一为 HuggingFace 仓库 id（与 model_paths.resolve_hf_path 一致）。"""
+    if not model or not str(model).strip():
+        return None
+    return resolve_hf_path(str(model).strip())
+def _build_response(model: str, text: str, result):
+    """构建标准响应"""
+    # 将 model 添加到 result 中，并确保 model 在最前面
+    if not isinstance(result, dict):
+        result = {}
+    result = result.copy()
+    # 如果 result 中已有 model，先移除
+    if 'model' in result:
+        model_value = result.pop('model')
+    else:
+        model_value = model
+    # 重新构建 result，确保 model 在最前面
+    result = {'model': _analyze_result_model_display(model_value), **result}
+    return {
+        "request": {'text': text},
+        "result": result
+    }
+def _error_response(model: str, text: str, message: str, status_code: int):
+    """构建错误响应（统一格式）"""
+    # 统一错误格式：包含 success=false 和 message
+    result = create_empty_analysis_result(message, _analyze_result_model_display(model))
+    return {
+        "success": False,
+        "message": message,
+        "request": {'text': text or ''},
+        "result": result
+    }, status_code
+def _validate_and_prepare_request(analyze_request):
+    """
+    验证请求并准备参数
+    Returns:
+        (model, text, error_msg, status_code) 元组
+        如果验证失败，返回 (None, None, error_msg, status_code)
+        如果成功，返回 (model, text, None, None)
+    """
+    model = analyze_request.get('model')
+    text = analyze_request.get('text')
+    if not text:
+        return None, None, "缺少分析文本，请提供 text 字段", 400
+    # 获取默认模型（使用模块级上下文以获取持久化的当前活动模型）
+    from backend.app_context import get_app_context
+    context = get_app_context(prefer_module_context=True)
+    default_model = context.model_name if context.model_name else DEFAULT_MODEL
+    # 处理 default、None 或空字符串，使用默认模型
+    if not model or model == 'default' or model == '':
+        model = default_model
+    else:
+        # 只允许使用默认模型，其他模型请求将被拒绝
+        if model != default_model:
+            return None, None, f"当前仅支持默认模型 '{default_model}'，不允许使用其他模型", 400
+    return model, text, None, None
+def _load_project_with_error_handling(model):
+    """
+    获取已加载的模型；若未加载则根据配置进行懒加载或返回错误。
+    Returns:
+        (project_obj, error_msg, status_code) 元组
+        如果成功，返回 (project_obj, None, None)
+        如果失败，返回 (None, error_msg, status_code)
+    """
+    # 检查模型是否在注册表中
+    if not project_registry.is_available(model):
+        available_models = list(project_registry.available_model_names())
+        error_msg = f"❌ 模型 '{model}' 未注册。可用模型: {available_models}"
+        print(error_msg)
+        return None, error_msg, 404
+    # 检查模型是否已加载
+    p = project_registry.get(model)
+    if p is None:
+        from backend.app_context import get_app_context
+        from backend.model_manager import ensure_main_slot_ready
+        context = get_app_context(prefer_module_context=True)
+        if context.model_loading:
+            error_msg = f"模型 '{model}' 正在后台加载中，请稍后重试"
+            print(f"⚠️ {error_msg}")
+            return None, error_msg, 503
+        # 懒加载模式 (--no_auto_load)：首次请求仅初始化主槽位（权重 + QwenLM 项目）
+        if getattr(context.args, 'no_auto_load', False):
+            try:
+                ensure_main_slot_ready()
+                p = project_registry.get(model)
+            except Exception as e:  # noqa: BLE001
+                import traceback
+                print(f"⚠️ 模型懒加载失败: {e}")
+                traceback.print_exc()
+                return None, f"模型加载失败: {str(e)}", 500
+        if p is None:
+            error_msg = f"模型 '{model}' 未加载，请联系管理员"
+            print(f"⚠️ {error_msg}")
+            return None, error_msg, 503
+    return p, None, None
+def _log_request(text, stream_mode=False, client_ip=None):
+    """
+    打印请求日志
+    Returns:
+        int: 请求ID
+    """
+    from backend.access_log import log_analyze_request
+    return log_analyze_request(text, stream_mode, client_ip)
+def _log_response(res, char_count, elapsed_time, stream_mode=False, request_id=None, wait_time=None):
+    """打印响应日志"""
+    tokens = len(res.get('bpe_strings', []))
+    text_length = char_count
+    mode_str = "(stream)" if stream_mode else ""
+    # 构建日志消息
+    msg = f"\t📤 API analyze {mode_str} response:"
+    if request_id is not None:
+        msg += f" req_id={request_id},"
+    msg += f" tokens={tokens}, text_length={text_length}"
+    msg += f", response_time={elapsed_time:.4f}s"
+    print(msg)
+def _validate_and_fix_result(res):
+    """验证和修复结果结构"""
+    if not isinstance(res, dict):
+        res = {'bpe_strings': []}
+    if 'bpe_strings' not in res or not isinstance(res.get('bpe_strings'), list):
+        res['bpe_strings'] = res.get('bpe_strings', []) if isinstance(res.get('bpe_strings'), list) else []
+    return res
+def analyze(analyze_request):
+    """
+    分析文本
+    Args:
+        analyze_request: 分析请求字典，包含：
+            - model: 模型名称
+            - text: 要分析的文本
+            - stream: 可选，如果为 True 则返回 SSE 流式响应（带进度信息）
+    Returns:
+        如果 stream=True: SSE 响应对象
+        否则: (响应字典, 状态码) 元组
+    """
+    # 检查模型是否正在加载中（使用模块级上下文）
+    from backend.app_context import get_app_context
+    context = get_app_context(prefer_module_context=True)
+    if context.model_loading:
+        return _error_response('', '', '模型正在加载中，请稍后重试', 503)
+    # 在请求上下文中获取 client_ip，流式响应时生成器内可能已失效
+    from backend.access_log import get_client_ip
+    client_ip = get_client_ip()
+    # 检查是否启用流式响应
+    stream = analyze_request.get('stream', False)
+    if stream:
+        return _analyze_with_stream(analyze_request, client_ip)
+    return _analyze_plain(analyze_request, client_ip)
+def _analyze_with_stream(analyze_request, client_ip):
+    """
+    流式分析文本，通过SSE返回进度和结果（内部函数）
+    Args:
+        analyze_request: 分析请求字典，包含 model 和 text
+        client_ip: 客户端 IP，在入口处获取后传入
+    Returns:
+        SSE响应对象
+    """
+    reporter = SSEProgressReporter(lambda: _generate_analyze_events(analyze_request, client_ip))
+    return reporter.create_response()
+def _analyze_plain(analyze_request, client_ip):
+    """
+    非流式分析：封装流式实现，消费事件流后返回 JSON。
+    供脚本等简单客户端使用。
+    """
+    result = None
+    error_msg = None
+    status_code = 500
+    try:
+        for event_str in _generate_analyze_events(analyze_request, client_ip):
+            if not event_str.startswith('data: '):
+                continue
+            data = json.loads(event_str[6:].strip())
+            t = data.get('type')
+            if t == 'result':
+                result = data.get('data')
+            elif t == 'error':
+                error_msg = data.get('message', '分析失败')
+                status_code = data.get('status_code', 500)
+                break
+    except Exception as e:
+        import traceback
+        traceback.print_exc()
+        exit_if_oom(e, defer_seconds=1)
+        error_msg = f"分析失败: {str(e)}"
+    finally:
+        gc.collect()
+    if error_msg:
+        model = analyze_request.get('model') or ''
+        text = analyze_request.get('text') or ''
+        return _error_response(model, text, error_msg, status_code)
+    if result is None:
+        return _error_response('', '', '分析失败：未获取到结果', 500)
+    return result, 200
+def _generate_analyze_events(analyze_request, client_ip):
+    """
+    流式分析核心：生成 SSE 事件流（progress + result/error）。
+    供 _analyze_with_stream 和 _analyze_plain 复用。
+    client_ip 需在入口处获取并传入，因流式响应时生成器执行时请求上下文可能已失效。
+    """
+    # 再次检查模型加载状态（在生成器内部，使用模块级上下文）
+    from backend.app_context import get_app_context
+    context = get_app_context(prefer_module_context=True)
+    if context.model_loading:
+        yield send_error_event('模型正在加载中，请稍后重试', 503)
+        return
+    start_time = time.perf_counter()
+    # 验证和准备请求
+    model, text, error_msg, status_code = _validate_and_prepare_request(analyze_request)
+    if error_msg:
+        yield send_error_event(error_msg, status_code or 400)
+        return
+    # 加载模型
+    p, error_msg, status_code = _load_project_with_error_handling(model)
+    if error_msg:
+        yield send_error_event(error_msg, status_code or 500)
+        return
+    try:
+        char_count = len(text) if text else 0
+        request_id = _log_request(text, stream_mode=True, client_ip=client_ip)
+        # 创建线程安全的进度队列
+        progress_queue = queue.Queue()
+        analysis_done = threading.Event()
+        analysis_result = None
+        analysis_error = None
+        lock_wait_time = None  # 记录等待锁的时间
+        def progress_callback_func(step: int, total_steps: int, stage: str, percentage: Optional[int]):
+            """进度回调函数，将事件加入队列"""
+            progress_queue.put(('progress', step, total_steps, stage, percentage))
+        def run_analysis():
+            """在单独线程中运行分析"""
+            nonlocal analysis_result, analysis_error, lock_wait_time
+            try:
+                # 记录开始等待锁的时间
+                lock_wait_start = time.perf_counter()
+                # 尝试获取锁，设置超时避免长时间排队
+                lock_acquired = _inference_lock.acquire(timeout=LOCK_WAIT_TIMEOUT)
+                if not lock_acquired:
+                    # 获取锁超时，说明前面有任务在执行且耗时较长
+                    analysis_error = QueueTimeoutError(
+                        f"排队等待超过 {LOCK_WAIT_TIMEOUT} 秒，服务繁忙，请稍后重试"
+                    )
+                    return
+                # 记录等待时间
+                lock_wait_time = time.perf_counter() - lock_wait_start
+                try:
+                    from backend.access_log import log_analyze_start
+                    log_analyze_start(request_id, lock_wait_time, stream_mode=True)
+                    # 在持有锁的情况下执行分析
+                    # 注意：这里的执行时长也会受到 ANALYSIS_TIMEOUT 的监控（在外层循环中）
+                    res = p.lm.analyze_text(text, progress_callback=progress_callback_func)
+                    analysis_result = res
+                finally:
+                    # 确保锁一定会被释放
+                    _inference_lock.release()
+            except Exception as e:
+                analysis_error = e
+            finally:
+                analysis_done.set()
+                progress_queue.put(('done', None, None))  # 发送完成信号
+        # 启动分析线程
+        analysis_thread = threading.Thread(target=run_analysis, daemon=True)
+        analysis_thread.start()
+        # 实时发送进度事件，并检查超时
+        timeout_reached = False
+        for kind, event_str in consume_progress_queue(
+            progress_queue, analysis_done, start_time, ANALYSIS_TIMEOUT, "分析"
+        ):
+            if kind == 'timeout':
+                timeout_reached = True
+                yield event_str
+                break
+            if kind == 'progress':
+                yield event_str
+            elif kind == 'done':
+                break
+        # 如果超时，不等待分析完成，直接返回
+        if timeout_reached:
+            gc.collect()
+            return
+        # 检查是否有错误
+        # 注意：此时已收到 'done' 信号，分析线程已完成其工作（或发生错误）
+        # 线程是 daemon 的，会自动清理，无需显式等待
+        if analysis_error:
+            # 排队超时：返回友好的错误消息
+            if isinstance(analysis_error, QueueTimeoutError):
+                print(f"⏱️ 排队超时: {analysis_error}")
+                yield send_error_event(str(analysis_error), 503)
+                gc.collect()
+                return
+            # 其他错误：抛出异常，由外层的 try-except 处理
+            raise analysis_error
+        # 检查结果是否为空（理论上不应该发生，因为要么有结果，要么有错误）
+        if analysis_result is None:
+            print("⚠️ 分析结果为空，但没有错误信息")
+            yield send_error_event("分析失败：未获取到结果", 500)
+            gc.collect()
+            return
+        res = analysis_result
+        elapsed_time = time.perf_counter() - start_time
+        _log_response(res, char_count, elapsed_time, stream_mode=True,
+                     request_id=request_id, wait_time=lock_wait_time)
+        # 验证和修复结果
+        res = _validate_and_fix_result(res)
+        # 构建最终响应
+        final_response = _build_response(model, text, res)
+        # 发送最终结果
+        yield send_result_event(final_response)
+        # 强制垃圾回收以释放内存
+        gc.collect()
+    except Exception as e:
+        import traceback
+        traceback.print_exc()
+        exit_if_oom(e, defer_seconds=1)
+        yield send_error_event(str(e), 500)
+        # 即使出错也进行垃圾回收
+        gc.collect()

backend/api/analyze_semantic.py ADDED Viewed

	@@ -0,0 +1,212 @@

+"""Semantic analysis API：返回原文各 token 对 prompt 的平均关注度"""
+import gc
+import json
+import queue
+import threading
+import time
+from typing import Optional
+from backend.model_manager import _inference_lock
+from backend.oom import exit_if_oom
+from backend.semantic_analyzer import analyze_semantic as _analyze_semantic
+from backend.api.sse_utils import (
+    SSEProgressReporter,
+    consume_progress_queue,
+    send_result_event,
+    send_error_event,
+)
+from backend.access_log import get_client_ip
+from backend.api.analyze import QueueTimeoutError, ANALYSIS_TIMEOUT, LOCK_WAIT_TIMEOUT
+def _log_request(query, text, client_ip=None):
+    from backend.access_log import log_analyze_semantic_request
+    return log_analyze_semantic_request(query, text, client_ip)
+def _build_success_response(result, debug_info: bool = False):
+    """构建成功响应。debug_info=True 时包含 debug_info 对象（abbrev、topk_tokens、topk_probs）"""
+    resp = {
+        "success": True,
+        "model": result["model"],
+        "token_attention": result["token_attention"],
+        "full_match_degree": result["full_match_degree"],
+    }
+    if debug_info and "debug_info" in result:
+        resp["debug_info"] = result["debug_info"]
+    return resp
+def _generate_semantic_events(
+    query: str, text: str, submode: Optional[str] = None, debug_info: bool = False,
+    full_match_degree_only: bool = False, client_ip: Optional[str] = None
+):
+    """
+    流式语义分析核心：生成 SSE 事件流（progress + result/error）。
+    供 _analyze_semantic_with_stream 和 _analyze_semantic_plain 复用。
+    client_ip 需在入口处获取并传入，因流式响应时生成器执行时请求上下文已失效。
+    """
+    if client_ip is None:
+        client_ip = get_client_ip()
+    start_time = time.perf_counter()
+    request_id = _log_request(query, text, client_ip)
+    progress_queue = queue.Queue()
+    analysis_done = threading.Event()
+    analysis_result = None
+    analysis_error = None
+    lock_wait_time = None
+    def progress_callback(step: int, total_steps: int, stage: str, percentage: Optional[int]):
+        progress_queue.put(("progress", step, total_steps, stage, percentage))
+    def run_analysis():
+        nonlocal analysis_result, analysis_error, lock_wait_time
+        try:
+            lock_wait_start = time.perf_counter()
+            lock_acquired = _inference_lock.acquire(timeout=LOCK_WAIT_TIMEOUT)
+            if not lock_acquired:
+                analysis_error = QueueTimeoutError(
+                    f"排队等待超过 {LOCK_WAIT_TIMEOUT} 秒，服务繁忙，请稍后重试"
+                )
+                return
+            lock_wait_time = time.perf_counter() - lock_wait_start
+            try:
+                from backend.access_log import log_analyze_semantic_start
+                log_analyze_semantic_start(request_id, lock_wait_time, stream_mode=True)
+                result = _analyze_semantic(query, text, submode_override=submode, progress_callback=progress_callback, debug_info=debug_info, full_match_degree_only=full_match_degree_only)
+                analysis_result = result
+            finally:
+                _inference_lock.release()
+        except Exception as e:
+            analysis_error = e
+        finally:
+            analysis_done.set()
+            progress_queue.put(("done", None, None))
+    try:
+        analysis_thread = threading.Thread(target=run_analysis, daemon=True)
+        analysis_thread.start()
+        timeout_reached = False
+        for kind, event_str in consume_progress_queue(
+            progress_queue, analysis_done, start_time, ANALYSIS_TIMEOUT, "语义分析"
+        ):
+            if kind == 'timeout':
+                timeout_reached = True
+                yield event_str
+                break
+            if kind == 'progress':
+                yield event_str
+            elif kind == 'done':
+                break
+        if timeout_reached:
+            gc.collect()
+            return
+        if analysis_error:
+            if isinstance(analysis_error, QueueTimeoutError):
+                print(f"⏱️ 排队超时: {analysis_error}")
+                yield send_error_event(str(analysis_error), 503)
+                gc.collect()
+                return
+            raise analysis_error
+        if analysis_result is None:
+            print("⚠️ 语义分析结果为空，但没有错误信息")
+            yield send_error_event("分析失败：未获取到结果", 500)
+            gc.collect()
+            return
+        elapsed = time.perf_counter() - start_time
+        tokens = len(analysis_result.get("token_attention", []))
+        print(
+            f"\t📤 API analyze_semantic (stream) response: req_id={request_id}, "
+            f"tokens={tokens}, response_time={elapsed:.4f}s"
+        )
+        yield send_result_event(_build_success_response(analysis_result, debug_info))
+    except Exception as e:
+        import traceback
+        traceback.print_exc()
+        exit_if_oom(e, defer_seconds=1)
+        yield send_error_event(str(e), 500)
+    finally:
+        gc.collect()
+def _analyze_semantic_with_stream(
+    query: str, text: str, submode: Optional[str] = None, debug_info: bool = False,
+    full_match_degree_only: bool = False, client_ip: Optional[str] = None
+):
+    """流式语义分析，通过 SSE 返回阶段级进度"""
+    return SSEProgressReporter(
+        lambda: _generate_semantic_events(query, text, submode, debug_info, full_match_degree_only, client_ip)
+    ).create_response()
+def _analyze_semantic_plain(
+    query: str, text: str, submode: Optional[str] = None, debug_info: bool = False,
+    full_match_degree_only: bool = False, client_ip: Optional[str] = None
+):
+    """
+    非流式语义分析：封装流式实现，消费事件流后返回 JSON。
+    供脚本等简单客户端使用。
+    """
+    result = None
+    error_msg = None
+    status_code = 500
+    try:
+        for event_str in _generate_semantic_events(query, text, submode, debug_info, full_match_degree_only, client_ip):
+            if not event_str.startswith('data: '):
+                continue
+            data = json.loads(event_str[6:].strip())
+            t = data.get('type')
+            if t == 'result':
+                result = data.get('data')
+            elif t == 'error':
+                error_msg = data.get('message', '分析失败')
+                status_code = data.get('status_code', 500)
+                break
+    except Exception as e:
+        import traceback
+        traceback.print_exc()
+        exit_if_oom(e, defer_seconds=1)
+        error_msg = str(e)
+    finally:
+        gc.collect()
+    if error_msg:
+        return {"success": False, "message": error_msg}, status_code
+    if result is None:
+        return {"success": False, "message": "分析失败：未获取到结果"}, 500
+    return result, 200
+def analyze_semantic(semantic_request):
+    """
+    分析原文 token 对 prompt 的关注度。
+    Args:
+        semantic_request: 包含 query、text、stream（可选）、submode（可选）的字典
+    Returns:
+        stream=True 时返回 SSE 响应；否则返回 (响应字典, 状态码) 元组
+    """
+    query = (semantic_request.get("query") or "")
+    text = semantic_request.get("text") or ""
+    stream = semantic_request.get("stream", False)
+    submode = (semantic_request.get("submode") or "").strip() or None
+    debug_info = bool(semantic_request.get("debug_info", False))
+    full_match_degree_only = bool(semantic_request.get("full_match_degree_only", False))
+    if not query:
+        return {"success": False, "message": "缺少 query 字段"}, 400
+    if not text:
+        return {"success": False, "message": "缺少 text 字段"}, 400
+    client_ip = get_client_ip()
+    if stream:
+        return _analyze_semantic_with_stream(query, text, submode, debug_info, full_match_degree_only, client_ip)
+    return _analyze_semantic_plain(query, text, submode, debug_info, full_match_degree_only, client_ip)

backend/api/demo.py ADDED Viewed

	@@ -0,0 +1,183 @@

+"""Demo 文件管理 API"""
+from backend.data_utils import save_demo_payload
+from backend.demo_folder import (
+    list_demo_items,
+    move_demo_file,
+    rename_demo_file,
+    delete_demo_file,
+    move_folder,
+)
+from backend.api.utils import (
+    get_demo_directory,
+    handle_api_error,
+    handle_api_success,
+    require_admin,
+    validate_admin_token,
+)
+from backend.access_log import log_check_admin
+def list_demos(path: str = ""):
+    """
+    扫描demo目录下的文件夹和文件，返回列表
+    支持指定路径参数，返回指定路径下的内容
+    文件名（去掉.json后缀）作为demo名称
+    支持中文文件名和路径
+    从data/demo目录读取（更专业的数据目录结构）
+    Args:
+        path: 可选，指定要列出的路径，默认为根目录（空字符串）
+    """
+    demo_dir = get_demo_directory(create=False)
+    try:
+        result = list_demo_items(demo_dir, path)
+        # if not result.get("items"):
+        #     print(f"⚠️  路径 '{path}' 下没有内容: {demo_dir}")
+        # else:
+        #     items_count = len(result["items"])
+        #     folders_count = sum(1 for item in result["items"] if item["type"] == "folder")
+        #     files_count = sum(1 for item in result["items"] if item["type"] == "file")
+        #     print(f"✓ 路径 '{path}': {folders_count} 个文件夹, {files_count} 个文件 (共 {items_count} 项)")
+        return result
+    except Exception as e:
+        error_result = handle_api_error("Failed to scan demo directory", e)
+        return {"path": path, "items": []}
+@require_admin
+def save_demo(save_request):
+    """
+    保存demo文件到data/demo目录
+    请求格式: { name: string, data: AnalyzeResponse, path?: string, overwrite?: boolean }
+    path: 可选，保存路径，默认为根目录（"/"）
+    overwrite: 可选，是否覆盖已存在的文件，默认为False
+    """
+    name = save_request.get('name')
+    data = save_request.get('data')
+    path = save_request.get('path', '/')  # 默认为根目录
+    overwrite = save_request.get('overwrite', False)  # 默认为False
+    if not name or not data:
+        return {
+            'success': False,
+            'message': 'Missing required parameters: name or data'
+        }
+    try:
+        demo_dir = get_demo_directory(create=True)
+        result = save_demo_payload(demo_dir, name, data, path, overwrite)
+        if result.get('success'):
+            print(f"✓ Demo已保存: {demo_dir / result['file']}")
+        else:
+            print(f"❌ Save failed: {result.get('message')}")
+        return result
+    except Exception as e:
+        return handle_api_error('Save failed', e)
+@require_admin
+def delete_demo(delete_request):
+    """
+    将demo文件移动到deleted文件夹（软删除）
+    请求格式: { file: string }  # 文件名（包含.json后缀）
+    """
+    file = delete_request.get('file')
+    if not file:
+        return {
+            'success': False,
+            'message': 'Missing required parameter: file'
+        }
+    try:
+        demo_dir = get_demo_directory(create=False)
+        result = delete_demo_file(demo_dir, file)
+        return handle_api_success(result)
+    except Exception as e:
+        return handle_api_error('Delete failed', e)
+@require_admin
+def move_demo(move_request):
+    """
+    移动demo文件或文件夹
+    请求格式: { file: string, target_path: string } 或 { path: string, target_path: string }
+    """
+    file = move_request.get('file')
+    path = move_request.get('path')
+    target_path = move_request.get('target_path', '')
+    if not target_path and target_path != '':
+        return {
+            'success': False,
+            'message': 'Missing required parameter: target_path'
+        }
+    if not file and not path:
+        return {
+            'success': False,
+            'message': 'Missing required parameter: file or path'
+        }
+    try:
+        demo_dir = get_demo_directory(create=False)
+        if file:
+            # 移动文件
+            result = move_demo_file(demo_dir, file, target_path)
+        else:
+            # 移动文件夹
+            result = move_folder(demo_dir, path, target_path)
+        return handle_api_success(result)
+    except Exception as e:
+        return handle_api_error('Move failed', e)
+@require_admin
+def rename_demo(rename_request):
+    """
+    重命名demo文件
+    请求格式: { file: string, new_name: string }
+    """
+    file = rename_request.get('file')
+    new_name = rename_request.get('new_name')
+    if not file or not new_name:
+        return {
+            'success': False,
+            'message': 'Missing required parameter: file or new_name'
+        }
+    try:
+        demo_dir = get_demo_directory(create=False)
+        result = rename_demo_file(demo_dir, file, new_name)
+        return handle_api_success(result)
+    except Exception as e:
+        return handle_api_error('Rename failed', e)
+def check_admin(check_request):
+    """
+    检查管理员token是否有效
+    请求格式: { token: string }
+    """
+    from flask import request
+    # 从请求体或请求头获取token
+    request_token = check_request.get('token') or request.headers.get('X-Admin-Token')
+    # 验证token
+    is_valid, error_message = validate_admin_token(request_token)
+    # 记录管理员权限检查
+    log_check_admin(is_valid, token=request_token)
+    if is_valid:
+        return {'success': True}
+    else:
+        return {
+            'success': False,
+            'message': error_message
+        }

backend/api/fetch_url.py ADDED Viewed

	@@ -0,0 +1,221 @@

+"""URL 文本提取 API"""
+import json
+import re
+from urllib.parse import urlparse
+import trafilatura
+import requests
+from backend.api.utils import handle_api_error
+# 单次提取的最大字符数上限（防止异常大页面影响性能）
+MAX_EXTRACTED_TEXT_LENGTH = 20000
+def _is_valid_url(url: str) -> bool:
+    """验证 URL 格式"""
+    try:
+        result = urlparse(url)
+        return all([result.scheme in ['http', 'https'], result.netloc])
+    except Exception:
+        return False
+def _is_local_or_private(url: str) -> bool:
+    """检查是否为本地或私有网络地址（防止 SSRF 攻击）"""
+    try:
+        parsed = urlparse(url)
+        hostname = parsed.hostname
+        if not hostname:
+            return True
+        # 检查是否为 localhost
+        if hostname in ['localhost', '127.0.0.1', '::1']:
+            return True
+        # 检查是否为私有 IP 地址
+        private_patterns = [
+            r'^10\.',  # 10.0.0.0/8
+            r'^172\.(1[6-9]|2[0-9]|3[0-1])\.',  # 172.16.0.0/12
+            r'^192\.168\.',  # 192.168.0.0/16
+            r'^169\.254\.',  # 169.254.0.0/16 (link-local)
+        ]
+        for pattern in private_patterns:
+            if re.match(pattern, hostname):
+                return True
+        return False
+    except Exception:
+        return True  # 解析失败时保守处理，拒绝访问
+def _format_article_text(metadata: dict) -> str:
+    """
+    将元数据和正文格式化为类似网页显示的纯文本
+    Args:
+        metadata: trafilatura 提取的 JSON 数据（已解析为字典）
+    Returns:
+        格式化后的文章文本
+    """
+    lines = []
+    # 标题
+    if metadata.get('title'):
+        lines.append(metadata['title'])
+        lines.append('')
+    # 元数据信息（无标签，直接显示内容）
+    meta_parts = []
+    if metadata.get('author'):
+        meta_parts.append(metadata['author'])
+    if metadata.get('date'):
+        meta_parts.append(metadata['date'])
+    # if metadata.get('hostname'):
+    #     meta_parts.append(metadata['hostname'])
+    if metadata.get('source-hostname'):
+        meta_parts.append(metadata['source-hostname'])
+    # if metadata.get('filedate'):
+    #     meta_parts.append(metadata['filedate'])
+    if meta_parts:
+        lines.append(' | '.join(meta_parts))
+        lines.append('')
+    # 正文
+    if metadata.get('text'):
+        lines.append(metadata['text'])
+    return '\n'.join(lines)
+def fetch_url(fetch_request):
+    """
+    从 URL 提取文本内容
+    Args:
+        fetch_request: 包含 url 字段的字典
+    Returns:
+        (响应字典, 状态码) 元组
+    """
+    url = fetch_request.get('url', '').strip()
+    # 验证 URL
+    if not url:
+        return {
+            'success': False,
+            'message': '缺少 URL 参数，请提供 url 字段'
+        }, 400
+    if not _is_valid_url(url):
+        return {
+            'success': False,
+            'message': f'无效的 URL 格式: {url}'
+        }, 400
+    # 安全检查：防止 SSRF 攻击
+    if _is_local_or_private(url):
+        return {
+            'success': False,
+            'message': '不允许访问本地或私有网络地址'
+        }, 400
+    # 提取文本和元数据
+    try:
+        from backend.access_log import log_fetch_url
+        log_fetch_url(url)
+        # 使用 requests 下载网页，设置浏览器 User-Agent 和请求头
+        headers = {
+            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
+            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
+            'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
+            'Accept-Encoding': 'gzip, deflate, br',
+            'Connection': 'keep-alive',
+            'Upgrade-Insecure-Requests': '1',
+        }
+        # 下载网页内容（设置超时和请求头）
+        response = requests.get(url, headers=headers, timeout=10, allow_redirects=True)
+        response.raise_for_status()
+        # 检查响应内容类型
+        content_type = response.headers.get('Content-Type', '').lower()
+        if 'text/html' not in content_type and 'text/xml' not in content_type:
+            return {
+                'success': False,
+                'message': f'不支持的内容类型: {content_type}，仅支持 HTML/XML 页面'
+            }, 400
+        # 使用 trafilatura 提取结构化数据（包含元数据和正文）
+        result_json = trafilatura.extract(
+            response.text,
+            url=url,
+            with_metadata=True,
+            output_format='json'
+        )
+        if not result_json:
+            print("⚠️ 无法提取页面内容")
+            return {
+                'success': False,
+                'message': '无法从网页中提取文本内容，可能不是文章页面或页面需要验证'
+            }, 400
+        # 解析 JSON 数据
+        metadata = json.loads(result_json)
+        # 检查是否有正文内容
+        if not metadata.get('text') or not metadata['text'].strip():
+            print("⚠️ 提取到元数据但无正文内容")
+            print("元数据:", json.dumps(metadata, ensure_ascii=False, indent=2))
+            return {
+                'success': False,
+                'message': '无法从网页中提取正文内容'
+            }, 400
+        # 格式化文本（元数据 + 正文）
+        formatted_text = _format_article_text(metadata)
+        original_char_count = len(formatted_text)
+        # 构建返回消息（如果截断了，添加提示）
+        message = None
+        # 检查并截断超长文本
+        if original_char_count > MAX_EXTRACTED_TEXT_LENGTH:
+            formatted_text = formatted_text[:MAX_EXTRACTED_TEXT_LENGTH]
+            message = f'内容较长，已截断为前 {MAX_EXTRACTED_TEXT_LENGTH} 字符（原始长度: {original_char_count} 字符）'
+        char_count = len(formatted_text)
+        # 打印提取结果
+        # print(formatted_text.split('\n')[:4])
+        # print(f"✓ 提取成功: {char_count} 字符" + (f" (截断前: {original_char_count} 字符)" if original_char_count > char_count else ""))
+        # 打印除正文外的metadata内容
+        metadata_less = metadata.copy()
+        metadata_less['raw_text'] = ''
+        metadata_less['text'] = ''
+        # print(json.dumps(metadata_less, ensure_ascii=False, indent=2))
+        return {
+            'success': True,
+            'text': formatted_text,
+            'url': url,
+            'char_count': char_count,
+            'message': message
+        }, 200
+    except requests.exceptions.Timeout:
+        return {
+            'success': False,
+            'message': '请求超时，请检查网络连接或稍后重试'
+        }, 400
+    except requests.exceptions.RequestException as e:
+        return {
+            'success': False,
+            'message': f'无法访问 URL: {str(e)}'
+        }, 400
+    except Exception as e:  # noqa: BLE001
+        error_response = handle_api_error('URL 文本提取失败', e)
+        return error_response, 500

backend/api/folder.py ADDED Viewed

	@@ -0,0 +1,102 @@

+"""文件夹管理 API"""
+from backend.demo_folder import (
+    get_all_folders,
+    move_folder,
+    rename_folder,
+    delete_folder,
+    create_folder,
+)
+from backend.api.utils import (
+    get_demo_directory,
+    handle_api_error,
+    handle_api_success,
+    require_admin,
+)
+def _move_folder_internal(demo_dir, path, target_path):
+    """内部函数：移动文件夹"""
+    return move_folder(demo_dir, path, target_path)
+@require_admin
+def rename_folder_api(rename_request):
+    """
+    重命名文件夹
+    请求格式: { path: string, new_name: string }
+    """
+    path = rename_request.get('path')
+    new_name = rename_request.get('new_name')
+    if not path or not new_name:
+        return {
+            'success': False,
+            'message': 'Missing required parameter: path or new_name'
+        }
+    try:
+        demo_dir = get_demo_directory(create=False)
+        result = rename_folder(demo_dir, path, new_name)
+        return handle_api_success(result)
+    except Exception as e:
+        return handle_api_error('Rename failed', e)
+@require_admin
+def delete_folder_api(delete_request):
+    """
+    删除文件夹（移动到.deleted目录）
+    请求格式: { path: string }
+    """
+    path = delete_request.get('path')
+    if not path:
+        return {
+            'success': False,
+            'message': 'Missing required parameter: path'
+        }
+    try:
+        demo_dir = get_demo_directory(create=False)
+        result = delete_folder(demo_dir, path)
+        return handle_api_success(result)
+    except Exception as e:
+        return handle_api_error('Delete failed', e)
+def list_all_folders():
+    """
+    获取所有文件夹列表（用于移动操作的选择器）
+    返回格式: { folders: string[] }
+    """
+    try:
+        demo_dir = get_demo_directory(create=False)
+        folders = get_all_folders(demo_dir)
+        return {'folders': folders}
+    except Exception as e:
+        handle_api_error("Failed to get folder list", e)
+        return {'folders': []}
+@require_admin
+def create_folder_api(create_request):
+    """
+    创建新文件夹
+    请求格式: { parent_path: string, folder_name: string }
+    """
+    parent_path = create_request.get('parent_path', '/')
+    folder_name = create_request.get('folder_name')
+    if not folder_name:
+        return {
+            'success': False,
+            'message': 'Missing required parameter: folder_name'
+        }
+    try:
+        demo_dir = get_demo_directory(create=False)
+        result = create_folder(demo_dir, parent_path, folder_name)
+        return handle_api_success(result)
+    except Exception as e:
+        return handle_api_error('Create failed', e)

backend/api/model_switch.py ADDED Viewed

	@@ -0,0 +1,229 @@

+"""模型切换 API"""
+import gc
+import os
+from typing import Optional
+import torch
+from backend import REGISTERED_MODELS
+from backend.model_manager import project_registry
+from backend.app_context import get_app_context
+from backend.api.utils import require_admin
+def get_available_models():
+    """获取所有可用的模型列表"""
+    return {
+        'success': True,
+        'models': list(REGISTERED_MODELS.keys())
+    }, 200
+def _get_device_type() -> str:
+    """获取当前设备类型"""
+    if torch.cuda.is_available():
+        return "cuda"
+    elif hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
+        return "mps"
+    else:
+        return "cpu"
+def _restore_env_vars(old_force_int8: Optional[str], old_force_bfloat16: Optional[str]) -> None:
+    """恢复环境变量配置"""
+    if old_force_int8 is not None:
+        os.environ['FORCE_INT8'] = old_force_int8
+    else:
+        os.environ.pop('FORCE_INT8', None)
+    if old_force_bfloat16 is not None:
+        os.environ['CPU_FORCE_BFLOAT16'] = old_force_bfloat16
+    else:
+        os.environ.pop('CPU_FORCE_BFLOAT16', None)
+def get_current_model():
+    """获取当前使用的模型及量化配置"""
+    # 使用模块级上下文以获取持久化的模型状态
+    context = get_app_context(prefer_module_context=True)
+    device_type = _get_device_type()
+    return {
+        'success': True,
+        'model': context.model_name,
+        'loading': context.model_loading,
+        'device_type': device_type,
+        'use_int8': os.environ.get('FORCE_INT8') == '1',
+        'use_bfloat16': os.environ.get('CPU_FORCE_BFLOAT16') == '1'
+    }, 200
+@require_admin
+def switch_model(switch_request):
+    """
+    切换模型（需要管理员权限）
+    Args:
+        switch_request: 切换请求字典，包含：
+            - model: 目标模型名称
+            - use_int8: 是否使用 INT8 量化（可选）
+            - use_bfloat16: 是否使用 bfloat16（可选，仅CPU）
+    Returns:
+        (响应字典, 状态码) 元组
+    """
+    if False:  # 原在线切换逻辑保留，不执行；恢复时请删除此守卫并测试
+        target_model = switch_request.get('model')
+        use_int8 = switch_request.get('use_int8', False)
+        use_bfloat16 = switch_request.get('use_bfloat16', False)
+        # 验证请求
+        if not target_model:
+            return {
+                'success': False,
+                'message': 'Missing model parameter'
+            }, 400
+        # 检查模型是否可用
+        if target_model not in REGISTERED_MODELS:
+            available_models = list(REGISTERED_MODELS.keys())
+            return {
+                'success': False,
+                'message': f'Model {target_model} does not exist. Available models: {", ".join(available_models)}'
+            }, 404
+        # 获取设备类型
+        device_type = _get_device_type()
+        # 验证量化参数与设备兼容性
+        if use_int8 and device_type == "mps":
+            return {
+                'success': False,
+                'message': 'INT8 quantization is not supported on MPS device'
+            }, 400
+        if use_bfloat16 and device_type != "cpu":
+            return {
+                'success': False,
+                'message': 'bfloat16 quantization is only supported on CPU device'
+            }, 400
+        if use_int8 and use_bfloat16:
+            return {
+                'success': False,
+                'message': 'Cannot enable both INT8 and bfloat16 quantization'
+            }, 400
+        # 使用模块级上下文以确保状态修改持久化（不会被后续请求重置）
+        context = get_app_context(prefer_module_context=True)
+        current_model = context.model_name
+        # 保存当前环境变量配置（用于回滚）
+        old_force_int8 = os.environ.get('FORCE_INT8')
+        old_force_bfloat16 = os.environ.get('CPU_FORCE_BFLOAT16')
+        # 检查是否已经是目标模型且量化配置相同
+        current_int8 = os.environ.get('FORCE_INT8') == '1'
+        current_bfloat16 = os.environ.get('CPU_FORCE_BFLOAT16') == '1'
+        if (current_model == target_model and
+                current_int8 == use_int8 and
+                current_bfloat16 == use_bfloat16):
+            return {
+                'success': True,
+                'message': f'Already using model {target_model} (same quantization configuration)',
+                'model': target_model
+            }, 200
+        # 检查模型是否正在加载中（初始加载或切换）
+        if context.model_loading:
+            return {
+                'success': False,
+                'message': 'Model is currently loading, please try again later'
+            }, 503
+        try:
+            # 标记开始加载
+            context.set_model_loading(True)
+            print(f"🔄 开始切换模型: {current_model} -> {target_model}")
+            # 设置新的量化环境变量
+            if use_int8:
+                os.environ['FORCE_INT8'] = '1'
+                print("   设置量化: INT8")
+            else:
+                os.environ.pop('FORCE_INT8', None)
+            if use_bfloat16:
+                os.environ['CPU_FORCE_BFLOAT16'] = '1'
+                print("   设置量化: bfloat16")
+            else:
+                os.environ.pop('CPU_FORCE_BFLOAT16', None)
+            # 卸载旧模型
+            if current_model and current_model in project_registry:
+                print(f"   卸载旧模型: {current_model}")
+                project_registry.unload(current_model)
+                gc.collect()
+                if device_type == "cuda":
+                    torch.cuda.empty_cache()
+                elif device_type == "mps":
+                    torch.mps.empty_cache()
+            # 加载新模型
+            print(f"   加载新模型: {target_model}")
+            project_registry.ensure_loaded(target_model)
+            # 更新当前模型
+            context.set_current_model(target_model)
+            print(f"✅ 模型切换成功: {target_model}")
+            return {
+                'success': True,
+                'message': f'Model switched to {target_model}',
+                'model': target_model
+            }, 200
+        except KeyError:
+            # 模型不存在（虽然前面已经检查过，但以防万一）
+            print(f"❌ 模型切换失败: 模型 {target_model} 未注册")
+            # 回滚：恢复旧模型名称和环境变量
+            context.set_current_model(current_model)
+            _restore_env_vars(old_force_int8, old_force_bfloat16)
+            return {
+                'success': False,
+                'message': f'Model {target_model} is not registered'
+            }, 404
+        except Exception as e:
+            # 加载失败，尝试回滚
+            print(f"❌ 模型切换失败: {e}")
+            print(f"   尝试回滚到旧模型: {current_model}")
+            try:
+                # 回滚：恢复环境变量和重新加载旧模型
+                _restore_env_vars(old_force_int8, old_force_bfloat16)
+                if current_model:
+                    project_registry.ensure_loaded(current_model)
+                    context.set_current_model(current_model)
+                    print(f"✅ 已回滚到旧模型: {current_model}")
+            except Exception as rollback_error:
+                print(f"⚠️  回滚失败: {rollback_error}")
+            return {
+                'success': False,
+                'message': f'Model switch failed: {str(e)}'
+            }, 500
+        finally:
+            # 无论成功还是失败，都要清除加载标志
+            context.set_model_loading(False)
+            gc.collect()
+    return (
+        {
+            'success': False,
+            'message': '在线模型切换已禁用，请通过命令行 --model / --semantic_model 指定后重启服务',
+        },
+        501,
+    )

backend/api/openai_completions.py ADDED Viewed

	@@ -0,0 +1,379 @@

+"""OpenAI 兼容 /v1/completions：语义分析同款模型续写，其余响应字段固定。"""
+import gc
+import queue
+import threading
+import time
+import traceback
+from typing import Any, Callable, Dict, List, Optional, Tuple
+from backend.model_manager import _inference_lock, get_semantic_model_display_name
+from backend.oom import exit_if_oom, is_oom_error
+from backend.completion_generator import (
+    PromptTooLongError,
+    apply_chat_template_for_completion,
+    completion_cancel_requested,
+    generate_completion_text,
+    global_completion_stop_event,
+    inference_shutdown_event,
+)
+from backend.api.analyze import LOCK_WAIT_TIMEOUT, QueueTimeoutError
+from backend.api.sse_utils import (
+    SSEProgressReporter,
+    send_completion_delta_event,
+    send_error_event,
+    send_result_event,
+)
+from backend.access_log import get_client_ip
+# 单次续写 SSE：从进入流式生成器起算的墙钟上限（含排队等推理锁 + 生成）。
+COMPLETION_WALL_CLOCK_TIMEOUT_SEC = 300.0
+def _log_cmpl_issue(request_id: int, msg: str) -> None:
+    """续写非正常结束时一行说明（与成功时的 ``_log_completion_finished`` 二选一）。"""
+    print(f"\t⚠️ openai_completions req_id={request_id}: {msg}")
+def _log_request(model: str, prompt: str, client_ip=None):
+    from backend.access_log import log_openai_completions_request
+    return log_openai_completions_request(model, prompt, client_ip)
+def _build_response(
+    completion_text: str,
+    finish_reason: str,
+    prompt_tokens: int,
+    completion_tokens: int,
+    bpe_strings: List[Dict[str, Any]],
+):
+    """OpenAICompletionsResponse：choices + usage；info_radar 为续写 token 级数据。"""
+    total = prompt_tokens + completion_tokens
+    return {
+        "id": "cmpl-stub-info-radar",
+        "object": "text_completion",
+        "created": int(time.time()),
+        "model": get_semantic_model_display_name(),
+        "choices": [
+            {
+                "text": completion_text,
+                "index": 0,
+                "finish_reason": finish_reason,
+            }
+        ],
+        "usage": {
+            "prompt_tokens": prompt_tokens,
+            "completion_tokens": completion_tokens,
+            "total_tokens": total,
+        },
+        "info_radar": {
+            "bpe_strings": bpe_strings,
+        },
+    }
+# 与 generate_completion_text 返回一致（末项 TTFT 秒；未生成时为 None）
+CompletionRunResult = Tuple[str, str, int, int, List[Dict[str, Any]], Optional[float]]
+def _completion_inference_after_lock(
+    prompt: str,
+    request_id: int,
+    lock_wait_time: float,
+    *,
+    stream_delta: Optional[Callable[[str, bool], None]] = None,
+    max_tokens: Optional[int] = None,
+) -> CompletionRunResult:
+    """
+    在已持有推理锁的上下文中执行续写（旧版非流式路径的持锁体内逻辑）。
+    流式可传 stream_delta；中止由 ``completion_cancel_requested()`` 统一判断。
+    """
+    from backend.access_log import log_openai_completions_start
+    log_openai_completions_start(request_id, lock_wait_time)
+    return generate_completion_text(prompt, stream_delta=stream_delta, max_tokens=max_tokens)
+def _log_completion_finished(
+    request_id: int,
+    prompt_tokens: int,
+    completion_tokens: int,
+    elapsed: float,
+    ttft_s: Optional[float],
+) -> None:
+    """旧非流式分支在返回 JSON 前、流式在发出末条 result 前的同一行日志。
+    prompt tokens/s = prompt_tokens / TTFT；generate tokens/s = completion_tokens / (elapsed − TTFT)。
+    ``elapsed`` 为 SSE 起点至结束；与 TTFT 计时原点不完全一致时，吞吐率为近似值。
+    无 TTFT（``ttft_s`` 为 ``None``）时不输出时间与吞吐字段。
+    """
+    if ttft_s is None:
+        tps_part = ""
+    else:
+        decode_s = elapsed - ttft_s
+        prompt_time_s = f"{ttft_s:.4f}" if ttft_s > 0 else "n/a"
+        gen_time_s = f"{decode_s:.4f}" if decode_s > 0 else "n/a"
+        prompt_part = f"{prompt_tokens / ttft_s:.2f}" if ttft_s > 0 else "n/a"
+        gen_part = (
+            f"{completion_tokens / decode_s:.2f}"
+            if completion_tokens and decode_s > 0
+            else "n/a"
+        )
+        tps_part = (
+            f", time= {prompt_time_s} / {gen_time_s}s, "
+            f"tokens/s= {prompt_part} / {gen_part}"
+        )
+    print(
+        f"\t📤 API openai_completions response: req_id={request_id}, "
+        f"prompt/generate tokens= {prompt_tokens} / {completion_tokens}, "
+        f"{tps_part}"
+    )
+def _generate_completion_events(
+    prompt: str,
+    request_id: int,
+    *,
+    max_tokens: Optional[int] = None,
+):
+    global_completion_stop_event.clear()
+    q: queue.Queue = queue.Queue()
+    start_time = time.perf_counter()
+    def run():
+        try:
+            lock_wait_start = time.perf_counter()
+            lock_acquired = _inference_lock.acquire(timeout=LOCK_WAIT_TIMEOUT)
+            if not lock_acquired:
+                q.put(("error", QueueTimeoutError(
+                    f"排队等待超过 {LOCK_WAIT_TIMEOUT} 秒，服务繁忙，请稍后重试"
+                )))
+                return
+            lock_wait_time = time.perf_counter() - lock_wait_start
+            try:
+                def stream_delta(text: str, stream_end: bool) -> None:
+                    if completion_cancel_requested():
+                        return
+                    q.put(("delta", text, stream_end))
+                result = _completion_inference_after_lock(
+                    prompt,
+                    request_id,
+                    lock_wait_time,
+                    stream_delta=stream_delta,
+                    max_tokens=max_tokens,
+                )
+            finally:
+                _inference_lock.release()
+                gc.collect()
+            q.put(("result", result))
+        except Exception as e:
+            q.put(("error", e))
+    worker = threading.Thread(target=run, daemon=True)
+    worker.start()
+    try:
+        while True:
+            elapsed = time.perf_counter() - start_time
+            if elapsed >= COMPLETION_WALL_CLOCK_TIMEOUT_SEC:
+                try:
+                    item = q.get_nowait()
+                except queue.Empty:
+                    global_completion_stop_event.set()
+                    _log_cmpl_issue(
+                        request_id,
+                        f"墙钟超时 {elapsed:.1f}s / 上限 {COMPLETION_WALL_CLOCK_TIMEOUT_SEC:.0f}s",
+                    )
+                    yield send_error_event(
+                        f"续写处理超过 {COMPLETION_WALL_CLOCK_TIMEOUT_SEC:.0f} 秒（墙钟限制），已中止",
+                        504,
+                    )
+                    return
+            else:
+                try:
+                    # 每 100ms 醒一次，检查一次是否到 60 秒
+                    item = q.get(timeout=0.1)
+                except queue.Empty:
+                    continue
+            kind = item[0]
+            if kind == "delta":
+                _, text, stream_end = item
+                if text or stream_end:
+                    yield send_completion_delta_event(text, stream_end)
+            elif kind == "result":
+                (
+                    _completion_text,
+                    finish_reason,
+                    prompt_tokens,
+                    completion_tokens,
+                    bpe_strings,
+                    ttft_s,
+                ) = item[1]
+                elapsed = time.perf_counter() - start_time
+                if global_completion_stop_event.is_set() or inference_shutdown_event.is_set():
+                    finish_reason = "abort"
+                if inference_shutdown_event.is_set():
+                    _log_cmpl_issue(
+                        request_id,
+                        f"进程终止，续写中止 elapsed={elapsed:.2f}s "
+                        f"tokens={prompt_tokens}/{completion_tokens}",
+                    )
+                elif global_completion_stop_event.is_set():
+                    _log_cmpl_issue(
+                        request_id,
+                        f"用户 Stop，续写中止 elapsed={elapsed:.2f}s "
+                        f"tokens={prompt_tokens}/{completion_tokens}",
+                    )
+                else:
+                    _log_completion_finished(
+                        request_id,
+                        prompt_tokens,
+                        completion_tokens,
+                        elapsed,
+                        ttft_s,
+                    )
+                yield send_result_event(
+                    _build_response(
+                        _completion_text,
+                        finish_reason,
+                        prompt_tokens,
+                        completion_tokens,
+                        bpe_strings,
+                    )
+                )
+                return
+            elif kind == "error":
+                err = item[1]
+                if isinstance(err, PromptTooLongError):
+                    _log_cmpl_issue(request_id, f"prompt too long: {err}")
+                    yield send_error_event(str(err), 400)
+                elif isinstance(err, QueueTimeoutError):
+                    _log_cmpl_issue(request_id, f"排队超时: {err}")
+                    yield send_error_event(str(err), 503)
+                else:
+                    exit_if_oom(err, defer_seconds=1)
+                    if is_oom_error(err):
+                        yield send_error_event(str(err), 500)
+                        return
+                    _log_cmpl_issue(
+                        request_id,
+                        "".join(
+                            traceback.format_exception(
+                                type(err), err, err.__traceback__
+                            )
+                        ).strip(),
+                    )
+                    yield send_error_event(str(err), 500)
+                return
+    finally:
+        gc.collect()
+def _completions_sse_response(
+    prompt: str,
+    request_id: int,
+    *,
+    max_tokens: Optional[int] = None,
+):
+    return SSEProgressReporter(
+        lambda: _generate_completion_events(prompt, request_id, max_tokens=max_tokens)
+    ).create_response()
+def completions_stop():
+    """
+    单用户串行：置位全局停止标志，使当前续写在 generate 与 SSE 回调中尽快结束。
+    无需 body；新一次 POST /v1/completions 时会在流式生成器入口清除该标志。
+    """
+    global_completion_stop_event.set()
+    return {"ok": True}, 200
+def completions_prompt(completions_prompt_request):
+    """
+    将用户原文套用 chat template，返回实际送入续写接口的完整 prompt 字符串（JSON）。
+    Args:
+        completions_prompt_request: 含 model、prompt（用户输入），见 server_openai_definitions.yaml
+    Returns:
+        (dict with prompt_used, 200) 或校验/过长错误
+    """
+    if not isinstance(completions_prompt_request, dict):
+        completions_prompt_request = {}
+    model = completions_prompt_request.get("model")
+    prompt = completions_prompt_request.get("prompt")
+    if not model:
+        return {"success": False, "message": "缺少 model 字段"}, 400
+    if prompt is None:
+        return {"success": False, "message": "缺少 prompt 字段"}, 400
+    if not isinstance(prompt, str):
+        return {"success": False, "message": "prompt 必须为字符串"}, 400
+    system_opt: Optional[str]
+    if "system" not in completions_prompt_request:
+        system_opt = None
+    else:
+        system_raw = completions_prompt_request.get("system")
+        if not isinstance(system_raw, str):
+            return {"success": False, "message": "system 必须为字符串"}, 400
+        system_opt = system_raw
+    client_ip = get_client_ip()
+    from backend.access_log import log_openai_completions_prompt_request
+    log_openai_completions_prompt_request(
+        model,
+        user_prompt=prompt,
+        system=system_opt,
+        client_ip=client_ip,
+    )
+    try:
+        prompt_used = apply_chat_template_for_completion(prompt, system_opt)
+    except PromptTooLongError as e:
+        return {"success": False, "message": str(e)}, 400
+    return {"prompt_used": prompt_used}, 200
+def completions(completions_request):
+    """
+    文本补写：与 analyze_semantic 共用推理锁与 semantic 模型；响应恒为 text/event-stream（SSE）。
+    ``prompt`` 须为已确定的完整模型输入（需 chat template 时请先调 POST /v1/completions/prompt）。
+    Args:
+        completions_request: 含 model、prompt 等，见 server_openai_definitions.yaml
+    Returns:
+        SSE Response；校验失败时 (错误体, 400/503/500)
+    """
+    if not isinstance(completions_request, dict):
+        completions_request = {}
+    model = completions_request.get("model")
+    prompt = completions_request.get("prompt")
+    if not model:
+        return {"success": False, "message": "缺少 model 字段"}, 400
+    if prompt is None:
+        return {"success": False, "message": "缺少 prompt 字段"}, 400
+    if not isinstance(prompt, str):
+        return {"success": False, "message": "prompt 必须为字符串"}, 400
+    max_tokens_raw = completions_request.get("max_tokens")
+    max_tokens: Optional[int]
+    if max_tokens_raw is None:
+        max_tokens = None
+    elif type(max_tokens_raw) is not int:
+        return {"success": False, "message": "max_tokens 须为正整数"}, 400
+    elif max_tokens_raw <= 0:
+        return {"success": False, "message": "max_tokens 须 > 0"}, 400
+    else:
+        max_tokens = max_tokens_raw
+    client_ip = get_client_ip()
+    request_id = _log_request(model, prompt, client_ip)
+    return _completions_sse_response(prompt, request_id, max_tokens=max_tokens)

backend/api/prediction_attribute.py ADDED Viewed

	@@ -0,0 +1,79 @@

+"""预测归因 API"""
+import gc
+import time
+from backend.model_manager import _inference_lock
+from backend.oom import exit_if_oom
+from backend.prediction_attributor import analyze_prediction_attribution
+from backend.api.analyze import LOCK_WAIT_TIMEOUT
+from backend.access_log import get_client_ip, log_prediction_attribute_request
+def prediction_attribute(attribution_request):
+    """
+    对上下文文本的下一 token 预测做归因分析。
+    Args:
+        attribution_request: 包含 context 和 target_prediction 的字典
+    Returns:
+        (响应字典, 状态码) 元组
+    """
+    context = attribution_request.get("context")
+    target_prediction = attribution_request.get("target_prediction")
+    model = attribution_request.get("model")
+    if context is None:
+        return {"success": False, "message": "Missing required field: context"}, 400
+    if not isinstance(context, str):
+        return {"success": False, "message": "context must be a string"}, 400
+    if context == "":
+        return {"success": False, "message": "Missing required field: context"}, 400
+    if target_prediction is not None and not isinstance(target_prediction, str):
+        return {"success": False, "message": "target_prediction must be a string"}, 400
+    if target_prediction == "":
+        return {"success": False, "message": "target_prediction must not be empty"}, 400
+    if model is None:
+        return {"success": False, "message": "Missing required field: model"}, 400
+    if not isinstance(model, str):
+        return {"success": False, "message": "model must be a string"}, 400
+    if model not in ("base", "instruct"):
+        return {"success": False, "message": 'model must be "base" or "instruct"'}, 400
+    client_ip = get_client_ip()
+    start_time = time.perf_counter()
+    request_id = log_prediction_attribute_request(context, target_prediction, model, client_ip)
+    lock_acquired = _inference_lock.acquire(timeout=LOCK_WAIT_TIMEOUT)
+    if not lock_acquired:
+        return {
+            "success": False,
+            "message": (
+                f"Queue wait exceeded {LOCK_WAIT_TIMEOUT} seconds; "
+                "server is busy, please try again later."
+            ),
+        }, 503
+    try:
+        result = analyze_prediction_attribution(context, target_prediction, model=model)
+    except ValueError as e:
+        return {"success": False, "message": str(e)}, 400
+    except Exception as e:
+        import traceback
+        traceback.print_exc()
+        exit_if_oom(e, defer_seconds=1)
+        return {"success": False, "message": str(e)}, 500
+    finally:
+        _inference_lock.release()
+        gc.collect()
+    elapsed = time.perf_counter() - start_time
+    tokens = len(result.get("token_attribution", []))
+    print(
+        f"\t📤 API prediction_attribute response: req_id={request_id}, "
+        f"tokens={tokens}, response_time={elapsed:.4f}s"
+    )
+    return {"success": True, **result}, 200

backend/api/sse_utils.py ADDED Viewed

	@@ -0,0 +1,181 @@

+"""Server-Sent Events (SSE) 工具模块"""
+import json
+import queue
+import time
+from typing import Callable, Generator, Optional, Tuple
+from flask import Response
+class SSEProgressReporter:
+    """SSE进度报告器"""
+    def __init__(self, generator_func: Callable):
+        """
+        初始化SSE进度报告器
+        Args:
+            generator_func: 生成器函数，用于生成SSE事件
+        """
+        self.generator_func = generator_func
+    def generate(self):
+        """生成SSE事件流"""
+        try:
+            for event in self.generator_func():
+                yield event
+        except Exception as e:
+            # 发送错误事件
+            error_data = {
+                'type': 'error',
+                'message': str(e)
+            }
+            yield f"data: {json.dumps(error_data)}\n\n"
+    def create_response(self) -> Response:
+        """创建SSE响应"""
+        return Response(
+            self.generate(),
+            mimetype='text/event-stream',
+            headers={
+                'Cache-Control': 'no-cache',
+                'X-Accel-Buffering': 'no',  # 禁用nginx缓冲
+                'Connection': 'keep-alive'
+            }
+        )
+def send_progress_event(step: int, total_steps: int, stage: str, percentage: Optional[int] = None, message: Optional[str] = None) -> str:
+    """
+    生成SSE进度事件
+    Args:
+        step: 当前步骤 (1-based)
+        total_steps: 总步骤数
+        stage: 阶段名称 (encoding, inference, processing)
+        percentage: 可选的进度百分比 (0-100)，仅在需要显示百分比的阶段提供
+        message: 可选的进度消息
+    Returns:
+        SSE格式的事件字符串
+    """
+    data = {
+        'type': 'progress',
+        'step': step,
+        'total_steps': total_steps,
+        'stage': stage
+    }
+    if percentage is not None:
+        data['percentage'] = percentage
+    if message:
+        data['message'] = message
+    return f"data: {json.dumps(data)}\n\n"
+def send_result_event(result: dict) -> str:
+    """
+    生成SSE结果事件
+    Args:
+        result: 分析结果字典
+    Returns:
+        SSE格式的事件字符串
+    """
+    data = {
+        'type': 'result',
+        'data': result
+    }
+    return f"data: {json.dumps(data)}\n\n"
+def send_completion_delta_event(text: str, stream_end: bool) -> str:
+    """续写流式：与 analyze 的 progress/result 并列，type=delta。"""
+    data = {
+        "type": "delta",
+        "text": text,
+    }
+    if stream_end:
+        data["stream_end"] = True
+    return f"data: {json.dumps(data)}\n\n"
+def send_prompt_used_event(prompt_used: str) -> str:
+    """续写流式：在首条 delta 之前下发实际送入模型的 prompt 原文。"""
+    data = {
+        "type": "prompt_used",
+        "prompt_used": prompt_used,
+    }
+    return f"data: {json.dumps(data)}\n\n"
+def send_error_event(message: str, status_code: Optional[int] = None) -> str:
+    """
+    生成SSE错误事件
+    Args:
+        message: 错误消息
+        status_code: 可选 HTTP 状态码，供非流式封装解析
+    Returns:
+        SSE格式的事件字符串
+    """
+    data = {'type': 'error', 'message': message}
+    if status_code is not None:
+        data['status_code'] = status_code
+    return f"data: {json.dumps(data)}\n\n"
+def consume_progress_queue(
+    progress_queue: queue.Queue,
+    analysis_done,
+    start_time: float,
+    timeout_seconds: float,
+    timeout_label: str = "分析",
+) -> Generator[Tuple[str, str], None, None]:
+    """
+    消费进度队列，yield (kind, event_str)。
+    kind: 'progress' | 'timeout' | 'done'
+    event_str: SSE 格式字符串（timeout 时含错误信息，done 时为空）
+    """
+    done_received = False
+    last_progress_info = None
+    while True:
+        elapsed = time.perf_counter() - start_time
+        if elapsed >= timeout_seconds:
+            progress_str = f" | {last_progress_info}" if last_progress_info else ""
+            print(f"⏱️ {timeout_label}超时: 处理时长 {elapsed:.2f}s 超过限制 {timeout_seconds}s，已放弃{progress_str}")
+            yield ('timeout', send_error_event(f"分析超时：处理时长超过 {timeout_seconds} 秒限制，已放弃"))
+            return
+        try:
+            event_data = progress_queue.get(timeout=0.1)
+            event_type = event_data[0]
+            if event_type == 'progress':
+                _, step, total_steps, stage, percentage = event_data
+                if total_steps > 0:
+                    last_progress_info = f"step={step}/{total_steps}"
+                else:
+                    last_progress_info = f"step={step}"
+                if stage:
+                    last_progress_info += f" stage={stage}"
+                if percentage is not None:
+                    last_progress_info += f" {percentage}%"
+                yield ('progress', send_progress_event(step, total_steps, stage, percentage))
+            elif event_type == 'done':
+                done_received = True
+                while not progress_queue.empty():
+                    try:
+                        remaining = progress_queue.get_nowait()
+                        if remaining[0] == 'progress':
+                            _, step, total_steps, stage, percentage = remaining
+                            yield ('progress', send_progress_event(step, total_steps, stage, percentage))
+                    except queue.Empty:
+                        break
+                yield ('done', '')
+                return
+        except queue.Empty:
+            if analysis_done.is_set() and done_received:
+                yield ('done', '')
+                return

backend/api/static.py ADDED Viewed

	@@ -0,0 +1,60 @@

+"""静态文件路由"""
+import mimetypes
+from pathlib import Path
+from urllib.parse import unquote
+from flask import Response, redirect, abort, request
+from werkzeug.utils import safe_join
+from backend.access_log import log_page_load, log_demo_file
+def _read_static_file(directory: str, path: str) -> Response:
+    """读取静态文件并返回 Response，避免 send_from_directory 在 ASGI/a2wsgi 下
+    流式传输导致的 Content-Length 不匹配（RuntimeError: Response content shorter than Content-Length）。
+    """
+    base = Path(directory).resolve()
+    safe_path = safe_join(str(base), path)
+    if safe_path is None:
+        abort(404)
+    full_path = Path(safe_path)
+    if not full_path.is_file() or not str(full_path.resolve()).startswith(str(base)):
+        abort(404)
+    content = full_path.read_bytes()
+    mimetype, _ = mimetypes.guess_type(path)
+    mimetype = mimetype or "application/octet-stream"
+    return Response(content, mimetype=mimetype, headers={"Content-Length": str(len(content))})
+def register_static_routes(app):
+    """注册静态文件路由"""
+    @app.route('/')
+    def redir():
+        target = 'client/index.html'
+        if request.query_string:
+            target += '?' + request.query_string.decode()
+        return redirect(target)
+    @app.route('/client/<path:path>')
+    def send_static(path):
+        """serves all files from ./client/dist/ to ``/client/<path:path>``"""
+        if path.endswith('.html'):
+            log_page_load(path)
+        return _read_static_file('client/dist', path)
+    @app.route('/demo/<path:path>')
+    def send_demo(path):
+        """serves all demo files from the demo dir to ``/demo/<path:path>``"""
+        from backend.app_context import get_data_dir
+        data_dir = get_data_dir()
+        log_demo_file(path)
+        try:
+            decoded_path = unquote(path)
+            return _read_static_file(str(data_dir), decoded_path)
+        except Exception:
+            try:
+                return _read_static_file(str(data_dir), path)
+            except Exception:
+                abort(404)

backend/api/utils.py ADDED Viewed

	@@ -0,0 +1,118 @@

+"""API 工具函数"""
+import math
+import os
+import traceback
+def round_to_sig_figs(x: float, n: int = 7) -> float:
+    """将浮点数舍入为 n 位有效数字。0 或非有限值原样返回。"""
+    if x == 0 or not math.isfinite(x):
+        return x
+    return float(f"{x:.{n}g}")
+from functools import wraps
+from pathlib import Path
+from flask import request, jsonify
+def get_demo_directory(create=False):
+    """获取 demo 目录路径"""
+    from backend.app_context import get_demo_directory as _get_demo_dir
+    return _get_demo_dir(create=create)
+def handle_api_error(operation_name: str, error: Exception) -> dict:
+    """
+    统一的 API 错误处理
+    Args:
+        operation_name: 操作名称（如 'Save failed'、'Delete failed'）
+        error: 异常对象
+    Returns:
+        标准错误响应字典
+    """
+    error_msg = f'{operation_name}: {str(error)}'
+    print(f"❌ {error_msg}")
+    traceback.print_exc()
+    return {
+        'success': False,
+        'message': error_msg
+    }
+def handle_api_success(result: dict, operation_name: str = None) -> dict:
+    """
+    处理 API 成功响应，打印日志
+    Args:
+        result: 操作结果字典
+        operation_name: 可选的操作名称，用于日志
+    Returns:
+        结果字典
+    """
+    if result.get('success'):
+        if operation_name:
+            print(f"✓ {operation_name}")
+        elif result.get('message'):
+            print(f"✓ {result.get('message')}")
+    else:
+        message = result.get('message', 'Operation failed')
+        print(f"❌ {message}")
+    return result
+def get_admin_token() -> str:
+    """
+    获取管理员token（从环境变量读取）
+    Returns:
+        管理员token字符串，如果未设置则返回None
+    """
+    return os.environ.get('INFORADAR_ADMIN_TOKEN')
+def validate_admin_token(request_token: str) -> tuple[bool, str]:
+    """
+    验证管理员token是否有效
+    Args:
+        request_token: 要验证的token
+    Returns:
+        (是否有效, 错误信息)
+    """
+    admin_token = get_admin_token()
+    # 如果未配置INFORADAR_ADMIN_TOKEN，返回未启用状态
+    if admin_token is None:
+        return False, 'Admin features are not enabled'
+    # 验证token
+    if request_token == admin_token:
+        return True, ''
+    else:
+        return False, 'Invalid admin token'
+def require_admin(f):
+    """
+    装饰器：要求管理员权限才能访问的API
+    检查请求头中的 X-Admin-Token 是否与配置的 INFORADAR_ADMIN_TOKEN 匹配
+    如果未配置 INFORADAR_ADMIN_TOKEN，视为全是普通用户，拒绝所有写操作
+    """
+    @wraps(f)
+    def wrapper(*args, **kwargs):
+        request_token = request.headers.get('X-Admin-Token')
+        is_valid, error_message = validate_admin_token(request_token)
+        if not is_valid:
+            return {
+                'success': False,
+                'message': 'Admin permission required'
+            }, 403
+        return f(*args, **kwargs)
+    return wrapper

backend/app_context.py ADDED Viewed

	@@ -0,0 +1,110 @@

+"""应用上下文管理
+使用类级别单例模式，提供进程级共享状态。
+"""
+import sys
+from pathlib import Path
+from typing import Optional
+from argparse import Namespace
+class AppContext:
+    """
+    应用上下文（进程级单例）
+    通过 AppContext.init() 初始化，通过 AppContext.get() 获取。
+    单例模式确保整个进程共享同一个上下文，避免模块重新导入导致的状态不一致。
+    """
+    _instance: Optional['AppContext'] = None
+    @classmethod
+    def get(cls) -> 'AppContext':
+        """获取上下文单例（必须先调用 init）"""
+        if cls._instance is None:
+            raise RuntimeError("AppContext 未初始化，请先调用 AppContext.init()")
+        return cls._instance
+    @classmethod
+    def init(cls, args: Namespace, data_dir: Path) -> 'AppContext':
+        """
+        初始化上下文单例（幂等操作）
+        如果已初始化则返回现有实例，确保模块重新导入时不会覆盖状态。
+        """
+        if cls._instance is not None:
+            return cls._instance
+        cls._instance = cls(args, data_dir)
+        gc = getattr(args, "gradient_checkpointing", True)
+        print(
+            f"[Info Radar] gradient_checkpointing={'on' if gc else 'off'}",
+            file=sys.stderr,
+            flush=True,
+        )
+        return cls._instance
+    @classmethod
+    def is_initialized(cls) -> bool:
+        """检查上下文是否已初始化"""
+        return cls._instance is not None
+    def __init__(self, args: Namespace, data_dir: Path):
+        """私有构造函数，请使用 AppContext.init()"""
+        self.args = args
+        self.data_dir = data_dir
+        self._model_loading = True  # 初始时处于加载状态
+        self._current_model_name = getattr(args, 'model', None)
+    @property
+    def model_name(self) -> str:
+        """当前模型名称"""
+        return self._current_model_name
+    @property
+    def model_loading(self) -> bool:
+        """模型是否正在加载"""
+        return self._model_loading
+    def set_current_model(self, model_name: str):
+        """设置当前模型名称"""
+        self._current_model_name = model_name
+    def set_model_loading(self, loading: bool):
+        """设置模型加载状态"""
+        self._model_loading = loading
+    def get_demo_dir(self, create: bool = False) -> Path:
+        """获取 demo 目录路径"""
+        from backend.data_utils import get_demo_dir
+        return get_demo_dir(self.data_dir, create=create)
+# ============= 兼容性接口（供旧代码平滑迁移）=============
+def get_app_context(prefer_module_context: bool = False) -> AppContext:
+    """获取应用上下文（兼容旧接口，prefer_module_context 参数已忽略）"""
+    return AppContext.get()
+def get_args() -> Namespace:
+    """获取命令行参数"""
+    return AppContext.get().args
+def get_verbose() -> bool:
+    """是否输出详细调试信息（由 --verbose 控制）"""
+    try:
+        return getattr(get_args(), "verbose", False)
+    except RuntimeError:
+        return False
+def get_data_dir() -> Path:
+    """获取数据目录"""
+    return AppContext.get().data_dir
+def get_demo_directory(create: bool = False) -> Path:
+    """获取 demo 目录"""
+    return AppContext.get().get_demo_dir(create=create)

backend/class_register.py ADDED Viewed

	@@ -0,0 +1,16 @@

+REGISTERED_MODELS = {}
+def register_model(name):
+    """
+    注册模型类的装饰器
+    自动将注册的模型名保存到类属性 _registered_model_name 中，
+    避免在子类初始化时重复指定模型名
+    """
+    def decorator(cls):
+        REGISTERED_MODELS[name] = cls
+        # 将注册的模型名保存到类属性中
+        cls._registered_model_name = name
+        return cls
+    return decorator

backend/completion_generator.py ADDED Viewed

	@@ -0,0 +1,558 @@

+"""
+OpenAI /v1/completions：core_generate_from_text 为唯一续写入口。
+Chat 模板拼装见 apply_chat_template_for_completion（供 POST /v1/completions/prompt）；
+POST /v1/completions 的 prompt 须为已确定的模型输入字符串。
+整段上下文 token 上限（prompt + 续写合计）为本模块 ``completion_max_token_length``；
+可选 max_tokens 限制续写长度，且与 prompt 之和不超过该上限。
+"""
+import signal
+import sys
+import threading
+import time
+from typing import Any, Callable, Dict, List, Optional, Tuple
+import torch
+from transformers import StoppingCriteria, StoppingCriteriaList, TextStreamer
+from backend.api.utils import round_to_sig_figs
+from backend.app_context import get_verbose
+from backend.device import DeviceManager
+from backend.model_manager import ensure_semantic_slot_ready
+from backend.pred_topk_format import pred_topk_pairs_from_probs_1d
+from backend.runtime_config import DEFAULT_TOPK
+# 续写路径：prompt + 续写合计不得超过该 token 数（与语义分析 runtime 无关）。
+completion_max_token_length = 1000
+# 特殊 token 亦视为分析/展示内容，故不跳过。
+_COMPLETION_DECODE_SKIP_SPECIAL = False
+# 进程收到 SIGTERM / SIGINT 时置位。
+inference_shutdown_event = threading.Event()
+# 单用户串行：用户 POST /v1/completions/stop、或 SSE 墙钟超时，与 inference_shutdown 一起在续写路径检查。
+# 新一次 POST /v1/completions（SSE 入口）时由 openai_completions clear。
+global_completion_stop_event = threading.Event()
+def completion_cancel_requested() -> bool:
+    """是否应停止当前续写（进程退出或全局停止）。"""
+    return inference_shutdown_event.is_set() or global_completion_stop_event.is_set()
+def register_inference_shutdown_handlers() -> None:
+    """注册 SIGTERM / SIGINT：置位 inference_shutdown_event，使 model.generate 尽快在下一步停止。
+    应在主线程、进程启动早期调用一次（如 server 加载时）。SIGINT 在置位后抛出 KeyboardInterrupt，便于开发态 Ctrl+C 退出。
+    """
+    def _on_sigterm(signum: int, frame: Any) -> None:
+        inference_shutdown_event.set()
+    def _on_sigint(signum: int, frame: Any) -> None:
+        inference_shutdown_event.set()
+        raise KeyboardInterrupt
+    try:
+        signal.signal(signal.SIGTERM, _on_sigterm)
+    except (ValueError, OSError):
+        pass
+    try:
+        signal.signal(signal.SIGINT, _on_sigint)
+    except (ValueError, OSError):
+        pass
+class PromptTooLongError(ValueError):
+    """prompt 过长或占满上下文导致无法续写（``input_len >= ctx_limit`` 时由 ``core_generate_from_text`` 抛出）。"""
+def _completion_without_generate(
+    prompt_tokens: int,
+) -> Tuple[str, str, int, int, List[Dict[str, Any]], Optional[float]]:
+    """取消续写时未进入 ``model.generate`` 的返回（与前端 ``abort`` 展示一致）。"""
+    return "", "abort", prompt_tokens, 0, [], None
+def _print_completion_stream_delta(text: str, stream_end: bool) -> None:
+    """接收 TextStreamer 切分好的增量片段，由本模块打印（与默认 TextStreamer 输出一致）。"""
+    # 仅在verbose时打印
+    if get_verbose():
+        print(text, flush=True, end="" if not stream_end else None)
+def _compose_stream_delta(
+    stream_delta: Optional[Callable[[str, bool], None]],
+) -> Callable[[str, bool], None]:
+    """
+    将可选的 SSE/外部 stream_delta 与本地 verbose 打印组合：二者互不替代，可同时生效。
+    """
+    def on_delta(text: str, stream_end: bool) -> None:
+        if stream_delta is not None:
+            stream_delta(text, stream_end)
+        _print_completion_stream_delta(text, stream_end)
+    return on_delta
+class _DeltaTextStreamer(TextStreamer):
+    """继承 put/end 的增量切分逻辑，只把片段交给回调，不直接 print。"""
+    def __init__(
+        self,
+        tokenizer,
+        on_delta: Callable[[str, bool], None],
+        *,
+        skip_prompt: bool = False,
+        **decode_kwargs: Any,
+    ) -> None:
+        super().__init__(tokenizer, skip_prompt=skip_prompt, **decode_kwargs)
+        self._on_delta = on_delta
+    def on_finalized_text(self, text: str, stream_end: bool = False) -> None:
+        self._on_delta(text, stream_end)
+class _CancelOnEventStoppingCriteria(StoppingCriteria):
+    """每步检查 ``completion_cancel_requested()``，尽快结束 generate。"""
+    def __call__(
+        self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs: Any
+    ) -> torch.BoolTensor:
+        # StoppingCriteria 约定：返回与 batch 等长的 bool 向量，True 表示该行本步停止生成。
+        batch_size = input_ids.shape[0]
+        cancel_requested = completion_cancel_requested()
+        return torch.full(
+            (batch_size,),
+            fill_value=cancel_requested,
+            device=input_ids.device,
+            dtype=torch.bool,
+        )
+def _stack_scores_to_cpu(
+    scores: Tuple[torch.Tensor, ...],
+) -> torch.Tensor:
+    """将 ``generate(..., output_scores=True)`` 的 scores 元组沿 batch 维拼成 ``[n, vocab]``，并一次搬到 CPU。"""
+    if not scores:
+        return torch.empty(0, 0)
+    # 每步形状为 (batch, vocab)，greedy batch=1 时 cat(dim=0) -> (n, vocab)
+    return torch.cat(scores, dim=0).detach().cpu()
+def _print_completion_warning(msg: str) -> None:
+    print(msg, file=sys.stderr, flush=True)
+def _completion_one_token_debug(tokenizer, token_id: int) -> str:
+    """续写路径调试用：单 token 的 id 与 decode（repr 便于观察空白/换行）。"""
+    decoded = tokenizer.decode([token_id], skip_special_tokens=False)
+    return f"id={token_id}, decode={decoded!r}"
+def _warn_decode_reencode_mismatch(
+    tokenizer,
+    *,
+    n: int,
+    mismatch_count: int,
+    first: int,
+    new_cpu: torch.Tensor,
+    reencoded: torch.Tensor,
+) -> None:
+    """token 序列不一致时警告（文案与原 RuntimeError 一致），随后走增量 decode offset。"""
+    g0 = int(new_cpu[first].item())
+    r0 = int(reencoded[first].item())
+    lines = [
+        "续写段 decode→encode 与 generate 的 token 序列不一致，无法使用 offset_mapping。",
+        f"  共 {n} token，其中 {mismatch_count} 处 id 不同（首处 index={first}）。",
+        "  首处:",
+        f"    generate  {_completion_one_token_debug(tokenizer, g0)}",
+        f"    reencode  {_completion_one_token_debug(tokenizer, r0)}",
+    ]
+    nxt = first + 1
+    if nxt < n:
+        g1 = int(new_cpu[nxt].item())
+        r1 = int(reencoded[nxt].item())
+        lines.extend(
+            [
+                f"  后一处 (index={nxt}):",
+                f"    generate  {_completion_one_token_debug(tokenizer, g1)}",
+                f"    reencode  {_completion_one_token_debug(tokenizer, r1)}",
+            ]
+        )
+    _print_completion_warning("\n".join(lines))
+def _warn_decode_reencode_length_mismatch(
+    new_cpu: torch.Tensor,
+    reencoded: torch.Tensor,
+) -> None:
+    msg = (
+        "续写段 decode→encode 与 generate 的 token 序列不一致（长度不同），无法使用 offset_mapping。\n"
+        f"  new_ids:   shape={tuple(new_cpu.shape)}\n"
+        f"  reencode:  shape={tuple(reencoded.shape)}"
+    )
+    _print_completion_warning(msg)
+def _lcp_prefix_len(a: str, b: str) -> int:
+    """``a`` 与 ``b`` 的最长公共前缀长度（Python ``str`` 下标，Unicode 标量）。 """
+    k, n = 0, min(len(a), len(b))
+    while k < n and a[k] == b[k]:
+        k += 1
+    return k
+def _verbose_incremental_offset_step(
+    *,
+    step_1based: int,
+    n_tokens: int,
+    token_id: int,
+    tokenizer,
+    skip: bool,
+    offset: Tuple[int, int],
+    matched: int,
+    curr_len: int,
+    raw: str,
+) -> None:
+    """verbose：本步 ``offset``/``raw``；LCP 未盖满前缀时附 ``single_decode``。"""
+    if not get_verbose():
+        return
+    s, e = offset
+    raw_show = raw if len(raw) <= 240 else raw[:237] + "..."
+    line = (
+        f"[incremental-offset] step {step_1based}/{n_tokens} id={token_id} "
+        f"offset=[{s},{e}) raw={raw_show!r}"
+    )
+    if matched < curr_len:
+        one = tokenizer.decode([token_id], skip_special_tokens=skip)
+        line += f" (bpe mismatch) single_decode={one!r}"
+    _print_completion_warning(line)
+def _print_full_decode_text_mismatch(full_decode: str, text: str) -> None:
+    """整段 ``decode(ids)`` 与 ``completion_text`` 不等时打印一行级诊断。"""
+    lines = [
+        "续写段整段 decode 与 completion_text 不一致：",
+        f"  len(decode)={len(full_decode)}, len(text)={len(text)}",
+    ]
+    n = min(len(full_decode), len(text))
+    first_diff = next((k for k in range(n) if full_decode[k] != text[k]), None)
+    if first_diff is not None:
+        a, b = full_decode[first_diff], text[first_diff]
+        lines.append(f"  首处 index={first_diff}: {a!r} vs {b!r}")
+    elif len(full_decode) != len(text):
+        lines.append("  同源码点前缀一致，仅长度不同。")
+    _print_completion_warning("\n".join(lines))
+def _completion_incremental_offsets_and_raws(
+    tokenizer,
+    new_ids: torch.Tensor,
+    completion_text: str,
+    *,
+    skip: bool,
+) -> Tuple[List[Tuple[int, int]], List[str]]:
+    """
+    慢路径：解码器码点。第 ``i`` 步 ``curr = decode(ids[:i+1])``，
+    ``matched = LCP(curr, completion_text)``（自 0 全量比较，避免 decode 非单调时增量 LCP 偏差）；
+    ``offset``：若 ``matched < len(curr)``（前缀与全文前沿未对齐），则 ``(off_left, off_left)``；
+    否则 ``(off_left, len(curr))``。``raw`` 恒为 ``curr[off_left:]``。
+    未对齐时 BPE 与全文对不齐，乱码段码点数、``offset`` 无可靠展示语义；右界收拢为左界仅为避免
+    前端按 ``completion_text`` 切片校验 ``raw`` 时报错（零宽区间不取切片）。
+    ``off_left``：首步 ``0``；若上一步 ``matched == len(curr)``，则 ``off_left = matched``；若上一步
+    ``matched < len(curr)``，则冻结 ``off_left`` 直至再次出现完全对齐步。
+    须 ``decode(ids) == completion_text``，否则报错。
+    """
+    ids = [int(t) for t in new_ids.tolist()]
+    n_tok = len(ids)
+    offsets: List[Tuple[int, int]] = []
+    raws: List[str] = []
+    off_left = 0
+    # 每步对前缀 ``ids[:i+1]`` 整段 decode；重复切片为语义所需，非疏忽。
+    for i in range(n_tok):
+        curr = tokenizer.decode(ids[: i + 1], skip_special_tokens=skip)
+        matched = _lcp_prefix_len(curr, completion_text)
+        curr_len = len(curr)
+        raw = curr[off_left:]
+        # 未对齐：乱码长度与 offset 无可靠意义；右界=左界，避免前端 text[s:e]==raw 类校验失败。
+        if matched < curr_len:
+            off = (off_left, off_left)
+        else:
+            off = (off_left, curr_len)
+        # _verbose_incremental_offset_step(
+        #     step_1based=i + 1,
+        #     n_tokens=n_tok,
+        #     token_id=ids[i],
+        #     tokenizer=tokenizer,
+        #     skip=skip,
+        #     offset=off,
+        #     matched=matched,
+        #     curr_len=curr_len,
+        #     raw=raw,
+        # )
+        offsets.append(off)
+        raws.append(raw)
+        if matched == len(curr):
+            off_left = matched
+    full = tokenizer.decode(ids, skip_special_tokens=skip)
+    if full != completion_text:
+        _print_full_decode_text_mismatch(full, completion_text)
+        raise RuntimeError(
+            "续写段 decode(ids) 与 completion_text 不一致，无法填解码器坐标 offset/raw。"
+        )
+    return offsets, raws
+def _build_generated_bpe_strings(
+    tokenizer,
+    new_ids: torch.Tensor,
+    scores_logits: torch.Tensor,
+    top_k: int,
+    completion_text: str,
+) -> List[Dict[str, Any]]:
+    """
+    续写段每个生成 token 的信息密度风格条目：offset/raw（相对续写全文）、real_topk、pred_topk。
+    new_ids：1D int64，须已在 CPU，与 generate 输出一致。
+    scores_logits：float，形状 ``[n, vocab]``，须已在 CPU（避免逐步 GPU softmax / .item() 往返）。
+    completion_text：与 ``tokenizer.decode(new_ids, skip_special_tokens=...)`` 使用同一套参数得到的续写原文（调用方已 decode 一次，避免重复）。
+    若整段 encode 与 ``new_ids`` 一致则用 ``offset_mapping``（快路径，offset 为 ``completion_text`` 内下标）；
+    否则用增量 decode（慢路径）：LCP 未盖满前缀时 ``offset`` 为 ``(off_left, off_left)``（见该函数注释：主要为避免前端切片校验报错），否则 ``(off_left, len(curr))``；``raw`` 恒为 ``curr[off_left:]``。
+    """
+    n = int(new_ids.numel())
+    if n == 0:
+        return []
+    if scores_logits.dim() != 2 or scores_logits.shape[0] != n:
+        raise RuntimeError(
+            f"scores_logits 形状与 new_ids 不一致：scores_logits.shape={tuple(scores_logits.shape)}, n={n}"
+        )
+    top_k = min(top_k, int(scores_logits.shape[-1]))
+    new_cpu = new_ids.detach().cpu()
+    skip = _COMPLETION_DECODE_SKIP_SPECIAL
+    enc = tokenizer(
+        completion_text,
+        return_tensors="pt",
+        return_offsets_mapping=True,
+        add_special_tokens=False,
+    )
+    reencoded = enc["input_ids"][0]
+    ids_match = reencoded.shape == new_cpu.shape and torch.equal(reencoded, new_cpu)
+    incremental_raws: Optional[List[str]]
+    if ids_match:
+        offset_mapping = enc["offset_mapping"][0].tolist()
+        incremental_raws = None
+    else:
+        if reencoded.shape != new_cpu.shape:
+            _warn_decode_reencode_length_mismatch(new_cpu, reencoded)
+        else:
+            diff = reencoded != new_cpu
+            first = int(torch.where(diff)[0][0].item())
+            _warn_decode_reencode_mismatch(
+                tokenizer,
+                n=n,
+                mismatch_count=int(diff.sum().item()),
+                first=first,
+                new_cpu=new_cpu,
+                reencoded=reencoded,
+            )
+        print("已使用增量 decode 对齐路径；结果不受影响。", flush=True)
+        offset_mapping, incremental_raws = _completion_incremental_offsets_and_raws(
+            tokenizer, new_cpu, completion_text, skip=skip
+        )
+    out: List[Dict[str, Any]] = []
+    for step in range(n):
+        logits = scores_logits[step]
+        probs = torch.softmax(logits, dim=-1)
+        tid = int(new_ids[step].item())
+        s, e = offset_mapping[step]
+        if incremental_raws is not None:
+            raw = incremental_raws[step]
+        else:
+            raw = completion_text[s:e] if s < e else ""
+        out.append(
+            {
+                "offset": [s, e],
+                "raw": raw,
+                "real_topk": [0, round_to_sig_figs(float(probs[tid].item()))],
+                "pred_topk": pred_topk_pairs_from_probs_1d(probs, tokenizer, top_k),
+            }
+        )
+    return out
+def core_generate_from_text(
+    formatted_text: str,
+    *,
+    stream_delta: Optional[Callable[[str, bool], None]] = None,
+    max_tokens: Optional[int] = None,
+) -> Tuple[str, str, int, int, List[Dict[str, Any]], Optional[float]]:
+    """
+    对一段已确定的模型输入字符串做自回归续写（默认贪心；函数内 ``_use_low_temp_sampling`` 可临时切到低温采样）。
+    编码后 prompt token 数不得超过上下文上限；续写步数不超过「剩余上下文」且不超过可选 ``max_tokens``。
+    中止条件见 ``completion_cancel_requested()``（进程信号、全局停止含用户 Stop / 墙钟超时）。
+    Args:
+        stream_delta: 可选；若提供则额外调用（如 SSE）。本地 verbose 打印由 ``_print_completion_stream_delta`` 单独控制，与是否传入 stream_delta 无关。
+        max_tokens: 可选；正整数，限制本次最多生成多少个新 token（与 ``min(max_tokens, 上限 − prompt)`` 取小）。省略则用尽剩余上下文额度。
+    Returns:
+        (续写文本, finish_reason, prompt_tokens, completion_tokens, 续写段 bpe_strings, ttft_s)。
+        ttft_s 为自 ``model.generate`` 起至首次产出续写片段的秒数；仅取消时为 ``None``。
+    """
+    tokenizer, model, device = ensure_semantic_slot_ready()
+    ctx_limit = completion_max_token_length
+    model.eval()
+    enc = tokenizer(formatted_text, return_tensors="pt")
+    input_ids = enc["input_ids"].to(device)
+    input_len = input_ids.shape[1]
+    n = int(input_len)
+    if n >= ctx_limit:
+        raise PromptTooLongError(
+            "Prompt too long: "
+            f"{n} tokens (context limit is {ctx_limit} tokens; prompt plus completion must not exceed this limit)."
+        )
+    remaining = ctx_limit - n
+    if max_tokens is None:
+        effective_max_new = remaining
+    else:
+        effective_max_new = min(max_tokens, remaining)
+    if get_verbose():
+        print(
+            f"📌 completion: 推理原文 (tokens={input_len}, ctx_limit={ctx_limit}, max_new={effective_max_new}):\n"
+            f"{formatted_text}",
+            end="", # 不换行, 用于和后续打印推理结果拼在一起
+        )
+    prompt_tokens = int(input_len)
+    # 主要防止：排队等推理锁期间用户已取消，拿到锁后在此短路，避免无意义进入 generate。
+    # 墙钟 / 进程信号等其它情况较少见。
+    if completion_cancel_requested():
+        return _completion_without_generate(prompt_tokens)
+    try:
+        base_on_delta = _compose_stream_delta(stream_delta)
+        ttft_seconds: Optional[float] = None
+        gen_start_t0 = 0.0
+        def on_delta_with_ttft(text: str, stream_end: bool) -> None:
+            nonlocal ttft_seconds
+            if ttft_seconds is None:
+                ttft_seconds = time.perf_counter() - gen_start_t0
+            base_on_delta(text, stream_end)
+        streamer = _DeltaTextStreamer(
+            tokenizer,
+            on_delta_with_ttft,
+            skip_prompt=True,
+            skip_special_tokens=_COMPLETION_DECODE_SKIP_SPECIAL,
+        )
+        # 临时实验：置 True 启用低温采样；默认 False 为贪心解码（可复现）。
+        _use_low_temp_sampling = False
+        _low_temperature = 0.2
+        gen_kw: Dict[str, Any] = {
+            "input_ids": input_ids,
+            "max_new_tokens": effective_max_new,
+            "return_dict_in_generate": True,
+            "output_scores": True,
+            "streamer": streamer,
+            "stopping_criteria": StoppingCriteriaList([_CancelOnEventStoppingCriteria()]),
+        }
+        if _use_low_temp_sampling:
+            gen_kw["do_sample"] = True
+            gen_kw["temperature"] = _low_temperature
+        else:
+            gen_kw["do_sample"] = False
+        gen_start_t0 = time.perf_counter()
+        with torch.inference_mode():
+            outputs = model.generate(**gen_kw)
+        if device.type == "cuda":
+            torch.cuda.synchronize(device)
+        elif device.type == "mps":
+            torch.mps.synchronize()
+        gen = outputs.sequences
+        new_ids = gen[0, input_len:].detach().cpu().contiguous()
+        text = tokenizer.decode(new_ids, skip_special_tokens=_COMPLETION_DECODE_SKIP_SPECIAL)
+        if outputs.scores is None:
+            raise RuntimeError("model.generate 未返回 scores（需 output_scores=True）")
+        if new_ids.numel() == 0:
+            bpe_strings: List[Dict[str, Any]] = []
+        else:
+            # [len, vocab_size] 的 float32 logits
+            # 内存开销 1000 token x qwen 150k ~= 600MB
+            scores_cpu = _stack_scores_to_cpu(outputs.scores)
+            bpe_strings = _build_generated_bpe_strings(
+                tokenizer, new_ids, scores_cpu, DEFAULT_TOPK, text
+            )
+        # 续写增量已由 _print_completion_stream_delta 打印，此处不再重复打印全文
+        if completion_cancel_requested():
+            # 用户 Stop / 进程中止等：StoppingCriteria 提前结束时 new_ids 常少于上限，
+            # 勿用 "stop"（OpenAI 语义多为自然结束），否则前端会误显示为 EOS。
+            finish_reason = "abort"
+        else:
+            finish_reason = "length" if new_ids.numel() >= effective_max_new else "stop"
+        prompt_tokens = int(input_len)
+        completion_tokens = int(new_ids.numel())
+        return text, finish_reason, prompt_tokens, completion_tokens, bpe_strings, ttft_seconds
+    finally:
+        DeviceManager.clear_cache(device)
+def apply_chat_template_for_completion(
+    user_content: str,
+    system: Optional[str] = None,
+) -> str:
+    """
+    将单条 user 文本套用到 tokenizer chat template，返回实际送入 core_generate_from_text 的字符串。
+    调用方未传入 ``system``（即 ``None``）时仅拼装单条 user 消息；传入字符串时（含 ``\"\"``、仅空白）
+    原样作为 chat template 的 system 段，不做裁剪或改写。长度与上下文上限由 ``core_generate_from_text``
+    在生成前校验。
+    """
+    tokenizer, _, _ = ensure_semantic_slot_ready()
+    if system is None:
+        messages = [{"role": "user", "content": user_content}]
+    else:
+        messages = [
+            {"role": "system", "content": system},
+            {"role": "user", "content": user_content},
+        ]
+    return tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True,
+        enable_thinking=False,
+    )
+def generate_completion_text(
+    prompt: str,
+    stream_delta: Optional[Callable[[str, bool], None]] = None,
+    *,
+    max_tokens: Optional[int] = None,
+) -> Tuple[str, str, int, int, List[Dict[str, Any]], Optional[float]]:
+    """
+    ``prompt`` 须为已确定的完整模型输入（不再在服务端套用 chat template）。
+    流式可传 stream_delta；中止由 ``completion_cancel_requested()`` 统一判断。
+    ``max_tokens`` 为可选的正整数续写上限（与 API 约定一致）。
+    """
+    return core_generate_from_text(prompt, stream_delta=stream_delta, max_tokens=max_tokens)

backend/data_utils.py ADDED Viewed

	@@ -0,0 +1,97 @@

+import json
+import os
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+DEFAULT_DATA_DIR = Path(os.path.abspath("data/demo/public"))
+def resolve_data_dir(dir_arg: Optional[str]) -> Path:
+    """
+    Resolve the base data directory from CLI args or fall back to demo/public.
+    """
+    if dir_arg:
+        return Path(dir_arg).expanduser().absolute()
+    return DEFAULT_DATA_DIR
+def get_demo_dir(data_dir: Path, create: bool = False) -> Path:
+    """Return the demo directory under the given data dir, optionally creating it."""
+    # data_dir 此时默认就是 data/demo/public 的绝对路径
+    demo_dir = data_dir
+    if create:
+        demo_dir.mkdir(parents=True, exist_ok=True)
+    return demo_dir
+def list_demo_files(demo_dir: Path) -> List[Dict[str, str]]:
+    """Return sorted demo metadata from a directory. Missing dirs result in empty list."""
+    if not demo_dir.exists():
+        return []
+    demo_list = []
+    for file_path in demo_dir.glob("*.json"):
+        demo_list.append(
+            {
+                "name": file_path.stem,
+                "file": file_path.name,
+            }
+        )
+    demo_list.sort(key=lambda item: item["name"])
+    return demo_list
+def sanitize_demo_name(name: str) -> str:
+    """Remove unsafe characters from a demo name to create a safe filename."""
+    unsafe_chars = ['/', '\\', ':', '*', '?', '"', '<', '>', '|']
+    safe_name = name or ""
+    for char in unsafe_chars:
+        safe_name = safe_name.replace(char, '_')
+    safe_name = safe_name.strip(' .')
+    return safe_name[:200]
+def save_demo_payload(demo_dir: Path, name: str, data: Dict[str, Any], path: str = "", overwrite: bool = False) -> Dict[str, Any]:
+    """
+    Persist an AnalyzeResponse payload as a demo JSON file.
+    Args:
+        demo_dir: demo目录的绝对路径
+        name: demo文件名（不含扩展名）
+        data: 要保存的数据
+        path: 保存路径，可以是 ""、"/" 或以 "/" 开头的路径，默认为根目录
+        overwrite: 是否覆盖已存在的文件，默认为False
+    """
+    from backend.path_utils import resolve_demo_path
+    safe_name = sanitize_demo_name(name)
+    if not safe_name:
+        return {"success": False, "message": "文件名无效"}
+    # 解析目标路径
+    target_dir = resolve_demo_path(demo_dir, path)
+    if target_dir is None:
+        return {"success": False, "message": f"无效的保存路径: {path}"}
+    # 确保目标目录存在
+    target_dir.mkdir(parents=True, exist_ok=True)
+    file_path = target_dir / f"{safe_name}.json"
+    # 检查文件是否存在
+    if file_path.exists() and not overwrite:
+        return {
+            "success": False,
+            "exists": True,
+            "message": f'文件 "{safe_name}.json" 已存在',
+            "file": file_path.name,
+        }
+    with open(file_path, "w", encoding="utf-8") as f:
+        json.dump(data, f, ensure_ascii=False, indent=2)
+    return {
+        "success": True,
+        "message": f'Demo "{name}" 保存成功',
+        "file": file_path.name,
+    }

backend/demo_folder.py ADDED Viewed

	@@ -0,0 +1,339 @@

+"""
+Demo文件夹操作模块
+提供文件夹和文件的列表、移动、重命名、删除等功能
+"""
+import os
+import shutil
+import time
+from pathlib import Path
+from typing import Dict, List, Optional
+from backend.path_utils import (
+    normalize_path,
+    check_path_in_demo_dir,
+    validate_demo_path,
+    resolve_demo_path
+)
+# ==================== 辅助函数 ====================
+def _normalize_path(path: str) -> str:
+    """统一处理路径：将空字符串转换为 "/" （向后兼容包装器）"""
+    return normalize_path(path)
+def _build_api_path(parent_path: str, item_name: str) -> str:
+    """构建API路径格式（统一使用 "/" 开头的格式）"""
+    if parent_path and parent_path != "/":
+        return f"{parent_path}/{item_name}"
+    return f"/{item_name}"
+def _error_response(message: str) -> Dict[str, any]:
+    """统一错误响应格式"""
+    return {"success": False, "message": message}
+def _success_response(message: str) -> Dict[str, any]:
+    """统一成功响应格式"""
+    return {"success": True, "message": message}
+def _get_timestamped_name(base_name: str, extension: str = "") -> str:
+    """生成带时间戳的名称"""
+    timestamp = int(time.time())
+    return f"{base_name}_{timestamp}{extension}"
+def _ensure_deleted_dir(demo_dir: Path) -> Path:
+    """确保.deleted目录存在并返回路径"""
+    deleted_dir = demo_dir.resolve() / '.deleted'
+    deleted_dir.mkdir(parents=True, exist_ok=True)
+    return deleted_dir
+def _validate_json_file(file_path: Path) -> Optional[str]:
+    """验证文件存在且为JSON文件，返回错误消息或None"""
+    if not file_path.exists():
+        return "文件不存在"
+    if not file_path.is_file():
+        return "路径不是文件"
+    if file_path.suffix != '.json':
+        return "只能操作JSON文件"
+    return None
+def _validate_folder(folder_path: Path) -> Optional[str]:
+    """验证文件夹存在，返回错误消息或None"""
+    if not folder_path.exists():
+        return "文件夹不存在"
+    if not folder_path.is_dir():
+        return "路径不是文件夹"
+    return None
+# ==================== 文件系统操作函数 ====================
+# 核心路径处理函数已移至 backend/path_utils.py
+def list_demo_items(demo_dir: Path, path: str = "") -> Dict[str, any]:
+    """返回指定路径下的文件夹和文件列表，自动忽略隐藏文件夹"""
+    normalized_path = _normalize_path(path)
+    target_dir = resolve_demo_path(demo_dir, normalized_path)
+    if not target_dir or not target_dir.exists():
+        return {"path": normalized_path, "items": []}
+    items = []
+    try:
+        for item_path in target_dir.iterdir():
+            if item_path.name.startswith('.'):
+                continue
+            if item_path.is_dir():
+                items.append({
+                    "type": "folder",
+                    "name": item_path.name,
+                    "path": _build_api_path(normalized_path, item_path.name)
+                })
+            elif item_path.is_file() and item_path.suffix == '.json':
+                items.append({
+                    "type": "file",
+                    "name": item_path.stem,
+                    "path": _build_api_path(normalized_path, item_path.name)
+                })
+    except Exception as e:
+        import traceback
+        print(f"❌ 扫描目录失败: {e}")
+        traceback.print_exc()
+        return {"path": normalized_path, "items": []}
+    # 排序：文件夹在前，文件在后，各自按名称排序
+    folders = sorted([item for item in items if item["type"] == "folder"], key=lambda x: x["name"])
+    files = sorted([item for item in items if item["type"] == "file"], key=lambda x: x["name"])
+    return {"path": path, "items": folders + files}
+def get_all_folders(demo_dir: Path, exclude_path: Optional[str] = None) -> List[str]:
+    """递归获取所有文件夹列表（用于移动操作），自动忽略隐藏文件夹"""
+    folders = []
+    def _scan_directory(current_dir: Path, current_path: str):
+        """递归扫描目录"""
+        try:
+            for item in current_dir.iterdir():
+                if item.name.startswith('.'):
+                    continue
+                if item.is_dir():
+                    folder_path = _build_api_path(current_path, item.name)
+                    if exclude_path and (folder_path == exclude_path or folder_path.startswith(exclude_path + "/")):
+                        continue
+                    folders.append(folder_path)
+                    _scan_directory(item, folder_path)
+        except Exception as e:
+            import traceback
+            print(f"❌ 扫描文件夹失败: {e}")
+            traceback.print_exc()
+    _scan_directory(demo_dir, "/")
+    folders.insert(0, "/")
+    return folders
+def move_demo_file(demo_dir: Path, source_path: str, target_path: str) -> Dict[str, any]:
+    """移动demo文件"""
+    source_file = resolve_demo_path(demo_dir, source_path)
+    if not source_file:
+        return _error_response(f"源文件不存在: {source_path}")
+    error_msg = _validate_json_file(source_file)
+    if error_msg:
+        return _error_response(f"源文件{error_msg}: {source_path}" if "不存在" not in error_msg else error_msg)
+    target_dir = resolve_demo_path(demo_dir, target_path)
+    if not target_dir:
+        return _error_response(f"无效的目标路径: {target_path}")
+    target_dir.mkdir(parents=True, exist_ok=True)
+    target_file = target_dir / source_file.name
+    if target_file.exists() and target_file != source_file:
+        return _error_response(f"目标位置已存在同名文件: {source_file.name}")
+    try:
+        shutil.move(str(source_file), str(target_file))
+        return _success_response(f"文件已移动到 {target_path}")
+    except Exception as e:
+        return _error_response(f"移动失败: {str(e)}")
+def rename_demo_file(demo_dir: Path, file_path: str, new_name: str) -> Dict[str, any]:
+    """重命名demo文件"""
+    from backend.data_utils import sanitize_demo_name
+    source_file = resolve_demo_path(demo_dir, file_path)
+    if not source_file:
+        return _error_response(f"文件不存在: {file_path}")
+    error_msg = _validate_json_file(source_file)
+    if error_msg:
+        return _error_response(error_msg)
+    safe_name = sanitize_demo_name(new_name)
+    if not safe_name:
+        return _error_response("新名称无效")
+    target_file = source_file.parent / f"{safe_name}.json"
+    if target_file.exists() and target_file != source_file:
+        return _error_response(f"文件 '{safe_name}.json' 已存在")
+    try:
+        source_file.rename(target_file)
+        return _success_response(f"文件已重命名为 '{safe_name}.json'")
+    except Exception as e:
+        return _error_response(f"重命名失败: {str(e)}")
+def move_folder(demo_dir: Path, source_path: str, target_path: str) -> Dict[str, any]:
+    """移动文件夹（递归）"""
+    source_folder = resolve_demo_path(demo_dir, source_path)
+    if not source_folder:
+        return _error_response(f"源文件夹不存在: {source_path}")
+    error_msg = _validate_folder(source_folder)
+    if error_msg:
+        return _error_response(f"源{error_msg}: {source_path}" if "不存在" not in error_msg else error_msg)
+    target_dir = resolve_demo_path(demo_dir, target_path)
+    if not target_dir:
+        return _error_response(f"无效的目标路径: {target_path}")
+    target_dir.mkdir(parents=True, exist_ok=True)
+    target_folder = target_dir / source_folder.name
+    if target_folder.exists():
+        return _error_response(f"目标位置已存在同名文件夹: {source_folder.name}")
+    # 检查是否尝试移动到自己的子目录
+    if check_path_in_demo_dir(target_folder.resolve(), source_folder.resolve()):
+        return _error_response("不能将文件夹移动到自己的子目录")
+    try:
+        shutil.move(str(source_folder), str(target_folder))
+        return _success_response(f"文件夹已移动到 {target_path}")
+    except Exception as e:
+        return _error_response(f"移动失败: {str(e)}")
+def rename_folder(demo_dir: Path, folder_path: str, new_name: str) -> Dict[str, any]:
+    """重命名文件夹"""
+    from backend.data_utils import sanitize_demo_name
+    source_folder = resolve_demo_path(demo_dir, folder_path)
+    if not source_folder:
+        return _error_response(f"文件夹不存在: {folder_path}")
+    error_msg = _validate_folder(source_folder)
+    if error_msg:
+        return _error_response(error_msg)
+    safe_name = sanitize_demo_name(new_name)
+    if not safe_name:
+        return _error_response("新名称无效")
+    target_folder = source_folder.parent / safe_name
+    if target_folder.exists():
+        return _error_response(f"文件夹 '{safe_name}' 已存在")
+    try:
+        source_folder.rename(target_folder)
+        return _success_response(f"文件夹已重命名为 '{safe_name}'")
+    except Exception as e:
+        return _error_response(f"重命名失败: {str(e)}")
+def create_folder(demo_dir: Path, parent_path: str, folder_name: str) -> Dict[str, any]:
+    """创建新文件夹"""
+    from backend.data_utils import sanitize_demo_name
+    parent_dir = resolve_demo_path(demo_dir, parent_path)
+    if not parent_dir:
+        return _error_response(f"无效的父路径: {parent_path}")
+    safe_name = sanitize_demo_name(folder_name)
+    if not safe_name:
+        return _error_response("文件夹名称无效")
+    target_folder = parent_dir / safe_name
+    if target_folder.exists():
+        return _error_response(f"文件夹 '{safe_name}' 已存在")
+    try:
+        target_folder.mkdir(parents=True, exist_ok=False)
+        return _success_response(f"文件夹 '{safe_name}' 已创建")
+    except Exception as e:
+        return _error_response(f"创建失败: {str(e)}")
+def delete_folder(demo_dir: Path, folder_path: str) -> Dict[str, any]:
+    """删除文件夹（移动到 .deleted 隐藏目录）"""
+    source_folder = resolve_demo_path(demo_dir, folder_path)
+    if not source_folder:
+        return _error_response(f"文件夹不存在: {folder_path}")
+    error_msg = _validate_folder(source_folder)
+    if error_msg:
+        return _error_response(error_msg)
+    deleted_dir = _ensure_deleted_dir(demo_dir)
+    target_folder = deleted_dir / source_folder.name
+    if target_folder.exists():
+        target_folder = deleted_dir / _get_timestamped_name(source_folder.name)
+    try:
+        shutil.move(str(source_folder), str(target_folder))
+        return _success_response("文件夹已移动到 .deleted 目录")
+    except Exception as e:
+        return _error_response(f"删除失败: {str(e)}")
+def delete_demo_file(demo_dir: Path, file_path: str) -> Dict[str, any]:
+    """删除demo文件（移动到 .deleted 隐藏目录）"""
+    demo_dir_resolved = demo_dir.resolve()
+    source_file = resolve_demo_path(demo_dir_resolved, file_path)
+    if not source_file:
+        return _error_response(f"文件不存在: {file_path}")
+    error_msg = _validate_json_file(source_file)
+    if error_msg:
+        return _error_response(error_msg)
+    try:
+        relative_path = source_file.relative_to(demo_dir_resolved)
+    except ValueError:
+        return _error_response("无效的文件路径")
+    deleted_dir = _ensure_deleted_dir(demo_dir_resolved)
+    target_file = deleted_dir / relative_path
+    target_parent = target_file.parent
+    target_parent.mkdir(parents=True, exist_ok=True)
+    if target_file.exists():
+        target_file = target_parent / _get_timestamped_name(source_file.stem, ".json")
+    try:
+        shutil.move(str(source_file), str(target_file))
+        return _success_response(f"文件已移动到 .deleted 目录: {relative_path.as_posix()}")
+    except Exception as e:
+        return _error_response(f"删除失败: {str(e)}")

backend/device.py ADDED Viewed

	@@ -0,0 +1,97 @@

+"""设备管理：CPU/CUDA/MPS 检测与内存统计"""
+import os
+import torch
+class DeviceManager:
+    """设备管理工具类，统一处理设备相关的操作"""
+    @staticmethod
+    def clear_cache(device: torch.device) -> None:
+        """清理设备缓存"""
+        if device.type == "cuda":
+            torch.cuda.empty_cache()
+        elif device.type == "mps":
+            torch.mps.empty_cache()
+    @staticmethod
+    def synchronize(device: torch.device) -> None:
+        """同步设备操作"""
+        if device.type == "cuda":
+            torch.cuda.synchronize()
+        elif device.type == "mps":
+            torch.mps.synchronize()
+    @staticmethod
+    def get_device() -> torch.device:
+        """
+        获取计算设备
+        优先级：1. FORCE_CPU=1 强制 CPU  2. cuda > mps > cpu
+        """
+        if os.environ.get('FORCE_CPU') == '1':
+            return torch.device("cpu")
+        if torch.cuda.is_available():
+            return torch.device("cuda")
+        if hasattr(torch.backends, 'mps') and torch.backends.mps.is_available():
+            return torch.device("mps")
+        return torch.device("cpu")
+    @staticmethod
+    def get_device_name(device: torch.device) -> str:
+        """获取设备显示名称"""
+        if device.type == "cuda":
+            return "GPU"
+        elif device.type == "mps":
+            return "Apple Silicon"
+        else:
+            return "CPU"
+    @staticmethod
+    def print_model_load_stats(model: torch.nn.Module, load_time: float) -> None:
+        """打印模型加载统计信息（大小、时间、速度）"""
+        # 计算模型大小
+        model_size_bytes = sum(p.numel() * p.element_size() for p in model.parameters())
+        model_size_mb = model_size_bytes / (1024 * 1024)
+        # 计算加载速度
+        load_speed_mb_per_sec = model_size_mb / load_time if load_time > 0 else 0
+        # 格式化大小
+        size_str = f"{model_size_mb:.1f}MB" if model_size_mb < 1024 else f"{model_size_mb / 1024:.2f}GB"
+        # 格式化时间
+        if load_time < 1:
+            time_str = f"{load_time * 1000:.1f}ms"
+        elif load_time < 60:
+            time_str = f"{load_time:.2f}s"
+        else:
+            time_str = f"{int(load_time // 60)}m{load_time % 60:.1f}s"
+        print(f"✅ 模型加载完成 [大小: {size_str}, 耗时: {time_str}, 速度: {load_speed_mb_per_sec:.1f}MB/s]")
+    @staticmethod
+    def print_cuda_memory_summary(title="GPU 内存统计", device=0):
+        """打印详细的 CUDA 内存统计信息"""
+        if not torch.cuda.is_available():
+            return
+        print(f"\n{'='*60}")
+        print(f"🔍 {title}")
+        print(f"{'='*60}")
+        # 基本统计
+        allocated = torch.cuda.memory_allocated(device) / 1024**3
+        reserved = torch.cuda.memory_reserved(device) / 1024**3
+        max_allocated = torch.cuda.max_memory_allocated(device) / 1024**3
+        total = torch.cuda.get_device_properties(device).total_memory / 1024**3
+        print(f"📊 总显存: {total:.2f} GB")
+        print(f"✅ 已分配 (allocated): {allocated:.2f} GB  ({allocated/total*100:.1f}%)")
+        print(f"📦 已预留 (reserved): {reserved:.2f} GB  ({reserved/total*100:.1f}%)")
+        print(f"📈 峰值分配: {max_allocated:.2f} GB")
+        print(f"💚 可用空间: {total - reserved:.2f} GB  ({(total-reserved)/total*100:.1f}%)")
+        print(f"🔸 碎片化: {reserved - allocated:.2f} GB")
+        # 详细统计（简化版）
+        try:
+            stats = torch.cuda.memory_stats(device)
+            num_allocs = stats.get("num_alloc_retries", 0)
+            num_ooms = stats.get("num_ooms", 0)
+            if num_allocs > 0 or num_ooms > 0:
+                print(f"⚠️  分配重试: {num_allocs} 次, OOM: {num_ooms} 次")
+        except Exception:
+            pass
+        print(f"{'='*60}\n")

backend/language_checker.py ADDED Viewed

	@@ -0,0 +1,422 @@

+import torch
+import gc
+from typing import Callable, Dict, List, Optional, Tuple
+from .api.utils import round_to_sig_figs
+from .pred_topk_format import pred_topk_pairs_from_flat_ids_and_probs
+from .class_register import register_model, REGISTERED_MODELS
+from .device import DeviceManager
+from .model_manager import ensure_model_loaded
+from .runtime_config import load_runtime_config, DEFAULT_TOPK
+from model_paths import DEFAULT_MODEL, MODEL_PATHS, SEMANTIC_MODEL_PATHS, resolve_hf_path
+# 按 id(model) 缓存「仅含 BOS/等价起始符一步 forward」得到的末位词表 logits（全词表，不随分析文本变）
+_bos_first_position_logits_cache: Dict[int, torch.Tensor] = {}
+def compute_first_token_lm_with_bos_prefix_cache(
+    model: torch.nn.Module,
+    tokenizer,
+    device: torch.device,
+    first_token_id: int,
+    effective_topk: int,
+) -> Tuple[float, List[Tuple[str, float]]]:
+    """
+    首 token 无左文时的 workaround：与旧版 BOS 前缀一致，对单 token 输入 [bos] 做一步 forward，
+    将末位 logits（预测首段文本第一个 token 的分布）缓存到 CPU，再在 CPU 上 softmax/topk。
+    同一 model 实例复用同一份词表 logits，不在每次分析时重复 forward。
+    """
+    mid = id(model)
+    if mid not in _bos_first_position_logits_cache:
+        if tokenizer.bos_token_id is not None:
+            bos_id = int(tokenizer.bos_token_id)
+        elif tokenizer.eos_token_id is not None:
+            bos_id = int(tokenizer.eos_token_id)
+        else:
+            bos_id = 0
+        with torch.inference_mode():
+            bos_in = torch.tensor([[bos_id]], device=device, dtype=torch.long)
+            out = model(input_ids=bos_in)
+            # [V]：在 BOS 条件下预测「第一个文本 token」的分布
+            row = out.logits[0, -1, :].detach().float()
+        _bos_first_position_logits_cache[mid] = row.cpu()
+    logits = _bos_first_position_logits_cache[mid]
+    probs = torch.softmax(logits, dim=-1)
+    p = float(probs[first_token_id].item())
+    topk_vals, topk_inds = torch.topk(probs, k=min(effective_topk, probs.shape[0]), dim=-1)
+    topk_vals = topk_vals.float().numpy()
+    topk_inds_flat = topk_inds.flatten().tolist()
+    topk_tokens_decoded = tokenizer.batch_decode(
+        [[tid] for tid in topk_inds_flat],
+        skip_special_tokens=False,
+    )
+    pred_topk = [
+        (topk_tokens_decoded[j], round_to_sig_figs(float(topk_vals[j])))
+        for j in range(len(topk_tokens_decoded))
+    ]
+    return p, pred_topk
+class AbstractLanguageChecker:
+    """
+    Abstract Class that defines the Backend API of GLTR.
+    To extend the GLTR interface, you need to inherit this and
+    fill in the defined functions.
+    """
+    def __init__(self):
+        """
+        In the subclass, you need to load all necessary components
+        for the other functions.
+        Typically, this will comprise a tokenizer and a model.
+        """
+        self.device = DeviceManager.get_device()
+    def analyze_text(self, in_text):
+        """
+        Function that GLTR interacts with to analyze text and get token probabilities
+        Params:
+        - in_text: str -- The text that you want to analyze
+        - topk: int, optional -- Desired pred_topk count (default from runtime_config.DEFAULT_TOPK)
+        Output:
+        - payload: dict -- The wrapper for results in this function, described below
+        Payload values
+        ==============
+        bpe_strings: list of dict -- Each dict contains {"offset": [start, end], "raw": str,
+            "real_topk": [rank, prob], "pred_topk": [(token, prob), ...]}
+            - offset: character offsets in the original text [start, end]
+            - raw: token text extracted from original text
+            - real_topk: (ranking, prob) of each token（优先级默认0）
+            - pred_topk: top-k 候选列表（若不可用则为空数组）
+        """
+        raise NotImplementedError
+@register_model(name='qwen2.5-0.5b')
+class QwenLM(AbstractLanguageChecker):
+    """
+    Qwen 系列模型支持
+    默认使用 Qwen2.5-0.5B Base 模型（适合计算 surprisal 和信息量）
+    """
+    def __init__(self, model_path=None, model_name=None):
+        super(QwenLM, self).__init__()
+        model_name = model_name or getattr(self.__class__, '_registered_model_name', DEFAULT_MODEL)
+        if model_path is not None and str(model_path).strip():
+            resolved = str(model_path).strip()
+        else:
+            resolved = resolve_hf_path(model_name)
+        # 加载运行时配置（支持部分覆盖）
+        self._load_runtime_config(model_name)
+        self.tokenizer, self.model, self.device = ensure_model_loaded(resolved)
+        # ============================================================
+        # 关于 torch.compile() 的性能优化讨论结论：
+        #
+        # CPU 环境：
+        # - 成本 > 收益，不推荐使用
+        #
+        # CUDA 环境（如果未来升级到 GPU Space）：
+        # - 加速比：30-70%（显著提升）
+        # - 编译时间：相对推理时间更短
+        # - Triton 内核优化：显著减少显存读写
+        # - 结论：强烈推荐使用，需配合预热确保形状覆盖
+        # 如需启用，可在此处添加：
+        #   if torch.cuda.is_available() and hasattr(torch, 'compile'):
+        #       self.model = torch.compile(self.model, mode="default")
+        #       # 并在启动时运行预热推理覆盖 chunk_size 长度
+        # ============================================================
+        # 初始化分析计数器（用于控制GPU内存统计打印频率）
+        self._analysis_count = 0
+    def _load_runtime_config(self, model_name: Optional[str]):
+        """
+        加载运行时配置：基于模型和平台的四层配置合并
+        Args:
+            model_name: 模型标识符（如 'qwen3-1.7b'）
+        """
+        # 调用配置模块的完整加载流程
+        # 返回: (platform, max_token_length, chunk_size)
+        self.platform, self.max_length, self.chunk_size = load_runtime_config(
+            model_name=model_name or "default_model"
+        )
+    def _encode_text(self, in_text: str) -> Tuple[torch.Tensor, List[Tuple[int, int]]]:
+        """编码文本并返回 token_ids 和 offsets"""
+        # 使用 tokenizer 的原生截断功能
+        enc_out = self.tokenizer(
+            in_text,
+            return_tensors='pt',
+            return_offsets_mapping=True,
+            max_length=self.max_length,
+            truncation=True
+        )
+        token_ids = enc_out['input_ids']
+        token_offsets = enc_out['offset_mapping'][0].tolist()
+        # 通过最后一个 offset 和文本长度对比判断是否截断
+        if token_offsets:
+            last_offset_end = token_offsets[-1][1]
+            if last_offset_end < len(in_text):
+                # 文本被截断了，警告token截断信息，和字数截断信息
+                print(f"⚠️  文本过长，已截断至前 {self.max_length} token ({len(in_text)} char -> {last_offset_end} char)")
+        token_ids = token_ids.to(self.device)
+        return token_ids, token_offsets
+    def _run_inference_and_process_chunked(
+        self,
+        token_ids: torch.Tensor,
+        effective_topk: int,
+        progress_callback: Optional[Callable[[int, int, str, Optional[int]], None]] = None
+    ) -> Tuple[List[List[Tuple[str, float]]], List[float]]:
+        """
+        分块推理并即时处理：核心内存优化逻辑
+        利用 KV Cache 分段计算 Logits，计算完立即释放，避免保留全量 Logits。
+        数值说明：在 float16（如 MPS）上，在「仅前缀 forward」vs「整段 forward」同位置 logits 的逐元素对比，可能出现微小差异；
+        float16（MPS/CUDA）可能因实现路径出现约 1%的 量级差，非掩码错误。CPU float32 下则完全一致。
+        """
+        seq_len = token_ids.shape[1]
+        # 使用初始化时根据平台确定的 chunk_size
+        chunk_size = self.chunk_size
+        real_probs_list = []
+        pred_topk_list = []
+        past_key_values = None
+        # 预先清理
+        DeviceManager.clear_cache(self.device)
+        full_input_ids = token_ids
+        # 因果 LM：logits[i] 预测 input_ids[i+1]；首 token 无左文，不在此循环中计分
+        # 我们使用 past_key_values 增量推理
+        # 第一次：输入 input_ids[:, :chunk_size]，输出 logits 对应位置 0..chunk_size-1 (预测 1..chunk_size)
+        total_chunks = (seq_len + chunk_size - 1) // chunk_size
+        with torch.inference_mode():
+            for i in range(total_chunks):
+                start_idx = i * chunk_size
+                end_idx = min((i + 1) * chunk_size, seq_len)
+                current_chunk_len = end_idx - start_idx
+                # 准备输入（统一逻辑，避免边界 token 重复）
+                if i == 0:
+                    input_chunk = full_input_ids[:, :end_idx]
+                else:
+                    input_chunk = full_input_ids[:, start_idx:end_idx]
+                # 1. 运行推理
+                outputs = self.model(
+                    input_ids=input_chunk,
+                    past_key_values=past_key_values,
+                    use_cache=True
+                )
+                past_key_values = outputs.past_key_values
+                logits = outputs.logits
+                # 获取 targets
+                # full_input_ids[:, 1:] 是所有 targets
+                # 当前块 targets 范围: [start_idx : end_idx]
+                chunk_targets = full_input_ids[:, 1+start_idx : 1+end_idx]
+                valid_len = chunk_targets.shape[1]
+                if valid_len == 0:
+                    continue
+                # 最后一块覆盖到序列末尾时，最后一个 logit 位预测的是「下一 token」，需裁掉
+                current_logits = logits[:, :valid_len, :]
+                # 2. 处理当前块的 Softmax 和 TopK
+                probs_chunk = torch.softmax(current_logits, dim=2)
+                # 提取真实概率
+                chunk_target_probs = torch.gather(probs_chunk, 2, chunk_targets.unsqueeze(-1))
+                real_probs_list.extend(chunk_target_probs.flatten().detach().cpu().float().numpy().tolist())
+                # 提取 TopK
+                # 由于 chunk_size 已确保小于 MPS_TOPK_BUG_THRESHOLD，所以直接计算
+                topk_vals, topk_inds = torch.topk(probs_chunk, k=effective_topk, dim=2)
+                chunk_pred_topk = self._decode_topk_tokens(
+                    topk_vals, topk_inds, effective_topk, valid_len
+                )
+                pred_topk_list.extend(chunk_pred_topk)
+                # 3. 立即释放内存
+                del logits
+                del current_logits
+                del probs_chunk
+                del chunk_target_probs
+                # outputs 会在下一次循环时被覆盖，无需手动处理
+                # 进度更新（基于实际处理的 token 数量）
+                if progress_callback:
+                    pct = int(end_idx / seq_len * 100)  # 推理阶段独立的 0-100%
+                    progress_callback(2, 3, 'inference', pct)
+        # 循环结束，清理 KV Cache
+        del past_key_values
+        DeviceManager.clear_cache(self.device)
+        return pred_topk_list, real_probs_list
+    def _decode_topk_tokens(
+        self,
+        topk_prob_values: torch.Tensor,
+        topk_prob_inds: torch.Tensor,
+        effective_topk: int,
+        seq_len: int
+    ) -> List[List[Tuple[str, float]]]:
+        """解码 TopK tokens 并构建预测列表（长度等于参与 topk 的序列长度）"""
+        topk_prob_values_cpu = topk_prob_values[0].detach().cpu().float().numpy()
+        topk_prob_inds_flat = topk_prob_inds[0].cpu().flatten().tolist()
+        probs_flat = topk_prob_values_cpu.flatten().tolist()
+        flat_pairs = pred_topk_pairs_from_flat_ids_and_probs(
+            topk_prob_inds_flat, probs_flat, self.tokenizer
+        )
+        return [
+            flat_pairs[i * effective_topk : (i + 1) * effective_topk]
+            for i in range(seq_len)
+        ]
+    def _build_bpe_strings(
+        self,
+        token_offsets: List[Tuple[int, int]],
+        real_topk: List[Tuple[int, float]],
+        pred_topk: List[List[Tuple[str, float]]],
+        in_text: str
+    ) -> List[Dict]:
+        """构建最终的 BPE 字符串列表"""
+        # 确保长度一致
+        min_len = min(len(token_offsets), len(real_topk), len(pred_topk) if pred_topk else len(token_offsets))
+        bpe_strings = []
+        for idx in range(min_len):
+            start, end = token_offsets[idx]
+            raw_text = in_text[start:end] if start < end else ""
+            token_payload = {
+                "offset": [start, end],
+                "raw": raw_text,
+                "real_topk": list(real_topk[idx]),
+                "pred_topk": pred_topk[idx] if pred_topk else []
+            }
+            bpe_strings.append(token_payload)
+        return bpe_strings
+    def analyze_text(self, in_text: str, progress_callback: Optional[Callable[[int, int, str, Optional[int]], None]] = None) -> Dict[str, List[Dict]]:
+        """
+        计算文本中每个 token 的概率
+        进度回调参数: (step: int, total_steps: int, stage: str, percentage: Optional[int])
+        - step: 当前步骤 (1-based)
+        - total_steps: 总步骤数 (固定为 3)
+        - stage: 阶段名称 (encoding/inference/processing)
+        - percentage: 可选的百分比，仅在 inference 阶段提供
+        """
+        TOTAL_STEPS = 3
+        try:
+            # Step 1: 编码文本
+            if progress_callback:
+                progress_callback(1, TOTAL_STEPS, 'encoding', None)
+            token_ids, token_offsets = self._encode_text(in_text)
+            # Step 2: 分块推理并处理（带百分比进度）
+            # 这取代了原来的 _run_model_inference, MPS 流式处理, 和 _process_topk
+            if progress_callback:
+                progress_callback(2, 3, 'inference', 0)
+            pred_topk, real_topk_probs = self._run_inference_and_process_chunked(
+                token_ids, DEFAULT_TOPK, progress_callback
+            )
+            # Step 3: 构建结果
+            if progress_callback:
+                progress_callback(3, TOTAL_STEPS, 'processing', None)
+            if token_ids.shape[1] >= 1:
+                p0, pred0 = compute_first_token_lm_with_bos_prefix_cache(
+                    self.model,
+                    self.tokenizer,
+                    self.device,
+                    int(token_ids[0, 0].item()),
+                    DEFAULT_TOPK,
+                )
+                pred_topk.insert(0, pred0)
+                real_topk_probs.insert(0, p0)
+            seq_len = len(real_topk_probs)
+            real_topk = list(zip([0] * seq_len, [round_to_sig_figs(p) for p in real_topk_probs]))
+            bpe_strings = self._build_bpe_strings(token_offsets, real_topk, pred_topk, in_text)
+            # 最终清理
+            DeviceManager.clear_cache(self.device)
+            gc.collect()
+            # 更新分析计数器
+            self._analysis_count += 1
+            # 打印分析任务完成后的内存统计（第1、11、21...次分析后打印）
+            if self.device.type == "cuda" and (self._analysis_count - 1) % 10 == 0:
+                device_idx = self.device.index if self.device.index is not None else 0
+                DeviceManager.print_cuda_memory_summary(device=device_idx)
+            return {'bpe_strings': bpe_strings}
+        except Exception as e:
+            import traceback
+            print(f"❌ Error in QwenLM.analyze_text: {e}")
+            traceback.print_exc()
+            return {'bpe_strings': []}
+    # _cleanup_tensors 方法已被移除，因为不再需要显式清理小张量
+# ============================================================
+# 自动注册：根据 MODEL_PATHS 与 SEMANTIC_MODEL_PATHS 自动注册所有模型
+# ============================================================
+# 只需要在 model_paths.py 中添加模型路径，即可自动注册
+# 无需手动创建子类，实现 DRY 原则
+def _auto_register_models():
+    """自动注册 MODEL_PATHS 与 SEMANTIC_MODEL_PATHS 中的所有模型"""
+    for model_name in (*MODEL_PATHS.keys(), *SEMANTIC_MODEL_PATHS.keys()):
+        if model_name not in REGISTERED_MODELS:
+            # 动态创建模型类并注册
+            # 使用闭包捕获当前 model_name
+            def make_init():
+                def __init__(self):
+                    QwenLM.__init__(self)
+                return __init__
+            model_class = type(
+                f'QwenLM_{model_name.replace(".", "_").replace("-", "_")}',
+                (QwenLM,),
+                {
+                    '__init__': make_init(),
+                    '__doc__': f'{model_name} 模型支持（自动注册）'
+                }
+            )
+            register_model(model_name)(model_class)
+# 执行自动注册
+_auto_register_models()

backend/load_utils.py ADDED Viewed

	@@ -0,0 +1,69 @@

+"""HuggingFace 模型下载与加载：下载独立，加载仅考虑本地"""
+import json
+import os
+from typing import Callable, TypeVar
+T = TypeVar("T")
+# 与 transformers 的 checkpoint 命名一致
+_SAFE_WEIGHTS = "model.safetensors"
+_SAFE_WEIGHTS_INDEX = "model.safetensors.index.json"
+_WEIGHTS = "pytorch_model.bin"
+_WEIGHTS_INDEX = "pytorch_model.bin.index.json"
+def _is_model_cache_complete(local_path: str) -> bool:
+    """
+    本地检查模型权重是否完整。与 transformers 的 _get_resolved_checkpoint_files 逻辑一致。
+    """
+    def _p(f: str) -> str:
+        return os.path.join(local_path, f)
+    if os.path.isfile(_p(_SAFE_WEIGHTS)):
+        return True
+    index_file = _p(_SAFE_WEIGHTS_INDEX)
+    if os.path.isfile(index_file):
+        with open(index_file) as f:
+            index = json.load(f)
+        shards = set(index.get("weight_map", {}).values())
+        return all(os.path.isfile(_p(s)) for s in shards)
+    if os.path.isfile(_p(_WEIGHTS)):
+        return True
+    index_file = _p(_WEIGHTS_INDEX)
+    if os.path.isfile(index_file):
+        with open(index_file) as f:
+            index = json.load(f)
+        shards = set(index.get("weight_map", {}).values())
+        return all(os.path.isfile(_p(s)) for s in shards)
+    return False
+def ensure_model_local(model_path: str, *, force_download: bool = False) -> str:
+    """
+    确保模型在本地可用，返回本地路径。
+    - 本地目录：直接返回
+    - HuggingFace ID：优先用本地缓存（不联网），缓存不完整时 force_download 可触发下载
+    """
+    if os.path.isdir(model_path):
+        return model_path
+    if "/" in model_path and not os.path.exists(model_path):
+        from huggingface_hub import snapshot_download
+        if force_download:
+            return snapshot_download(model_path)
+        try:
+            path = snapshot_download(model_path, local_files_only=True)
+            if not _is_model_cache_complete(path):
+                return snapshot_download(model_path)
+            return path
+        except Exception:
+            return snapshot_download(model_path)
+    return model_path
+def resolve_and_load(model_path: str, loader: Callable[[str, bool], T]) -> T:
+    """
+    先确保模型本地可用，再加载。加载时始终使用 local_files_only=True。
+    """
+    path = ensure_model_local(model_path)
+    return loader(path, True)

backend/logging_config.py ADDED Viewed

	@@ -0,0 +1,37 @@

+"""
+日志配置模块
+统一管理应用的日志配置
+"""
+import logging
+def configure_logging(app=None):
+    """
+    配置应用日志：完全屏蔽所有连接和请求相关的日志
+    Args:
+        app: Connexion/Flask 应用实例（可选）
+    """
+    # 屏蔽第三方库的日志
+    logging.getLogger('werkzeug').setLevel(logging.CRITICAL)
+    logging.getLogger('connexion').setLevel(logging.CRITICAL)
+    logging.getLogger('flask_cors').setLevel(logging.CRITICAL)
+    logging.getLogger('flask').setLevel(logging.CRITICAL)
+    logging.getLogger('urllib3').setLevel(logging.CRITICAL)
+    logging.getLogger('transformers').setLevel(logging.CRITICAL)
+    logging.getLogger('torch').setLevel(logging.CRITICAL)
+    # 设置根日志级别，只显示严重错误
+    logging.basicConfig(level=logging.CRITICAL, format='%(message)s')
+    # 配置 Flask app logger（如果提供了应用实例）
+    if app:
+        try:
+            app.app.logger.setLevel(logging.CRITICAL)
+            # 禁用 Werkzeug 的访问日志
+            import werkzeug.serving
+            werkzeug.serving.WSGIRequestHandler.log_request = lambda *args, **kwargs: None
+        except Exception:
+            pass

backend/model_loader.py ADDED Viewed

	@@ -0,0 +1,169 @@

+"""
+Causal LM 模型加载：设备策略与加载逻辑统一封装
+供 language_checker.QwenLM（信息密度分析）与 model_manager.ensure_model_loaded 共用，
+消除重复的设备分支、量化配置、加载后处理等逻辑。
+加载策略说明：
+- INT8 量化：bitsandbytes 8bit，device_map="cpu"/"auto"，减少约 4 倍内存
+- CPU 手动模式：无 device_map，.to(device)，默认 float32
+- GPU/MPS 自动模式：device_map="auto"，float16
+dtype/设备与因果 LM 在「仅前缀 forward」vs「整段 forward」同位置 logits 的逐元素对比：
+float32（CPU）常完全一致；float16（MPS/CUDA）可能因实现路径出现约 1e-2 量级差，非掩码错误。
+复现与说明见 scripts/reproduce_logits_triple_path.py、scripts/prove_fp16_gemm_shape_sensitivity.py。
+"""
+import os
+import time
+from typing import Any, Dict, Optional
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from transformers.utils import is_flash_attn_2_available
+from .device import DeviceManager
+from .load_utils import resolve_and_load
+from .quantization_config import get_quantization_config
+def get_device_load_strategy(device: torch.device) -> Dict[str, Any]:
+    """
+    根据设备推断加载策略（device_map、dtype、use_int8 等）。
+    打印设备模式说明，与 QwenLM 风格一致。
+    环境变量：FORCE_INT8=1 / CPU_FORCE_BFLOAT16=1
+    返回供 load_causal_lm 使用的参数字典。
+    """
+    qconfig = get_quantization_config(device)
+    use_int8 = qconfig.use_int8
+    device_map = None
+    dtype = qconfig.dtype
+    use_low_cpu_mem = False
+    if device.type == "cpu":
+        print("🔧 CPU 模式：手动控制设备分配")
+        if use_int8:
+            device_map = "cpu"
+            print("⚠️  启用 INT8 量化（FORCE_INT8=1，实验性，在某些情况下会降低性能）")
+        elif dtype == torch.bfloat16:
+            use_low_cpu_mem = True
+            print("⚠️  启用 bfloat16（CPU_FORCE_BFLOAT16=1，需硬件支持 AVX-512_BF16 或 AMX，否则可能极慢）")
+        else:
+            use_low_cpu_mem = True
+            print("🔧 dtype: float32")  # 默认: float32
+    elif device.type == "cuda":
+        print("🔧 CUDA 模式：自动设备分配")
+        device_map = "auto"
+        use_low_cpu_mem = True
+        if use_int8:
+            print("⚠️  启用 INT8 量化（FORCE_INT8=1）")
+        else:
+            print("🔧 dtype: float16")
+        print("🔧 device_map: auto")
+    else:
+        # MPS 模式：自动设备分配 + float16（MPS 不支持 INT8 量化）
+        print(f"🔧 {device.type.upper()} 模式：自动设备分配")
+        if os.environ.get("FORCE_INT8") == "1":
+            print("⚠️  MPS 不支持 INT8 量化，已忽略 FORCE_INT8=1 环境变量")
+        device_map = "auto"
+        use_low_cpu_mem = True
+        print("🔧 dtype: float16")
+        print("🔧 device_map: auto")
+    return {
+        "device_map": device_map,
+        "dtype": dtype,
+        "use_low_cpu_mem": use_low_cpu_mem,
+        "use_int8": use_int8,
+    }
+def attn_implementation_for_device(device: torch.device) -> str:
+    """
+    非 CUDA：eager，兼容性最好（CPU / MPS 等）。
+    CUDA：已安装 flash-attn 时用 flash_attention_2；否则 eager（不使用 sdpa）。
+    """
+    if device.type != "cuda":
+        return "eager"
+    if is_flash_attn_2_available():
+        return "flash_attention_2"
+    return "eager"
+def load_causal_lm(
+    model_path: str,
+    device: torch.device,
+    *,
+    attn_implementation: Optional[str] = None,
+    extra_model_kwargs: Optional[Dict[str, Any]] = None,
+) -> torch.nn.Module:
+    """
+    加载 Causal LM 模型，统一处理设备策略、量化、加载后处理。
+    Args:
+        model_path: HuggingFace 模型路径或本地路径
+        device: 目标设备
+        attn_implementation: 可选；未传时可在外层用 attn_implementation_for_device(device)
+        extra_model_kwargs: 可选，额外传给 from_pretrained 的参数
+    Returns:
+        已 eval() 的模型
+    """
+    strategy = get_device_load_strategy(device)
+    device_map = strategy["device_map"]
+    dtype = strategy["dtype"]
+    use_low_cpu_mem = strategy["use_low_cpu_mem"]
+    use_int8 = strategy["use_int8"]
+    load_kw: Dict[str, Any] = {
+        "trust_remote_code": True,
+        "low_cpu_mem_usage": use_low_cpu_mem or use_int8,
+    }
+    if attn_implementation is not None:
+        load_kw["attn_implementation"] = attn_implementation
+    if extra_model_kwargs:
+        load_kw.update(extra_model_kwargs)
+    def _load(path: str, lf: bool):
+        kw = dict(local_files_only=lf, **load_kw)
+        if use_int8:
+            from transformers import BitsAndBytesConfig
+            return AutoModelForCausalLM.from_pretrained(
+                path,
+                quantization_config=BitsAndBytesConfig(load_in_8bit=True),
+                device_map=device_map,
+                **kw,
+            )
+        if device_map:
+            return AutoModelForCausalLM.from_pretrained(
+                path,
+                device_map=device_map,
+                dtype=dtype,
+                **kw,
+            )
+        return AutoModelForCausalLM.from_pretrained(
+            path, dtype=dtype, **kw
+        ).to(device)
+    t0 = time.perf_counter()
+    model = resolve_and_load(model_path, _load)
+    load_time = time.perf_counter() - t0
+    DeviceManager.print_model_load_stats(model, load_time)
+    model.eval()
+    if device.type == "cuda":
+        device_idx = device.index if device.index is not None else 0
+        DeviceManager.print_cuda_memory_summary(device=device_idx)
+    return model
+def load_tokenizer(model_path: str):
+    """加载 tokenizer。本地优先时先解析为缓存路径，避免 tokenizer 内部 model_info 联网。"""
+    def _load(path: str, lf: bool):
+        return AutoTokenizer.from_pretrained(
+            path, trust_remote_code=True, local_files_only=lf
+        )
+    return resolve_and_load(model_path, _load)

backend/model_manager.py ADDED Viewed

	@@ -0,0 +1,233 @@

+"""模型管理模块：主槽位与语义槽位对称配置，权重缓存共用。"""
+from enum import Enum
+import threading
+from backend import REGISTERED_MODELS
+from backend.project_registry import ModelRegistry
+from backend.device import DeviceManager
+from backend.model_loader import attn_implementation_for_device, load_causal_lm, load_tokenizer
+from model_paths import DEFAULT_MODEL, DEFAULT_SEMANTIC_MODEL, resolve_hf_path
+project_registry = ModelRegistry(REGISTERED_MODELS)
+_init_lock = threading.Lock()
+# 统一推理锁：信息密度分析与 Semantic 分析共用，确保模型推理串行执行
+_inference_lock = threading.Lock()
+# 按 HuggingFace 路径去重的已加载模型缓存（主分析 / 语义 / 续写共用）
+_hf_load_lock = threading.Lock()
+_hf_loaded: dict[str, tuple] = {}
+class ModelSlot(str, Enum):
+    """与 CLI --model / --semantic_model 对应的两个对等槽位。"""
+    MAIN = "main"
+    SEMANTIC = "semantic"
+# 启动预载与「全部权重」枚举时使用的槽位顺序（对等、无主次）
+CONFIGURED_SLOTS: tuple[ModelSlot, ...] = (ModelSlot.MAIN, ModelSlot.SEMANTIC)
+def _resolved_hf_path_for_slot(slot: ModelSlot) -> str:
+    """由应用上下文解析槽位对应的 HuggingFace 路径（或本地路径字符串）。"""
+    if slot == ModelSlot.MAIN:
+        try:
+            from backend.app_context import get_app_context
+            context = get_app_context(prefer_module_context=True)
+            model_name = context.model_name or DEFAULT_MODEL
+        except RuntimeError:
+            model_name = DEFAULT_MODEL
+        return resolve_hf_path(model_name)
+    if slot == ModelSlot.SEMANTIC:
+        try:
+            from backend.app_context import get_args
+            raw = getattr(get_args(), "semantic_model", DEFAULT_SEMANTIC_MODEL)
+        except RuntimeError:
+            raw = DEFAULT_SEMANTIC_MODEL
+        return resolve_hf_path(raw)
+    raise ValueError(f"unknown ModelSlot: {slot!r}")
+def ensure_slot_weights_loaded(slot: ModelSlot):
+    """
+    加载指定槽位权重（若未缓存）；主 / 语义完全相同的入口。
+    返回 (tokenizer, model, device)。
+    """
+    return ensure_model_loaded(_resolved_hf_path_for_slot(slot))
+def ensure_model_loaded(resolved_hf_path: str):
+    """
+    唯一底层加载入口：保证 resolved_hf_path 对应权重已加载。
+    返回 (tokenizer, model, device)，其中 device 为模型参数所在 device。
+    """
+    with _hf_load_lock:
+        hit = _hf_loaded.get(resolved_hf_path)
+        if hit is not None:
+            return hit
+        device = DeviceManager.get_device()
+        display = resolved_hf_path.split("/")[-1] if "/" in resolved_hf_path else resolved_hf_path
+        print(f"📦 正在加载模型权重: {display}")
+        tokenizer = load_tokenizer(resolved_hf_path)
+        model = load_causal_lm(
+            resolved_hf_path,
+            device,
+            attn_implementation=attn_implementation_for_device(device),
+        )
+        for p in model.parameters():
+            p.requires_grad_(False)
+        model_device = next(model.parameters()).device
+        device_name = DeviceManager.get_device_name(device)
+        print(f"✓ {display} 已加载 ({device_name})")
+        out = (tokenizer, model, model_device)
+        _hf_loaded[resolved_hf_path] = out
+        return out
+def ensure_project_loaded(project_name: str):
+    """确保项目已加载，如果未加载则加载它"""
+    if not project_name:
+        raise ValueError("model name is required")
+    if not project_registry.is_available(project_name):
+        raise KeyError(project_name)
+    try:
+        return project_registry.ensure_loaded(project_name)
+    except KeyError:
+        # Re-raise to allow caller to format message uniformly.
+        raise
+    except Exception as exc:  # noqa: BLE001 - propagate detailed message
+        raise RuntimeError(f"模型 '{project_name}' 加载失败: {exc}") from exc
+def _register_main_qwenlm_if_needed():
+    """
+    信息密度路径：在 MAIN 槽位权重已就绪后，注册 project_registry 中的 QwenLM 实例。
+    语义槽位无对应 registry 包装，故仅此槽位需要。
+    """
+    from backend.app_context import get_app_context
+    context = get_app_context(prefer_module_context=True)
+    selected_name = context.model_name
+    if not selected_name:
+        raise ValueError("未指定模型名称")
+    if selected_name in project_registry:
+        _ensure_default_project_ready(selected_name)
+        return
+    if not project_registry.is_available(selected_name):
+        raise KeyError(f"模型 '{selected_name}' 未找到，可用模型: {list(REGISTERED_MODELS.keys())}")
+    try:
+        project_registry.load(selected_name)
+        _ensure_default_project_ready(selected_name)
+    except Exception as exc:  # noqa: BLE001
+        raise RuntimeError(f"模型 '{selected_name}' 加载失败: {exc}") from exc
+def preload_all_slots():
+    """
+    启动预载（非 --no_auto_load）：对 CONFIGURED_SLOTS 各解析 HF 路径，去重后加载全部权重，
+    再注册主槽位 QwenLM 项目。两槽位在「先加载权重」层面完全对等。
+    """
+    from backend.app_context import get_app_context
+    get_app_context(prefer_module_context=True)
+    paths = {_resolved_hf_path_for_slot(s) for s in CONFIGURED_SLOTS}
+    with _init_lock:
+        for path in paths:
+            ensure_model_loaded(path)
+        _register_main_qwenlm_if_needed()
+def ensure_slot_ready(slot: ModelSlot):
+    """
+    槽位业务就绪（对称 API）：保证该槽位后续推理所需状态已备好。
+    - 两槽位均先保证 HF 权重已加载，返回 (tokenizer, model, device)。
+    - MAIN 另需将 QwenLM 挂入 project_registry（信息密度管线）；SEMANTIC 无 registry 步骤。
+    懒加载时：信息密度调 ensure_main_slot_ready()；语义/续写调 ensure_semantic_slot_ready()。
+    """
+    from backend.app_context import get_app_context
+    get_app_context(prefer_module_context=True)
+    if slot == ModelSlot.MAIN:
+        with _init_lock:
+            out = ensure_slot_weights_loaded(ModelSlot.MAIN)
+            _register_main_qwenlm_if_needed()
+            return out
+    if slot == ModelSlot.SEMANTIC:
+        return ensure_slot_weights_loaded(ModelSlot.SEMANTIC)
+    raise ValueError(f"unknown ModelSlot: {slot!r}")
+def ensure_main_slot_ready():
+    """懒加载首次信息密度：同 ensure_slot_ready(ModelSlot.MAIN)。"""
+    return ensure_slot_ready(ModelSlot.MAIN)
+def ensure_semantic_slot_ready():
+    """懒加载首次语义类请求：同 ensure_slot_ready(ModelSlot.SEMANTIC)。"""
+    return ensure_slot_ready(ModelSlot.SEMANTIC)
+def get_current_model_max_token_length() -> int:
+    """
+    查询当前生效模型的 max_token_length 参数。
+    优先从已加载的模型实例获取，未加载时取 default_model.default_cpu_machine 配置。
+    """
+    from backend.app_context import get_app_context
+    from backend.runtime_config import RUNTIME_CONFIGS
+    try:
+        context = get_app_context(prefer_module_context=True)
+        model_name = context.model_name or DEFAULT_MODEL
+    except RuntimeError:
+        model_name = "default_model"
+    project = project_registry.get(model_name)
+    if project is not None and hasattr(project.lm, "max_length"):
+        return project.lm.max_length
+    return RUNTIME_CONFIGS["default_model"]["default_cpu_machine"]["max_token_length"]
+def _ensure_default_project_ready(selected_name: str):
+    """确保默认项目已准备好"""
+    if not selected_name:
+        return
+    if selected_name in project_registry:
+        return
+    print(f"⚠️ 默认模型未缓存，正在预加载: {selected_name}")
+    project_registry.ensure_loaded(selected_name)
+# 旧名保留（与槽位就绪 API 等价）
+ensure_semantic_loaded = ensure_semantic_slot_ready
+ensure_main_project_ready = ensure_main_slot_ready
+def get_semantic_model_display_name() -> str:
+    """返回 semantic 槽位 HuggingFace 路径（用于结果中的 model 字段）"""
+    return _resolved_hf_path_for_slot(ModelSlot.SEMANTIC)
+def ensure_main_model_loaded():
+    """
+    仅需主模型前向、且不必经过 project_registry 时（如 attribution）：MAIN 槽位权重。
+    """
+    return ensure_slot_weights_loaded(ModelSlot.MAIN)
+def get_main_model_display_name() -> str:
+    """返回主槽位 HuggingFace 路径（用于结果中的 model 字段）"""
+    return _resolved_hf_path_for_slot(ModelSlot.MAIN)

backend/next_token_topk.py ADDED Viewed

	@@ -0,0 +1,26 @@

+"""
+下一 token 的 top-k 解码：与语义分析 logits_gradient 一致，供 semantic / attribution 复用。
+"""
+from typing import List, Tuple
+import torch
+from .api.utils import round_to_sig_figs
+DEFAULT_NEXT_TOKEN_TOPK = 10
+def decode_topk_ids_to_strings_and_rounded_probs(
+    probs_1d: torch.Tensor,
+    tokenizer,
+    topk_ids_1d: torch.Tensor,
+) -> Tuple[List[str], List[float]]:
+    """
+    probs_1d: 对单位置 logits 的 softmax，shape [vocab_size]。
+    topk_ids_1d: torch.topk(logits, k) 返回的 indices，shape [k]。
+    返回与语义分析 debug_info 相同形态的 topk_tokens、topk_probs（概率已 round_to_sig_figs）。
+    """
+    ids_list = topk_ids_1d.tolist()
+    topk_tokens = [tokenizer.decode([int(tid)]) for tid in ids_list]
+    topk_probs = [round_to_sig_figs(probs_1d[int(tid)].item()) for tid in ids_list]
+    return topk_tokens, topk_probs

backend/oom.py ADDED Viewed

	@@ -0,0 +1,55 @@

+"""OOM 处理：MPS/CUDA 显存或 CPU 内存不足时退出进程，由进程管理器重启"""
+import os
+import threading
+import time
+def _check_oom_msg(msg: str) -> bool:
+    patterns = (
+        "out of memory",
+        "out of memory error",
+        "memory allocation",
+        "cannot allocate memory",
+        "insufficient memory",
+        "ran out of memory",
+        "resource exhausted",
+        "cuda error: out of memory",
+        "mps backend out of memory",
+    )
+    return any(p in msg.lower() for p in patterns)
+def is_oom_error(e: Exception) -> bool:
+    """检测是否为 OOM（含 MPS/CUDA 显存、CPU 内存），此类错误后进程无法恢复，需重启"""
+    if isinstance(e, MemoryError):
+        return True
+    if _check_oom_msg(str(e)):
+        return True
+    # 检查异常链（如被 RuntimeError 包装的 OOM）
+    for exc in (getattr(e, "__cause__", None), getattr(e, "__context__", None)):
+        if exc is not None and (isinstance(exc, MemoryError) or _check_oom_msg(str(exc))):
+            return True
+    return False
+def exit_if_oom(e: Exception, defer_seconds: float = 0) -> None:
+    """若为 OOM 则退出进程，由进程管理器重启以恢复内存。
+    defer_seconds: 延迟退出秒数，用于先返回错误响应再退出（非流式需 > 0）
+    """
+    if not is_oom_error(e):
+        return
+    msg = f"🛑 OOM 检测到，进程退出以便重启: {e}"
+    if defer_seconds > 0:
+        msg = f"🛑 OOM 检测到，{defer_seconds}s 后进程退出以便重启: {e}"
+    print(msg)
+    def _exit():
+        if defer_seconds > 0:
+            time.sleep(defer_seconds)
+        os._exit(1)
+    if defer_seconds > 0:
+        threading.Thread(target=_exit, daemon=False).start()
+    else:
+        os._exit(1)

backend/path_utils.py ADDED Viewed

	@@ -0,0 +1,92 @@

+"""
+路径处理工具模块
+统一管理路径验证、规范化、解析等逻辑
+"""
+import os
+from pathlib import Path
+from typing import Optional
+def normalize_path(path: str) -> str:
+    """
+    统一处理路径：将空字符串转换为 "/"
+    Args:
+        path: 输入路径
+    Returns:
+        规范化后的路径
+    """
+    return path if path else "/"
+def check_path_in_demo_dir(path: Path, demo_dir: Path) -> bool:
+    """
+    检查路径是否在demo目录内（Python 3.8兼容）
+    Args:
+        path: 要检查的路径
+        demo_dir: demo目录路径
+    Returns:
+        True 如果路径在demo目录内
+    """
+    try:
+        return path.is_relative_to(demo_dir)
+    except AttributeError:
+        # Python 3.8兼容性：使用os.path.commonpath
+        path_str = str(path)
+        demo_dir_str = str(demo_dir)
+        common = os.path.commonpath([path_str, demo_dir_str])
+        return common == demo_dir_str
+def validate_demo_path(path: str, demo_dir: Path) -> bool:
+    """
+    验证路径安全性，防止路径遍历攻击
+    Args:
+        path: 要验证的相对路径
+        demo_dir: demo目录的绝对路径
+    Returns:
+        True 如果路径安全
+    """
+    if not path or path == "/":
+        return True
+    # 移除首尾斜杠并规范化路径
+    normalized_path = path.strip('/').replace('\\', '/')
+    # 检查路径是否包含 ".." 或其他危险字符
+    if '..' in normalized_path.split('/'):
+        return False
+    try:
+        resolved_path = (demo_dir / normalized_path).resolve()
+        demo_dir_resolved = demo_dir.resolve()
+        return check_path_in_demo_dir(resolved_path, demo_dir_resolved)
+    except Exception:
+        return False
+def resolve_demo_path(demo_dir: Path, path: str) -> Optional[Path]:
+    """
+    解析并验证路径，返回绝对路径
+    Args:
+        demo_dir: demo目录的绝对路径
+        path: 要解析的相对路径
+    Returns:
+        解析后的绝对路径，验证失败则返回 None
+    """
+    if not validate_demo_path(path, demo_dir):
+        return None
+    if not path or path == "/":
+        return demo_dir
+    return (demo_dir / path.lstrip('/')).resolve()

backend/pred_topk_format.py ADDED Viewed

	@@ -0,0 +1,44 @@

+"""
+pred_topk 列表的格式化：与 language_checker 中 batch_decode + round_to_sig_figs 语义一致，供信息密度与续写共用。
+"""
+from typing import List, Tuple
+import torch
+from backend.api.utils import round_to_sig_figs
+def pred_topk_pairs_from_flat_ids_and_probs(
+    ids_flat: List[int],
+    probs_flat: List[float],
+    tokenizer,
+) -> List[Tuple[str, float]]:
+    """
+    对 torch.topk 展平后的 id / 概率序列解码为 [(token 文本, 概率), ...]。
+    与 QwenLM._decode_topk_tokens 内层逻辑一致（单次 batch_decode）。
+    """
+    if len(ids_flat) != len(probs_flat):
+        raise ValueError("ids_flat 与 probs_flat 长度须一致")
+    if not ids_flat:
+        return []
+    decoded = tokenizer.batch_decode([[tid] for tid in ids_flat], skip_special_tokens=False)
+    return [
+        (decoded[j], round_to_sig_figs(float(probs_flat[j])))
+        for j in range(len(ids_flat))
+    ]
+def pred_topk_pairs_from_probs_1d(
+    probs: torch.Tensor,
+    tokenizer,
+    top_k: int,
+) -> List[Tuple[str, float]]:
+    """单步 1D softmax 概率向量上的 top-k，用于续写 generate 的每步 scores。"""
+    top_k = min(int(top_k), int(probs.numel()))
+    if top_k <= 0:
+        return []
+    topk_probs, topk_ids = torch.topk(probs, top_k, dim=-1)
+    ids_flat = topk_ids.cpu().flatten().tolist()
+    probs_flat = topk_probs.detach().cpu().float().numpy().flatten().tolist()
+    return pred_topk_pairs_from_flat_ids_and_probs(ids_flat, probs_flat, tokenizer)

backend/prediction_attributor.py ADDED Viewed

	@@ -0,0 +1,185 @@

+"""
+预测归因：对任意上下文的下一个 token 预测，计算指定候选 token 的 logit
+对输入各 token embedding 的梯度，以梯度 L2 范数作为归因分。
+由请求参数 `model` 选择权重槽位：base 为主槽位（--model），instruct 为语义槽位（--semantic_model）。
+"""
+import math
+from typing import Dict, Optional
+import torch
+from .api.utils import round_to_sig_figs
+from .device import DeviceManager
+from .model_manager import (
+    ModelSlot,
+    ensure_slot_weights_loaded,
+    get_main_model_display_name,
+    get_semantic_model_display_name,
+)
+from .next_token_topk import decode_topk_ids_to_strings_and_rounded_probs, DEFAULT_NEXT_TOKEN_TOPK
+def _get_gradient_checkpointing() -> bool:
+    """默认 True；``--no-gradient-checkpointing`` 关闭。"""
+    try:
+        from backend.app_context import get_args
+        return getattr(get_args(), "gradient_checkpointing", True)
+    except RuntimeError:
+        return True
+# 归因输入长度上限（token 数）；超长则报错
+ATTRIBUTION_MAX_TOKEN_LENGTH = 500
+# 与 API 请求体 `model` 一致：base=主槽位，instruct=语义槽位
+PREDICTION_ATTR_MODEL_BASE = "base"
+PREDICTION_ATTR_MODEL_INSTRUCT = "instruct"
+def _slot_for_prediction_attr_model(model: str) -> ModelSlot:
+    if model == PREDICTION_ATTR_MODEL_BASE:
+        return ModelSlot.MAIN
+    if model == PREDICTION_ATTR_MODEL_INSTRUCT:
+        return ModelSlot.SEMANTIC
+    raise ValueError(
+        f"Unsupported model {model!r}; only {PREDICTION_ATTR_MODEL_BASE!r} and "
+        f"{PREDICTION_ATTR_MODEL_INSTRUCT!r} are supported."
+    )
+def analyze_prediction_attribution(
+    context: str, target_prediction: Optional[str] = None, *, model: str
+) -> Dict:
+    """
+    计算 context 中各 token 对 target_prediction 首 token 预测的归因分。
+    Args:
+        context: 输入上下文文本（token 数不得超过 ATTRIBUTION_MAX_TOKEN_LENGTH，否则抛 ValueError）
+        target_prediction: 目标预测文本；tokenize 后取第一个 token 作为归因目标。
+                           省略或传 None 时自动使用 top-1（贪心解码）。
+        model: ``base`` 为主槽位权重，``instruct`` 为语义槽位权重（与 API 请求体一致）
+    Returns:
+        {
+            "model": str,
+            "target_token": str,       # 归因目标 token 的字符串
+            "target_prob": float,      # 该 token 在 next-token 分布中的预测概率
+            "token_attribution": [{"offset": [s, e], "raw": str, "score": float}, ...],
+            "debug_info": {"topk_tokens": [...], "topk_probs": [...]},  # 与语义分析同形（下一 token top10）
+            "is_eos": bool,            # target_token 是否为 EOS token
+        }
+    """
+    slot = _slot_for_prediction_attr_model(model)
+    tokenizer, hf_model, device = ensure_slot_weights_loaded(slot)
+    model_display = (
+        get_main_model_display_name() if slot == ModelSlot.MAIN else get_semantic_model_display_name()
+    )
+    # 归因目标 id 仅在前向得到 logits 后解析：top-1 用 argmax；显式 target 用 encode（可与 argmax 不同）。
+    use_top1 = target_prediction is None
+    # 对 context 编码，保留 offset_mapping 用于还原字符位置
+    enc = tokenizer(context, return_tensors="pt", return_offsets_mapping=True)
+    input_ids = enc["input_ids"].to(device)
+    offset_mapping = enc["offset_mapping"][0].tolist()
+    n_tokens = input_ids.shape[1]
+    if n_tokens > ATTRIBUTION_MAX_TOKEN_LENGTH:
+        raise ValueError(
+            "Context exceeds attribution length limit "
+            f"({ATTRIBUTION_MAX_TOKEN_LENGTH} tokens); current length is {n_tokens} tokens."
+        )
+    # 通过 embedding 层获取可微输入
+    embed_layer = hf_model.get_input_embeddings()
+    embeds = embed_layer(input_ids).detach().clone().requires_grad_(True)
+    use_gc = _get_gradient_checkpointing()
+    try:
+        hf_model.eval()
+        if use_gc:
+            hf_model.gradient_checkpointing_enable()
+        with torch.set_grad_enabled(True):
+            # 归因只需最后一步 logits，不需要 KV cache；关闭可显著降低长上下文内存峰值。
+            outputs = hf_model(inputs_embeds=embeds, output_attentions=False, use_cache=False)
+        # 显式同步，确保前向已完成（与 semantic logits_gradient 一致）
+        if device.type == "cuda":
+            torch.cuda.synchronize(device)
+        elif device.type == "mps":
+            torch.mps.synchronize()
+        logits = outputs.logits[0, -1, :]  # next-token logits，shape: [vocab_size]
+        probs = torch.softmax(logits, dim=-1)
+        _, topk_ids = torch.topk(logits, DEFAULT_NEXT_TOKEN_TOPK)
+        topk_tokens, topk_probs = decode_topk_ids_to_strings_and_rounded_probs(
+            probs, tokenizer, topk_ids
+        )
+        if use_top1:
+            target_token_id = int(topk_ids[0].item())
+            target_token = tokenizer.decode([target_token_id])
+        else:
+            assert target_prediction is not None
+            target_ids = tokenizer.encode(target_prediction, add_special_tokens=False)
+            if not target_ids:
+                raise ValueError(f"Cannot tokenize target_prediction: {target_prediction!r}")
+            target_token_id = target_ids[0]
+            target_token = tokenizer.decode([target_token_id])
+        target_prob = round_to_sig_figs(probs[target_token_id].item())
+        # 对目标 token 的 raw logit 反传（不经 softmax，避免饱和与竞争污染）
+        logits[target_token_id].backward()
+        grad = embeds.grad
+        if grad is None:
+            raise RuntimeError(
+                "Gradient did not propagate; this model may not support attribution (e.g. int8 quantization)."
+            )
+        # 显式同步，确保反向已完成后再读梯度（与 semantic logits_gradient 一致）
+        if device.type == "cuda":
+            torch.cuda.synchronize(device)
+        elif device.type == "mps":
+            torch.mps.synchronize()
+        norms = grad[0].float().norm(dim=-1).cpu().tolist()
+        # 按 offset 过滤特殊 token（BOS/EOS 的 span 长度为 0）
+        token_attribution = []
+        nan_count = 0
+        for (s, e), norm in zip(offset_mapping, norms):
+            if s >= e:
+                continue
+            if not math.isfinite(norm):
+                score = 0.0
+                nan_count += 1
+            else:
+                score = round_to_sig_figs(norm)
+            token_attribution.append({
+                "offset": [s, e],
+                "raw": context[s:e],
+                "score": score,
+            })
+        if nan_count > 0:
+            print(f"⚠️ token_attribution 中有 {nan_count} 个 score 为 NaN/Inf，已替换为 0。")
+        eos_id = tokenizer.eos_token_id
+        is_eos = eos_id is not None and target_token_id == int(eos_id)
+        return {
+            "model": model_display,
+            "target_token": target_token,
+            "target_prob": target_prob,
+            "token_attribution": token_attribution,
+            "debug_info": {"topk_tokens": topk_tokens, "topk_probs": topk_probs},
+            "is_eos": is_eos,
+        }
+    finally:
+        if use_gc:
+            hf_model.gradient_checkpointing_disable()
+        # 与 semantic_analyzer._analyze_logits_gradient 一致：每次推理后清理，避免 MPS/CUDA 累积
+        DeviceManager.clear_cache(device)

backend/project_registry.py ADDED Viewed

	@@ -0,0 +1,72 @@

+from typing import Dict, Iterable, Optional, Sequence, Tuple
+class ModelInstance:
+    """Lightweight wrapper holding a configured language model instance."""
+    def __init__(self, model_cls, config):
+        self.config = config
+        self.lm = model_cls()
+class ModelRegistry:
+    """Manages lazy loading and caching of backend language models."""
+    def __init__(self, available_models: Dict[str, object]):
+        self._available_models = available_models
+        self._projects: Dict[str, ModelInstance] = {}
+    def __contains__(self, project_name: str) -> bool:
+        return project_name in self._projects
+    def get(self, project_name: str) -> Optional[ModelInstance]:
+        return self._projects.get(project_name)
+    def configs(self) -> Dict[str, object]:
+        return {name: project.config for name, project in self._projects.items()}
+    def available_model_names(self) -> Sequence[str]:
+        return tuple(self._available_models.keys())
+    def is_available(self, project_name: str) -> bool:
+        return project_name in self._available_models
+    def load(self, project_name: str) -> ModelInstance:
+        if project_name not in self._available_models:
+            raise KeyError(f"模型 '{project_name}' 未在 REGISTERED_MODELS 中注册")
+        project = ModelInstance(self._available_models[project_name], project_name)
+        self._projects[project_name] = project
+        return project
+    def ensure_loaded(self, project_name: str) -> ModelInstance:
+        """Return a project instance, loading it if necessary."""
+        if project_name in self._projects:
+            return self._projects[project_name]
+        return self.load(project_name)
+    def unload(self, project_name: str) -> bool:
+        """卸载指定模型，释放内存"""
+        if project_name in self._projects:
+            del self._projects[project_name]
+            return True
+        return False
+    def ensure_any(self, candidates: Iterable[str]) -> Tuple[str, ModelInstance]:
+        """Load (or reuse) the first successfully instantiated project."""
+        last_error: Optional[Exception] = None
+        for candidate in candidates:
+            if not candidate:
+                continue
+            if candidate in self._projects:
+                return candidate, self._projects[candidate]
+            try:
+                project = self.load(candidate)
+                return candidate, project
+            except Exception as exc:  # noqa: BLE001 - bubble up aggregated info
+                last_error = exc
+                continue
+        if last_error:
+            raise last_error
+        raise ValueError("没有可用的模型！")

backend/quantization_config.py ADDED Viewed

	@@ -0,0 +1,42 @@

+"""
+量化配置（语义分析、信息密度分析共用）
+从环境变量读取并返回设备相关的量化策略：
+- FORCE_INT8=1: INT8 量化（CPU/CUDA 支持，MPS 不支持）
+- CPU_FORCE_BFLOAT16=1: CPU 使用 bfloat16
+"""
+import os
+from typing import NamedTuple
+import torch
+class QuantizationConfig(NamedTuple):
+    """量化配置，语义模型和信息密度模型共用"""
+    use_int8: bool
+    dtype: torch.dtype
+def get_quantization_config(device: torch.device) -> QuantizationConfig:
+    """
+    根据设备和环境变量返回量化配置。
+    Returns:
+        QuantizationConfig: use_int8, dtype
+    """
+    force_int8 = os.environ.get("FORCE_INT8") == "1"
+    force_bfloat16 = os.environ.get("CPU_FORCE_BFLOAT16") == "1"
+    if device.type == "cpu":
+        use_int8 = force_int8
+        dtype = torch.bfloat16 if force_bfloat16 else torch.float32
+    elif device.type == "cuda":
+        use_int8 = force_int8
+        dtype = torch.float16
+    else:
+        # MPS 不支持 INT8
+        use_int8 = False
+        dtype = torch.float16
+    return QuantizationConfig(use_int8=use_int8, dtype=dtype)

backend/runtime_config.py ADDED Viewed

	@@ -0,0 +1,402 @@

+"""
+运行时配置管理模块
+负责管理不同模型在不同平台下的运行时参数配置，包括：
+- max_token_length: 文本分析的最大 token 数限制（信息密度分析）
+- chunk_size: 推理时的分块大小
+- 语义分析有独立的 SEMANTIC_RUNTIME_CONFIGS，仅含 max_token_length
+平台 ID 说明：
+- local_mps: 本地 Apple Silicon（M1/M2/M3）
+- cloud_cuda: 云端 CUDA GPU
+- cloud_cpu_16g: 云端大内存 CPU（如 HF Space 免费层，16G RAM）
+- cloud_cpu_32g: 云端大内存 CPU（如 HF Space CPU upgrade，32G RAM）
+- default_cpu_machine: 默认 CPU 机器（未知或未识别的 CPU 环境）
+- 未来可扩展: cloud_cuda_a100, cloud_cuda_24g 等
+"""
+import os
+import torch
+import sys
+from typing import Dict, Optional
+# ============= 平台级常量 =============
+# 分析接口的 pred_topk 默认数量（候选词数量）
+# 前端 ToolTip 显示数量与此保持一致
+DEFAULT_TOPK = 10
+# MPS 单次 TopK 操作的安全序列长度上限（避免 MPS bug）
+# chunk_size 必须小于此值以确保每个 chunk 的 TopK 计算安全
+MPS_TOPK_BUG_THRESHOLD = 2048
+# ============= 运行时参数配置表 (Model × Platform) =============
+#
+# 二维表结构：每个模型针对每个平台配置 max_token_length 和 chunk_size
+#
+# 四层覆盖优先级（从高到低）：
+#   1. (model_name, platform)        - 模型在该平台的专用配置（最精确）
+#   2. (model_name, "default_cpu_machine") - 模型的通用配置（跨平台）
+#   3. ("default_model", platform)   - 平台的通用配置（跨模型）
+#   4. ("default_model", "default_cpu_machine") - 全局兜底配置
+#
+# 每层支持部分覆盖：只填 max_token_length 或 chunk_size 均可
+RUNTIME_CONFIGS = {
+    # 全局默认模型配置
+    "default_model": {
+        # 默认 CPU 机器配置（最保守，用于未识别的 CPU 环境）
+        "default_cpu_machine": {
+            "max_token_length": 2000,
+            "chunk_size": 256
+        },
+        # 云端 CPU（16G），如 HF Spaces CPU basic
+        "cloud_cpu_16g": {
+            "max_token_length": 2000,
+            "chunk_size": 256
+        },
+        # 云端 CPU（32G），如 HF Spaces CPU upgrade
+        "cloud_cpu_32g": {
+            "max_token_length": 5000,
+            "chunk_size": 512
+        },
+        # 云端 GPU 显存充足
+        "cloud_cuda": {
+            # "max_token_length": 10000,
+            "max_token_length": 5000,
+            "chunk_size": 1024
+        },
+        # 本地 Apple Silicon
+        "local_mps": {
+            "max_token_length": 2000,
+            "chunk_size": 512
+        }
+    },
+    # # Qwen3-1.7B
+    # "qwen3-1.7b": {
+    #     "local_mps": {
+    #         "max_token_length": 2000,
+    #         "chunk_size": 128
+    #     }
+    # }
+}
+# ============= 语义分析运行时配置（仅 max_token_length） =============
+# 按平台配置，语义分析独立于信息密度模型
+SEMANTIC_RUNTIME_CONFIGS = {
+    "default_cpu_machine": {"max_token_length": 300},
+    "cloud_cpu_16g": {"max_token_length": 300},
+    "cloud_cpu_32g": {"max_token_length": 1000},
+    "cloud_cuda": {"max_token_length": 1000},
+    "local_mps": {"max_token_length": 300},
+}
+# ============= 平台检测与配置解析 =============
+def detect_platform(verbose: bool = True) -> str:
+    """
+    自动检测当前运行平台
+    优先级：
+      1. 环境变量 FORCE_CPU（显式强制 CPU 模式）
+      2. 自动探测硬件（cuda/mps/cpu）
+      3. 细分 CPU 类型（如 cloud_cpu_16g）
+    Args:
+        verbose: 是否打印检测信息
+    Returns:
+        平台 ID 字符串（如 'local_mps', 'cloud_cuda', 'cloud_cpu_16g', 'cloud_cpu_32g', 'default_cpu_machine'）
+    """
+    # 1. 显式强制 CPU（可通过环境变量 FORCE_CPU=1 启用）
+    if os.environ.get("FORCE_CPU") == "1":
+        print(f"🔧 强制 CPU 模式")
+        return _detect_cpu_variant()
+    # 2. 自动探测 GPU/MPS
+    if torch.cuda.is_available():
+        platform = "cloud_cuda"
+    elif torch.backends.mps.is_available():
+        platform = "local_mps"
+    else:
+        # 3. 细分 CPU 类型
+        platform = _detect_cpu_variant()
+    print(f"🔍 自动检测平台配置: {platform}")
+    return platform
+def _detect_cpu_variant() -> str:
+    """
+    检测具体的 CPU 环境变体（内部函数）
+    根据内存大小识别不同的 CPU 环境：
+    - >= 30GB: cloud_cpu_32g（32G 内存环境）
+    - >= 15GB: cloud_cpu_16g（16G 内存环境）
+    - 其他: default_cpu_machine（默认配置）
+    优先检测容器内存限制（cgroup），如果不可用则回退到系统内存检测。
+    """
+    total_memory = 0
+    try:
+        # 优先检测容器内存限制（cgroup）
+        # cgroup v2: /sys/fs/cgroup/memory.max
+        # cgroup v1: /sys/fs/cgroup/memory/memory.limit_in_bytes
+        cgroup_paths = [
+            "/sys/fs/cgroup/memory.max",  # cgroup v2
+            "/sys/fs/cgroup/memory/memory.limit_in_bytes",  # cgroup v1
+        ]
+        for cgroup_path in cgroup_paths:
+            try:
+                if os.path.exists(cgroup_path):
+                    with open(cgroup_path, 'r') as f:
+                        limit_str = f.read().strip()
+                        # cgroup v2 可能返回 "max" 表示无限制
+                        if limit_str == "max":
+                            break
+                        limit_bytes = int(limit_str)
+                        if limit_bytes > 0 and limit_bytes < (2 ** 63):  # 合理范围
+                            total_memory = limit_bytes
+                            print(f"🔍 从 cgroup 检测到容器内存限制: {total_memory / (1024 ** 3):.2f} GB")
+                            break
+            except (ValueError, IOError, OSError):
+                continue
+        # 如果 cgroup 检测失败，回退到系统内存检测
+        if total_memory == 0 and sys.platform != "win32":
+            try:
+                page_size = os.sysconf('SC_PAGE_SIZE')
+                phys_pages = os.sysconf('SC_PHYS_PAGES')
+                total_memory = page_size * phys_pages
+                print(f"🔍 从系统配置检测到内存: {total_memory / (1024 ** 3):.2f} GB")
+            except (ValueError, AttributeError):
+                pass
+        # 转换为 GB
+        total_memory_gb = total_memory / (1024 ** 3)
+        # 判断标准：
+        # - >= 30GB: cloud_cpu_32g（HF Spaces CPU upgrade 通常会有 30.x GB 可见）
+        # - >= 15GB: cloud_cpu_16g（HF Spaces CPU basic 通常会有 15.x GB 可见）
+        if total_memory_gb >= 30.0:
+            return "cloud_cpu_32g"
+        elif total_memory_gb >= 15.0:
+            return "cloud_cpu_16g"
+    except Exception as e:
+        print(f"⚠️  CPU 环境检测失败，回退到默认配置: {e}")
+    return "default_cpu_machine"
+def merge_runtime_config(model_name: str, platform: str, verbose: bool = True) -> Dict[str, int]:
+    """
+    四层配置合并：支持部分覆盖，并追踪配置来源
+    优先级（从高到低）：
+      1. (model_name, platform)        - 模型在该平台的专用配置
+      2. (model_name, "default_cpu_machine") - 模型通用配置
+      3. ("default_model", platform)   - 平台通用配置
+      4. ("default_model", "default_cpu_machine") - 全局兜底
+    Args:
+        model_name: 模型名称（如 'qwen3-1.7b'）
+        platform: 平台 ID（如 'local_mps'）
+        verbose: 是否打印配置来源提示
+    Returns:
+        合并后的配置字典 {"max_token_length": int, "chunk_size": int}
+    Raises:
+        ValueError: 配置不完整时抛出
+    """
+    # 准备四层配置（从低优先级到高优先级）
+    layers = [
+        {
+            "name": "default_model.default_cpu_machine",
+            "config": RUNTIME_CONFIGS.get("default_model", {}).get("default_cpu_machine", {})
+        },
+        {
+            "name": f"default_model.{platform}",
+            "config": RUNTIME_CONFIGS.get("default_model", {}).get(platform, {})
+        },
+        {
+            "name": f"{model_name}.default_cpu_machine",
+            "config": RUNTIME_CONFIGS.get(model_name, {}).get("default_cpu_machine", {})
+        },
+        {
+            "name": f"{model_name}.{platform}",
+            "config": RUNTIME_CONFIGS.get(model_name, {}).get(platform, {})
+        }
+    ]
+    # 追踪每个配置项的来源
+    config_sources = {}  # {"max_token_length": "层级名称", "chunk_size": "层级名称"}
+    merged = {}
+    # 依次合并（后面的覆盖前面的）
+    for layer in layers:
+        layer_config = layer["config"]
+        for key, value in layer_config.items():
+            merged[key] = value
+            config_sources[key] = layer["name"]
+    # 确保必需字段存在
+    if "max_token_length" not in merged or "chunk_size" not in merged:
+        raise ValueError(
+            f"配置不完整: model={model_name}, platform={platform}, "
+            f"merged={merged}. 缺少必需字段！"
+        )
+    # 打印当前使用的配置项的配置来源
+    for key, source in config_sources.items():
+        actual_value = merged[key]
+        print(f"\t{key}={actual_value} ( {source})")
+    return merged
+_semantic_max_token_length_cache: Optional[int] = None
+def get_semantic_max_token_length(verbose: bool = False) -> int:
+    """
+    获取语义分析的 max_token_length（从 SEMANTIC_RUNTIME_CONFIGS 按平台读取）
+    平台检测结果会缓存，避免每次分析重复检测。
+    """
+    global _semantic_max_token_length_cache
+    if _semantic_max_token_length_cache is not None:
+        return _semantic_max_token_length_cache
+    platform = detect_platform(verbose=verbose)
+    config = SEMANTIC_RUNTIME_CONFIGS.get(platform, SEMANTIC_RUNTIME_CONFIGS["default_cpu_machine"])
+    _semantic_max_token_length_cache = config["max_token_length"]
+    return _semantic_max_token_length_cache
+def validate_platform_config(platform: str, chunk_size: int, verbose: bool = True) -> None:
+    """
+    平台级安全校验（前置到初始化阶段）
+    Args:
+        platform: 平台 ID
+        chunk_size: 配置的 chunk_size
+        verbose: 是否打印校验信息
+    Raises:
+        ValueError: 配置不符合平台限制时抛出
+    """
+    # MPS 平台的特殊限制
+    if "mps" in platform.lower():
+        if chunk_size > MPS_TOPK_BUG_THRESHOLD:
+            raise ValueError(
+                f"❌ MPS 平台配置错误: chunk_size ({chunk_size}) "
+                f"超过安全上限 ({MPS_TOPK_BUG_THRESHOLD})\n"
+                f"   平台: {platform}\n"
+                f"   建议: 调整 RUNTIME_CONFIGS 中 {platform} 的 chunk_size"
+            )
+        if verbose:
+            print(f"✓ MPS 平台安全检查通过: chunk_size={chunk_size} (上限={MPS_TOPK_BUG_THRESHOLD})")
+def _get_cpu_info() -> Optional[str]:
+    """
+    读取 CPU 型号信息（仅用于显示）
+    Returns:
+        model_name, if None, return "未知"
+    """
+    model_name = None
+    try:
+        if sys.platform == 'linux':
+            with open('/proc/cpuinfo', 'r') as f:
+                for line in f:
+                    # 读取 model name
+                    if model_name is None and 'model name' in line.lower():
+                        model_name = line.split(':', 1)[1].strip()
+                    # 如果已经读取到所需信息，可以提前退出
+                    if model_name:
+                        break
+    except Exception:
+        pass
+    return model_name
+def _print_cpu_info() -> None:
+    """
+    打印 CPU 型号信息（所有平台都打印）
+    """
+    try:
+        cpu_model = _get_cpu_info()
+        model = cpu_model or "未知"
+        print(f"💻 CPU 型号: {model}")
+    except Exception as e:
+        print(f"⚠️  CPU 信息获取失败: {e}")
+def _print_cpu_thread_info() -> None:
+    """打印 CPU 线程配置信息（PyTorch 默认配置）"""
+    try:
+        intra_threads = torch.get_num_threads()
+        inter_threads = torch.get_num_interop_threads()
+        print(f"🧵 PyTorch 线程配置: intra-op={intra_threads}, inter-op={inter_threads}")
+    except Exception as e:
+        print(f"⚠️  CPU 线程信息获取失败: {e}")
+def load_runtime_config(model_name: str, verbose: bool = False) -> tuple[str, int, int]:
+    """
+    加载运行时配置的完整流程：检测平台 -> 合并配置 -> 校验 -> CPU调试信息
+    这是配置加载的主入口函数，封装了完整的配置加载逻辑。
+    Args:
+        model_name: 模型标识符（如 'qwen3-1.7b'）
+        verbose: 是否打印详细的配置信息
+    Returns:
+        tuple[platform, max_token_length, chunk_size]
+    Raises:
+        ValueError: 配置不完整或不符合平台限制时抛出
+    """
+    # 1. 检测平台
+    platform = detect_platform(verbose=verbose)
+    # 2. 四层配置合并（支持部分覆盖，并追踪配置来源）
+    config = merge_runtime_config(
+        model_name=model_name or "default_model",
+        platform=platform,
+        verbose=verbose
+    )
+    # 3. 提取配置
+    max_token_length = config["max_token_length"]
+    chunk_size = config["chunk_size"]
+    # 4. 平台级安全校验（MPS 限制等）
+    validate_platform_config(platform, chunk_size, verbose=verbose)
+    # 5. 打印 CPU 信息（所有平台都打印）
+    _print_cpu_info()
+    # 6. CPU 线程配置信息打印（仅针对 CPU 平台）
+    if "cpu" in platform.lower():
+        _print_cpu_thread_info()  # 打印调试信息
+    # 7. 打印配置摘要
+    print(
+        f"⚙️  运行时配置已加载 [model={model_name}, platform={platform}]: "
+        f"max_token_length={max_token_length}, chunk_size={chunk_size}"
+    )
+    return platform, max_token_length, chunk_size

backend/schemas.py ADDED Viewed

	@@ -0,0 +1,43 @@

+from dataclasses import asdict, dataclass, field
+from typing import Dict, List, Optional, Tuple
+@dataclass
+class TokenWithOffset:
+    offset: Tuple[int, int]
+    raw: str
+    real_topk: Optional[Tuple[int, float]] = None
+    pred_topk: List[Tuple[str, float]] = field(default_factory=list)
+@dataclass
+class AnalyzeResult:
+    model: Optional[str] = None
+    bpe_strings: List[TokenWithOffset] = field(default_factory=list)
+    error: Optional[str] = None
+@dataclass
+class AnalyzeRequest:
+    model: str
+    text: str
+@dataclass
+class AnalyzeResponse:
+    request: AnalyzeRequest
+    result: AnalyzeResult
+def serialize_analyze_result(result: AnalyzeResult) -> Dict:
+    return asdict(result)
+def create_empty_analysis_result(error: Optional[str] = None, model: Optional[str] = None) -> Dict:
+    result = AnalyzeResult()
+    if error:
+        result.error = error
+    if model:
+        result.model = model
+    return serialize_analyze_result(result)

backend/semantic_analyzer.py ADDED Viewed

	@@ -0,0 +1,280 @@

+"""
+Semantic analysis：基于 instruct 模型提取原文 token 与 query 的相关度
+使用 logits_gradient 梯度归因策略（与预测更一致），子策略由 --logits_gradient_submode 指定：
+- count：top-10 logits 梯度（排除 0），prompt 引导「数量」。0.6b下只适合用于判断文章整体是否有关联，1.7b下全能
+- match_score：目标 token logit 梯度，prompt 引导「相关度打分」。0.6b/1.7b下都不太有竞争力。【已废弃】
+- fill_blank：填空式，top-10 logits 梯度（排除 无），prompt 引导「最相关的一个词」。0.6b下只适合用于给token打分，1.7b下全能
+count/fill_blank 按概率加权（Σ pᵢ·zᵢ）。
+模型由 --semantic_model 参数指定，默认 qwen3-0.6b-instruct
+"""
+import gc
+import math
+from typing import Callable, Dict, List, Optional
+import torch
+from .api.utils import round_to_sig_figs
+from .device import DeviceManager
+from .model_manager import ensure_semantic_slot_ready, get_semantic_model_display_name
+from .next_token_topk import decode_topk_ids_to_strings_and_rounded_probs, DEFAULT_NEXT_TOKEN_TOPK
+from .runtime_config import get_semantic_max_token_length
+def _get_logits_gradient_submode() -> str:
+    """logits_gradient 子策略：count / match_score(已废弃) / fill_blank"""
+    try:
+        from backend.app_context import get_args
+        return getattr(get_args(), "logits_gradient_submode", "fill_blank")
+    except RuntimeError:
+        return "fill_blank"
+def _truncate_text_by_tokens(tokenizer, text: str, max_tokens: int) -> str:
+    """将 text 截断至最多 max_tokens 个 token；超长时打印提示。"""
+    text_ids = tokenizer.encode(text, add_special_tokens=False)
+    if len(text_ids) > max_tokens:
+        print(f"⚠️  原文过长，已截断至前 {max_tokens} token")
+        return tokenizer.decode(text_ids[:max_tokens])
+    return text
+def _get_gradient_checkpointing() -> bool:
+    """默认 True（run.py）；``--no-gradient-checkpointing`` 关闭。"""
+    try:
+        from backend.app_context import get_args
+        return getattr(get_args(), "gradient_checkpointing", True)
+    except RuntimeError:
+        return True
+def _get_verbose() -> bool:
+    """是否输出详细调试信息（由 --verbose 控制）"""
+    from backend.app_context import get_verbose
+    return get_verbose()
+def _analyze_logits_gradient(
+    query: str,
+    text: str,
+    tokenizer,
+    model,
+    device,
+    submode_override: Optional[str] = None,
+    progress_callback: Optional[Callable[[int, int, str, Optional[int]], None]] = None,
+    debug_info: bool = False,
+    full_match_degree_only: bool = False,
+) -> Dict:
+    """
+    梯度归因：logits 对输入 embedding 的梯度。
+    子策略：count / match_score(已废弃) / fill_blank，由 --logits_gradient_submode 指定。
+    submode_override: 评估时可选覆盖，用于同一进程内测试不同子模式。
+    """
+    TOTAL_STEPS = 4
+    submode = submode_override if submode_override is not None else _get_logits_gradient_submode()
+    max_length = get_semantic_max_token_length()
+    if progress_callback:
+        progress_callback(1, TOTAL_STEPS, "encoding", None)
+    # 根据submodule来决定不同的instruction
+    # 文档前用 \n\n 分隔，避免 tokenizer 将首字符与空格合并，导致 offset_mapping 计算错误
+    if submode == "count":
+        instruction = f"请问下面文字中有多少个词与查询主题（{query}）相关？文字内容：\n\n"
+    elif submode == "match_score":  # 已废弃
+        instruction = f"请问下面文字与查询主题（{query}）的相关程度是多少？请回答0/1/2（2为最高相关）。文字内容：\n\n"
+    elif submode == "fill_blank":
+        instruction = f"请问下面文字中哪个词与查询主题（{query}）最相关？如无相关词则回答“无”。文字内容：\n\n"
+    else:
+        raise ValueError(f"未知子模式: {submode}")
+    # 截断 text 到 max_length token，再拼
+    truncated_text = _truncate_text_by_tokens(tokenizer, text, max_length)
+    messages = [{"role": "user", "content": instruction + truncated_text}]
+    formatted = tokenizer.apply_chat_template(
+        messages, tokenize=False, add_generation_prompt=True,
+        enable_thinking=False
+    )
+    # 生成引导词：chat template 只支持完整消息，引导词需追加到 formatted
+    if submode == "count":
+        generation_guide = f"原文中与查询主题（{query}）相关的词的数量 = **"
+    elif submode == "match_score":  # 已废弃
+        generation_guide = f"文章和查询主题（{query}）的相关程度（0-2）打分为：**"
+    elif submode == "fill_blank":
+        # “引号是特意为了防止模型生成引号
+        generation_guide = f"原文中与查询主题（{query}）最相关的一个词是：**“"
+    else:
+        raise ValueError(f"未知子模式: {submode}")
+    formatted += generation_guide
+    # logits_gradient count/fill_blank 的 top-k，影响梯度目标覆盖的候选词数量
+    LOGITS_GRADIENT_TOPK = DEFAULT_NEXT_TOKEN_TOPK
+    idx = formatted.find(instruction)
+    instruction_start_char = idx if idx >= 0 else 0
+    text_start_char = instruction_start_char + len(instruction)
+    text_end_char = text_start_char + len(truncated_text)
+    lines = truncated_text.splitlines()
+    abbrev_text = truncated_text if len(lines) <= 2 else f"{lines[0]}\n...\n{lines[-1]}"
+    abbrev = formatted[:text_start_char] + abbrev_text + formatted[text_end_char:]
+    enc = tokenizer(
+        formatted,
+        return_tensors="pt",
+        return_offsets_mapping=True,
+    )
+    input_ids = enc["input_ids"].to(device)
+    offset_mapping = enc["offset_mapping"][0].tolist()
+    prompt_end = len(offset_mapping)
+    for i, (s, _) in enumerate(offset_mapping):
+        if s >= text_start_char:
+            prompt_end = i
+            break
+    embed_layer = model.get_input_embeddings()
+    embeds = embed_layer(input_ids).detach().clone().requires_grad_(True)
+    use_gc = _get_gradient_checkpointing()
+    if _get_verbose():
+        print(f"📌 logits_gradient: 推理原文 (tokens={len(offset_mapping)}):\n{abbrev}")
+    if progress_callback:
+        progress_callback(2, TOTAL_STEPS, "inference", None)
+    model.eval()
+    if use_gc:
+        model.gradient_checkpointing_enable()
+    try:
+        with torch.set_grad_enabled(not full_match_degree_only):
+            outputs = model(
+                inputs_embeds=embeds,
+                output_attentions=False,
+            )
+        # 显式同步，确保已完成，progress_callback 时机准确
+        if device.type == "cuda":
+            torch.cuda.synchronize(device)
+        elif device.type == "mps":
+            torch.mps.synchronize()
+        logits = outputs.logits[:, -1, :]
+        topk_vals, topk_ids = torch.topk(logits, LOGITS_GRADIENT_TOPK, dim=-1)
+        probs = torch.softmax(logits, dim=-1)
+        topk_tokens, topk_probs = decode_topk_ids_to_strings_and_rounded_probs(
+            probs[0], tokenizer, topk_ids[0]
+        )
+        if _get_verbose():
+            print(f"top{LOGITS_GRADIENT_TOPK}: {[f'{t}({p*100:.1f}%)' for t, p in zip(topk_tokens, topk_probs)]}")
+        neg_token = "无" if submode == "fill_blank" else "0"
+        neg_id = tokenizer.encode(neg_token, add_special_tokens=False)[0]
+        # 全文匹配度：count/match_score(已废弃) 用 1-P("0")，fill_blank 用 1-P("无")
+        p_neg = probs[0, neg_id].item()
+        full_match_degree = round(1.0 - p_neg, 4)
+        if full_match_degree_only:
+            return {
+                "model": get_semantic_model_display_name(),
+                "token_attention": [],
+                "full_match_degree": full_match_degree,
+            }
+        if progress_callback:
+            progress_callback(3, TOTAL_STEPS, "backward", None)
+        # 归因目标：raw logits（不经过 softmax backward），避免饱和与竞争污染。
+        if submode == "count" or submode == "fill_blank":
+            # count/fill_blank 均用 top-10、按概率加权 Σ pᵢ·zᵢ，并排除 neg_token（0/无）以保持梯度方向与「相关」一致。
+            vals = topk_vals[0]
+            w = probs[0, topk_ids[0]].detach().clone()
+            # 排除 neg_token
+            w[topk_ids[0] == neg_id] = 0
+            target_logit = (w * vals).sum()
+        elif submode == "match_score":  # 已废弃
+            target_ids = tokenizer.encode("2", add_special_tokens=False)
+            if not target_ids:
+                raise ValueError("tokenizer 无法编码 '2'")
+            target_logit = logits[0, target_ids[0]]
+        else:
+            raise ValueError(f"未知 submode: {submode}")
+        target_logit.backward()
+        grad = embeds.grad
+        if grad is None:
+            raise RuntimeError("logits_gradient: 梯度未回传，可能模型不支持（如 int8 量化）")
+        # 显式同步，确保已完成，progress_callback 时机准确
+        if device.type == "cuda":
+            torch.cuda.synchronize(device)
+        elif device.type == "mps":
+            torch.mps.synchronize()
+        if progress_callback:
+            progress_callback(4, TOTAL_STEPS, "processing", None)
+        text_token_end = len(offset_mapping)
+        # 在 GPU 上一次性计算所有 token 的 ‖∇f‖，避免循环内 .item() 导致 500 次 GPU→CPU 同步
+        grad_slice = grad[0, prompt_end:text_token_end].float()
+        norms = grad_slice.norm(dim=-1).cpu().tolist()
+        token_attention: List[Dict] = []
+        nan_count = 0
+        for i in range(prompt_end, text_token_end):
+            s, e = offset_mapping[i]
+            if s >= text_start_char and e <= text_end_char:
+                s_rel, e_rel = s - text_start_char, e - text_start_char
+                score = norms[i - prompt_end]
+                if not math.isfinite(score):
+                    score = 0.0
+                    nan_count += 1
+                else:
+                    score = round_to_sig_figs(score)
+                token_attention.append({"offset": [s_rel, e_rel], "raw": truncated_text[s_rel:e_rel], "score": score})
+        if nan_count > 0:
+            print(f"⚠️ token_attention 中有 {nan_count} 个 score 为 NaN/Inf，已替换为 0。")
+        out = {
+            "model": get_semantic_model_display_name(),
+            "token_attention": token_attention,
+            "full_match_degree": full_match_degree,
+        }
+        if debug_info:
+            out["debug_info"] = {"abbrev": abbrev, "topk_tokens": topk_tokens, "topk_probs": topk_probs}
+        return out
+    finally:
+        if use_gc:
+            model.gradient_checkpointing_disable()
+        # 每次推理后清理：避免连续多次调用时 MPS/CUDA 内存累积导致卡死
+        DeviceManager.clear_cache(device)
+def analyze_semantic(
+    query: str,
+    text: str,
+    submode_override: Optional[str] = None,
+    progress_callback: Optional[Callable[[int, int, str, Optional[int]], None]] = None,
+    debug_info: bool = False,
+    full_match_degree_only: bool = False,
+) -> Dict:
+    """
+    分析原文各 token 与 query 的相关度（使用 logits_gradient 梯度归因）。
+    Args:
+        query: 查询主题
+        text: 原文
+        submode_override: 评估时可选覆盖子模式（count/match_score已废弃/fill_blank）
+        progress_callback: 可选进度回调 (step, total_steps, stage, percentage)
+        debug_info: 为 True 时返回 debug_abbrev（推理原文缩写）；topk_tokens、topk_probs 始终在结果中
+    Returns:
+        {"model", "token_attention", "full_match_degree"}；debug_info=True 时包含 debug_info 对象
+    """
+    tokenizer, model, device = ensure_semantic_slot_ready()
+    return _analyze_logits_gradient(
+        query, text, tokenizer, model, device,
+        submode_override=submode_override,
+        progress_callback=progress_callback,
+        debug_info=debug_info,
+        full_match_degree_only=full_match_degree_only,
+    )

client/src/analysis.html ADDED Viewed

	@@ -0,0 +1,188 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <title></title>
+    <meta name="description"
+      content="Info Highlight visualizes token-level information density in text using LLMs, helping you quickly find key content and skip redundancy.">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <link rel="stylesheet" type="text/css" href="start.css">
+</head>
+<body>
+    <main class="main_frame">
+        <section class="left_panel">
+            <div class="floating_content">
+                <header class="app-page-toolbar app-page-toolbar--bleed">
+                    <h1 class="page-toolbar-title"><span class="title-main-line"><span data-page-title data-i18n></span><span class="title-tagline" data-page-subtitle data-i18n></span></span></h1>
+                    <div class="app-page-toolbar-actions">
+                        <a href="index.html" class="home-link" title="InfoLens Home" data-i18n="text,title">InfoLens Home</a>
+                        <a href="compare.html?showTextRender=1&demos=/quick-start-1.json,/quick-start-2.json" target="_blank" class="compare-link" style="display: none;" title="Compare analysis results" data-i18n="text,title">Compare results</a>
+                        <div class="settings-menu-wrapper">
+                            <button id="settings_btn" class="settings-btn" title="Settings" data-i18n="title">
+                                <span class="settings-icon">⚙️</span>
+                            </button>
+                            <div id="settings_menu" class="settings-menu" style="display: none;">
+                                <!-- INCLUDE partials/settings-menu-analysis.html -->
+                                <!-- INCLUDE partials/settings-menu-common-mid.html -->
+                                <!-- INCLUDE partials/settings-menu-trailing-admin.html -->
+                            </div>
+                        </div>
+                    </div>
+                </header>
+                <!-- 首页介绍内容容器（由 JS 动态加载） -->
+                <section id="home-intro-content" class="intro-section">
+                    <!-- Content will be loaded dynamically -->
+                </section>
+                <section class="demo-section">
+                    <div class="demo-header">
+                        <span id="demo_header_text" data-i18n>Quick start - select a demo:</span>
+                        <button id="refresh_demo_btn" class="refresh-btn" title="Refresh demo list" data-i18n="title">↻</button>
+                        <div class="file-input-wrapper">
+                            <button id="open_local_demo_btn" class="file-input-button" type="button" title="Open demo file from local" data-i18n="text,title">Select local</button>
+                            <span id="open_local_demo_filename" class="file-input-filename" data-i18n>No file
+                                selected</span>
+                            <input type="file" id="open_local_demo_input" style="display: none;"
+                                accept=".json,application/json">
+                        </div>
+                        <span id="demos_loading" class="demos-loading" data-i18n>Refreshing...</span>
+                    </div>
+                    <div class="demos"></div>
+                </section>
+                <section class="input-section">
+                    <div class="input-header">
+                        <span id="input_header_text"><span class="demo" data-i18n>or enter text:</span></span>
+                        <div class="text-action-buttons-top">
+                            <div class="textarea-counter" id="text_count_display">
+                                <span id="text_count_value">0</span> <span id="char_unit" data-i18n>chars</span>
+                            </div>
+                            <button id="clear_text_btn" class="text-action-btn" data-i18n>Clear</button>
+                            <button id="paste_text_btn" class="text-action-btn" data-i18n>Paste</button>
+                            <button id="load_url_btn" class="text-action-btn" title="Load text from URL and analyze"
+                                data-i18n="text,title">Analyze URL</button>
+                            <button id="analyze_save_btn" class="text-action-btn" data-i18n>Analyze&Upload</button>
+                        </div>
+                    </div>
+                    <div class="textarea-wrapper">
+                        <textarea id="test_text"></textarea>
+                        <div class="button-group">
+                            <div class="button-left">
+                                <button id="submit_text_btn" class="primary-btn" data-i18n>Analyze</button>
+                                <div class="loadersmall loader-small-container"></div>
+                                <span id="analyze_progress" class="analyze-progress"></span>
+                            </div>
+                            <div id="text_metrics" class="text-metrics">
+                                <div class="text-metrics-primary">
+                                    <span id="metric_bytes">0 B</span>
+                                    <span class="text-metrics-divider">|</span>
+                                    <span id="metric_chars">0 chars</span>
+                                    <span class="text-metrics-divider">|</span>
+                                    <span id="metric_tokens">0 tokens</span>
+                                </div>
+                                <div id="metric_total_surprisal" class="text-metrics-secondary">total information = 0
+                                    bits</div>
+                                <div id="metric_model" class="text-metrics-secondary">model: </div>
+                            </div>
+                            <div class="button-right">
+                                <button id="save_demo_btn" class="primary-btn inactive" data-i18n>Upload</button>
+                                <button id="save_local_demo_btn" class="primary-btn inactive" title="Save to local file"
+                                    data-i18n="text,title">Save</button>
+                            </div>
+                        </div>
+                    </div>
+                </section>
+                <section id="semantic_analysis_section" class="semantic-analysis-section" style="display: none;">
+                    <div class="semantic-analysis-controls">
+                        <div class="semantic-search-row">
+                            <div class="semantic-search-input-wrapper">
+                                <input type="text" id="semantic_search_input" class="semantic-search-input" placeholder="Enter query for semantic analysis">
+                                <button type="button" id="semantic_search_clear" class="semantic-search-clear demo-delete-btn" title="Clear" aria-label="Clear" data-i18n="title,aria-label">×</button>
+                                <ul id="semantic_search_history_dropdown" class="semantic-search-history-dropdown"></ul>
+                            </div>
+                            <div class="semantic-search-actions">
+                                <button id="semantic_search_btn" class="primary-btn" data-i18n>Search</button>
+                                <span id="semantic_match_degree" class="semantic-match-degree" style="display: none;"></span>
+                                <div id="semantic_search_loader" class="semantic-search-loader" style="visibility: hidden;"></div>
+                                <span id="semantic_progress" class="semantic-progress"></span>
+                            </div>
+                        </div>
+                        <div id="semantic_submode_row" class="semantic-submode-row" data-admin-only style="display: none;">
+                            <span class="semantic-submode-group">
+                                <label><input type="checkbox" id="semantic_chunked_mode" title="analyse in chunks" checked> chunked</label>
+                            </span>
+                            <span class="semantic-submode-group">
+                                <label class="semantic-submode-label" for="semantic_submode_select">submode: </label>
+                                <select id="semantic_submode_select" class="semantic-submode-select">
+                                    <option value="count">count</option>
+                                    <option value="fill_blank">fill_blank</option>
+                                    <option value="hybrid" selected>hybrid</option>
+                                </select>
+                            </span>
+                            <span id="semantic_threshold_item" class="semantic-submode-group" data-admin-only style="display: none;">
+                                <label class="semantic-submode-label" for="semantic_threshold_input">Match threshold:</label>
+                                <input type="number" id="semantic_threshold_input" class="semantic-threshold-input" min="0" max="1">
+                            </span>
+                            <span class="semantic-submode-group semantic-submode-group-right">
+                                <label class="semantic-submode-label" for="semantic_color_source_select">color source: </label>
+                                <select id="semantic_color_source_select" class="semantic-submode-select">
+                                    <option value="raw_score_normed">raw score normed</option>
+                                    <option value="signal_probability">signal probability</option>
+                                    <option value="pw_score" selected>pw score</option>
+                                </select>
+                            </span>
+                        </div>
+                    </div>
+                </section>
+            </div>
+            <section id="all_result" class="results-section">
+                <div id="stats" class="stats-container">
+                    <div id="match_score_progress_item" class="histogram-item" style="display: none;">
+                        <div id="match_score_progress_title"></div>
+                        <svg id="stats_match_score_progress"></svg>
+                    </div>
+                    <div id="raw_score_normed_histogram_item" class="histogram-item" style="display: none;">
+                        <div id="raw_score_normed_histogram_title"></div>
+                        <svg id="stats_raw_score_normed"></svg>
+                    </div>
+                    <div id="token_histogram_item" class="histogram-item" style="display: none;">
+                        <div id="token_histogram_title"></div>
+                        <svg id="stats_frac"></svg>
+                    </div>
+                    <div id="surprisal_progress_item" class="histogram-item" style="display: none;">
+                        <div id="surprisal_progress_title"></div>
+                        <svg id="stats_surprisal_progress"></svg>
+                    </div>
+                </div>
+            </section>
+        </section>
+        <div class="resizer" id="resizer"></div>
+        <section class="right_panel">
+            <div id="results">
+                <div id="major_tooltip" class="tooltip">
+                    <div class="currentToken"></div>
+                    <div class="myDetail"></div>
+                    <br />
+                    <div class="predictions predictions-table"></div>
+                </div>
+            </div>
+        </section>
+    </main>
+    <div id="toast" class="toast"></div>
+    <!-- INCLUDE partials/attribution-sidebar.html -->
+    <script src="vendor.js"></script>
+    <script src="start.js"></script>
+</body>
+</html>

client/src/attribution.html ADDED Viewed

	@@ -0,0 +1,166 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <title></title>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <link rel="stylesheet" type="text/css" href="attribution.css">
+</head>
+<body>
+    <main class="main_frame">
+        <section class="left_panel">
+            <div class="floating_content">
+                <header class="app-page-toolbar app-page-toolbar--bleed">
+                    <h1 class="page-toolbar-title"><span class="title-main-line"><span data-page-title data-i18n></span><span class="title-tagline" data-page-subtitle data-i18n></span></span></h1>
+                    <div class="app-page-toolbar-actions">
+                        <a href="index.html" class="home-link" title="InfoLens Home" data-i18n="text,title">InfoLens Home</a>
+                        <div class="settings-menu-wrapper">
+                            <button id="settings_btn" class="settings-btn" title="Settings" data-i18n="title">
+                                <span class="settings-icon">⚙️</span>
+                            </button>
+                            <div id="settings_menu" class="settings-menu" style="display: none;">
+                                <!-- INCLUDE partials/settings-menu-common-mid.html -->
+                                <!-- INCLUDE partials/settings-menu-trailing-admin.html -->
+                            </div>
+                        </div>
+                    </div>
+                </header>
+                <div class="chat-cached-history-bar">
+                    <div class="semantic-search-input-wrapper chat-prompt-history-wrapper">
+                        <button type="button" id="attribution_cached_history_btn" class="text-action-btn" data-i18n>Cached history</button>
+                        <ul id="attribution_cached_history_dropdown" class="semantic-search-history-dropdown"></ul>
+                    </div>
+                </div>
+                <section class="input-section">
+                    <div class="chat-prompt-panel">
+                        <div class="input-header">
+                            <span data-i18n>Context</span>
+                            <div class="text-action-buttons-top">
+                                <div class="textarea-counter" id="context_count_display">
+                                    <span id="context_count_value">0</span> <span data-i18n>chars</span>
+                                </div>
+                                <button type="button" id="clear_context_btn" class="text-action-btn" data-i18n>Clear</button>
+                                <button type="button" id="paste_context_btn" class="text-action-btn" data-i18n>Paste</button>
+                                <button type="button" id="context_history_btn" class="text-action-btn" data-i18n>History</button>
+                            </div>
+                        </div>
+                        <div class="textarea-wrapper chat-prompt-textarea-block">
+                            <div class="semantic-search-input-wrapper chat-prompt-history-wrapper">
+                                <textarea id="context_text"></textarea>
+                                <ul id="context_history_dropdown" class="semantic-search-history-dropdown"></ul>
+                            </div>
+                        </div>
+                    </div>
+                    <div class="chat-prompt-panel attribution-target-panel">
+                        <div class="input-header">
+                            <span data-i18n>Target prediction</span>
+                            <div class="text-action-buttons-top">
+                                <div class="textarea-counter" id="target_count_display">
+                                    <span id="target_count_value">0</span> <span data-i18n>chars</span>
+                                </div>
+                                <button type="button" id="clear_target_btn" class="text-action-btn" data-i18n>Clear</button>
+                                <button type="button" id="paste_target_btn" class="text-action-btn" data-i18n>Paste</button>
+                                <button type="button" id="target_history_btn" class="text-action-btn" data-i18n>History</button>
+                            </div>
+                        </div>
+                        <div class="textarea-wrapper chat-prompt-textarea-block">
+                            <div class="semantic-search-input-wrapper chat-prompt-history-wrapper">
+                                <textarea id="target_text"></textarea>
+                                <ul id="target_history_dropdown" class="semantic-search-history-dropdown"></ul>
+                            </div>
+                        </div>
+                    </div>
+                    <div class="textarea-wrapper chat-prompt-actions-row">
+                        <div class="semantic-submode-row chat-completion-options-row attribution-model-variant-row">
+                            <span class="semantic-submode-group">
+                                <label class="semantic-submode-label" for="attribution_model_variant" data-i18n>Model</label>
+                                <select id="attribution_model_variant" class="semantic-submode-select" aria-label="Attribution model slot" data-i18n="aria-label">
+                                    <option value="base">base</option>
+                                    <option value="instruct">instruct</option>
+                                </select>
+                            </span>
+                        </div>
+                        <div class="button-group">
+                            <div class="button-left">
+                                <button type="button" id="analyze_btn" class="primary-btn inactive" disabled data-i18n>Analyze attribution</button>
+                                <div class="loadersmall loader-small-container"></div>
+                            </div>
+                            <div id="attribution_result_info" class="text-metrics is-hidden"></div>
+                            <div class="button-right">
+                                <button type="button" id="force_retry_btn" class="primary-btn inactive" disabled title="Fetch again without using cached result" data-i18n="text,title">Force retry</button>
+                            </div>
+                        </div>
+                    </div>
+                    <div class="semantic-submode-row attribution-max-score-row">
+                        <span class="semantic-submode-group">
+                            <label class="attribution-use-mapping-label">
+                                <input type="checkbox" id="attribution_use_mapping">
+                                <span></span>
+                            </label>
+                        </span>
+                        <span class="semantic-submode-group attribution-max-score-slider-group">
+                            <label class="semantic-submode-label" for="attribution_max_score_range" data-i18n>Max score</label>
+                            <input type="range" id="attribution_max_score_range" class="attribution-max-score-range"
+                                min="0.01" max="1" step="0.01" value="1"
+                                title="For threshold x∈(0,1]: map normalized scores in [0,x] linearly to display intensities [0,1]; scores above x saturate at maximum intensity. At x=1, equivalent to disabling mapping."
+                                data-i18n="title"
+                                disabled>
+                            <span id="attribution_max_score_value" class="attribution-max-score-value" aria-live="polite">1.00</span>
+                        </span>
+                    </div>
+                    <div class="attribution-exclude-prompt-patterns-row">
+                        <div class="semantic-submode-row attribution-exclude-prompt-patterns-header">
+                            <span class="semantic-submode-group">
+                                <label class="attribution-use-mapping-label"
+                                    title="When enabled, each line below is a regex with the global flag, matched only within the context field below. If a token offset lies fully inside a match, its score is treated as 0."
+                                    data-i18n="title">
+                                    <input type="checkbox" id="attribution_exclude_prompt_patterns_enable" checked>
+                                    <span></span>
+                                </label>
+                            </span>
+                            <span class="semantic-submode-group">
+                                <label class="semantic-submode-label" for="attribution_exclude_prompt_patterns" data-i18n>Exclude prompt patterns</label>
+                            </span>
+                        </div>
+                        <textarea id="attribution_exclude_prompt_patterns" class="attribution-exclude-prompt-patterns-input" rows="2"
+                            placeholder="One regex per line (context only)"
+                            spellcheck="false"
+                            autocomplete="off"
+                            title="One regex per line (global flag), matched only within the context text; if a token offset lies fully inside a match, its score is treated as 0."
+                            data-i18n="placeholder,title"></textarea>
+                    </div>
+                </section>
+            </div>
+        </section>
+        <div class="resizer" id="resizer"></div>
+        <section class="right_panel">
+            <div id="results" class="attribution-inspector-surface">
+                <div id="major_tooltip" class="tooltip">
+                    <div class="currentToken"></div>
+                    <div class="myDetail"></div>
+                    <br />
+                    <div class="predictions predictions-table"></div>
+                </div>
+            </div>
+        </section>
+    </main>
+    <div id="toast" class="toast"></div>
+    <script src="vendor.js"></script>
+    <script src="attribution.js"></script>
+</body>
+</html>

client/src/chat.html ADDED Viewed

	@@ -0,0 +1,171 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <title></title>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <link rel="stylesheet" type="text/css" href="chat.css">
+</head>
+<body>
+    <main class="main_frame">
+        <section class="left_panel">
+            <div class="floating_content">
+                <header class="app-page-toolbar app-page-toolbar--bleed">
+                    <h1 class="page-toolbar-title"><span class="title-main-line"><span data-page-title data-i18n></span><span class="title-tagline" data-page-subtitle data-i18n></span></span></h1>
+                    <div class="app-page-toolbar-actions">
+                        <a href="index.html" class="home-link" title="InfoLens Home" data-i18n="text,title">InfoLens Home</a>
+                        <div class="settings-menu-wrapper">
+                            <button id="settings_btn" class="settings-btn" title="Settings" data-i18n="title">
+                                <span class="settings-icon">⚙️</span>
+                            </button>
+                            <div id="settings_menu" class="settings-menu" style="display: none;">
+                                <!-- INCLUDE partials/settings-menu-common-mid.html -->
+                                <!-- INCLUDE partials/settings-menu-trailing-admin.html -->
+                            </div>
+                        </div>
+                    </div>
+                </header>
+                <div class="chat-cached-history-bar">
+                    <div class="semantic-search-input-wrapper chat-prompt-history-wrapper">
+                        <button type="button" id="chat_cached_history_btn" class="text-action-btn" data-i18n>Cached history</button>
+                        <ul id="chat_cached_history_dropdown" class="semantic-search-history-dropdown"></ul>
+                    </div>
+                </div>
+                <section class="input-section">
+                    <div class="semantic-submode-row chat-raw-prompt-mode-row">
+                        <span class="semantic-submode-group">
+                            <label for="chat_skip_chat_template">
+                                <input type="checkbox" id="chat_skip_chat_template" />
+                                <span data-i18n>Raw prompt mode</span>
+                            </label>
+                        </span>
+                    </div>
+                    <div id="raw_input_panel" class="chat-prompt-panel">
+                        <div class="input-header">
+                            <span><span class="demo" data-i18n>Raw prompt</span></span>
+                            <div class="text-action-buttons-top">
+                                <div class="textarea-counter" id="text_count_display">
+                                    <span id="text_count_value">0</span> <span data-i18n>chars</span>
+                                </div>
+                                <button type="button" id="clear_text_btn" class="text-action-btn">Clear</button>
+                                <button type="button" id="paste_text_btn" class="text-action-btn">Paste</button>
+                                <button type="button" id="chat_raw_input_history_btn" class="text-action-btn" data-i18n>History</button>
+                            </div>
+                        </div>
+                        <div class="textarea-wrapper chat-prompt-textarea-block">
+                            <div class="semantic-search-input-wrapper chat-prompt-history-wrapper">
+                                <textarea id="test_text"></textarea>
+                                <ul id="chat_raw_input_history_dropdown" class="semantic-search-history-dropdown"></ul>
+                            </div>
+                        </div>
+                    </div>
+                    <div id="chat_input_panel" hidden>
+                        <div class="chat-prompt-panel" id="chat_system_prompt_panel">
+                            <div class="input-header">
+                                <label class="chat-use-system-label">
+                                    <input type="checkbox" id="chat_use_system_prompt" checked />
+                                    <span class="demo" data-i18n>System</span>
+                                </label>
+                                <div class="text-action-buttons-top">
+                                    <div class="textarea-counter" id="chat_system_text_count_display">
+                                        <span id="chat_system_text_count_value">0</span> <span data-i18n>chars</span>
+                                    </div>
+                                    <button type="button" id="chat_system_clear_text_btn" class="text-action-btn">Clear</button>
+                                    <button type="button" id="chat_system_paste_text_btn" class="text-action-btn">Paste</button>
+                                    <button type="button" id="chat_system_prompt_history_btn" class="text-action-btn">History</button>
+                                </div>
+                            </div>
+                            <div class="textarea-wrapper chat-prompt-textarea-block">
+                                <div class="semantic-search-input-wrapper chat-prompt-history-wrapper">
+                                    <textarea id="chat_system_text">You are a helpful assistant.</textarea>
+                                    <ul id="chat_system_prompt_history_dropdown" class="semantic-search-history-dropdown"></ul>
+                                </div>
+                            </div>
+                        </div>
+                        <div class="chat-prompt-panel">
+                            <div class="input-header">
+                                <span><span class="demo" data-i18n>User</span></span>
+                                <div class="text-action-buttons-top">
+                                    <div class="textarea-counter" id="chat_user_text_count_display">
+                                        <span id="chat_user_text_count_value">0</span> <span data-i18n>chars</span>
+                                    </div>
+                                    <button type="button" id="chat_user_clear_text_btn" class="text-action-btn">Clear</button>
+                                    <button type="button" id="chat_user_paste_text_btn" class="text-action-btn">Paste</button>
+                                    <button type="button" id="chat_user_prompt_history_btn" class="text-action-btn">History</button>
+                                </div>
+                            </div>
+                            <div class="textarea-wrapper chat-prompt-textarea-block">
+                                <div class="semantic-search-input-wrapper chat-prompt-history-wrapper">
+                                    <textarea id="chat_user_text"></textarea>
+                                    <ul id="chat_user_prompt_history_dropdown" class="semantic-search-history-dropdown"></ul>
+                                </div>
+                            </div>
+                        </div>
+                    </div>
+                    <div class="textarea-wrapper chat-prompt-actions-row">
+                        <div class="semantic-submode-row chat-completion-options-row">
+                            <span class="semantic-submode-group">
+                                <label class="chat-max-new-tokens-label" for="chat_max_new_tokens">
+                                    <span class="semantic-submode-label" data-i18n>Max new tokens:</span>
+                                    <input type="text" id="chat_max_new_tokens" class="semantic-threshold-input chat-max-new-tokens-input" inputmode="numeric" autocomplete="off" />
+                                </label>
+                            </span>
+                        </div>
+                        <div class="button-group">
+                            <div class="button-left">
+                                <button type="button" id="submit_text_btn" class="primary-btn inactive" disabled data-i18n>Ask</button>
+                                <div class="generation-status-slot loader-small-container">
+                                    <div class="loadersmall"></div>
+                                    <span id="chat_complete_reason" class="generation-end-reason"></span>
+                                </div>
+                                <span id="analyze_progress" class="analyze-progress"></span>
+                            </div>
+                            <div id="text_metrics" class="text-metrics text-metrics-chat">
+                                <div id="metric_usage" class="text-metrics-secondary"></div>
+                                <div id="metric_model" class="text-metrics-secondary">model: </div>
+                            </div>
+                            <div class="button-right">
+                                <button type="button" id="force_retry_btn" class="primary-btn inactive" disabled title="Fetch again without using cached result" data-i18n="text,title">Force retry</button>
+                            </div>
+                        </div>
+                    </div>
+                </section>
+            </div>
+        </section>
+        <div class="resizer" id="resizer"></div>
+        <section class="right_panel">
+            <div class="chat-right-stack">
+                <div id="chat_prompt_used" class="chat-prompt-used truncated-text" hidden></div>
+                <div id="chat_streaming_preview" class="chat-streaming-preview" hidden></div>
+                <div id="results">
+                <div id="major_tooltip" class="tooltip">
+                    <div class="currentToken"></div>
+                    <div class="myDetail"></div>
+                    <br />
+                    <div class="predictions predictions-table"></div>
+                </div>
+                </div>
+                <div class="chat-copy-fulltext-row">
+                    <button type="button" id="chat_copy_fulltext_btn" class="text-action-btn">Copy full text</button>
+                </div>
+            </div>
+        </section>
+    </main>
+    <div id="toast" class="toast"></div>
+    <!-- INCLUDE partials/attribution-sidebar.html -->
+    <script src="vendor.js"></script>
+    <script src="chat.js"></script>
+</body>
+</html>

client/src/compare.html ADDED Viewed

	@@ -0,0 +1,69 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <title>Info Highlight / Compare</title>
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <link rel="stylesheet" type="text/css" href="compare.css">
+    <!--<link rel="stylesheet" type="text/css" href="vendor.css">-->
+</head>
+<body>
+    <div class="main_frame">
+        <!-- Grid 容器包裹工具栏和内容区 -->
+        <div class="compare-wrapper">
+            <!-- 顶部工具栏 -->
+            <div class="app-page-toolbar">
+                <h1 class="compare-toolbar-title">
+                    <span class="compare-toolbar-title-app" data-i18n>Info Highlight</span>
+                    <span class="compare-toolbar-title-sep" aria-hidden="true">/</span>
+                    <span class="compare-toolbar-title-page" data-i18n>Compare</span>
+                </h1>
+                <div class="app-page-toolbar-actions">
+                    <label style="display: flex; align-items: center; gap: 5px; cursor: pointer;">
+                        <input type="checkbox" id="show_text_render_toggle">
+                        <span data-i18n>Show Text Rendering</span>
+                    </label>
+                    <label style="display: flex; align-items: center; gap: 5px; cursor: pointer;">
+                        <input type="checkbox" id="model_diff_mode_toggle">
+                        <span data-i18n>Diff Mode</span>
+                    </label>
+                    <button id="edit_mode_toggle" data-i18n>Edit</button>
+                    <button id="clear_demos_btn" data-i18n>Clear</button>
+                    <button id="add_demos_btn" data-i18n>Add</button>
+                </div>
+            </div>
+            <!-- 对比结果展示区 -->
+            <div id="compare-container" class="compare-container">
+                <!-- 空状态提示（自动显示/隐藏） -->
+                <div class="compare-empty-state">
+                    <div class="empty-icon">📊</div>
+                    <div class="empty-title" data-i18n>No comparison data</div>
+                </div>
+                <!-- Demo 列将通过 JavaScript 动态创建 -->
+            </div>
+        </div>
+    </div>
+    <!-- Toast通知容器 -->
+    <div id="toast" class="toast"></div>
+    <!-- 全局tooltip容器（用于对比模式） -->
+    <div id="global_tooltip" class="tooltip">
+        <div class="currentToken"></div>
+        <div class="myDetail"></div>
+        <br />
+        <div class="predictions predictions-table"></div>
+    </div>
+    <script src="vendor.js"></script>
+    <script src="compare.js"></script>
+</body>
+</html>

client/src/content/home.en.html ADDED Viewed

	@@ -0,0 +1,91 @@

+<!-- 简介 / Hero（始终可见） -->
+<div class="intro-brief" style="--intro-rgb: 255, 71, 64">
+    <span class="intro-token" style="--a:0.56">Want</span><span class="intro-token" style="--a:0.53"> key</span><span class="intro-token" style="--a:0.29"> points</span><span class="intro-token" style="--a:0.31"> at</span><span class="intro-token" style="--a:0.09"> a</span><span class="intro-token" style="--a:0.00"> glance</span><span class="intro-token" style="--a:0.04">?</span><span class="intro-token" style="--a:0.26"> Or</span><span class="intro-token" style="--a:0.31"> simply</span><span class="intro-token" style="--a:0.29"> curious</span><span class="intro-token" style="--a:0.03"> about</span><span class="intro-token" style="--a:0.08"> the</span><span class="intro-token" style="--a:0.33"> information</span><span class="intro-token" style="--a:0.68">-the</span><span class="intro-token" style="--a:0.02">oret</span><span class="intro-token" style="--a:0.00">ic</span><span class="intro-token" style="--a:0.29"> nature</span><span class="intro-token" style="--a:0.00"> of</span><span class="intro-token" style="--a:0.31"> language</span><span class="intro-token" style="--a:0.19">?</span><br><br><span class="intro-token" style="--a:0.32">Try</span><span class="intro-token" style="--a:0.47"> Info</span><span class="intro-token" style="--a:0.70"> Highlight</span><span class="intro-token" style="--a:0.17">.</span><span class="intro-token" style="--a:0.06"> It</span><span class="intro-token" style="--a:0.25"> uses</span><span class="intro-token" style="--a:0.34"> large</span><span class="intro-token" style="--a:0.02"> language</span><span class="intro-token" style="--a:0.00"> models</span><span class="intro-token" style="--a:0.02"> to</span><span class="intro-token" style="--a:0.23"> analyze</span><span class="intro-token" style="--a:0.14"> text</span><span class="intro-token" style="--a:0.37"> information</span><span class="intro-token" style="--a:0.19"> density</span><span class="intro-token" style="--a:0.05"> and</span><span class="intro-token" style="--a:0.34"> visual</span><span class="intro-token" style="--a:0.01">izes</span><span class="intro-token" style="--a:0.39"> where</span><span class="intro-token" style="--a:0.08"> the</span><span class="intro-token" style="--a:0.26"> important</span><span class="intro-token" style="--a:0.13"> parts</span><span class="intro-token" style="--a:0.05"> are</span><span class="intro-token" style="--a:0.08">.</span><br><br><span class="intro-token" style="--a:0.17">The</span><span class="intro-token" style="--a:0.40"> color</span><span class="intro-token" style="--a:0.17"> intensity</span><span class="intro-token" style="--a:0.07"> of</span><span class="intro-token" style="--a:0.06"> each</span><span class="intro-token" style="--a:0.27"> token</span><span class="intro-token" style="--a:0.10"> indicates</span><span class="intro-token" style="--a:0.07"> how</span><span class="intro-token" style="--a:0.04"> much</span><span class="intro-token" style="--a:0.03"> information</span><span class="intro-token" style="--a:0.03"> it</span><span class="intro-token" style="--a:0.09"> carries</span><span class="intro-token" style="--a:0.04">.</span><span class="intro-token" style="--a:0.39"> Try</span><span class="intro-token" style="--a:0.04"> it</span><span class="intro-token" style="--a:0.12"> yourself</span><span class="intro-token" style="--a:0.21">!</span>
+</div>
+<!-- 了解更多（默认折叠） -->
+<details class="intro-more">
+    <summary>
+        <span class="intro-summary-when-closed">Learn more</span>
+        <span class="intro-summary-when-open">Hide</span>
+    </summary>
+    <!-- 原理直觉 -->
+    <div class="intro-block">
+        <h4>Intuitive Understanding of Information</h4>
+        <p>From a linguistic perspective, information represents the novelty/surprise/importance of a word. Words that
+            are harder to predict from context typically carry more information. A simple example: "This morning I opened the door and saw a 'UFO'."
+            vs "This morning I opened the door and saw a 'cat'." — clearly "UFO" carries more information.</p>
+    </div>
+    <!-- 技术定义 -->
+    <div class="intro-block intro-technical">
+        <h4>Information-Theoretic Perspective</h4>
+        <p>In our implementation, the information content of each token comes from how difficult it is for the LLM to
+            predict that token from left to right.</p>
+        <p>
+            From an information-theoretic perspective, this can be expressed as the conditional information of a token
+            given the model and the preceding context:
+        </p>
+        <pre>
+            Information of tokenᵢ in a text = -log₂P(tokenᵢ | model, token₀, …, tokenᵢ₋₁)
+        </pre>
+        <p>The core assumption behind Info Highlight is that this conditional information aligns with human subjective
+            perception, such as novelty, surprise, and potential importance.
+        </p>
+    </div>
+    <!-- 误差与局限 -->
+    <div class="intro-block">
+        <h4>Ideal vs Reality</h4>
+        <p>
+            For an ideal model, whose knowledge and contextual understanding match that of the reader, the evaluation
+            would perfectly align with human subjective perception.
+        </p>
+        <p>Therefore, the gap between current results and reader perception mainly comes from two aspects:</p>
+        <ul>
+            <li><strong>Model capability vs human reader:</strong> The model's understanding and knowledge may be generally less than,
+                or possibly exceed, the reader's. Imagine comparing a state-of-the-art LLM with a ten-year-old reader.</li>
+            <li><strong>Model context vs human reader:</strong> The model only has the text read so far as context, much less
+                than the reader's. Info Highlight uses base models without instruction tuning or prompts (which actually
+                gives the best results).</li>
+        </ul>
+        <p>The good news is that LLMs are improving so fast: current analysis results already reflect mainstream
+            readers' subjective perception to some extent, and can be used to evaluate article information content and
+            improve reading speed.</p>
+    </div>
+    <!-- Tribute -->
+    <div class="intro-block">
+        <h4>Tribute</h4>
+        <p>Built on the classic project <a href="http://gltr.io" target="_blank" rel="noopener">GLTR.io</a>,
+            developed by Hendrik Strobelt et al. in 2019. GLTR was a web demo that pioneered using GPT-2 prediction
+            probabilities to detect generated text.</p>
+        <p>However, Info Highlight is not meant to detect AI text, but to evaluate the "information quality" of text.</p>
+    </div>
+    <!-- FAQ -->
+    <div class="intro-block intro-faq">
+        <h4>FAQ</h4>
+        <p><strong>Is it an AI text detector?</strong></p>
+        <p>No.</p>
+        <p>When we dislike AI text, we actually dislike low-quality text. We dislike low-quality human-written text,
+            rather than high-quality AI-generated content. So the key is the "information quality" of the text.
+            Info Highlight aims to detect "information quality" rather than "AI signs", though it can be used to detect
+            AI-generated nonsense with no information content.</p>
+        <p><strong>What LLM is currently used?</strong></p>
+        <p>Currently the open-source <strong>Qwen3-0.6B/1.7B/4B/14B-Base</strong> is used. Among them, the 4B model gives
+            results quite close to most people's subjective perception among the models the author has tested (note that
+            larger model does not necessarily lead to more consistency with the reader's subjective perception). When
+            hardware is limited, 0.6B/1.7B models are used; they perform slightly worse than 4B (information
+            content difference is within ~15%), but the trend is similar.</p>
+        <p><strong>Why does information content affect text quality?</strong></p>
+        <p>Low information content means the LLM can easily predict it from context. If even a machine can predict it,
+            how important can it be? Conversely, high information content means the LLM has difficulty predicting it
+            from context. (Assuming it's not a mistake) Then it represents key information the author wants to convey
+            that the machine doesn't know.</p>
+    </div>
+</details>

client/src/content/home.zh.html ADDED Viewed

	@@ -0,0 +1,68 @@

+<!-- 简介 / Hero（始终可见） -->
+<div class="intro-brief" style="--intro-rgb: 255, 71, 64">
+    <span class="intro-token" style="--a:0.63">想</span><span class="intro-token" style="--a:0.58">一眼</span><span class="intro-token" style="--a:0.43">找到</span><span class="intro-token" style="--a:0.52">文章</span><span class="intro-token" style="--a:0.35">的关键</span><span class="intro-token" style="--a:0.13">点</span><span class="intro-token" style="--a:0.31">？</span><span class="intro-token" style="--a:0.29">或者</span><span class="intro-token" style="--a:0.27">只是</span><span class="intro-token" style="--a:0.37">好奇</span><span class="intro-token" style="--a:0.38">文字</span><span class="intro-token" style="--a:0.48">的信息</span><span class="intro-token" style="--a:0.56">论</span><span class="intro-token" style="--a:0.48">奥</span><span class="intro-token" style="--a:0.03">秘</span><span class="intro-token" style="--a:0.18">？</span><br><br><span class="intro-token" style="--a:0.47">试试</span><span class="intro-token" style="--a:0.38">Info</span><span class="intro-token" style="--a:0.70"> Highlight</span><span class="intro-token" style="--a:0.36">.</span><span class="intro-token" style="--a:0.16"> 它</span><span class="intro-token" style="--a:0.00">它</span><span class="intro-token" style="--a:0.27">用</span><span class="intro-token" style="--a:0.29">大</span><span class="intro-token" style="--a:0.13">语言</span><span class="intro-token" style="--a:0.00">模型</span><span class="intro-token" style="--a:0.18">分析</span><span class="intro-token" style="--a:0.15">文本</span><span class="intro-token" style="--a:0.34">的信息</span><span class="intro-token" style="--a:0.09">密度</span><span class="intro-token" style="--a:0.03">，</span><span class="intro-token" style="--a:0.48">可视化</span><span class="intro-token" style="--a:0.19">展示</span><span class="intro-token" style="--a:0.49">哪里</span><span class="intro-token" style="--a:0.30">更重要</span><span class="intro-token" style="--a:0.11">。</span><br><br><span class="intro-token" style="--a:0.41">每个</span><span class="intro-token" style="--a:0.21">字</span><span class="intro-token" style="--a:0.41">的颜色</span><span class="intro-token" style="--a:0.18">深</span><span class="intro-token" style="--a:0.00">浅</span><span class="intro-token" style="--a:0.11">，</span><span class="intro-token" style="--a:0.18">表示</span><span class="intro-token" style="--a:0.11">它</span><span class="intro-token" style="--a:0.31">承载</span><span class="intro-token" style="--a:0.02">的信息</span><span class="intro-token" style="--a:0.03">量</span><span class="intro-token" style="--a:0.11">大小</span><span class="intro-token" style="--a:0.05">。</span><span class="intro-token" style="--a:0.49">自己</span><span class="intro-token" style="--a:0.28">试试</span><span class="intro-token" style="--a:0.06">吧</span><span class="intro-token" style="--a:0.11">！</span>
+</div>
+<!-- 了解更多（默认折叠） -->
+<details class="intro-more">
+    <summary>
+        <span class="intro-summary-when-closed">了解更多</span>
+        <span class="intro-summary-when-open">收起</span>
+    </summary>
+    <!-- 原理直觉 -->
+    <div class="intro-block">
+        <h4>信息量的直观理解</h4>
+        <p>从语言学角度看，信息量代表一个词所包含的新意/意外性/关键程度。越难从上下文中预测出来的词，通常携带的信息就越多。一个简单的例子："今天早上我打开门看见了一只'飞碟'" 和
+            "今天早上我打开门看见了一只'猫'"：在这里显然"飞碟"的信息量更大。</p>
+    </div>
+    <!-- 技术定义 -->
+    <div class="intro-block intro-technical">
+        <h4>信息论视角</h4>
+        <p>在工程实现中，每个 token 的信息量，来自大模型从左到右预测当前 token 的难度。</p>
+        <p>从信息论角度，它可以表示为当前 token 相对于大模型和已读上下文的条件信息量：</p>
+        <pre>
+            一段文本中的 tokenᵢ 的信息量 = -log₂P(tokenᵢ | model, token₀, …, tokenᵢ₋₁)
+        </pre>
+        <p>Info Highlight 的核心假设就是，这个条件信息量的大小和人类的主观感受（新意/意外性/潜在关键程度）是一致的。</p>
+    </div>
+    <!-- 误差与局限 -->
+    <div class="intro-block">
+        <h4>理想与现实</h4>
+        <p>对于一个想象中的理想模型（它的包含了上下文的知识量和阅读者一致），那么它评估出的结果应该和阅读者的主观感受是完全一致的。</p>
+        <p>所以，目前的实际结果和阅读者主观感受之间的差距，主要来自两个方面：</p>
+        <ul>
+            <li><strong>模型能力和阅读者的差异：</strong>模型的理解能力和知识量很可能不如阅读者，也有小可能性过剩，想象一下目前的SOTA大模型和一个十岁孩子阅读者相��。</li>
+            <li><strong>模型上下文和阅读者的差异：</strong>模型只有文章已读部分作为上下文，远小于阅读者。Info Highlight 使用没有 instruct 微调的 base 模型，也没有任何提示词（其实这样效果已经是最好了）。
+            </li>
+        </ul>
+        <p>好消息是，大模型进步实在太快了：目前的分析结果已经在一定程度上反映了主流阅读者的主观感受，可以用来评估文章的信息含量，还可以提高阅读速度。</p>
+    </div>
+    <!-- 致谢 -->
+    <div class="intro-block">
+        <h4>致谢</h4>
+        <p>基于 2019 年 Hendrik Strobelt 等人开发的经典项目 <a href="http://gltr.io" target="_blank" rel="noopener">GLTR.io</a>。GLTR 是一个网页演示，率先用 GPT-2 的预测概率来检测生成文本。</p>
+        <p>不过 Info Highlight 的目标不是检测 AI 文本，而是评估文本的“信息质量”。</p>
+    </div>
+    <!-- FAQ -->
+    <div class="intro-block intro-faq">
+        <h4>常见问题</h4>
+        <p><strong>它是 AI 文本检测器吗？</strong></p>
+        <p>不是。</p>
+        <p>当我们反感AI文本时，我们其实是反感低质量的文本。我们更反感低质量的真人写的文本，而不是AI生成的高质量内容。所以，关键是文本的"信息质量"。Info Highlight 的目标是检测"信息质量"而不是“AI痕迹”，虽然它可以用来检测没有信息量的AI胡编文本。
+        </p>
+        <p><strong>目前使用的是什么大模型？</strong></p>
+        <p>当前使用的是开源的 <strong>Qwen3-0.6B/1.7B/4B/14B-Base</strong>，其中4B模型是作者测试过的模型里结果挺接近大部分人主观感受的一个（注意并不一定是模型越大越符合阅读者的主观感受）。
+        当硬件配置限制时，会用0.6B/1.7B模型，它们效果比4B稍差（信息量评估结果差异约15%以内），但趋势是类似的。</p>
+        <p><strong>说到底，为什么信息量会影响文本的质量？</strong></p>
+        <p>一个词的信息量低，意味着大模型能很容易从上文预测出来。既然机器都能预测出来，那它还能有多关键呢？反之，一个词的信息量高，意味着大模型很难从上文预测出来。（如果不是错误表达的话）那它就代表了作者想要表达，而机器不知道的关键信息。
+        </p>
+    </div>
+</details>

client/src/content/images/attribute-dark.png ADDED Viewed

Git LFS Details

SHA256: 5bfb1953f9e589e42663a72254d9cd6b461852596524928f630a88bb97746c4d
Pointer size: 130 Bytes
Size of remote file: 88 kB

client/src/content/images/attribute.png ADDED Viewed

Git LFS Details

SHA256: fc87ac09ae4aed3732f6abf90b92a439c553e953530bcc7d0f4793522f8eff55
Pointer size: 130 Bytes
Size of remote file: 91.8 kB