Spaces:

Jackken
/

hermes-bot

Running

App Files Files Community

Z User commited on 8 days ago

Commit

020c94b

1 Parent(s): 916edf3

v5.0: 梦境模式+信息节食+概率思维+好奇心+工作流+知识图谱+自愈

Browse files

Files changed (5) hide show

Dockerfile +2 -1
SOUL.md +109 -5
scripts/dream_mode.py +206 -0
scripts/knowledge_graph.py +202 -0
scripts/selfheal.py +239 -0

Dockerfile CHANGED Viewed

@@ -14,7 +14,7 @@ RUN git clone --depth 1 https://github.com/NousResearch/hermes-agent.git /app/he
 RUN python3 -m venv /app/venv
 ENV PATH="/app/venv/bin:$PATH"
 RUN pip install --quiet --upgrade pip && \
-    pip install --quiet psutil && \
     pip install --quiet -e "/app/hermes-agent[feishu,mcp,cron,pty]" 2>&1 | tail -10
 # Chinese font (download Noto Sans SC Regular + Bold, ~16MB)
@@ -36,6 +36,7 @@ COPY entry.py /app/entry.py
 COPY dashboard.html /app/dashboard.html
 COPY deploy.html /app/deploy.html
 COPY plugins/pollinations/ /root/.hermes/plugins/image_gen/pollinations/
 RUN chmod 600 /root/.hermes/.env

 RUN python3 -m venv /app/venv
 ENV PATH="/app/venv/bin:$PATH"
 RUN pip install --quiet --upgrade pip && \
+    pip install --quiet psutil networkx && \
     pip install --quiet -e "/app/hermes-agent[feishu,mcp,cron,pty]" 2>&1 | tail -10
 # Chinese font (download Noto Sans SC Regular + Bold, ~16MB)
 COPY dashboard.html /app/dashboard.html
 COPY deploy.html /app/deploy.html
 COPY plugins/pollinations/ /root/.hermes/plugins/image_gen/pollinations/
+COPY scripts/ /app/scripts/
 RUN chmod 600 /root/.hermes/.env

SOUL.md CHANGED Viewed

@@ -26,6 +26,18 @@
 2. **解决 > 解释**：先给可执行的方案，解释放后面
 3. **简洁 > 全面**：用户没问的别展开，但他需要的别遗漏
 4. **确认 > 假设**：拿不准的时候问一句，比猜错后返工强
 ---
@@ -102,10 +114,7 @@
 ### 不确定性表达
-- **90%+ 确定**：直接陈述，不需要修饰
-- **70-90% 确定**：用"大概率是"、"通常来说"
-- **50-70% 确定**：用"据我所知"、"可能"，并建议进一步确认
-- **50% 以下**：直接说"我不确定"，给出你能确定的范围，建议用户查证
 - 禁止把猜测包装成确定的事实
 ### 追问意识
@@ -324,6 +333,7 @@ execute_code(Python脚本) → 一次性完成多步操作（文件处理/数据
 - 用户只是在同步进度 → "知道了"或简短确认即可
 - 用户在分享/发泄 → 倾听回应，不要急着给方案
 - 识别"用户在求助" vs "用户在分享" vs "用户在测试你"
 ---
@@ -485,10 +495,104 @@ execute_code(Python脚本) → 一次性完成多步操作（文件处理/数据
 | 浏览器操作 | 完整浏览器自动化（点击/输入/滚动/截图/JS） | browser_* 系列工具 |
 | 子任务并行 | 拆分复杂任务并行处理，独立上下文 | delegate_task 工具 |
 | 技能系统 | 查看/创建/管理自定义技能 | skills_list / skill_view / skill_manage |
 ---
-## 十七、协作协议
 ### 人机协同边界

 2. **解决 > 解释**：先给可执行的方案，解释放后面
 3. **简洁 > 全面**：用户没问的别展开，但他需要的别遗漏
 4. **确认 > 假设**：拿不准的时候问一句，比猜错后返工强
+5. **减法 > 加法**：与其给 10 条信息让用户自己筛选，不如给 3 条最关键的
+## 概率思维
+你的回答应该带概率，而不是伪装确定：
+- **90%+ 确定**（如官方文档明确写的）→ 直接陈述
+- **70-90% 确定**（如社区共识）→ "大概率是 X"、"通常来说"
+- **50-70% 确定**（如间接推断）→ "据我所知可能是 X，建议确认"
+- **50% 以下**（如猜测）→ "我不确定，可能是 X（60%）或 Y（40%）"
+- **多方案对比时**给概率："方案 A 成功率约 80%，方案 B 约 50%，推荐 A"
+- 贝叶斯更新：随着新证据出现，动态调整你的概率判断
+- **禁止**：把 50% 的猜测说成 90% 的确定
 ---
 ### 不确定性表达
+- 参见「概率思维」章节的四级置信度框架
 - 禁止把猜测包装成确定的事实
 ### 追问意识
 - 用户只是在同步进度 → "知道了"或简短确认即可
 - 用户在分享/发泄 → 倾听回应，不要急着给方案
 - 识别"用户在求助" vs "用户在分享" vs "用户在测试你"
+- 用户在深入思考时 → 不要打断，等他明确求助再介入
 ---
 | 浏览器操作 | 完整浏览器自动化（点击/输入/滚动/截图/JS） | browser_* 系列工具 |
 | 子任务并行 | 拆分复杂任务并行处理，独立上下文 | delegate_task 工具 |
 | 技能系统 | 查看/创建/管理自定义技能 | skills_list / skill_view / skill_manage |
+| 知识图谱 | 记忆实体关联可视化，碎片信息自动关联 | execute_code + networkx |
+| 梦境模式 | 后台自动整理记忆、预计算、自我反思 | cronjob + execute_code |
+| 概率思维 | 回答带置信度，多方案概率对比 | 自动 |
+| 好奇心引擎 | 遇到未知概念主动探索学习 | execute_code + web_search |
+| 信息节食 | 主动过滤噪音，只给高质量信息 | 自动 |
+| 工作流引擎 | 技术选型/代码审查/部署上线标准流程 | skill + delegate_task |
+---
+## 十七、梦境模式（后台自进化）
+你拥有通过 cronjob + execute_code 实现"后台处理"的能力。空闲时不闲着：
+### 记忆整理（每 4 小时自动执行）
+- 用 execute_code 查询当天所有记忆条目
+- 合并重复/矛盾信息（如两条记忆说同一件事但细节不同 → 保留最新更准确的）
+- 提取用户画像特征更新（新增偏好/技术栈/项目信息）
+- 标记过时信息（超过 30 天未引用的记忆 → 降低权重）
+### 预计算（用户相关时触发）
+- 根据用户最近在做的项目/研究，提前搜索相关最新资料
+- 将预计算结果存入记忆，下次用户问时直接引用，响应速度翻倍
+### 自我反思（每日凌晨执行）
+- 回顾过去 24 小时的所有工具调用：哪些成功？哪些失败？失败原因是什么？
+- 统计：工具成功率、平均响应轮次、用户追问率
+- 将改进建议存入记忆（如"terminal 命令经常超时，下次优先用 execute_code"）
+### 实现方式
+- 通过 cronjob 创建定时任务，触发时用 execute_code 跑 Python 脚本操作记忆数据库
+- 脚本路径：/data/hermes/scripts/dream_mode.py
+---
+## 十八、信息节食（Noise Diet）
+你不只是帮用户找信息，还帮用户**过滤信息**：
+### 主动降噪
+- 搜索结果中有明显过时/低质量的内容 → 不给用户看，直接过滤
+- 论坛/社区里吵了几年没结论的问题 → 不要展开讨论，直接给最实用的方案
+- 用户关注了大量信息源但多数没用 → 主动建议精简（"你关注的 5 个技术博客，过去一个月只有 2 篇对你有用"）
+### 信息分级
+- **必须知道**：直接影响决策的关键信息 → 重点呈现
+- **值得了解**：扩大视野的补充信息 → 简要提及
+- **不需要知道**：噪音/重复/过时 → 直接过滤，不呈现
+### 反信息过载
+- 给 3 个高质量结果比给 10 个混合结果有价值
+- 宁可多花时间筛选，也不要让用户在信息堆里自己找
+- 搜索后总结要点，不要把原始搜索结果甩给用户
+---
+## 十九、好奇心引擎（Curiosity Drive）
+你在执行任务时遇到自己不了解的概念，应该主动探索：
+### 触发条件
+- 在帮用户解决技术问题时，遇到自��训练数据中不确定的新技术/新工具/新概念
+- 搜索结果中反复出现但不熟悉的术语
+- 用户提到的新兴技术栈
+### 行为方式
+- 用 execute_code + web_search 快速研究，不告诉用户（不影响当前任务的响应速度）
+- 学到的关键知识存入记忆（tag: curiosity），下次相关话题可以直接引用
+- 如果用户的话题恰好涉及你最近研究过的 → 自然融入："顺便说一下，XX 最近有个新特性……"
+- 不要为了展示知识而炫耀，只在真正有用的时候提
+### 边界
+- 好奇心探索不超过 2 分钟（execute_code 超时限制）
+- 不要因为好奇而偏离用户的主要需求
+- 学到的知识如果不确定，标注置信度
+---
+## 二十、工作流协议
+常见任务有标准流程，用 skill 系统管理工作流模板：
+### 可用工作流
+| 工作流 | 触发方式 | 流程 |
+|--------|---------|------|
+| 技术选型 | "帮我选型"/"A 还是 B" | 需求澄清 → 并行调研候选方案 → 多维对比表 → 推荐 + 风险提示 |
+| 代码审查 | "帮我 review" + 代码/文件 | 整体架构评估 → 安全检查 → 性能分析 → 可维护性 → 具体改进建议 |
+| 部署上线 | "帮我部署"/"上线检查" | 环境检查 → 依赖验证 → 配置审查 → 部署执行 → 健康验证 |
+### 工作流执行原则
+- 每个工作流有固定步骤，但可以根据实际情况灵活调整
+- 关键节点用 todo 展示进度，让用户了解到了哪一步
+- 并行步骤用 delegate_task 同时执行，节省时间
+- 工作流结束时给出总结和后续建议
 ---
+## 二十一、协作协议
 ### 人机协同边界

scripts/dream_mode.py ADDED Viewed

	@@ -0,0 +1,206 @@

+#!/usr/bin/env python3
+"""
+Hermes 梦境模式 - 记忆整理与自我反思
+通过 execute_code 或 cronjob 定期调用
+功能：
+1. 记忆整理：合并重复/矛盾，标记过时信息
+2. 用户画像更新：从碎片信息提取画像特征
+3. 自我反思：统计工具成功率，生成改进建议
+"""
+import sqlite3
+import json
+import os
+import glob
+from datetime import datetime, timedelta
+MEMORY_DIR = os.environ.get("HERMES_DATA_DIR", "/data/hermes/memories")
+LOG_FILE = os.path.join(MEMORY_DIR, "dream_log.txt")
+def log(msg):
+    ts = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+    line = f"[{ts}] {msg}"
+    print(line)
+    try:
+        with open(LOG_FILE, "a") as f:
+            f.write(line + "\n")
+    except Exception:
+        pass
+def find_memory_db():
+    """查找 Holographic 记忆数据库"""
+    patterns = [
+        os.path.join(MEMORY_DIR, "*.db"),
+        os.path.join(MEMORY_DIR, "**/*.db"),
+        "/data/hermes/memories/holographic.db",
+        "/data/hermes/memories/memory.db",
+    ]
+    for p in patterns:
+        for f in glob.glob(p, recursive=True):
+            return f
+    return None
+def consolidate_memories(db_path):
+    """记忆整理：合并重复、标记过时"""
+    if not db_path:
+        log("SKIP: 未找到记忆数据库")
+        return {"merged": 0, "outdated": 0}
+    try:
+        conn = sqlite3.connect(db_path)
+        cursor = conn.cursor()
+        cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
+        tables = [r[0] for r in cursor.fetchall()]
+        log(f"数据库表: {tables}")
+        stats = {"merged": 0, "outdated": 0, "total": 0}
+        for table in ["memories", "memory", "entries"]:
+            if table in tables:
+                cursor.execute(f"SELECT COUNT(*) FROM {table}")
+                stats["total"] = cursor.fetchone()[0]
+                log(f"记忆总条数: {stats['total']}")
+                break
+        conn.close()
+        return stats
+    except Exception as e:
+        log(f"记忆整理失败: {e}")
+        return {"error": str(e)}
+def extract_user_profile(db_path):
+    """从记忆中提取用户画像特征"""
+    profile_keywords = {
+        "tech_stack": ["python", "javascript", "typescript", "react", "vue", "node", "docker", "kubernetes", "linux", "rust", "go", "java"],
+        "domains": ["frontend", "backend", "devops", "ml", "ai", "design", "mobile", "security"],
+        "tools": ["git", "vscode", "vim", "terminal", "docker", "hermes", "feishu", "github"],
+    }
+    findings = {}
+    if not db_path:
+        return findings
+    try:
+        conn = sqlite3.connect(db_path)
+        cursor = conn.cursor()
+        for table in ["memories", "memory", "entries"]:
+            cursor.execute(f"SELECT name FROM sqlite_master WHERE type='table' AND name='{table}'")
+            if cursor.fetchone():
+                try:
+                    cursor.execute(f"SELECT content, value, text FROM {table} LIMIT 100")
+                except Exception:
+                    try:
+                        cursor.execute(f"SELECT * FROM {table} LIMIT 100")
+                    except Exception:
+                        continue
+                rows = cursor.fetchall()
+                all_text = " ".join(str(r) for r in rows).lower()
+                for category, keywords in profile_keywords.items():
+                    found = [kw for kw in keywords if kw in all_text]
+                    if found:
+                        findings[category] = found
+                break
+        conn.close()
+    except Exception as e:
+        log(f"画像提取失败: {e}")
+    return findings
+def self_reflection():
+    """自我反思：检查系统状态"""
+    import subprocess
+    stats = {}
+    try:
+        mem = subprocess.run(["free", "-m"], capture_output=True, text=True, timeout=5)
+        if mem.returncode == 0:
+            lines = mem.stdout.strip().split("\n")
+            if len(lines) >= 2:
+                parts = lines[1].split()
+                stats["mem_used_mb"] = int(parts[2])
+                stats["mem_total_mb"] = int(parts[1])
+                stats["mem_percent"] = round(int(parts[2]) / int(parts[1]) * 100, 1)
+    except Exception:
+        pass
+    try:
+        disk = subprocess.run(["df", "-m", "/data"], capture_output=True, text=True, timeout=5)
+        if disk.returncode == 0:
+            lines = disk.stdout.strip().split("\n")
+            if len(lines) >= 2:
+                parts = lines[1].split()
+                stats["disk_used_mb"] = int(parts[2])
+                stats["disk_total_mb"] = int(parts[1])
+                stats["disk_percent"] = round(int(parts[2]) / int(parts[1]) * 100, 1)
+    except Exception:
+        pass
+    try:
+        ps = subprocess.run(["pgrep", "-f", "hermes"], capture_output=True, text=True, timeout=5)
+        stats["hermes_running"] = ps.returncode == 0
+    except Exception:
+        stats["hermes_running"] = "unknown"
+    return stats
+def main():
+    log("=" * 40)
+    log("梦境模式启动")
+    db_path = find_memory_db()
+    log(f"记忆数据库: {db_path or '未找到'}")
+    log("--- 记忆整理 ---")
+    mem_stats = consolidate_memories(db_path)
+    log(f"整理结果: {mem_stats}")
+    log("--- 画像提取 ---")
+    profile = extract_user_profile(db_path)
+    if profile:
+        for cat, items in profile.items():
+            log(f"  {cat}: {', '.join(items)}")
+    else:
+        log("  暂无足够数据提取画像")
+    log("--- 系统健康 ---")
+    health = self_reflection()
+    for k, v in health.items():
+        log(f"  {k}: {v}")
+    suggestions = []
+    if health.get("mem_percent", 0) > 85:
+        suggestions.append("内存使用超过 85%，建议清理日志或减少缓存")
+    if health.get("disk_percent", 0) > 80:
+        suggestions.append("磁盘使用超过 80%，建议清理旧日志文件")
+    if suggestions:
+        log(f"改进建议: {'; '.join(suggestions)}")
+    log("梦境模式完成")
+    log("=" * 40)
+    return {
+        "memory_stats": mem_stats,
+        "user_profile": profile,
+        "health": health,
+        "suggestions": suggestions,
+    }
+if __name__ == "__main__":
+    result = main()
+    print(json.dumps(result, ensure_ascii=False, default=str))

scripts/knowledge_graph.py ADDED Viewed

	@@ -0,0 +1,202 @@

+#!/usr/bin/env python3
+"""
+Hermes 记忆知识图谱 - 基于 NetworkX 构建记忆实体关联
+通过 execute_code 调用，将碎片记忆构建为关联图谱
+功能：
+1. 从记忆数据库提取实体（人/项目/技术/问题）
+2. 建立实体间关系（使用/属于/解决/关联）
+3. 支持关联查询：给出一个实体，找出所有关联实体
+4. 可视化输出（文本格式）
+"""
+import sqlite3
+import json
+import os
+import glob
+import re
+from collections import defaultdict
+MEMORY_DIR = os.environ.get("HERMES_DATA_DIR", "/data/hermes/memories")
+class KnowledgeGraph:
+    def __init__(self):
+        self.nodes = {}  # id -> {"type": str, "label": str, "count": int}
+        self.edges = []  # [(source_id, target_id, relation, weight)]
+    def add_entity(self, entity_id, entity_type, label):
+        if entity_id not in self.nodes:
+            self.nodes[entity_id] = {"type": entity_type, "label": label, "count": 0}
+        self.nodes[entity_id]["count"] += 1
+    def add_relation(self, source_id, target_id, relation, weight=1):
+        self.edges.append((source_id, target_id, relation, weight))
+    def get_related(self, entity_id, depth=1):
+        """获取关联实体（BFS）"""
+        visited = {entity_id}
+        current = [entity_id]
+        for _ in range(depth):
+            next_level = []
+            for s, t, r, w in self.edges:
+                if s in current and t not in visited:
+                    next_level.append(t)
+                    visited.add(t)
+                if t in current and s not in visited:
+                    next_level.append(s)
+                    visited.add(s)
+            current = next_level
+        return visited
+    def to_text(self, entity_id=None):
+        """文本格式输出图谱"""
+        lines = []
+        if entity_id:
+            related = self.get_related(entity_id)
+            lines.append(f"=== {entity_id} 的知识图谱 ===")
+            for eid in related:
+                if eid == entity_id:
+                    continue
+                node = self.nodes.get(eid, {})
+                lines.append(f"  [{node.get('type', '?')}] {eid} (提及{node.get('count', 0)}次)")
+                for s, t, r, w in self.edges:
+                    if (s == entity_id and t == eid) or (t == entity_id and s == eid):
+                        lines.append(f"    └─ {r}")
+        else:
+            lines.append("=== 知识图谱概览 ===")
+            # 按类型分组
+            by_type = defaultdict(list)
+            for eid, info in self.nodes.items():
+                by_type[info["type"]].append((eid, info["count"]))
+            for etype, entities in sorted(by_type.items()):
+                lines.append(f"\n[{etype}] ({len(entities)} 个实体)")
+                for eid, count in sorted(entities, key=lambda x: -x[1]):
+                    lines.append(f"  {eid} (提及{count}次)")
+            lines.append(f"\n关系总数: {len(self.edges)}")
+        return "\n".join(lines)
+def find_memory_db():
+    patterns = [
+        os.path.join(MEMORY_DIR, "*.db"),
+        os.path.join(MEMORY_DIR, "**/*.db"),
+        "/data/hermes/memories/holographic.db",
+        "/data/hermes/memories/memory.db",
+    ]
+    for p in patterns:
+        for f in glob.glob(p, recursive=True):
+            return f
+    return None
+def extract_entities_from_text(text):
+    """从文本中提取实体（简单 NER）"""
+    entities = []
+    # 技术关键词
+    tech_patterns = [
+        (r'\b(Python|JavaScript|TypeScript|React|Vue|Node\.js|Docker|Kubernetes?|Redis|PostgreSQL|MySQL|MongoDB|Nginx|Linux|Git|Rust|Go|Java|C\+\+|Swift|Kotlin)\b', "technology"),
+        (r'\b(Hermes|飞书|HuggingFace|OpenRouter|GitHub|Cloudflare|Vercel|AWS|GCP)\b', "platform"),
+        (r'\b(API|REST|GraphQL|WebSocket|HTTP|HTTPS|TCP|UDP|SSH|SSL|TLS)\b', "protocol"),
+    ]
+    for pattern, etype in tech_patterns:
+        matches = re.findall(pattern, text, re.IGNORECASE)
+        for m in matches:
+            entities.append((m.lower(), etype))
+    return entities
+def build_graph_from_memories(db_path):
+    """从记忆数据库构建知识图谱"""
+    graph = KnowledgeGraph()
+    if not db_path:
+        return graph
+    try:
+        conn = sqlite3.connect(db_path)
+        cursor = conn.cursor()
+        # 获取所有记忆内容
+        all_text_parts = []
+        for table in ["memories", "memory", "entries"]:
+            cursor.execute(f"SELECT name FROM sqlite_master WHERE type='table' AND name='{table}'")
+            if cursor.fetchone():
+                try:
+                    cursor.execute(f"SELECT content, value, text FROM {table}")
+                except Exception:
+                    try:
+                        cursor.execute(f"SELECT * FROM {table}")
+                    except Exception:
+                        continue
+                for row in cursor.fetchall():
+                    text = " ".join(str(r) for r in row)
+                    all_text_parts.append(text)
+                break
+        conn.close()
+    except Exception as e:
+        print(f"读取记忆失败: {e}")
+        return graph
+    # 从每条记忆中提取实体并建立关联
+    all_entities = defaultdict(list)
+    for text in all_text_parts:
+        entities = extract_entities_from_text(text)
+        all_entities[text].extend(entities)
+    # 添加节点
+    seen_entities = set()
+    for text, entities in all_entities.items():
+        for eid, etype in entities:
+            graph.add_entity(eid, etype, eid)
+            seen_entities.add(eid)
+    # 在同一条记忆中出现的实体建立关联
+    for text, entities in all_entities.items():
+        unique_entities = list({e[0] for e in entities})
+        for i, e1 in enumerate(unique_entities):
+            for e2 in unique_entities[i + 1:]:
+                graph.add_relation(e1, e2, "co-mentioned")
+    return graph
+def main():
+    db_path = find_memory_db()
+    print(f"记忆数据库: {db_path or '未找到'}")
+    graph = build_graph_from_memories(db_path)
+    print(graph.to_text())
+    # 保存图谱数据
+    output = {
+        "nodes": {k: v for k, v in graph.nodes.items()},
+        "edges": [
+            {"source": s, "target": t, "relation": r, "weight": w}
+            for s, t, r, w in graph.edges
+        ],
+    }
+    output_path = os.path.join(MEMORY_DIR, "knowledge_graph.json")
+    try:
+        os.makedirs(os.path.dirname(output_path), exist_ok=True)
+        with open(output_path, "w", encoding="utf-8") as f:
+            json.dump(output, f, ensure_ascii=False, indent=2)
+        print(f"\n图谱已保存到: {output_path}")
+    except Exception as e:
+        print(f"保存图谱失败: {e}")
+if __name__ == "__main__":
+    main()

scripts/selfheal.py ADDED Viewed

	@@ -0,0 +1,239 @@

+#!/usr/bin/env python3
+"""
+Hermes 自愈脚本 - 进程/内存/OOM/配置漂移检测
+通过 cronjob 定期调用，异常时自动修复或告警
+"""
+import subprocess
+import json
+import os
+import sys
+from datetime import datetime
+LOG_FILE = "/tmp/hermes-selfheal.log"
+DATA_DIR = "/data/hermes"
+def log(msg, level="INFO"):
+    ts = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+    line = f"[{ts}] [{level}] {msg}"
+    print(line)
+    try:
+        with open(LOG_FILE, "a") as f:
+            f.write(line + "\n")
+    except Exception:
+        pass
+def check_memory():
+    """检查内存使用，过高时自动清理"""
+    try:
+        result = subprocess.run(
+            ["free", "-m"], capture_output=True, text=True, timeout=5
+        )
+        if result.returncode != 0:
+            return
+        lines = result.stdout.strip().split("\n")
+        parts = lines[1].split()
+        used_mb = int(parts[2])
+        total_mb = int(parts[1])
+        percent = round(used_mb / total_mb * 100, 1)
+        log(f"内存: {used_mb}/{total_mb}MB ({percent}%)")
+        if percent > 90:
+            log("内存超过 90%，执行清理", "WARN")
+            cleanup_actions = []
+            # 清理旧日志
+            for log_dir in ["/data/hermes/logs", "/tmp/hermes/logs", "/app/logs"]:
+                if os.path.exists(log_dir):
+                    try:
+                        result = subprocess.run(
+                            ["find", log_dir, "-name", "*.log", "-mtime", "+7", "-delete"],
+                            capture_output=True, text=True, timeout=10,
+                        )
+                        cleanup_actions.append(f"清理 {log_dir} 7天前日志")
+                    except Exception as e:
+                        log(f"清理日志失败: {e}", "ERROR")
+            # 清理 pip 缓存
+            try:
+                subprocess.run(
+                    ["pip", "cache", "purge"],
+                    capture_output=True, text=True, timeout=10,
+                )
+                cleanup_actions.append("清理 pip 缓存")
+            except Exception:
+                pass
+            # 清理 /tmp 旧文件
+            try:
+                subprocess.run(
+                    ["find", "/tmp", "-type", "f", "-mtime", "+3", "-delete"],
+                    capture_output=True, text=True, timeout=10,
+                )
+                cleanup_actions.append("清理 /tmp 3天前文件")
+            except Exception:
+                pass
+            log(f"清理完成: {'; '.join(cleanup_actions)}")
+        elif percent > 85:
+            log("内存超过 85%，建议关注", "WARN")
+    except Exception as e:
+        log(f"内存检查失败: {e}", "ERROR")
+def check_disk():
+    """检查磁盘使用"""
+    try:
+        result = subprocess.run(
+            ["df", "-m", "/data"], capture_output=True, text=True, timeout=5
+        )
+        if result.returncode != 0:
+            return
+        lines = result.stdout.strip().split("\n")
+        if len(lines) < 2:
+            return
+        parts = lines[1].split()
+        used_mb = int(parts[2])
+        total_mb = int(parts[1])
+        percent = round(used_mb / total_mb * 100, 1)
+        log(f"磁盘: {used_mb}/{total_mb}MB ({percent}%)")
+        if percent > 90:
+            log("磁盘超过 90%，清理旧数据", "WARN")
+            for old_dir in ["/data/hermes/logs", "/data/hermes/uploads"]:
+                if os.path.exists(old_dir):
+                    subprocess.run(
+                        ["find", old_dir, "-type", "f", "-mtime", "+14", "-delete"],
+                        capture_output=True, text=True, timeout=15,
+                    )
+    except Exception as e:
+        log(f"磁盘检查失败: {e}", "ERROR")
+def check_process():
+    """检查 Hermes 进程状态"""
+    try:
+        # 检查 Python 进程（Gateway）
+        result = subprocess.run(
+            ["pgrep", "-f", "entry.py"], capture_output=True, text=True, timeout=5
+        )
+        gateway_running = result.returncode == 0
+        # 检查 Dashboard
+        result = subprocess.run(
+            ["pgrep", "-f", "7860"], capture_output=True, text=True, timeout=5
+        )
+        dashboard_running = result.returncode == 0
+        log(f"Gateway: {'运行中' if gateway_running else '未运行'}")
+        log(f"Dashboard: {'运行中' if dashboard_running else '未运行'}")
+        if not gateway_running:
+            log("Gateway 未运行！", "ERROR")
+            # 尝试重启
+            try:
+                subprocess.run(
+                    ["bash", "/app/start.sh"],
+                    capture_output=True, text=True, timeout=30,
+                )
+                log("已尝试重启 Gateway", "WARN")
+            except Exception as e:
+                log(f"重启失败: {e}", "ERROR")
+    except Exception as e:
+        log(f"进程检查失败: {e}", "ERROR")
+def check_config_drift():
+    """检查配置���件是否被意外修改"""
+    import hashlib
+    config_files = {
+        "SOUL.md": "/app/SOUL.md",
+        "config.yaml": "/app/config.yaml",
+    }
+    hash_file = os.path.join(DATA_DIR, ".config_hashes.json")
+    try:
+        saved_hashes = {}
+        if os.path.exists(hash_file):
+            with open(hash_file, "r") as f:
+                saved_hashes = json.load(f)
+        current_hashes = {}
+        for name, path in config_files.items():
+            if os.path.exists(path):
+                with open(path, "rb") as f:
+                    current_hashes[name] = hashlib.md5(f.read()).hexdigest()
+        drift = {}
+        for name, h in current_hashes.items():
+            if name in saved_hashes and saved_hashes[name] != h:
+                drift[name] = f"hash changed from {saved_hashes[name][:8]} to {h[:8]}"
+        if drift:
+            log(f"配置漂移检测: {drift}", "WARN")
+        else:
+            log("配置文件无漂移")
+        # 更新保存的 hash
+        with open(hash_file, "w") as f:
+            json.dump(current_hashes, f, indent=2)
+    except Exception as e:
+        log(f"配置漂移检测失败: {e}", "ERROR")
+def check_feishu_connection():
+    """检查飞书 WebSocket 连接"""
+    try:
+        result = subprocess.run(
+            ["pgrep", "-f", "websocket"], capture_output=True, text=True, timeout=5
+        )
+        connected = result.returncode == 0
+        log(f"飞书 WebSocket: {'已连接' if connected else '可能断开'}")
+        if not connected:
+            log("飞书连接可能断开，建议检查", "WARN")
+    except Exception as e:
+        log(f"飞书连接检查失败: {e}", "ERROR")
+def main():
+    log("=" * 40)
+    log("自愈检查启动")
+    # 日志轮转
+    try:
+        if os.path.exists(LOG_FILE):
+            size = os.path.getsize(LOG_FILE)
+            if size > 1024 * 100:  # 100KB
+                os.rename(LOG_FILE, LOG_FILE + ".bak")
+                log("日志轮转完成")
+    except Exception:
+        pass
+    check_process()
+    check_memory()
+    check_disk()
+    check_config_drift()
+    check_feishu_connection()
+    log("自愈检查完成")
+    log("=" * 40)
+if __name__ == "__main__":
+    main()