🏗️ ScholarMind — 生产级学术知识库问答 & 知识图谱系统

完整架构设计文档，请查看 docs/ARCHITECTURE.md

文档索引

文档	说明
docs/ARCHITECTURE.md	核心架构设计 — 系统总览、各层详细设计、代码示例
docs/DATAFLOW.md	数据流设计 — 端到端流转、并发模型、缓存策略、监控
docs/CACHING.md	🆕 7层缓存加速方案 — 语义缓存、Provider缓存、vLLM APC、KV压缩等
docs/ADR.md	技术选型决策记录 — 每个技术选型的依据和论文来源
docs/PAPERS.md	论文索引 — 14篇核心论文 + 15个开源项目速查
docs/requirements.txt	核心依赖 — Python完整依赖列表

系统概述

ScholarMind 是一个面向 1000+ 篇学术 PDF 论文 的生产级智能知识系统，集成：

PDF 深度解析：基于 MinerU 2.5 VLM 的高精度 OCR（公式/表格/图表）
知识图谱自动构建：从论文中自动抽取实体与关系，构建领域知识图谱
混合检索问答：GraphRAG + 向量检索 + BM25 稀疏检索的三路融合
多模型支持：同时支持本地部署（vLLM/Ollama）和外部 API（OpenAI/Anthropic/DeepSeek）
Agent 编排：基于 LangGraph 的多 Agent 协作，支持多跳推理
7层缓存加速：语义缓存 + Provider缓存 + vLLM前缀缓存 + KV压缩，P50延迟降至~400ms

核心架构图

┌────────────────────────────────────────────────────────┐
│                    用户层 (Web UI / API)                │
├────────────────────────────────────────────────────────┤
│              Agent 编排层 (LangGraph)                   │
│  路由Agent → 检索Agent → 推理Agent → 总结Agent          │
├────────────────────────────────────────────────────────┤
│            LLM 统一接入层 (LiteLLM Proxy)               │
│  vLLM | Ollama | OpenAI | Anthropic | DeepSeek         │
├──────── 7层缓存加速栈 ────────────────────────────────┤
│  L1 语义缓存(GPTCache) → L2 检索缓存 → L3 Provider缓存 │
│  L4 vLLM APC → L5 CacheBlend → L6 SnapKV → L7 对话缓存│
├────────────────────────────────────────────────────────┤
│              检索层 (Hybrid Retrieval)                   │
│  Dense Vector + Sparse BM25 + Graph Query               │
│  → RRF融合 → bge-reranker-large重排                     │
├────────────────────────────────────────────────────────┤
│               索引层 (Multi-Index)                      │
│  Qdrant(向量) | Neo4j(图谱) | RAPTOR(层次摘要树)        │
├────────────────────────────────────────────────────────┤
│             知识抽取层 (Knowledge Extraction)            │
│  GLiNER(NER) → LLMGraphTransformer(RE) → Graphusion    │
├────────────────────────────────────────────────────────┤
│             PDF 解析层 (MinerU Pipeline)                 │
│  PDF路由 → MinerU 2.5 VLM / PyMuPDF → JSON+Markdown    │
├────────────────────────────────────────────────────────┤
│                    存储层 (Storage)                      │
│  PostgreSQL | Qdrant | Neo4j | Redis | MinIO            │
└────────────────────────────────────────────────────────┘

性能指标

指标	无缓存	7层缓存后
QA响应延迟 (P50)	~1.5s	~400ms
QA响应延迟 (P99)	~4s	~1.5s
缓存命中时延迟	—	~5ms
API成本	基准	降低60%+
PDF解析速度 (A100)	2.12 页/秒	—
1000篇论文全量解析	~80 分钟	—

核心技术栈

组件	选型	论文依据
PDF解析	MinerU 2.5 VLM	arxiv:2509.22186 (OmniDocBench SOTA)
NER	GLiNER (440M)	arxiv:2311.08526 (F1=47.8, 零样本)
KG融合	Graphusion	arxiv:2410.17600 (+9.2% QA准确率)
GraphRAG	LightRAG	arxiv:2410.05779 (34k⭐, 增量更新)
层次索引	RAPTOR	arxiv:2401.18059 (+20%准确率)
检索重排	bge-reranker-large	arxiv:2502.11371 (共识最优)
语义缓存	GPTCache	7k⭐, 语义相似度命中
KV复用	CacheBlend/LMCache	arxiv:2405.16444 (2.2-3.3× TTFT)
KV压缩	SnapKV	arxiv:2404.14469 (3.6×解码加速)
Agent	LangGraph	有状态图, 条件分支, 生产级
LLM	LiteLLM	统一本地/API接口
向量库	Qdrant	Rust高性能, 原生Hybrid搜索
图数据库	Neo4j 5.x	LangChain原生集成

License

MIT

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Try ML Intern: https://smolagents-ml-intern.hf.space
Source code: https://github.com/huggingface/ml-intern

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = 'heyingyue/scholarmind-architecture'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for heyingyue/scholarmind-architecture

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 160

RAG vs. GraphRAG: A Systematic Evaluation and Key Insights

Paper • 2502.11371 • Published Feb 17, 2025

Graphusion: A RAG Framework for Knowledge Graph Construction with a Global Perspective

Paper • 2410.17600 • Published Oct 23, 2024 • 1

LightRAG: Simple and Fast Retrieval-Augmented Generation

Paper • 2410.05779 • Published Oct 8, 2024 • 39

CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion

Paper • 2405.16444 • Published May 26, 2024 • 1