Spaces:
Sleeping
Sleeping
Add demo mode, fix scoring, improve preferences
Browse filesDemo mode:
- Ships 420 pre-scored AI/ML papers as JSON for HF Spaces
- Entrypoint auto-bootstraps DB when no config/API key exists
- Dashboard shows Demo Mode banner, pipelines disabled
- Remove CLAUDE.md from public release
Scoring fixes:
- Generate axis field names from config (was hardcoded wrong:
code_weights vs code_and_weights, has_code vs has_code_poc)
- Fix filter 422: accept empty min_score as string, convert manually
- Fix min_score=0 displaying as blank in template
Preferences:
- Increase boost range from [-1, 1.5] to [-2, 3] for stronger signal
- CLAUDE.md +0 -73
- Dockerfile +2 -0
- data/demo-config.yaml +64 -0
- data/demo-data.json +0 -0
- entrypoint.sh +15 -0
- src/config.py +15 -4
- src/demo.py +93 -0
- src/preferences.py +5 -5
- src/web/app.py +15 -3
- src/web/templates/dashboard.html +9 -0
- src/web/templates/papers.html +1 -1
CLAUDE.md
DELETED
|
@@ -1,73 +0,0 @@
|
|
| 1 |
-
# Research Intelligence System
|
| 2 |
-
|
| 3 |
-
## Architecture
|
| 4 |
-
|
| 5 |
-
- **Web dashboard**: FastAPI + Jinja2 + HTMX on port 8888
|
| 6 |
-
- **Database**: SQLite at `data/researcher.db` (configurable in `config.yaml`)
|
| 7 |
-
- **Config**: YAML-driven via `config.yaml` (generated by setup wizard on first run)
|
| 8 |
-
- **Pipelines**: `src/pipelines/aiml.py` (HF + arXiv), `src/pipelines/security.py` (arXiv cs.CR)
|
| 9 |
-
- **Scoring**: `src/scoring.py` β Claude API batch scoring with configurable axes
|
| 10 |
-
- **Preferences**: `src/preferences.py` β learns from user signals (upvote/downvote/save/dismiss)
|
| 11 |
-
- **Scheduler**: APScheduler runs on configurable cron schedule
|
| 12 |
-
|
| 13 |
-
## Key Files
|
| 14 |
-
|
| 15 |
-
| File | Purpose |
|
| 16 |
-
|------|---------|
|
| 17 |
-
| `src/config.py` | YAML config loader, scoring prompt builder, defaults |
|
| 18 |
-
| `src/db.py` | SQLite schema + query helpers |
|
| 19 |
-
| `src/scoring.py` | Unified Claude API scorer |
|
| 20 |
-
| `src/preferences.py` | Preference computation from user signals |
|
| 21 |
-
| `src/pipelines/aiml.py` | AI/ML paper fetching (HF + arXiv) |
|
| 22 |
-
| `src/pipelines/security.py` | Security paper fetching (arXiv cs.CR) |
|
| 23 |
-
| `src/pipelines/github.py` | GitHub trending projects via OSSInsight |
|
| 24 |
-
| `src/pipelines/events.py` | Conferences, releases, RSS news |
|
| 25 |
-
| `src/web/app.py` | FastAPI routes, middleware, report generation |
|
| 26 |
-
| `src/scheduler.py` | APScheduler weekly trigger |
|
| 27 |
-
|
| 28 |
-
## Config System
|
| 29 |
-
|
| 30 |
-
`src/config.py` loads `config.yaml` and exposes module-level constants:
|
| 31 |
-
|
| 32 |
-
- `FIRST_RUN` β True when `config.yaml` doesn't exist (triggers setup wizard)
|
| 33 |
-
- `SCORING_CONFIGS` β Dict of domain scoring configs (axes, weights, prompts)
|
| 34 |
-
- `DB_PATH` β Path to SQLite database
|
| 35 |
-
- `ANTHROPIC_API_KEY` β From `.env` or environment
|
| 36 |
-
|
| 37 |
-
Scoring prompts are built dynamically from `scoring_axes` and `preferences` in config.
|
| 38 |
-
|
| 39 |
-
## Working with the Database
|
| 40 |
-
|
| 41 |
-
```bash
|
| 42 |
-
sqlite3 data/researcher.db
|
| 43 |
-
|
| 44 |
-
# Top papers
|
| 45 |
-
SELECT title, composite, summary FROM papers
|
| 46 |
-
WHERE domain='aiml' AND composite IS NOT NULL
|
| 47 |
-
ORDER BY composite DESC LIMIT 10;
|
| 48 |
-
|
| 49 |
-
# Signal counts
|
| 50 |
-
SELECT action, COUNT(*) FROM signals GROUP BY action;
|
| 51 |
-
|
| 52 |
-
# Preference profile
|
| 53 |
-
SELECT * FROM preferences ORDER BY abs(pref_value) DESC LIMIT 20;
|
| 54 |
-
```
|
| 55 |
-
|
| 56 |
-
## Docker
|
| 57 |
-
|
| 58 |
-
```bash
|
| 59 |
-
docker compose up --build
|
| 60 |
-
# Dashboard at http://localhost:9090
|
| 61 |
-
# Setup wizard runs on first visit
|
| 62 |
-
|
| 63 |
-
# Trigger pipelines
|
| 64 |
-
curl -X POST http://localhost:9090/run/aiml
|
| 65 |
-
curl -X POST http://localhost:9090/run/security
|
| 66 |
-
```
|
| 67 |
-
|
| 68 |
-
## Allowed Tools
|
| 69 |
-
|
| 70 |
-
When working with this project in Claude Code:
|
| 71 |
-
- **Bash**: python, sqlite3, curl, docker commands
|
| 72 |
-
- **WebSearch/WebFetch**: arXiv, GitHub, HuggingFace for paper details
|
| 73 |
-
- **Read/Edit**: all project files and data/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Dockerfile
CHANGED
|
@@ -12,6 +12,8 @@ RUN useradd -m -u 1000 -s /bin/bash appuser
|
|
| 12 |
# Copy source
|
| 13 |
COPY src/ src/
|
| 14 |
COPY data/seed_papers.json data/seed_papers.json
|
|
|
|
|
|
|
| 15 |
COPY entrypoint.sh .
|
| 16 |
RUN chmod +x entrypoint.sh
|
| 17 |
|
|
|
|
| 12 |
# Copy source
|
| 13 |
COPY src/ src/
|
| 14 |
COPY data/seed_papers.json data/seed_papers.json
|
| 15 |
+
COPY data/demo-data.json data/demo-data.json
|
| 16 |
+
COPY data/demo-config.yaml data/demo-config.yaml
|
| 17 |
COPY entrypoint.sh .
|
| 18 |
RUN chmod +x entrypoint.sh
|
| 19 |
|
data/demo-config.yaml
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
scoring:
|
| 2 |
+
model: claude-haiku-4-5-20251001
|
| 3 |
+
rescore_model: claude-sonnet-4-5-20250929
|
| 4 |
+
rescore_top_n: 15
|
| 5 |
+
batch_size: 20
|
| 6 |
+
domains:
|
| 7 |
+
aiml:
|
| 8 |
+
enabled: true
|
| 9 |
+
label: AI / ML
|
| 10 |
+
sources:
|
| 11 |
+
- huggingface
|
| 12 |
+
- arxiv
|
| 13 |
+
arxiv_categories:
|
| 14 |
+
- cs.CV
|
| 15 |
+
- cs.CL
|
| 16 |
+
- cs.LG
|
| 17 |
+
scoring_axes:
|
| 18 |
+
- name: Code & Weights
|
| 19 |
+
weight: 0.3
|
| 20 |
+
description: Open weights on HF, code on GitHub
|
| 21 |
+
- name: Novelty
|
| 22 |
+
weight: 0.35
|
| 23 |
+
description: Paradigm shifts over incremental
|
| 24 |
+
- name: Practical Applicability
|
| 25 |
+
weight: 0.35
|
| 26 |
+
description: Usable by practitioners soon
|
| 27 |
+
include_patterns: []
|
| 28 |
+
exclude_patterns: []
|
| 29 |
+
preferences:
|
| 30 |
+
boost_topics: []
|
| 31 |
+
penalize_topics: []
|
| 32 |
+
security:
|
| 33 |
+
enabled: true
|
| 34 |
+
label: Security
|
| 35 |
+
sources:
|
| 36 |
+
- arxiv
|
| 37 |
+
arxiv_categories:
|
| 38 |
+
- cs.CR
|
| 39 |
+
scoring_axes:
|
| 40 |
+
- name: Has Code/PoC
|
| 41 |
+
weight: 0.25
|
| 42 |
+
description: Working tools, repos, artifacts
|
| 43 |
+
- name: Novel Attack Surface
|
| 44 |
+
weight: 0.4
|
| 45 |
+
description: First-of-kind research
|
| 46 |
+
- name: Real-World Impact
|
| 47 |
+
weight: 0.35
|
| 48 |
+
description: Affects production systems
|
| 49 |
+
include_patterns: []
|
| 50 |
+
exclude_patterns: []
|
| 51 |
+
preferences:
|
| 52 |
+
boost_topics: []
|
| 53 |
+
penalize_topics: []
|
| 54 |
+
github:
|
| 55 |
+
enabled: true
|
| 56 |
+
events:
|
| 57 |
+
enabled: true
|
| 58 |
+
schedule:
|
| 59 |
+
cron: 0 22 * * 0
|
| 60 |
+
database:
|
| 61 |
+
path: data/researcher.db
|
| 62 |
+
web:
|
| 63 |
+
host: 0.0.0.0
|
| 64 |
+
port: 8888
|
data/demo-data.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
entrypoint.sh
CHANGED
|
@@ -4,6 +4,21 @@ set -e
|
|
| 4 |
PORT="${PORT:-8888}"
|
| 5 |
|
| 6 |
echo "=== Research Intelligence ==="
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
echo "Starting web server + scheduler on port ${PORT} ..."
|
| 8 |
|
| 9 |
exec python -m uvicorn src.web.app:app --host 0.0.0.0 --port "${PORT}"
|
|
|
|
| 4 |
PORT="${PORT:-8888}"
|
| 5 |
|
| 6 |
echo "=== Research Intelligence ==="
|
| 7 |
+
|
| 8 |
+
# Bootstrap demo data if no config exists and no API key set
|
| 9 |
+
if [ -n "$SPACE_ID" ]; then
|
| 10 |
+
CONFIG_PATH="/data/config.yaml"
|
| 11 |
+
else
|
| 12 |
+
CONFIG_PATH="config.yaml"
|
| 13 |
+
fi
|
| 14 |
+
|
| 15 |
+
if [ ! -f "$CONFIG_PATH" ] && [ -f "data/demo-data.json" ] && [ -z "$ANTHROPIC_API_KEY" ]; then
|
| 16 |
+
echo "No config found and no API key set β loading demo data..."
|
| 17 |
+
python -c "from src.demo import load_demo; load_demo()"
|
| 18 |
+
export DEMO_MODE=1
|
| 19 |
+
echo "Demo mode active. Deploy locally with an API key for full functionality."
|
| 20 |
+
fi
|
| 21 |
+
|
| 22 |
echo "Starting web server + scheduler on port ${PORT} ..."
|
| 23 |
|
| 24 |
exec python -m uvicorn src.web.app:app --host 0.0.0.0 --port "${PORT}"
|
src/config.py
CHANGED
|
@@ -31,6 +31,7 @@ log = logging.getLogger(__name__)
|
|
| 31 |
# ---------------------------------------------------------------------------
|
| 32 |
|
| 33 |
IS_HF_SPACE = bool(os.environ.get("SPACE_ID"))
|
|
|
|
| 34 |
|
| 35 |
|
| 36 |
def _spaces_data_dir() -> Path:
|
|
@@ -255,7 +256,7 @@ def _build_aiml_prompt(axes: list[dict], boost: list[str], penalize: list[str])
|
|
| 255 |
for i, ax in enumerate(axes, 1):
|
| 256 |
name = ax.get("name", f"axis_{i}")
|
| 257 |
desc = ax.get("description", "")
|
| 258 |
-
field = name.lower().replace(" ", "_").replace("&", "and").replace("/", "_")
|
| 259 |
axis_fields.append(field)
|
| 260 |
axis_section.append(f"{i}. **{field}** β {name}: {desc}")
|
| 261 |
|
|
@@ -297,7 +298,7 @@ def _build_security_prompt(axes: list[dict], boost: list[str], penalize: list[st
|
|
| 297 |
for i, ax in enumerate(axes, 1):
|
| 298 |
name = ax.get("name", f"axis_{i}")
|
| 299 |
desc = ax.get("description", "")
|
| 300 |
-
field = name.lower().replace(" ", "_").replace("&", "and").replace("/", "_")
|
| 301 |
axis_fields.append(field)
|
| 302 |
axes_section.append(f"{i}. **{field}** (1-10) β {name}: {desc}")
|
| 303 |
|
|
@@ -367,9 +368,14 @@ def _build_scoring_configs() -> dict:
|
|
| 367 |
aiml_weights[key] = ax.get("weight", 1.0 / len(aiml_axes_cfg))
|
| 368 |
aiml_weights = _normalize_weights(aiml_weights)
|
| 369 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 370 |
configs["aiml"] = {
|
| 371 |
"weights": aiml_weights,
|
| 372 |
-
"axes":
|
| 373 |
"axis_labels": [ax.get("name", f"Axis {i+1}") for i, ax in enumerate(aiml_axes_cfg)],
|
| 374 |
"prompt": _build_scoring_prompt("aiml", aiml_axes_cfg, aiml_prefs),
|
| 375 |
}
|
|
@@ -388,9 +394,14 @@ def _build_scoring_configs() -> dict:
|
|
| 388 |
sec_weights[key] = ax.get("weight", 1.0 / len(sec_axes_cfg))
|
| 389 |
sec_weights = _normalize_weights(sec_weights)
|
| 390 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 391 |
configs["security"] = {
|
| 392 |
"weights": sec_weights,
|
| 393 |
-
"axes":
|
| 394 |
"axis_labels": [ax.get("name", f"Axis {i+1}") for i, ax in enumerate(sec_axes_cfg)],
|
| 395 |
"prompt": _build_scoring_prompt("security", sec_axes_cfg, sec_prefs),
|
| 396 |
}
|
|
|
|
| 31 |
# ---------------------------------------------------------------------------
|
| 32 |
|
| 33 |
IS_HF_SPACE = bool(os.environ.get("SPACE_ID"))
|
| 34 |
+
DEMO_MODE = bool(os.environ.get("DEMO_MODE"))
|
| 35 |
|
| 36 |
|
| 37 |
def _spaces_data_dir() -> Path:
|
|
|
|
| 256 |
for i, ax in enumerate(axes, 1):
|
| 257 |
name = ax.get("name", f"axis_{i}")
|
| 258 |
desc = ax.get("description", "")
|
| 259 |
+
field = name.lower().replace(" ", "_").replace("&", "and").replace("/", "_").replace("-", "_")
|
| 260 |
axis_fields.append(field)
|
| 261 |
axis_section.append(f"{i}. **{field}** β {name}: {desc}")
|
| 262 |
|
|
|
|
| 298 |
for i, ax in enumerate(axes, 1):
|
| 299 |
name = ax.get("name", f"axis_{i}")
|
| 300 |
desc = ax.get("description", "")
|
| 301 |
+
field = name.lower().replace(" ", "_").replace("&", "and").replace("/", "_").replace("-", "_")
|
| 302 |
axis_fields.append(field)
|
| 303 |
axes_section.append(f"{i}. **{field}** (1-10) β {name}: {desc}")
|
| 304 |
|
|
|
|
| 368 |
aiml_weights[key] = ax.get("weight", 1.0 / len(aiml_axes_cfg))
|
| 369 |
aiml_weights = _normalize_weights(aiml_weights)
|
| 370 |
|
| 371 |
+
aiml_axis_fields = [
|
| 372 |
+
ax.get("name", f"axis_{i+1}").lower().replace(" ", "_").replace("&", "and").replace("/", "_").replace("-", "_")
|
| 373 |
+
for i, ax in enumerate(aiml_axes_cfg)
|
| 374 |
+
]
|
| 375 |
+
|
| 376 |
configs["aiml"] = {
|
| 377 |
"weights": aiml_weights,
|
| 378 |
+
"axes": aiml_axis_fields,
|
| 379 |
"axis_labels": [ax.get("name", f"Axis {i+1}") for i, ax in enumerate(aiml_axes_cfg)],
|
| 380 |
"prompt": _build_scoring_prompt("aiml", aiml_axes_cfg, aiml_prefs),
|
| 381 |
}
|
|
|
|
| 394 |
sec_weights[key] = ax.get("weight", 1.0 / len(sec_axes_cfg))
|
| 395 |
sec_weights = _normalize_weights(sec_weights)
|
| 396 |
|
| 397 |
+
sec_axis_fields = [
|
| 398 |
+
ax.get("name", f"axis_{i+1}").lower().replace(" ", "_").replace("&", "and").replace("/", "_").replace("-", "_")
|
| 399 |
+
for i, ax in enumerate(sec_axes_cfg)
|
| 400 |
+
]
|
| 401 |
+
|
| 402 |
configs["security"] = {
|
| 403 |
"weights": sec_weights,
|
| 404 |
+
"axes": sec_axis_fields,
|
| 405 |
"axis_labels": [ax.get("name", f"Axis {i+1}") for i, ax in enumerate(sec_axes_cfg)],
|
| 406 |
"prompt": _build_scoring_prompt("security", sec_axes_cfg, sec_prefs),
|
| 407 |
}
|
src/demo.py
ADDED
|
@@ -0,0 +1,93 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Demo data loader β bootstraps a pre-scored DB from bundled JSON."""
|
| 2 |
+
|
| 3 |
+
import json
|
| 4 |
+
import logging
|
| 5 |
+
import shutil
|
| 6 |
+
from pathlib import Path
|
| 7 |
+
|
| 8 |
+
log = logging.getLogger(__name__)
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
def load_demo():
|
| 12 |
+
"""Load demo data into a fresh database and copy demo config."""
|
| 13 |
+
from src.config import IS_HF_SPACE, SPACES_DATA_DIR
|
| 14 |
+
|
| 15 |
+
json_path = Path("data/demo-data.json")
|
| 16 |
+
config_src = Path("data/demo-config.yaml")
|
| 17 |
+
|
| 18 |
+
if not json_path.exists():
|
| 19 |
+
log.warning("Demo data not found at %s", json_path)
|
| 20 |
+
return
|
| 21 |
+
|
| 22 |
+
# Determine target paths
|
| 23 |
+
if IS_HF_SPACE:
|
| 24 |
+
config_dst = SPACES_DATA_DIR / "config.yaml"
|
| 25 |
+
db_path = SPACES_DATA_DIR / "researcher.db"
|
| 26 |
+
else:
|
| 27 |
+
config_dst = Path("config.yaml")
|
| 28 |
+
db_path = Path("data/researcher.db")
|
| 29 |
+
|
| 30 |
+
# Copy config
|
| 31 |
+
if config_src.exists() and not config_dst.exists():
|
| 32 |
+
config_dst.parent.mkdir(parents=True, exist_ok=True)
|
| 33 |
+
shutil.copy2(config_src, config_dst)
|
| 34 |
+
log.info("Demo config copied to %s", config_dst)
|
| 35 |
+
|
| 36 |
+
# Skip if DB already has data
|
| 37 |
+
if db_path.exists():
|
| 38 |
+
log.info("DB already exists at %s β skipping demo load", db_path)
|
| 39 |
+
return
|
| 40 |
+
|
| 41 |
+
# Initialize DB with current schema
|
| 42 |
+
import os
|
| 43 |
+
os.environ["DB_PATH"] = str(db_path)
|
| 44 |
+
|
| 45 |
+
# Re-import to pick up new path
|
| 46 |
+
import importlib
|
| 47 |
+
import src.config
|
| 48 |
+
importlib.reload(src.config)
|
| 49 |
+
|
| 50 |
+
from src.db import init_db, get_conn
|
| 51 |
+
init_db()
|
| 52 |
+
|
| 53 |
+
# Load JSON
|
| 54 |
+
data = json.loads(json_path.read_text())
|
| 55 |
+
runs = data.get("runs", [])
|
| 56 |
+
papers = data.get("papers", [])
|
| 57 |
+
|
| 58 |
+
with get_conn() as conn:
|
| 59 |
+
# Insert runs
|
| 60 |
+
for r in runs:
|
| 61 |
+
conn.execute(
|
| 62 |
+
"""INSERT OR IGNORE INTO runs (id, domain, started_at, finished_at,
|
| 63 |
+
date_start, date_end, paper_count, status)
|
| 64 |
+
VALUES (?, ?, ?, ?, ?, ?, ?, ?)""",
|
| 65 |
+
(r["id"], r["domain"], r["started_at"], r["finished_at"],
|
| 66 |
+
r["date_start"], r["date_end"], r["paper_count"], r["status"]),
|
| 67 |
+
)
|
| 68 |
+
|
| 69 |
+
# Insert papers
|
| 70 |
+
for p in papers:
|
| 71 |
+
conn.execute(
|
| 72 |
+
"""INSERT INTO papers (run_id, domain, arxiv_id, entry_id, title,
|
| 73 |
+
authors, abstract, published, categories, pdf_url, arxiv_url,
|
| 74 |
+
comment, source, github_repo, github_stars, hf_upvotes,
|
| 75 |
+
hf_models, hf_datasets, hf_spaces, score_axis_1, score_axis_2,
|
| 76 |
+
score_axis_3, composite, summary, reasoning, code_url,
|
| 77 |
+
s2_tldr, s2_paper_id, topics)
|
| 78 |
+
VALUES (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)""",
|
| 79 |
+
(p.get("run_id"), p.get("domain"), p.get("arxiv_id"), p.get("entry_id"),
|
| 80 |
+
p.get("title"), p.get("authors"), p.get("abstract"), p.get("published"),
|
| 81 |
+
p.get("categories"), p.get("pdf_url"), p.get("arxiv_url"),
|
| 82 |
+
p.get("comment"), p.get("source"), p.get("github_repo"),
|
| 83 |
+
p.get("github_stars"), p.get("hf_upvotes"), p.get("hf_models"),
|
| 84 |
+
p.get("hf_datasets"), p.get("hf_spaces"), p.get("score_axis_1"),
|
| 85 |
+
p.get("score_axis_2"), p.get("score_axis_3"), p.get("composite"),
|
| 86 |
+
p.get("summary"), p.get("reasoning"), p.get("code_url"),
|
| 87 |
+
p.get("s2_tldr"), p.get("s2_paper_id"), p.get("topics")),
|
| 88 |
+
)
|
| 89 |
+
|
| 90 |
+
# Rebuild FTS index
|
| 91 |
+
conn.execute("INSERT INTO papers_fts(papers_fts) VALUES('rebuild')")
|
| 92 |
+
|
| 93 |
+
log.info("Demo data loaded: %d runs, %d papers into %s", len(runs), len(papers), db_path)
|
src/preferences.py
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
"""Preference engine β learns from user signals to personalize paper rankings.
|
| 2 |
|
| 3 |
-
Adds a preference_boost (max +
|
| 4 |
Never re-scores papers. Papers with composite >= 8 are never penalized.
|
| 5 |
"""
|
| 6 |
|
|
@@ -190,7 +190,7 @@ def compute_paper_boost(paper: dict, preferences: dict[str, float]) -> tuple[flo
|
|
| 190 |
"""Compute preference boost for a single paper.
|
| 191 |
|
| 192 |
Returns (boost_value, list_of_reasons).
|
| 193 |
-
Boost is clamped to [-
|
| 194 |
Papers with composite >= 8 are never penalized (boost >= 0).
|
| 195 |
"""
|
| 196 |
if not preferences:
|
|
@@ -284,11 +284,11 @@ def compute_paper_boost(paper: dict, preferences: dict[str, float]) -> tuple[flo
|
|
| 284 |
if total_weight > 0:
|
| 285 |
boost = boost / total_weight # Normalize by actual weight used
|
| 286 |
|
| 287 |
-
# Scale to boost range: preferences are [-1, 1], we want [-
|
| 288 |
-
boost = boost *
|
| 289 |
|
| 290 |
# Clamp
|
| 291 |
-
boost = max(-
|
| 292 |
|
| 293 |
# Safety net: high-scoring papers never penalized
|
| 294 |
composite = paper.get("composite") or 0
|
|
|
|
| 1 |
"""Preference engine β learns from user signals to personalize paper rankings.
|
| 2 |
|
| 3 |
+
Adds a preference_boost (max +3.0 / min -2.0) on top of stored composite scores.
|
| 4 |
Never re-scores papers. Papers with composite >= 8 are never penalized.
|
| 5 |
"""
|
| 6 |
|
|
|
|
| 190 |
"""Compute preference boost for a single paper.
|
| 191 |
|
| 192 |
Returns (boost_value, list_of_reasons).
|
| 193 |
+
Boost is clamped to [-2.0, +3.0].
|
| 194 |
Papers with composite >= 8 are never penalized (boost >= 0).
|
| 195 |
"""
|
| 196 |
if not preferences:
|
|
|
|
| 284 |
if total_weight > 0:
|
| 285 |
boost = boost / total_weight # Normalize by actual weight used
|
| 286 |
|
| 287 |
+
# Scale to boost range: preferences are [-1, 1], we want [-2, 3]
|
| 288 |
+
boost = boost * 3.0
|
| 289 |
|
| 290 |
# Clamp
|
| 291 |
+
boost = max(-2.0, min(3.0, boost))
|
| 292 |
|
| 293 |
# Safety net: high-scoring papers never penalized
|
| 294 |
composite = paper.get("composite") or 0
|
src/web/app.py
CHANGED
|
@@ -269,6 +269,7 @@ async def dashboard(request: Request):
|
|
| 269 |
"running_pipelines": running,
|
| 270 |
"show_seed_banner": show_seed_banner,
|
| 271 |
"has_papers": (aiml_count + security_count) > 0,
|
|
|
|
| 272 |
})
|
| 273 |
|
| 274 |
|
|
@@ -284,7 +285,7 @@ async def papers_list(
|
|
| 284 |
offset: int = 0,
|
| 285 |
limit: int = 50,
|
| 286 |
search: str | None = None,
|
| 287 |
-
min_score:
|
| 288 |
has_code: bool = False,
|
| 289 |
topic: str | None = None,
|
| 290 |
sort: str | None = None,
|
|
@@ -292,6 +293,14 @@ async def papers_list(
|
|
| 292 |
if domain not in ("aiml", "security"):
|
| 293 |
return RedirectResponse("/")
|
| 294 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 295 |
config = SCORING_CONFIGS[domain]
|
| 296 |
run = get_latest_run(domain) or {}
|
| 297 |
|
|
@@ -307,7 +316,7 @@ async def papers_list(
|
|
| 307 |
papers, total = get_papers_page(
|
| 308 |
domain, run_id=run.get("id"),
|
| 309 |
offset=offset, limit=limit,
|
| 310 |
-
min_score=
|
| 311 |
has_code=has_code if has_code else None,
|
| 312 |
search=search,
|
| 313 |
topic=topic,
|
|
@@ -341,7 +350,7 @@ async def papers_list(
|
|
| 341 |
"offset": offset,
|
| 342 |
"limit": limit,
|
| 343 |
"search": search,
|
| 344 |
-
"min_score":
|
| 345 |
"has_code": has_code,
|
| 346 |
"topic": topic,
|
| 347 |
"sort": sort,
|
|
@@ -631,6 +640,9 @@ async def trigger_run(domain: str):
|
|
| 631 |
if domain not in ("aiml", "security", "github", "events"):
|
| 632 |
return RedirectResponse("/", status_code=303)
|
| 633 |
|
|
|
|
|
|
|
|
|
|
| 634 |
from src.config import is_pipeline_enabled
|
| 635 |
if not is_pipeline_enabled(domain):
|
| 636 |
return RedirectResponse("/", status_code=303)
|
|
|
|
| 269 |
"running_pipelines": running,
|
| 270 |
"show_seed_banner": show_seed_banner,
|
| 271 |
"has_papers": (aiml_count + security_count) > 0,
|
| 272 |
+
"demo_mode": bool(os.environ.get("DEMO_MODE")),
|
| 273 |
})
|
| 274 |
|
| 275 |
|
|
|
|
| 285 |
offset: int = 0,
|
| 286 |
limit: int = 50,
|
| 287 |
search: str | None = None,
|
| 288 |
+
min_score: str | None = None,
|
| 289 |
has_code: bool = False,
|
| 290 |
topic: str | None = None,
|
| 291 |
sort: str | None = None,
|
|
|
|
| 293 |
if domain not in ("aiml", "security"):
|
| 294 |
return RedirectResponse("/")
|
| 295 |
|
| 296 |
+
# Convert min_score from string (empty string from blank input β None)
|
| 297 |
+
min_score_val: float | None = None
|
| 298 |
+
if min_score:
|
| 299 |
+
try:
|
| 300 |
+
min_score_val = float(min_score)
|
| 301 |
+
except ValueError:
|
| 302 |
+
min_score_val = None
|
| 303 |
+
|
| 304 |
config = SCORING_CONFIGS[domain]
|
| 305 |
run = get_latest_run(domain) or {}
|
| 306 |
|
|
|
|
| 316 |
papers, total = get_papers_page(
|
| 317 |
domain, run_id=run.get("id"),
|
| 318 |
offset=offset, limit=limit,
|
| 319 |
+
min_score=min_score_val,
|
| 320 |
has_code=has_code if has_code else None,
|
| 321 |
search=search,
|
| 322 |
topic=topic,
|
|
|
|
| 350 |
"offset": offset,
|
| 351 |
"limit": limit,
|
| 352 |
"search": search,
|
| 353 |
+
"min_score": min_score_val,
|
| 354 |
"has_code": has_code,
|
| 355 |
"topic": topic,
|
| 356 |
"sort": sort,
|
|
|
|
| 640 |
if domain not in ("aiml", "security", "github", "events"):
|
| 641 |
return RedirectResponse("/", status_code=303)
|
| 642 |
|
| 643 |
+
if os.environ.get("DEMO_MODE"):
|
| 644 |
+
return RedirectResponse("/", status_code=303)
|
| 645 |
+
|
| 646 |
from src.config import is_pipeline_enabled
|
| 647 |
if not is_pipeline_enabled(domain):
|
| 648 |
return RedirectResponse("/", status_code=303)
|
src/web/templates/dashboard.html
CHANGED
|
@@ -1,6 +1,15 @@
|
|
| 1 |
{% extends "base.html" %}
|
| 2 |
{% block title %}Dashboard β Research Intelligence{% endblock %}
|
| 3 |
{% block content %}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
<div class="page-header">
|
| 5 |
<h1>Week of {{ week_label }}</h1>
|
| 6 |
<div class="subtitle">Research triage overview</div>
|
|
|
|
| 1 |
{% extends "base.html" %}
|
| 2 |
{% block title %}Dashboard β Research Intelligence{% endblock %}
|
| 3 |
{% block content %}
|
| 4 |
+
{% if demo_mode %}
|
| 5 |
+
<div style="background:linear-gradient(135deg, rgba(251,191,36,0.1), rgba(251,146,60,0.06)); border:1px solid rgba(251,191,36,0.3); border-radius:var(--radius-xl); padding:1rem 1.5rem; margin-bottom:1.5rem; display:flex; align-items:center; gap:0.75rem">
|
| 6 |
+
<span style="font-size:1.2rem">⚠</span>
|
| 7 |
+
<div>
|
| 8 |
+
<span style="font-weight:600; font-size:0.9rem">Demo Mode</span>
|
| 9 |
+
<span style="font-size:0.85rem; color:var(--text-muted)"> β Browsing sample data. Pipelines and scoring are disabled. To run your own instance, deploy locally with Docker Compose and an Anthropic API key.</span>
|
| 10 |
+
</div>
|
| 11 |
+
</div>
|
| 12 |
+
{% endif %}
|
| 13 |
<div class="page-header">
|
| 14 |
<h1>Week of {{ week_label }}</h1>
|
| 15 |
<div class="subtitle">Research triage overview</div>
|
src/web/templates/papers.html
CHANGED
|
@@ -16,7 +16,7 @@
|
|
| 16 |
<input type="search" name="search" value="{{ search or '' }}" placeholder="Search papers...">
|
| 17 |
<label>
|
| 18 |
Min score
|
| 19 |
-
<input type="number" name="min_score" value="{{ min_score
|
| 20 |
</label>
|
| 21 |
<label>
|
| 22 |
<input type="checkbox" name="has_code" value="1" {% if has_code %}checked{% endif %}>
|
|
|
|
| 16 |
<input type="search" name="search" value="{{ search or '' }}" placeholder="Search papers...">
|
| 17 |
<label>
|
| 18 |
Min score
|
| 19 |
+
<input type="number" name="min_score" value="{{ min_score if min_score is not none else '' }}" min="0" max="10" step="0.5">
|
| 20 |
</label>
|
| 21 |
<label>
|
| 22 |
<input type="checkbox" name="has_code" value="1" {% if has_code %}checked{% endif %}>
|