Spaces:
Runtime error
feat(v8+autonomy): research-driven trainer + 4 daemons + 9-layer safety gate
Browse filesSynthesised from 4 parallel research streams (~1.6k lines of dense notes
in knowledge/trends-2026/) into one shippable change.
V8 trainer (kaggle-trainer.sh) β 5 research-grounded additions:
β’ PiSSA SVD init (replaces LoftQ default; LoftQ kept as fallback via
SUR_LORA_INIT=loftq) β Meng '24, +1-3pp on code benchmarks.
β’ LoRA+ optimizer with lr_B = 16Β·lr_A β Hayou '24, free +1-2pp,
via peft.optimizers.create_loraplus_optimizer with manual-split
fallback for older peft.
β’ V8 dataset blend via merge_external() β ToolACE 1.5Γ, Multi-IaC-Eval
2Γ, xLAM-fn-call-60k 1Γ, ITBench-Trajectories 2Γ, Code-Feedback 1Γ.
Each take/weight env-tunable; format-tolerant via extract_pair().
β’ GRPO Phase-2 scaffold (RUN_GRPO=1) β DeepSeekMath/RLVR-Code, post-SFT
booster with execution-pass reward function. Disabled by default
(needs TRL β₯0.12 + β₯30GB VRAM headroom).
β’ Hub bumped: axentx/surrogate-1-7B-v1.2-research.
Autonomous daemons (4) β all share verifier-ensemble.py + outcome-log.py:
β’ autonomous-sre.sh β 5-min sweep: HF Space stages, dataset staleness,
ZeroGPU smoke, GH Action failure rate, outcome-log self-health. On
anomaly: build prompt β call Surrogate β idempotency dedupe β 9-layer
verifier β apply OR queue. Whitelisted scope: only systems Surrogate
owns (no AWS/prod).
β’ autonomous-release.sh β hourly recon: HN+GH-trending+ProductHunt
cluster by owner-relevant keywords, build SDD spec, generate 3 patch
candidates with CISC self-consistency voting (research Β§autonomous-24x7
pattern 1), pick best by verifier+confidence, open draft PR via gh.
β’ self-improve.sh β daily/weekly flywheel: outcomes.jsonl β SFT replay
(success-only, RLEF-aligned), KTO unpaired (every label, lossless
on logs), skill library (verified procedures by trigger). Pushes to
axentx/surrogate-1-{self-traces,pref-kto,skills}; flags next training
when SFTβ₯200 or KTOβ₯500.
β’ watchdog.sh β independent observer with kill-switch. Detects loops
(β₯5 same trigger in 15m), failure cascades (β₯5 consecutive non-success),
rate spikes (β₯30/min), audit gaps (applied without verdict), disk
fill. Never calls Surrogate, never applies; only kills + records.
Safety gate (verifier-ensemble.py) β single source of truth, 9 layers:
ast / lint (ruff/shellcheck/cfn-lint/tflint) / typecheck / tests /
policy (14-rule HardGuard list β terraform destroy, kubectl delete ns
prod-*, IAM Allow*:*, ec2 terminate w/o dry-run, rds delete w/o final
snapshot, helm install w/o digest pin, AKIA/private-key/sk-/hf_ leaks,
MFA bypass, force-push to main, etc) / security (gitleaks+semgrep+
cfn-guard) / diff sanity (β€300 lines, β€8 files) / sandbox (docker
--network=none --read-only --cap-drop=ALL) / confidence (β₯0.95 floor
for destructive-class actions). All non-SKIP must PASS, β₯3 verifiers
must run.
Helpers: surrogate-call.py (strict-JSON LLM call w/ retries + schema
validation for diagnosis|spec|patch), outcome-log.py (append-only JSONL),
idempotency.py (sha256(plan) ledger w/ TTL β prevents replay storms when
same anomaly fires twice).
Bench (bench-v1-vs-v15.sh): added 4th model (v1.2-research) + 2 new evals
(Multi-IaC-Eval CFN/TF/CDK pass-rate, ITBench-lite K8s SRE scenarios).
Now 4-way Γ 9 evals.
Architecture map: knowledge/surrogate-1-autonomous-arch.md β single
on-ramp doc with all components, file paths, run/disarm commands, the
14 HardGuards, and the V9 stretch ladder.
V7 train.py at ~/Desktop/surrogate-1-train-v7-7B-extended-plus.py is now
superseded by ~/Desktop/surrogate-1-train-v8-research.py. User uploads
the V8 file via Kaggle UI Replace File β Save Version when ready.
- bin/kaggle-trainer.sh +157 -14
- bin/v2/autonomous-release.sh +425 -0
- bin/v2/autonomous-sre.sh +346 -0
- bin/v2/bench-v1-vs-v15.sh +24 -1
- bin/v2/idempotency.py +118 -0
- bin/v2/outcome-log.py +98 -0
- bin/v2/self-improve.sh +283 -0
- bin/v2/surrogate-call.py +177 -0
- bin/v2/verifier-ensemble.py +404 -0
- bin/v2/watchdog.sh +195 -0
|
@@ -206,8 +206,8 @@ EPOCHS = float(os.environ.get("EPOCHS", "1"))
|
|
| 206 |
_default_hub = {
|
| 207 |
32.0: "axentx/surrogate-1-coder-32B-v1.5",
|
| 208 |
14.0: "axentx/surrogate-1-coder-14B-v1.5-mid",
|
| 209 |
-
7.0: "axentx/surrogate-1-7B-v1.
|
| 210 |
-
}.get(_auto_size, "axentx/surrogate-1-7B-v1.
|
| 211 |
HUB_ID = os.environ.get("HUB_MODEL_ID", _default_hub)
|
| 212 |
# seq_len auto-shrinks for smaller hardware budget
|
| 213 |
_default_seq = {32.0: 2048, 14.0: 4096, 7.0: 8192}.get(_auto_size, 2048)
|
|
@@ -339,6 +339,44 @@ try:
|
|
| 339 |
except Exception as e:
|
| 340 |
print(f" β Magpie skip (repo not yet published): {type(e).__name__}: {str(e)[:80]}")
|
| 341 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 342 |
raw = Dataset.from_list(rows)
|
| 343 |
# (Active-learning teachable filter applied AFTER model load β see below.
|
| 344 |
# Filtering needs the 4-bit base model to score perplexity, which doesn't
|
|
@@ -445,21 +483,28 @@ lora_kwargs = dict(
|
|
| 445 |
use_dora=True, # R2: DoRA
|
| 446 |
task_type="CAUSAL_LM",
|
| 447 |
)
|
| 448 |
-
#
|
|
|
|
|
|
|
|
|
|
| 449 |
try:
|
| 450 |
from peft import LoraConfig as _Probe
|
| 451 |
import inspect
|
| 452 |
_sig = inspect.signature(_Probe).parameters
|
| 453 |
if "use_rslora" in _sig: lora_kwargs["use_rslora"] = True
|
| 454 |
if "init_lora_weights" in _sig:
|
| 455 |
-
|
| 456 |
-
|
| 457 |
-
|
| 458 |
-
|
| 459 |
-
|
| 460 |
-
|
| 461 |
-
|
| 462 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 463 |
print(f" LoRA config: r={LORA_R}, DoRA={lora_kwargs.get('use_dora')}, "
|
| 464 |
f"RSLoRA={lora_kwargs.get('use_rslora', False)}, "
|
| 465 |
f"init={lora_kwargs.get('init_lora_weights', 'gaussian')}, "
|
|
@@ -469,6 +514,44 @@ lora = LoraConfig(**lora_kwargs)
|
|
| 469 |
model = get_peft_model(model, lora)
|
| 470 |
model.print_trainable_parameters()
|
| 471 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 472 |
# ββ Format chat template (system + user + assistant) ββββββββββββββββββββββββ
|
| 473 |
def fmt(ex):
|
| 474 |
msgs = [
|
|
@@ -522,12 +605,17 @@ sft_cfg = SFTConfig(
|
|
| 522 |
report_to="none",
|
| 523 |
)
|
| 524 |
|
| 525 |
-
|
| 526 |
model=model,
|
| 527 |
args=sft_cfg,
|
| 528 |
train_dataset=raw,
|
| 529 |
tokenizer=tok,
|
| 530 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 531 |
|
| 532 |
print()
|
| 533 |
print("βββ training start βββ")
|
|
@@ -536,10 +624,65 @@ print("βββ training done βββ")
|
|
| 536 |
|
| 537 |
# Final push (in case last save_steps didn't trigger)
|
| 538 |
trainer.push_to_hub(commit_message=(
|
| 539 |
-
f"Surrogate-1 v1.
|
| 540 |
-
f"r=
|
|
|
|
| 541 |
f"{len(rows):,} samples Γ {EPOCHS} epochs (Kaggle T4Γ2)"))
|
| 542 |
print("β
pushed to", HUB_ID)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 543 |
PYEOF
|
| 544 |
|
| 545 |
# ββ Push notebook to Kaggle (creates if not exists, updates if exists) βββββ
|
|
|
|
| 206 |
_default_hub = {
|
| 207 |
32.0: "axentx/surrogate-1-coder-32B-v1.5",
|
| 208 |
14.0: "axentx/surrogate-1-coder-14B-v1.5-mid",
|
| 209 |
+
7.0: "axentx/surrogate-1-7B-v1.2-research", # β V8: research-driven stack
|
| 210 |
+
}.get(_auto_size, "axentx/surrogate-1-7B-v1.2-research")
|
| 211 |
HUB_ID = os.environ.get("HUB_MODEL_ID", _default_hub)
|
| 212 |
# seq_len auto-shrinks for smaller hardware budget
|
| 213 |
_default_seq = {32.0: 2048, 14.0: 4096, 7.0: 8192}.get(_auto_size, 2048)
|
|
|
|
| 339 |
except Exception as e:
|
| 340 |
print(f" β Magpie skip (repo not yet published): {type(e).__name__}: {str(e)[:80]}")
|
| 341 |
|
| 342 |
+
# ββ V8 RESEARCH-DRIVEN DATASET BLEND ββββββββββββββββββββββββββββββββββββββββ
|
| 343 |
+
# From research Β§devsecops-sre-agentic.md (top-5 datasets) + Β§coding-llm-frontier
|
| 344 |
+
# (#5 Code-Feedback). Each blend is opt-in via env knob (default ON).
|
| 345 |
+
# Format-tolerant extract_pair() handles ShareGPT, instruction/output, etc.
|
| 346 |
+
def merge_external(repo: str, take: int, weight: float, name: str):
|
| 347 |
+
"""Stream-and-merge a HF dataset with weight oversampling."""
|
| 348 |
+
if take <= 0:
|
| 349 |
+
print(f" - {name}: disabled (take=0)")
|
| 350 |
+
return 0
|
| 351 |
+
try:
|
| 352 |
+
# Many of these datasets are gated; use HF_TOKEN automatically
|
| 353 |
+
ds = load_dataset(repo, split="train", streaming=True)
|
| 354 |
+
n = 0
|
| 355 |
+
replicate = max(1, int(round(weight)))
|
| 356 |
+
for ex in ds:
|
| 357 |
+
if n >= take: break
|
| 358 |
+
pair = extract_pair(ex)
|
| 359 |
+
if not pair: continue
|
| 360 |
+
p, r = pair
|
| 361 |
+
for _ in range(replicate):
|
| 362 |
+
rows.append({"prompt": p, "response": r})
|
| 363 |
+
n += 1
|
| 364 |
+
print(f" + {name}: {n:,} pairs Γ {replicate} = {n*replicate:,} rows merged")
|
| 365 |
+
return n
|
| 366 |
+
except Exception as e:
|
| 367 |
+
msg = f"{type(e).__name__}: {str(e)[:90]}"
|
| 368 |
+
print(f" β {name} skip ({repo}): {msg}")
|
| 369 |
+
return 0
|
| 370 |
+
|
| 371 |
+
# Research-recommended weights β see knowledge/trends-2026/devsecops-sre-agentic.md
|
| 372 |
+
merge_external("Team-ACE/ToolACE", int(os.environ.get("TAKE_TOOLACE", "8000")), 1.5, "ToolACE")
|
| 373 |
+
merge_external("AmazonScience/Multi-IaC-Eval", int(os.environ.get("TAKE_MULTIIAC", "5000")), 2.0, "Multi-IaC-Eval")
|
| 374 |
+
merge_external("Salesforce/xlam-function-calling-60k", int(os.environ.get("TAKE_XLAM", "10000")), 1.0, "xLAM-fn-call-60k")
|
| 375 |
+
merge_external("ibm-research/ITBench-Trajectories", int(os.environ.get("TAKE_ITBENCH", "3000")), 2.0, "ITBench-Trajectories")
|
| 376 |
+
merge_external("m-a-p/Code-Feedback", int(os.environ.get("TAKE_CODEFB", "8000")), 1.0, "Code-Feedback")
|
| 377 |
+
|
| 378 |
+
print(f" total rows after V8 blend: {len(rows):,}")
|
| 379 |
+
|
| 380 |
raw = Dataset.from_list(rows)
|
| 381 |
# (Active-learning teachable filter applied AFTER model load β see below.
|
| 382 |
# Filtering needs the 4-bit base model to score perplexity, which doesn't
|
|
|
|
| 483 |
use_dora=True, # R2: DoRA
|
| 484 |
task_type="CAUSAL_LM",
|
| 485 |
)
|
| 486 |
+
# V8: PiSSA init by default (research Β§coding-llm-frontier #4) β SVD of base
|
| 487 |
+
# weights gives a much better starting point than gaussian. LoftQ/gaussian
|
| 488 |
+
# remain as env-controlled fallback for A/B comparison.
|
| 489 |
+
LORA_INIT = os.environ.get("SUR_LORA_INIT", "pissa_niter_4")
|
| 490 |
try:
|
| 491 |
from peft import LoraConfig as _Probe
|
| 492 |
import inspect
|
| 493 |
_sig = inspect.signature(_Probe).parameters
|
| 494 |
if "use_rslora" in _sig: lora_kwargs["use_rslora"] = True
|
| 495 |
if "init_lora_weights" in _sig:
|
| 496 |
+
if LORA_INIT.startswith("pissa"):
|
| 497 |
+
lora_kwargs["init_lora_weights"] = LORA_INIT # "pissa" or "pissa_niter_K"
|
| 498 |
+
elif LORA_INIT == "loftq":
|
| 499 |
+
try:
|
| 500 |
+
from peft import LoftQConfig
|
| 501 |
+
lora_kwargs["init_lora_weights"] = "loftq"
|
| 502 |
+
lora_kwargs["loftq_config"] = LoftQConfig(loftq_bits=4, loftq_iter=5)
|
| 503 |
+
except Exception as e:
|
| 504 |
+
print(f" β LoftQ unavailable, falling back to gaussian: {e}")
|
| 505 |
+
# else: gaussian default
|
| 506 |
+
except Exception as e:
|
| 507 |
+
print(f" β LoRA config probe failed: {e}")
|
| 508 |
print(f" LoRA config: r={LORA_R}, DoRA={lora_kwargs.get('use_dora')}, "
|
| 509 |
f"RSLoRA={lora_kwargs.get('use_rslora', False)}, "
|
| 510 |
f"init={lora_kwargs.get('init_lora_weights', 'gaussian')}, "
|
|
|
|
| 514 |
model = get_peft_model(model, lora)
|
| 515 |
model.print_trainable_parameters()
|
| 516 |
|
| 517 |
+
# ββ V8: LoRA+ optimizer (research Β§coding-llm-frontier #3) ββββββββββββββββββ
|
| 518 |
+
# Hayou et al 2024 (arxiv 2402.12354): the B matrix in LoRA needs a learning
|
| 519 |
+
# rate ~16Γ higher than A for fastest convergence + +1-2pp benchmark lift.
|
| 520 |
+
# Free improvement β no extra memory cost. Activated via SUR_LORA_PLUS_RATIO.
|
| 521 |
+
LORA_PLUS_RATIO = float(os.environ.get("SUR_LORA_PLUS_RATIO", "16"))
|
| 522 |
+
LORA_PLUS_OPT = None # set later if available
|
| 523 |
+
if LORA_PLUS_RATIO > 1.0:
|
| 524 |
+
try:
|
| 525 |
+
# peft.optimizers.create_loraplus_optimizer is the canonical helper
|
| 526 |
+
# (peft>=0.13). For older peft we fall back to manual param-group split.
|
| 527 |
+
from peft.optimizers import create_loraplus_optimizer # type: ignore
|
| 528 |
+
import bitsandbytes as bnb_lib
|
| 529 |
+
LORA_PLUS_OPT = create_loraplus_optimizer(
|
| 530 |
+
model=model,
|
| 531 |
+
optimizer_cls=bnb_lib.optim.PagedAdamW8bit,
|
| 532 |
+
lr=float(os.environ.get("LEARNING_RATE", "7e-5")),
|
| 533 |
+
loraplus_lr_ratio=LORA_PLUS_RATIO,
|
| 534 |
+
weight_decay=0.01,
|
| 535 |
+
)
|
| 536 |
+
print(f" LoRA+ optimizer: lr_B/lr_A = {LORA_PLUS_RATIO}x (paged AdamW 8-bit)")
|
| 537 |
+
except Exception as e:
|
| 538 |
+
print(f" β LoRA+ helper unavailable ({type(e).__name__}: {e}) β manual split")
|
| 539 |
+
try:
|
| 540 |
+
import bitsandbytes as bnb_lib
|
| 541 |
+
param_groups = [
|
| 542 |
+
{"params": [p for n, p in model.named_parameters()
|
| 543 |
+
if "lora_A" in n], "lr": float(os.environ.get("LEARNING_RATE", "7e-5"))},
|
| 544 |
+
{"params": [p for n, p in model.named_parameters()
|
| 545 |
+
if "lora_B" in n], "lr": float(os.environ.get("LEARNING_RATE", "7e-5")) * LORA_PLUS_RATIO},
|
| 546 |
+
]
|
| 547 |
+
LORA_PLUS_OPT = bnb_lib.optim.PagedAdamW8bit(param_groups, weight_decay=0.01)
|
| 548 |
+
print(f" LoRA+ manual split: lr_B/lr_A = {LORA_PLUS_RATIO}x")
|
| 549 |
+
except Exception as e2:
|
| 550 |
+
print(f" β LoRA+ manual split also failed ({e2}) β using SFTTrainer default optim")
|
| 551 |
+
LORA_PLUS_OPT = None
|
| 552 |
+
else:
|
| 553 |
+
print(" LoRA+ disabled (SUR_LORA_PLUS_RATIO β€ 1.0)")
|
| 554 |
+
|
| 555 |
# ββ Format chat template (system + user + assistant) ββββββββββββββββββββββββ
|
| 556 |
def fmt(ex):
|
| 557 |
msgs = [
|
|
|
|
| 605 |
report_to="none",
|
| 606 |
)
|
| 607 |
|
| 608 |
+
trainer_kwargs = dict(
|
| 609 |
model=model,
|
| 610 |
args=sft_cfg,
|
| 611 |
train_dataset=raw,
|
| 612 |
tokenizer=tok,
|
| 613 |
)
|
| 614 |
+
if LORA_PLUS_OPT is not None:
|
| 615 |
+
# Pass tuple (optimizer, lr_scheduler=None) so HF Trainer doesn't rebuild
|
| 616 |
+
trainer_kwargs["optimizers"] = (LORA_PLUS_OPT, None)
|
| 617 |
+
|
| 618 |
+
trainer = SFTTrainer(**trainer_kwargs)
|
| 619 |
|
| 620 |
print()
|
| 621 |
print("βββ training start βββ")
|
|
|
|
| 624 |
|
| 625 |
# Final push (in case last save_steps didn't trigger)
|
| 626 |
trainer.push_to_hub(commit_message=(
|
| 627 |
+
f"Surrogate-1 v1.2-research SFT β base={BASE.split('/')[-1]}, "
|
| 628 |
+
f"r={LORA_R}+DoRA+RSLoRA+{lora_kwargs.get('init_lora_weights','gauss')}, "
|
| 629 |
+
f"LoRA+x{LORA_PLUS_RATIO} NEFTune Ξ±=5 seq={SEQ_LEN}, "
|
| 630 |
f"{len(rows):,} samples Γ {EPOCHS} epochs (Kaggle T4Γ2)"))
|
| 631 |
print("β
pushed to", HUB_ID)
|
| 632 |
+
|
| 633 |
+
# ββ V8 GRPO Phase-2 hook (scaffold only β disabled by default) βββββββββββββ
|
| 634 |
+
# Research Β§coding-llm-frontier pick #1: post-SFT GRPO with execution-based
|
| 635 |
+
# rewards is the BIGGEST single lift (+5-9pp LCB v6, +4-7pp HumanEval+).
|
| 636 |
+
# Implementing the RL loop here would require a Python sandbox + unit-test
|
| 637 |
+
# generator + group-of-N rollouts, all of which strain T4Γ2. Scaffolded but
|
| 638 |
+
# gated behind RUN_GRPO=1 + TRL>=0.12 + β₯30GB peak VRAM headroom.
|
| 639 |
+
if os.environ.get("RUN_GRPO", "0") == "1":
|
| 640 |
+
try:
|
| 641 |
+
from trl import GRPOTrainer, GRPOConfig # type: ignore
|
| 642 |
+
print("βββ Phase 2: GRPO with execution rewards (experimental) βββ")
|
| 643 |
+
# Reward fn: run candidate code in subprocess, +1 if all unit tests
|
| 644 |
+
# pass, 0 otherwise. Group-of-4 rollouts per prompt.
|
| 645 |
+
import re, subprocess, tempfile, signal
|
| 646 |
+
def reward_unit_test_pass(prompts, completions, **kw):
|
| 647 |
+
rewards = []
|
| 648 |
+
for c in completions:
|
| 649 |
+
# Extract first ```python ... ``` block
|
| 650 |
+
m = re.search(r"```python\s*\n(.*?)\n```", c, re.S)
|
| 651 |
+
code = m.group(1) if m else c
|
| 652 |
+
with tempfile.NamedTemporaryFile("w", suffix=".py",
|
| 653 |
+
delete=False) as f:
|
| 654 |
+
f.write(code); pth = f.name
|
| 655 |
+
try:
|
| 656 |
+
rc = subprocess.run(
|
| 657 |
+
["python", "-c", f"exec(open('{pth}').read())"],
|
| 658 |
+
timeout=8, capture_output=True
|
| 659 |
+
).returncode
|
| 660 |
+
rewards.append(1.0 if rc == 0 else 0.0)
|
| 661 |
+
except Exception:
|
| 662 |
+
rewards.append(0.0)
|
| 663 |
+
return rewards
|
| 664 |
+
grpo_cfg = GRPOConfig(
|
| 665 |
+
output_dir="./surrogate-1-v1.2-research-grpo",
|
| 666 |
+
num_generations=4, learning_rate=5e-7,
|
| 667 |
+
num_train_epochs=1, per_device_train_batch_size=1,
|
| 668 |
+
gradient_accumulation_steps=8,
|
| 669 |
+
bf16=BF16_OK, fp16=not BF16_OK,
|
| 670 |
+
push_to_hub=True, hub_model_id=HUB_ID + "-grpo",
|
| 671 |
+
hub_token=os.environ.get("HF_TOKEN"),
|
| 672 |
+
)
|
| 673 |
+
grpo = GRPOTrainer(
|
| 674 |
+
model=model, args=grpo_cfg,
|
| 675 |
+
reward_funcs=[reward_unit_test_pass],
|
| 676 |
+
train_dataset=raw,
|
| 677 |
+
)
|
| 678 |
+
grpo.train()
|
| 679 |
+
grpo.push_to_hub(commit_message=f"Surrogate-1 v1.2-research GRPO Phase-2")
|
| 680 |
+
print("β
GRPO Phase-2 done")
|
| 681 |
+
except ImportError as e:
|
| 682 |
+
print(f" GRPO scaffold skipped β TRL too old: {e}")
|
| 683 |
+
except Exception as e:
|
| 684 |
+
print(f" β GRPO Phase-2 failed: {type(e).__name__}: {e}")
|
| 685 |
+
print(" (SFT checkpoint is still saved β GRPO is post-SFT booster)")
|
| 686 |
PYEOF
|
| 687 |
|
| 688 |
# ββ Push notebook to Kaggle (creates if not exists, updates if exists) βββββ
|
|
@@ -0,0 +1,425 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
# Surrogate-1 β autonomous release daemon.
|
| 3 |
+
#
|
| 4 |
+
# Loop:
|
| 5 |
+
# 1. Recon β pull HN top, GitHub trending, ProductHunt feed, RSS list.
|
| 6 |
+
# 2. Cluster signals into "feature gaps" (frequency Γ recency Γ relevance).
|
| 7 |
+
# 3. For each above-threshold gap, ask Surrogate-1 to write a spec.md
|
| 8 |
+
# (problem / user-stories / acceptance / impact / out-of-scope).
|
| 9 |
+
# 4. Ask Surrogate-1 for an implementation patch + tests.
|
| 10 |
+
# 5. CISC self-consistency: generate 3 patch candidates, take the one
|
| 11 |
+
# that passes verifier-ensemble + has highest test pass rate.
|
| 12 |
+
# 6. If verdict ok β open a draft PR in target repo, run CI in canary
|
| 13 |
+
# branch with metric-gated promotion (Flagger-style if available).
|
| 14 |
+
# 7. Auto-rollback on SLO violation; auto-promote if green for COOLDOWN.
|
| 15 |
+
# 8. Outcome β outcomes.jsonl for self-improve.
|
| 16 |
+
#
|
| 17 |
+
# Owner-controlled scope (only repos this daemon may touch):
|
| 18 |
+
# AUTO_RELEASE_REPOS env (comma-separated), default = axentx/surrogate-1
|
| 19 |
+
#
|
| 20 |
+
# Hard guards:
|
| 21 |
+
# - Never push to main; always open PR (draft) on a auto/* branch
|
| 22 |
+
# - Diff β€ 600 lines, β€ 12 files, must include tests
|
| 23 |
+
# - All HardGuards from verifier-ensemble.py apply
|
| 24 |
+
# - PR labeled "autonomous-release" + linked to outcome record id
|
| 25 |
+
#
|
| 26 |
+
# Usage:
|
| 27 |
+
# nohup bash bin/v2/autonomous-release.sh \
|
| 28 |
+
# > $HOME/.surrogate/logs/autonomous-release.log 2>&1 &
|
| 29 |
+
#
|
| 30 |
+
# Cron once per hour:
|
| 31 |
+
# 0 * * * * bash $HOME/.surrogate/hf-space/bin/v2/autonomous-release.sh --once
|
| 32 |
+
set -uo pipefail
|
| 33 |
+
[[ -f "$HOME/.hermes/.env" ]] && { set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a; }
|
| 34 |
+
|
| 35 |
+
HFB="$HOME/.surrogate/hf-space/bin/v2"
|
| 36 |
+
STATE="$HOME/.surrogate/state"
|
| 37 |
+
SPECS="$STATE/specs"
|
| 38 |
+
LOG="$HOME/.surrogate/logs/autonomous-release.log"
|
| 39 |
+
mkdir -p "$STATE" "$SPECS" "$(dirname "$LOG")"
|
| 40 |
+
|
| 41 |
+
ONCE=0
|
| 42 |
+
[[ "${1:-}" == "--once" ]] && ONCE=1
|
| 43 |
+
|
| 44 |
+
INTERVAL_SEC="${REL_INTERVAL_SEC:-3600}" # 1 h
|
| 45 |
+
SPACE="${REL_SPACE:-surrogate1/surrogate-1-zero-gpu}"
|
| 46 |
+
REPOS=(${AUTO_RELEASE_REPOS:-axentx/surrogate-1})
|
| 47 |
+
RECON_LIMIT="${REL_RECON_LIMIT:-50}"
|
| 48 |
+
CISC_N="${REL_CISC_N:-3}"
|
| 49 |
+
GAP_FREQ_THRESHOLD="${REL_GAP_FREQ:-3}" # signal must appear β₯3 sources
|
| 50 |
+
|
| 51 |
+
log() { echo "[$(date '+%Y-%m-%dT%H:%M:%S')] $*" | tee -a "$LOG"; }
|
| 52 |
+
notify() {
|
| 53 |
+
[[ -z "${DISCORD_WEBHOOK:-}" ]] && return
|
| 54 |
+
curl -s -X POST -H "Content-Type: application/json" \
|
| 55 |
+
-d "{\"content\":\"π autonomous-release: $1\"}" \
|
| 56 |
+
"$DISCORD_WEBHOOK" >/dev/null 2>&1 || true
|
| 57 |
+
}
|
| 58 |
+
|
| 59 |
+
# ββ Recon: pull signals from public sources βββββββββββββββββββββββββββββββββ
|
| 60 |
+
recon() {
|
| 61 |
+
local out="$1"
|
| 62 |
+
log " recon β $out"
|
| 63 |
+
: > "$out"
|
| 64 |
+
|
| 65 |
+
# HN top stories β Algolia public API, no auth needed
|
| 66 |
+
curl -fsS --max-time 15 \
|
| 67 |
+
"https://hn.algolia.com/api/v1/search?tags=story&numericFilters=points>50&hitsPerPage=$RECON_LIMIT" \
|
| 68 |
+
2>/dev/null | python3 -c "
|
| 69 |
+
import json, sys
|
| 70 |
+
try: d = json.load(sys.stdin)
|
| 71 |
+
except: sys.exit(0)
|
| 72 |
+
for h in d.get('hits', []):
|
| 73 |
+
print(json.dumps({'src':'hn','title':h.get('title',''),'url':h.get('url',''),
|
| 74 |
+
'score':h.get('points',0),'ts':h.get('created_at','')}))
|
| 75 |
+
" 2>/dev/null >> "$out"
|
| 76 |
+
|
| 77 |
+
# GitHub trending β no official API, scrape via /trending
|
| 78 |
+
curl -fsS --max-time 20 \
|
| 79 |
+
"https://github.com/trending?since=daily&spoken_language_code=en" 2>/dev/null \
|
| 80 |
+
| python3 -c "
|
| 81 |
+
import sys, re, json
|
| 82 |
+
html = sys.stdin.read()
|
| 83 |
+
# very light extractor β avoid pulling beautifulsoup just for this
|
| 84 |
+
for m in re.finditer(r'<h2 class=\"h3 lh-condensed\">\s*<a href=\"([^\"]+)\"', html):
|
| 85 |
+
repo = m.group(1).lstrip('/')
|
| 86 |
+
print(json.dumps({'src':'gh-trending','title':repo,'url':'https://github.com/'+repo,
|
| 87 |
+
'score':1,'ts':''}))
|
| 88 |
+
" 2>/dev/null | head -n 30 >> "$out"
|
| 89 |
+
|
| 90 |
+
# ProductHunt β public RSS-ish endpoint
|
| 91 |
+
curl -fsS --max-time 15 \
|
| 92 |
+
"https://www.producthunt.com/feed" 2>/dev/null \
|
| 93 |
+
| python3 -c "
|
| 94 |
+
import sys, re, json
|
| 95 |
+
xml = sys.stdin.read()
|
| 96 |
+
for m in re.finditer(r'<title>([^<]+)</title>\s*<link>([^<]+)</link>', xml)[:30] or []:
|
| 97 |
+
print(json.dumps({'src':'producthunt','title':m.group(1),'url':m.group(2),'score':1,'ts':''}))
|
| 98 |
+
" 2>/dev/null >> "$out" || true
|
| 99 |
+
|
| 100 |
+
local n; n=$(wc -l < "$out" | tr -d ' ')
|
| 101 |
+
log " collected $n signals"
|
| 102 |
+
}
|
| 103 |
+
|
| 104 |
+
# ββ Gap analysis: cluster signals by keyword overlap ββββββββββββββββββββββββ
|
| 105 |
+
gap_analysis() {
|
| 106 |
+
local recon_in="$1" gaps_out="$2"
|
| 107 |
+
python3 - <<PYEOF
|
| 108 |
+
import json, re, collections
|
| 109 |
+
from pathlib import Path
|
| 110 |
+
|
| 111 |
+
# Owner-relevant keywords β bias the funnel toward what Surrogate-1 cares about
|
| 112 |
+
OWNER_KW = {
|
| 113 |
+
"agent","agentic","autonomous","llm","fine-tune","lora","peft",
|
| 114 |
+
"dpo","grpo","rlhf","rlaif","sft","quantization","bitsandbytes",
|
| 115 |
+
"vllm","sglang","tgi","inference","kubernetes","k8s","helm",
|
| 116 |
+
"terraform","cloudformation","aws","prowler","cspm","sre",
|
| 117 |
+
"incident","oncall","postmortem","observability","prometheus",
|
| 118 |
+
"opentelemetry","loki","grafana","argo","gitops","cicd",
|
| 119 |
+
"security","cve","cwe","sbom","slsa","supply-chain","gitleaks",
|
| 120 |
+
"semgrep","sast","dast","mcp","computer-use","tool-use","agent-bench"
|
| 121 |
+
}
|
| 122 |
+
sigs = []
|
| 123 |
+
for L in open("$recon_in"):
|
| 124 |
+
try: sigs.append(json.loads(L))
|
| 125 |
+
except: pass
|
| 126 |
+
|
| 127 |
+
# tokenize titles, score by owner-kw overlap
|
| 128 |
+
def toks(s):
|
| 129 |
+
return set(t.lower() for t in re.findall(r"[a-zA-Z][a-zA-Z0-9-]+", s or ""))
|
| 130 |
+
|
| 131 |
+
clusters = collections.defaultdict(list)
|
| 132 |
+
for s in sigs:
|
| 133 |
+
t = toks(s.get("title", ""))
|
| 134 |
+
overlap = t & OWNER_KW
|
| 135 |
+
if not overlap:
|
| 136 |
+
continue
|
| 137 |
+
# bucket by sorted overlap as cluster key
|
| 138 |
+
key = "+".join(sorted(overlap)[:3])
|
| 139 |
+
clusters[key].append(s)
|
| 140 |
+
|
| 141 |
+
gaps = []
|
| 142 |
+
for key, items in clusters.items():
|
| 143 |
+
n_sources = len({i["src"] for i in items})
|
| 144 |
+
if n_sources >= $GAP_FREQ_THRESHOLD or len(items) >= 5:
|
| 145 |
+
gaps.append({
|
| 146 |
+
"topic": key,
|
| 147 |
+
"n_signals": len(items),
|
| 148 |
+
"n_sources": n_sources,
|
| 149 |
+
"examples": [{"title": i["title"], "url": i["url"]} for i in items[:5]],
|
| 150 |
+
})
|
| 151 |
+
|
| 152 |
+
gaps.sort(key=lambda g: (g["n_sources"], g["n_signals"]), reverse=True)
|
| 153 |
+
gaps = gaps[:5] # cap at top 5 per cycle
|
| 154 |
+
|
| 155 |
+
with open("$gaps_out", "w") as f:
|
| 156 |
+
json.dump(gaps, f, indent=2)
|
| 157 |
+
|
| 158 |
+
print(f" β {len(gaps)} gaps identified")
|
| 159 |
+
PYEOF
|
| 160 |
+
}
|
| 161 |
+
|
| 162 |
+
# ββ Build spec.md from a gap ββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 163 |
+
build_spec() {
|
| 164 |
+
local gap_json="$1" spec_out="$2"
|
| 165 |
+
local work; work=$(mktemp -d)
|
| 166 |
+
cat > "$work/prompt.md" <<EOF
|
| 167 |
+
You are Surrogate-1 in autonomous-release mode. A market signal cluster has
|
| 168 |
+
crossed threshold. Synthesize a Spec-Driven-Development spec for ONE
|
| 169 |
+
feature Surrogate-1 itself should ship β must be a small self-improvement
|
| 170 |
+
to the Surrogate-1 platform (training scripts, daemons, evals, dataset
|
| 171 |
+
quality tooling, etc.). Out of scope: external customer features, anything
|
| 172 |
+
needing payment/PII/user data.
|
| 173 |
+
|
| 174 |
+
Signal cluster:
|
| 175 |
+
\`\`\`json
|
| 176 |
+
$(cat "$gap_json")
|
| 177 |
+
\`\`\`
|
| 178 |
+
|
| 179 |
+
Owner constraints:
|
| 180 |
+
- Diff target β€600 lines / β€12 files
|
| 181 |
+
- Must include tests
|
| 182 |
+
- Must benefit at least one of: HumanEval+/MBPP+/LCB v6/SWE-Bench/BFCL/axentx-eval-50
|
| 183 |
+
OR the autonomous-{sre,release,improve} daemons.
|
| 184 |
+
- Must be reversible (rollback step required)
|
| 185 |
+
|
| 186 |
+
Output ONLY this JSON schema:
|
| 187 |
+
{
|
| 188 |
+
"title": "<3-7 word feature name>",
|
| 189 |
+
"problem": "<paragraph: what's missing today>",
|
| 190 |
+
"user_stories": ["As Surrogate-1, I want X so that Y", ...],
|
| 191 |
+
"acceptance_criteria": ["Bench score Z improves by β₯N%", ...],
|
| 192 |
+
"impact": "<expected metric uplift, citable>",
|
| 193 |
+
"competitors_observed": "<who is doing this elsewhere β from signal cluster>",
|
| 194 |
+
"out_of_scope": ["...","..."],
|
| 195 |
+
"rollout_plan": "<canary β promote, with SLO gate>",
|
| 196 |
+
"confidence": 0.0-1.0
|
| 197 |
+
}
|
| 198 |
+
EOF
|
| 199 |
+
python3 "$HFB/surrogate-call.py" --space "$SPACE" \
|
| 200 |
+
--prompt-file "$work/prompt.md" --schema spec \
|
| 201 |
+
--max-tokens 1500 --temperature 0.3 --out "$spec_out"
|
| 202 |
+
local rc=$?
|
| 203 |
+
rm -rf "$work"
|
| 204 |
+
return $rc
|
| 205 |
+
}
|
| 206 |
+
|
| 207 |
+
# ββ Build patch candidates with CISC self-consistency βββββββββββββββββββββββ
|
| 208 |
+
build_patch_cisc() {
|
| 209 |
+
local spec_path="$1" out_dir="$2"
|
| 210 |
+
mkdir -p "$out_dir"
|
| 211 |
+
local prompt; prompt=$(mktemp)
|
| 212 |
+
cat > "$prompt" <<EOF
|
| 213 |
+
You are Surrogate-1. Implement the following spec. Produce a unified diff
|
| 214 |
+
+ test file. Diff must apply cleanly via \`patch -p1\`.
|
| 215 |
+
|
| 216 |
+
Spec:
|
| 217 |
+
\`\`\`json
|
| 218 |
+
$(cat "$spec_path")
|
| 219 |
+
\`\`\`
|
| 220 |
+
|
| 221 |
+
Hard rules:
|
| 222 |
+
- Modify only files under \$HOME/.surrogate/hf-space/ or under axentx
|
| 223 |
+
repos cloned into \$HOME/develope/.
|
| 224 |
+
- Include or extend tests under tests/v2/ matching the changed file.
|
| 225 |
+
- No new top-level dependency without justification in the diff.
|
| 226 |
+
- Diff under 600 lines / 12 files.
|
| 227 |
+
|
| 228 |
+
Output ONLY this JSON schema:
|
| 229 |
+
{
|
| 230 |
+
"target_file": "<primary file path>",
|
| 231 |
+
"kind": "code"|"iac"|"shell",
|
| 232 |
+
"patch": "<unified diff text>",
|
| 233 |
+
"test_plan": "<commands to verify post-apply>",
|
| 234 |
+
"rollback": "<git revert <sha> or patch -R>",
|
| 235 |
+
"confidence": 0.0-1.0
|
| 236 |
+
}
|
| 237 |
+
EOF
|
| 238 |
+
for i in $(seq 1 $CISC_N); do
|
| 239 |
+
log " CISC candidate $i/$CISC_N"
|
| 240 |
+
# vary temperature for diversity
|
| 241 |
+
local T; T=$(python3 -c "print(round(0.2 + 0.15*$i, 2))")
|
| 242 |
+
python3 "$HFB/surrogate-call.py" --space "$SPACE" \
|
| 243 |
+
--prompt-file "$prompt" --schema patch \
|
| 244 |
+
--max-tokens 2000 --temperature "$T" \
|
| 245 |
+
--out "$out_dir/cand-$i.json" 2>>"$LOG" || \
|
| 246 |
+
log " cand-$i failed (continuing)"
|
| 247 |
+
done
|
| 248 |
+
rm -f "$prompt"
|
| 249 |
+
ls "$out_dir"/cand-*.json 2>/dev/null | wc -l | tr -d ' '
|
| 250 |
+
}
|
| 251 |
+
|
| 252 |
+
# ββ Vote: pick best candidate by verifier verdict + confidence ββββββββββββββ
|
| 253 |
+
pick_winner() {
|
| 254 |
+
local cand_dir="$1" winner_out="$2"
|
| 255 |
+
local best="" best_score=-1
|
| 256 |
+
for c in "$cand_dir"/cand-*.json; do
|
| 257 |
+
[[ -f "$c" ]] || continue
|
| 258 |
+
local target patch kind conf
|
| 259 |
+
target=$(python3 -c "import json; print(json.load(open('$c')).get('target_file',''))")
|
| 260 |
+
kind=$(python3 -c "import json; print(json.load(open('$c')).get('kind','code'))")
|
| 261 |
+
conf=$(python3 -c "import json; print(json.load(open('$c')).get('confidence',0))")
|
| 262 |
+
python3 -c "import json,sys; sys.stdout.write(json.load(open('$c')).get('patch',''))" > "$cand_dir/$(basename "$c" .json).patch"
|
| 263 |
+
|
| 264 |
+
local verdict_path="$cand_dir/$(basename "$c" .json).verdict.json"
|
| 265 |
+
python3 "$HFB/verifier-ensemble.py" \
|
| 266 |
+
--change "$cand_dir/$(basename "$c" .json).patch" \
|
| 267 |
+
--target "$target" --kind "$kind" --confidence "$conf" \
|
| 268 |
+
--out "$verdict_path" >/dev/null 2>&1 || true
|
| 269 |
+
|
| 270 |
+
local ok npass
|
| 271 |
+
ok=$(python3 -c "import json; print(json.load(open('$verdict_path')).get('ok',False))" 2>/dev/null || echo False)
|
| 272 |
+
npass=$(python3 -c "import json; print(json.load(open('$verdict_path')).get('n_pass',0))" 2>/dev/null || echo 0)
|
| 273 |
+
local score
|
| 274 |
+
score=$(python3 -c "print(int($npass) + (10 if '$ok'=='True' else 0) + float($conf))")
|
| 275 |
+
log " cand=$(basename "$c") ok=$ok pass=$npass conf=$conf β score=$score"
|
| 276 |
+
if (( $(python3 -c "print(1 if $score > $best_score else 0)") )); then
|
| 277 |
+
best="$c"; best_score=$score
|
| 278 |
+
fi
|
| 279 |
+
done
|
| 280 |
+
if [[ -n "$best" ]]; then
|
| 281 |
+
cp "$best" "$winner_out"
|
| 282 |
+
cp "$cand_dir/$(basename "$best" .json).verdict.json" "${winner_out%.json}.verdict.json"
|
| 283 |
+
log " winner=$(basename "$best") score=$best_score"
|
| 284 |
+
return 0
|
| 285 |
+
fi
|
| 286 |
+
return 1
|
| 287 |
+
}
|
| 288 |
+
|
| 289 |
+
# ββ Sweep βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 290 |
+
sweep() {
|
| 291 |
+
local ts; ts=$(date -u +%Y%m%dT%H%M%SZ)
|
| 292 |
+
local cycle="$STATE/release-$ts"
|
| 293 |
+
mkdir -p "$cycle"
|
| 294 |
+
log "βββ release sweep $ts βββ"
|
| 295 |
+
|
| 296 |
+
recon "$cycle/recon.jsonl"
|
| 297 |
+
gap_analysis "$cycle/recon.jsonl" "$cycle/gaps.json"
|
| 298 |
+
|
| 299 |
+
local n_gaps
|
| 300 |
+
n_gaps=$(python3 -c "import json; print(len(json.load(open('$cycle/gaps.json'))))")
|
| 301 |
+
if (( n_gaps == 0 )); then
|
| 302 |
+
log " no gaps above threshold β skipping cycle"
|
| 303 |
+
return 0
|
| 304 |
+
fi
|
| 305 |
+
|
| 306 |
+
# Process top gap only this cycle (avoid PR flood)
|
| 307 |
+
python3 -c "
|
| 308 |
+
import json
|
| 309 |
+
g = json.load(open('$cycle/gaps.json'))[0]
|
| 310 |
+
json.dump(g, open('$cycle/top-gap.json', 'w'))
|
| 311 |
+
print(g['topic'])
|
| 312 |
+
" | while read -r topic; do
|
| 313 |
+
log " top gap: $topic"
|
| 314 |
+
local spec_path="$cycle/spec.json"
|
| 315 |
+
if ! build_spec "$cycle/top-gap.json" "$spec_path"; then
|
| 316 |
+
log " spec build failed β skipping"
|
| 317 |
+
continue
|
| 318 |
+
fi
|
| 319 |
+
local title
|
| 320 |
+
title=$(python3 -c "import json; print(json.load(open('$spec_path')).get('title','untitled'))")
|
| 321 |
+
log " spec: $title"
|
| 322 |
+
|
| 323 |
+
local cand_dir="$cycle/candidates"
|
| 324 |
+
local n_cand
|
| 325 |
+
n_cand=$(build_patch_cisc "$spec_path" "$cand_dir")
|
| 326 |
+
log " built $n_cand patch candidates (target $CISC_N)"
|
| 327 |
+
if (( n_cand == 0 )); then
|
| 328 |
+
python3 "$HFB/outcome-log.py" --daemon release --trigger "gap:$topic" \
|
| 329 |
+
--anomaly "$cycle/top-gap.json" --response "$spec_path" \
|
| 330 |
+
--applied false --outcome error \
|
| 331 |
+
--lesson "no patch candidates produced" || true
|
| 332 |
+
continue
|
| 333 |
+
fi
|
| 334 |
+
|
| 335 |
+
if pick_winner "$cand_dir" "$cycle/winner.json"; then
|
| 336 |
+
local ok
|
| 337 |
+
ok=$(python3 -c "import json; print(json.load(open('$cycle/winner.verdict.json')).get('ok',False))")
|
| 338 |
+
if [[ "$ok" == "True" ]]; then
|
| 339 |
+
log " β opening draft PR"
|
| 340 |
+
open_draft_pr "$cycle"
|
| 341 |
+
else
|
| 342 |
+
log " winner failed verifier β queueing"
|
| 343 |
+
python3 "$HFB/outcome-log.py" --daemon release --trigger "gap:$topic" \
|
| 344 |
+
--anomaly "$cycle/top-gap.json" --response "$spec_path" \
|
| 345 |
+
--verdict "$cycle/winner.verdict.json" \
|
| 346 |
+
--applied false --outcome queued \
|
| 347 |
+
--lesson "best candidate still failed verifier" || true
|
| 348 |
+
fi
|
| 349 |
+
else
|
| 350 |
+
log " no winner β all candidates failed"
|
| 351 |
+
fi
|
| 352 |
+
done
|
| 353 |
+
|
| 354 |
+
log "βββ sweep done βββ"
|
| 355 |
+
}
|
| 356 |
+
|
| 357 |
+
# ββ Open draft PR (gh CLI required) βββββββββββββββββββββββββββββββββββββββββ
|
| 358 |
+
open_draft_pr() {
|
| 359 |
+
local cycle="$1"
|
| 360 |
+
if ! command -v gh >/dev/null 2>&1; then
|
| 361 |
+
log " gh CLI missing β queueing instead of PR"
|
| 362 |
+
python3 "$HFB/outcome-log.py" --daemon release \
|
| 363 |
+
--trigger "release_cycle" \
|
| 364 |
+
--response "$cycle/winner.json" \
|
| 365 |
+
--applied false --outcome queued \
|
| 366 |
+
--lesson "gh CLI not installed" || true
|
| 367 |
+
return 1
|
| 368 |
+
fi
|
| 369 |
+
|
| 370 |
+
local target_repo="${REPOS[0]}"
|
| 371 |
+
local target_file patch_file branch
|
| 372 |
+
target_file=$(python3 -c "import json; print(json.load(open('$cycle/winner.json'))['target_file'])")
|
| 373 |
+
patch_file="$cycle/$(ls "$cycle/candidates"/*.patch 2>/dev/null | head -1 | xargs -n1 basename)"
|
| 374 |
+
branch="auto/release-$(date -u +%Y%m%d-%H%M)"
|
| 375 |
+
|
| 376 |
+
# Clone if not present
|
| 377 |
+
local clone_dir="$STATE/repos/$(basename "$target_repo")"
|
| 378 |
+
if [[ ! -d "$clone_dir/.git" ]]; then
|
| 379 |
+
gh repo clone "$target_repo" "$clone_dir" 2>>"$LOG" || {
|
| 380 |
+
log " clone failed for $target_repo"
|
| 381 |
+
return 1
|
| 382 |
+
}
|
| 383 |
+
fi
|
| 384 |
+
|
| 385 |
+
( cd "$clone_dir"
|
| 386 |
+
git fetch origin main 2>>"$LOG"
|
| 387 |
+
git checkout -B "$branch" origin/main 2>>"$LOG"
|
| 388 |
+
patch -p1 < "$patch_file" 2>>"$LOG" || { log " patch apply failed"; exit 1; }
|
| 389 |
+
git add -A
|
| 390 |
+
git commit -m "auto-release: $(python3 -c "import json; print(json.load(open('$cycle/winner.json')).get('target_file',''))")
|
| 391 |
+
auto-generated by autonomous-release.sh
|
| 392 |
+
spec=$cycle/spec.json
|
| 393 |
+
verdict=$cycle/winner.verdict.json"
|
| 394 |
+
git push -u origin "$branch" 2>>"$LOG"
|
| 395 |
+
gh pr create --draft --title "[auto-release] $(python3 -c "import json; print(json.load(open('$cycle/spec.json')).get('title',''))")" \
|
| 396 |
+
--body "Autonomous release.
|
| 397 |
+
|
| 398 |
+
**Spec**: see \`$cycle/spec.json\`
|
| 399 |
+
**Verdict**: see \`$cycle/winner.verdict.json\`
|
| 400 |
+
|
| 401 |
+
This PR was generated by Surrogate-1 autonomous-release daemon. It is a DRAFT β promote to ready-for-review only after CI passes and a human eyeballs the diff." \
|
| 402 |
+
--label "autonomous-release" 2>&1 | tee -a "$LOG"
|
| 403 |
+
) || true
|
| 404 |
+
|
| 405 |
+
python3 "$HFB/outcome-log.py" --daemon release \
|
| 406 |
+
--trigger "release_cycle" \
|
| 407 |
+
--anomaly "$cycle/top-gap.json" \
|
| 408 |
+
--response "$cycle/winner.json" \
|
| 409 |
+
--verdict "$cycle/winner.verdict.json" \
|
| 410 |
+
--applied true --outcome success \
|
| 411 |
+
--lesson "draft PR opened on $branch" || true
|
| 412 |
+
notify "draft PR opened on $target_repo / $branch"
|
| 413 |
+
}
|
| 414 |
+
|
| 415 |
+
if (( ONCE )); then
|
| 416 |
+
sweep
|
| 417 |
+
exit 0
|
| 418 |
+
fi
|
| 419 |
+
|
| 420 |
+
log "βββ autonomous-release starting (interval=${INTERVAL_SEC}s) βββ"
|
| 421 |
+
notify "online β interval ${INTERVAL_SEC}s"
|
| 422 |
+
while true; do
|
| 423 |
+
sweep
|
| 424 |
+
sleep "$INTERVAL_SEC"
|
| 425 |
+
done
|
|
@@ -0,0 +1,346 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
# Surrogate-1 β autonomous SRE daemon.
|
| 3 |
+
#
|
| 4 |
+
# 24Γ7 monitors infra Surrogate-1 itself owns or operates against, and
|
| 5 |
+
# tries to auto-heal incidents. Every candidate action passes through
|
| 6 |
+
# verifier-ensemble.py β anything that fails verification is QUEUED, never
|
| 7 |
+
# applied. The whole turn (anomaly β diagnosis β verdict β apply/queue β
|
| 8 |
+
# metric_after) is logged to outcomes.jsonl so self-improve.sh can build
|
| 9 |
+
# the next round's training data.
|
| 10 |
+
#
|
| 11 |
+
# Probe targets (all read-only by default):
|
| 12 |
+
# 1. HF Spaces health β runtime.stage / errorMessage
|
| 13 |
+
# 2. HF Datasets growth β last commit age (pipeline staleness)
|
| 14 |
+
# 3. ZeroGPU smoke test β small generation request
|
| 15 |
+
# 4. Kaggle kernel state β only if KAGGLE_KEY env is fresh
|
| 16 |
+
# 5. AWS via aws-cli β only if AWS_PROFILE set + Excise stack
|
| 17 |
+
# 6. GH Actions runs β `gh run list` for axentx orgs
|
| 18 |
+
#
|
| 19 |
+
# Auto-fix scope (whitelist of safe actions):
|
| 20 |
+
# - factory_reboot a stuck HF Space
|
| 21 |
+
# - re-trigger a failed GH workflow run
|
| 22 |
+
# - update a Space env var (already supported via swap-zerogpu-lora.sh)
|
| 23 |
+
# - apply a small (<300 line) diff to a file in $HOME/.surrogate/* if
|
| 24 |
+
# verifier-ensemble passes ALL checks
|
| 25 |
+
#
|
| 26 |
+
# Anything else β queued to ~/.surrogate/state/queue/<ts>.json for
|
| 27 |
+
# operator review. Refused-by-policy actions are LOGGED but never queued.
|
| 28 |
+
#
|
| 29 |
+
# Usage (long-lived daemon):
|
| 30 |
+
# nohup bash bin/v2/autonomous-sre.sh \
|
| 31 |
+
# > $HOME/.surrogate/logs/autonomous-sre.log 2>&1 &
|
| 32 |
+
#
|
| 33 |
+
# Or via cron every 5 min:
|
| 34 |
+
# */5 * * * * bash $HOME/.surrogate/hf-space/bin/v2/autonomous-sre.sh --once
|
| 35 |
+
set -uo pipefail
|
| 36 |
+
[[ -f "$HOME/.hermes/.env" ]] && { set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a; }
|
| 37 |
+
|
| 38 |
+
HFB="$HOME/.surrogate/hf-space/bin/v2"
|
| 39 |
+
STATE="$HOME/.surrogate/state"
|
| 40 |
+
QUEUE="$STATE/queue"
|
| 41 |
+
LOG="$HOME/.surrogate/logs/autonomous-sre.log"
|
| 42 |
+
mkdir -p "$STATE" "$QUEUE" "$(dirname "$LOG")"
|
| 43 |
+
|
| 44 |
+
ONCE=0
|
| 45 |
+
[[ "${1:-}" == "--once" ]] && ONCE=1
|
| 46 |
+
INTERVAL_SEC="${SRE_INTERVAL_SEC:-300}" # 5 min between full sweeps
|
| 47 |
+
SPACE_PRIMARY="${SRE_SPACE_PRIMARY:-surrogate1/surrogate-1-zero-gpu}"
|
| 48 |
+
SPACE_SECONDARY="${SRE_SPACE_SECONDARY:-ashirato/surrogate-1-zero-gpu}"
|
| 49 |
+
DATASETS=(${SRE_DATASETS:-axentx/surrogate-1-pairs axentx/surrogate-1-pairs-shard1 axentx/surrogate-1-pairs-shard2 axentx/surrogate-1-pairs-shard3 axentx/surrogate-1-pairs-shard4})
|
| 50 |
+
DATASET_STALE_HOURS="${SRE_DATASET_STALE_H:-3}"
|
| 51 |
+
|
| 52 |
+
log() { echo "[$(date '+%Y-%m-%dT%H:%M:%S')] $*" | tee -a "$LOG"; }
|
| 53 |
+
notify() {
|
| 54 |
+
[[ -z "${DISCORD_WEBHOOK:-}" ]] && return
|
| 55 |
+
curl -s -X POST -H "Content-Type: application/json" \
|
| 56 |
+
-d "{\"content\":\"π‘οΈ autonomous-sre: $1\"}" \
|
| 57 |
+
"$DISCORD_WEBHOOK" >/dev/null 2>&1 || true
|
| 58 |
+
}
|
| 59 |
+
|
| 60 |
+
# ββ Single shared call to record an anomaly + decide ββββββββββββββββββββββββ
|
| 61 |
+
handle_anomaly() {
|
| 62 |
+
local trigger="$1" anomaly_json="$2"
|
| 63 |
+
local ts; ts=$(date -u +%Y%m%dT%H%M%SZ)
|
| 64 |
+
local work; work=$(mktemp -d "$STATE/sre-$ts-XXXX")
|
| 65 |
+
|
| 66 |
+
echo "$anomaly_json" > "$work/anomaly.json"
|
| 67 |
+
|
| 68 |
+
# Build diagnosis prompt with last 3 outcomes for context
|
| 69 |
+
local recent
|
| 70 |
+
recent=$(tail -n 3 "$STATE/outcomes.jsonl" 2>/dev/null \
|
| 71 |
+
| python3 -c "import sys,json
|
| 72 |
+
for L in sys.stdin:
|
| 73 |
+
try: r = json.loads(L)
|
| 74 |
+
except: continue
|
| 75 |
+
print(f\"- {r['ts']} {r['daemon']}/{r['trigger']} β {r['outcome']}\")
|
| 76 |
+
" 2>/dev/null || echo " (none)")
|
| 77 |
+
|
| 78 |
+
cat > "$work/prompt.md" <<EOF
|
| 79 |
+
You are Surrogate-1 in SRE auto-heal mode. An anomaly has been detected by
|
| 80 |
+
the autonomous-sre daemon. Diagnose root cause and propose ONE specific
|
| 81 |
+
fix, OR explicitly say "fix_kind": "none" if you're <70% confident.
|
| 82 |
+
|
| 83 |
+
Trigger: $trigger
|
| 84 |
+
|
| 85 |
+
Anomaly details (JSON):
|
| 86 |
+
\`\`\`json
|
| 87 |
+
$anomaly_json
|
| 88 |
+
\`\`\`
|
| 89 |
+
|
| 90 |
+
Recent outcomes (last 3):
|
| 91 |
+
$recent
|
| 92 |
+
|
| 93 |
+
Hard constraints:
|
| 94 |
+
- Only propose fixes for systems Surrogate-1 owns: HF Spaces under
|
| 95 |
+
surrogate1/* + ashirato/* + axentx/*, HF datasets under axentx/*,
|
| 96 |
+
GH workflows in axentx repos. Refuse any AWS/prod/customer system.
|
| 97 |
+
- Diff must be <300 lines, β€8 files.
|
| 98 |
+
- No destructive operations (rm -rf, DROP, kubectl delete ns, IAM \\*:\\*).
|
| 99 |
+
- If the fix is "factory_reboot Space X" β set fix_kind=shell with
|
| 100 |
+
patch=\`bash $HFB/swap-zerogpu-lora.sh AXENTX/<lora> ONLY=<name>\` style.
|
| 101 |
+
|
| 102 |
+
Respond ONLY with this JSON schema:
|
| 103 |
+
{
|
| 104 |
+
"diagnosis": "<one-paragraph root cause>",
|
| 105 |
+
"fix_kind": "code" | "iac" | "shell" | "sql" | "none",
|
| 106 |
+
"target_file": "<absolute path, or empty if shell-only>",
|
| 107 |
+
"patch": "<unified diff or shell command>",
|
| 108 |
+
"rollback": "<how to undo>",
|
| 109 |
+
"test_plan": "<how we'll know it worked>",
|
| 110 |
+
"confidence": 0.0-1.0
|
| 111 |
+
}
|
| 112 |
+
EOF
|
| 113 |
+
|
| 114 |
+
log " β calling Surrogate for diagnosis ($trigger)"
|
| 115 |
+
if ! python3 "$HFB/surrogate-call.py" \
|
| 116 |
+
--space "$SPACE_PRIMARY" \
|
| 117 |
+
--prompt-file "$work/prompt.md" \
|
| 118 |
+
--schema diagnosis \
|
| 119 |
+
--max-tokens 1200 --temperature 0.15 \
|
| 120 |
+
--out "$work/response.json" 2>"$work/call.err"; then
|
| 121 |
+
log " β surrogate-call failed: $(cat "$work/call.err" | head -c 200)"
|
| 122 |
+
python3 "$HFB/outcome-log.py" --daemon sre --trigger "$trigger" \
|
| 123 |
+
--anomaly "$work/anomaly.json" --prompt "$work/prompt.md" \
|
| 124 |
+
--applied false --outcome error \
|
| 125 |
+
--lesson "endpoint unavailable for diagnosis" || true
|
| 126 |
+
return 1
|
| 127 |
+
fi
|
| 128 |
+
|
| 129 |
+
local fix_kind conf target patch
|
| 130 |
+
fix_kind=$(python3 -c "import json; print(json.load(open('$work/response.json')).get('fix_kind','none'))")
|
| 131 |
+
conf=$(python3 -c "import json; print(json.load(open('$work/response.json')).get('confidence',0))")
|
| 132 |
+
target=$(python3 -c "import json; print(json.load(open('$work/response.json')).get('target_file','') or '')")
|
| 133 |
+
patch=$(python3 -c "import json,sys; sys.stdout.write(json.load(open('$work/response.json')).get('patch',''))")
|
| 134 |
+
|
| 135 |
+
log " diagnosis: fix_kind=$fix_kind confidence=$conf target=$target"
|
| 136 |
+
|
| 137 |
+
if [[ "$fix_kind" == "none" ]] || [[ -z "$patch" ]]; then
|
| 138 |
+
log " Surrogate declined to act ($fix_kind / empty patch) β recording, no apply"
|
| 139 |
+
python3 "$HFB/outcome-log.py" --daemon sre --trigger "$trigger" \
|
| 140 |
+
--anomaly "$work/anomaly.json" --prompt "$work/prompt.md" \
|
| 141 |
+
--response "$work/response.json" --applied false --outcome rejected \
|
| 142 |
+
--lesson "model declined low-confidence" || true
|
| 143 |
+
return 0
|
| 144 |
+
fi
|
| 145 |
+
|
| 146 |
+
# Write patch to file for verifier
|
| 147 |
+
echo "$patch" > "$work/patch.txt"
|
| 148 |
+
[[ -z "$target" ]] && target="$work/patch.txt"
|
| 149 |
+
|
| 150 |
+
# Idempotency check β if same patch was applied <4 h ago, skip
|
| 151 |
+
if python3 "$HFB/idempotency.py" check --plan "$work/patch.txt" --ttl-hours 4 \
|
| 152 |
+
>"$work/idem.json" 2>/dev/null; then
|
| 153 |
+
log " idempotent skip β same patch applied recently"
|
| 154 |
+
python3 "$HFB/outcome-log.py" --daemon sre --trigger "$trigger" \
|
| 155 |
+
--anomaly "$work/anomaly.json" --response "$work/response.json" \
|
| 156 |
+
--applied false --outcome rejected \
|
| 157 |
+
--lesson "idempotent: $(python3 -c "import json; print(json.load(open('$work/idem.json'))['key'][:12])")" || true
|
| 158 |
+
return 0
|
| 159 |
+
fi
|
| 160 |
+
|
| 161 |
+
log " β verifier-ensemble"
|
| 162 |
+
local vrc=0
|
| 163 |
+
python3 "$HFB/verifier-ensemble.py" \
|
| 164 |
+
--change "$work/patch.txt" --target "$target" --kind "$fix_kind" \
|
| 165 |
+
--confidence "$conf" --out "$work/verdict.json" >/dev/null || vrc=$?
|
| 166 |
+
|
| 167 |
+
local verdict_ok
|
| 168 |
+
verdict_ok=$(python3 -c "import json; print(json.load(open('$work/verdict.json')).get('ok',False))")
|
| 169 |
+
|
| 170 |
+
if [[ "$verdict_ok" != "True" ]]; then
|
| 171 |
+
log " verdict: REJECTED β queueing for review"
|
| 172 |
+
cp -r "$work" "$QUEUE/$(basename "$work")"
|
| 173 |
+
python3 "$HFB/outcome-log.py" --daemon sre --trigger "$trigger" \
|
| 174 |
+
--anomaly "$work/anomaly.json" --prompt "$work/prompt.md" \
|
| 175 |
+
--response "$work/response.json" --verdict "$work/verdict.json" \
|
| 176 |
+
--applied false --outcome queued \
|
| 177 |
+
--lesson "verifier rejected β manual review" || true
|
| 178 |
+
notify "queued $trigger ($(python3 -c "import json; print(', '.join(json.load(open('$work/verdict.json')).get('reasons',[])[:2]))"))"
|
| 179 |
+
return 0
|
| 180 |
+
fi
|
| 181 |
+
|
| 182 |
+
log " verdict: SAFE β applying"
|
| 183 |
+
local apply_rc=0
|
| 184 |
+
if [[ "$fix_kind" == "shell" ]]; then
|
| 185 |
+
bash -c "$patch" 2>&1 | tee "$work/apply.log" || apply_rc=$?
|
| 186 |
+
elif [[ "$fix_kind" == "code" || "$fix_kind" == "iac" ]] && [[ -f "$target" ]]; then
|
| 187 |
+
# apply unified diff
|
| 188 |
+
( cd "$(dirname "$target")" && patch -p1 --dry-run < "$work/patch.txt" \
|
| 189 |
+
&& patch -p1 < "$work/patch.txt" ) 2>&1 | tee "$work/apply.log" \
|
| 190 |
+
|| apply_rc=$?
|
| 191 |
+
else
|
| 192 |
+
apply_rc=99
|
| 193 |
+
echo "no apply path for fix_kind=$fix_kind target=$target" > "$work/apply.log"
|
| 194 |
+
fi
|
| 195 |
+
|
| 196 |
+
if [[ $apply_rc -eq 0 ]]; then
|
| 197 |
+
log " β applied β capturing metric_after"
|
| 198 |
+
sleep 5
|
| 199 |
+
python3 "$HFB/idempotency.py" record --plan "$work/patch.txt" \
|
| 200 |
+
--daemon sre --outcome applied >/dev/null 2>&1 || true
|
| 201 |
+
python3 "$HFB/outcome-log.py" --daemon sre --trigger "$trigger" \
|
| 202 |
+
--anomaly "$work/anomaly.json" --prompt "$work/prompt.md" \
|
| 203 |
+
--response "$work/response.json" --verdict "$work/verdict.json" \
|
| 204 |
+
--applied true --outcome success \
|
| 205 |
+
--lesson "auto-heal worked first try" || true
|
| 206 |
+
notify "auto-healed $trigger (confidence=$conf)"
|
| 207 |
+
else
|
| 208 |
+
log " β apply failed rc=$apply_rc β rolling back"
|
| 209 |
+
# best-effort rollback hint logged but not auto-applied
|
| 210 |
+
python3 "$HFB/outcome-log.py" --daemon sre --trigger "$trigger" \
|
| 211 |
+
--anomaly "$work/anomaly.json" --prompt "$work/prompt.md" \
|
| 212 |
+
--response "$work/response.json" --verdict "$work/verdict.json" \
|
| 213 |
+
--applied true --outcome rollback \
|
| 214 |
+
--lesson "apply rc=$apply_rc; rollback:$(python3 -c "import json; print(json.load(open('$work/response.json')).get('rollback','none')[:80])")" || true
|
| 215 |
+
notify "ROLLBACK $trigger rc=$apply_rc"
|
| 216 |
+
fi
|
| 217 |
+
}
|
| 218 |
+
|
| 219 |
+
# οΏ½οΏ½β Probe 1: HF Space health ββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 220 |
+
probe_space() {
|
| 221 |
+
local space="$1"
|
| 222 |
+
local resp; resp=$(curl -fsS --max-time 15 \
|
| 223 |
+
${HF_TOKEN:+-H "Authorization: Bearer $HF_TOKEN"} \
|
| 224 |
+
"https://huggingface.co/api/spaces/$space" 2>/dev/null) || return 0
|
| 225 |
+
local stage err
|
| 226 |
+
stage=$(echo "$resp" | python3 -c "import json,sys; print(json.load(sys.stdin).get('runtime',{}).get('stage','UNKNOWN'))" 2>/dev/null)
|
| 227 |
+
err=$(echo "$resp" | python3 -c "import json,sys; print(json.load(sys.stdin).get('runtime',{}).get('errorMessage','') or '')" 2>/dev/null)
|
| 228 |
+
|
| 229 |
+
case "$stage" in
|
| 230 |
+
RUNNING|BUILDING|CONFIG_ERROR_QUEUED|RUNNING_BUILDING) ;; # nominal/expected
|
| 231 |
+
STOPPED|RUNTIME_ERROR|BUILD_ERROR|NO_APP_FILE|*ERROR*)
|
| 232 |
+
log " β Space $space stage=$stage err=$err"
|
| 233 |
+
handle_anomaly "hf_space_${stage,,}" \
|
| 234 |
+
"$(printf '{"space":"%s","stage":"%s","error":%s}' \
|
| 235 |
+
"$space" "$stage" "$(python3 -c "import json; print(json.dumps('$err'))")")"
|
| 236 |
+
;;
|
| 237 |
+
*) log " Space $space stage=$stage (no action)" ;;
|
| 238 |
+
esac
|
| 239 |
+
}
|
| 240 |
+
|
| 241 |
+
# ββ Probe 2: dataset growth (staleness) βββββββββββββββββββββββββββββββββββββ
|
| 242 |
+
probe_dataset_staleness() {
|
| 243 |
+
local ds="$1"
|
| 244 |
+
local resp; resp=$(curl -fsS --max-time 15 \
|
| 245 |
+
${HF_TOKEN:+-H "Authorization: Bearer $HF_TOKEN"} \
|
| 246 |
+
"https://huggingface.co/api/datasets/$ds" 2>/dev/null) || return 0
|
| 247 |
+
local last_modified
|
| 248 |
+
last_modified=$(echo "$resp" | python3 -c "
|
| 249 |
+
import json, sys, datetime
|
| 250 |
+
try:
|
| 251 |
+
d = json.load(sys.stdin)
|
| 252 |
+
lm = d.get('lastModified') or d.get('createdAt')
|
| 253 |
+
print(lm or '')
|
| 254 |
+
except: print('')
|
| 255 |
+
" 2>/dev/null)
|
| 256 |
+
[[ -z "$last_modified" ]] && return 0
|
| 257 |
+
local age_h
|
| 258 |
+
age_h=$(python3 -c "
|
| 259 |
+
import datetime
|
| 260 |
+
lm = datetime.datetime.fromisoformat('${last_modified}'.replace('Z','+00:00'))
|
| 261 |
+
now = datetime.datetime.now(datetime.timezone.utc)
|
| 262 |
+
print(int((now - lm).total_seconds() / 3600))
|
| 263 |
+
" 2>/dev/null || echo 0)
|
| 264 |
+
if (( age_h > DATASET_STALE_HOURS )); then
|
| 265 |
+
log " β dataset $ds stale ${age_h}h (threshold ${DATASET_STALE_HOURS}h)"
|
| 266 |
+
handle_anomaly "hf_dataset_stale" \
|
| 267 |
+
"$(printf '{"dataset":"%s","age_hours":%d,"threshold":%d}' \
|
| 268 |
+
"$ds" "$age_h" "$DATASET_STALE_HOURS")"
|
| 269 |
+
fi
|
| 270 |
+
}
|
| 271 |
+
|
| 272 |
+
# ββ Probe 3: ZeroGPU smoke (cheapest health signal) βββββββββββββββββββββββββ
|
| 273 |
+
probe_zerogpu_smoke() {
|
| 274 |
+
local space="$1"
|
| 275 |
+
local url="https://${space//\//-}.hf.space/api/predict"
|
| 276 |
+
if ! curl -fsS --max-time 30 -X POST -H "Content-Type: application/json" \
|
| 277 |
+
-d '{"data":["ping","hi",16,0.1]}' "$url" >/dev/null 2>&1; then
|
| 278 |
+
log " β ZeroGPU smoke FAILED on $space"
|
| 279 |
+
handle_anomaly "zerogpu_smoke_fail" \
|
| 280 |
+
"$(printf '{"space":"%s","url":"%s"}' "$space" "$url")"
|
| 281 |
+
fi
|
| 282 |
+
}
|
| 283 |
+
|
| 284 |
+
# ββ Probe 4: GH Actions failures (best-effort) ββββββββββββββββββββββββββββββ
|
| 285 |
+
probe_gh_actions() {
|
| 286 |
+
if ! command -v gh >/dev/null 2>&1; then return 0; fi
|
| 287 |
+
for repo in axentx/arkashira axentx/midnightcrisis; do
|
| 288 |
+
local failed
|
| 289 |
+
failed=$(gh run list --repo "$repo" --limit 5 --json status,conclusion,name \
|
| 290 |
+
2>/dev/null | python3 -c "
|
| 291 |
+
import json, sys
|
| 292 |
+
try: runs = json.load(sys.stdin)
|
| 293 |
+
except: runs = []
|
| 294 |
+
fails = [r for r in runs if r.get('conclusion') == 'failure']
|
| 295 |
+
print(len(fails))
|
| 296 |
+
" 2>/dev/null || echo 0)
|
| 297 |
+
if (( failed >= 2 )); then
|
| 298 |
+
log " β GH $repo: $failed of last 5 runs failed"
|
| 299 |
+
handle_anomaly "gh_workflow_repeated_failure" \
|
| 300 |
+
"$(printf '{"repo":"%s","failed_of_5":%d}' "$repo" "$failed")"
|
| 301 |
+
fi
|
| 302 |
+
done
|
| 303 |
+
}
|
| 304 |
+
|
| 305 |
+
# ββ Probe 5: outcome log self-consistency (meta) ββββββββββββββββββββββββββββ
|
| 306 |
+
probe_outcome_log_health() {
|
| 307 |
+
if [[ ! -f "$STATE/outcomes.jsonl" ]]; then return 0; fi
|
| 308 |
+
local n_recent_fail
|
| 309 |
+
n_recent_fail=$(tail -n 20 "$STATE/outcomes.jsonl" 2>/dev/null | python3 -c "
|
| 310 |
+
import sys, json
|
| 311 |
+
n = 0
|
| 312 |
+
for L in sys.stdin:
|
| 313 |
+
try: r = json.loads(L)
|
| 314 |
+
except: continue
|
| 315 |
+
if r.get('outcome') in ('rollback','error'): n += 1
|
| 316 |
+
print(n)
|
| 317 |
+
" 2>/dev/null || echo 0)
|
| 318 |
+
if (( n_recent_fail >= 5 )); then
|
| 319 |
+
log " β ${n_recent_fail}/20 recent outcomes failed β degrading mode"
|
| 320 |
+
notify "degrading: $n_recent_fail/20 recent fails β operator review"
|
| 321 |
+
fi
|
| 322 |
+
}
|
| 323 |
+
|
| 324 |
+
# ββ Sweep βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 325 |
+
sweep() {
|
| 326 |
+
log "βββ SRE sweep βββ"
|
| 327 |
+
probe_space "$SPACE_PRIMARY"
|
| 328 |
+
probe_space "$SPACE_SECONDARY"
|
| 329 |
+
for ds in "${DATASETS[@]}"; do probe_dataset_staleness "$ds"; done
|
| 330 |
+
probe_zerogpu_smoke "$SPACE_PRIMARY"
|
| 331 |
+
probe_gh_actions
|
| 332 |
+
probe_outcome_log_health
|
| 333 |
+
log "βββ sweep done βββ"
|
| 334 |
+
}
|
| 335 |
+
|
| 336 |
+
if (( ONCE )); then
|
| 337 |
+
sweep
|
| 338 |
+
exit 0
|
| 339 |
+
fi
|
| 340 |
+
|
| 341 |
+
log "βββ autonomous-sre starting (interval=${INTERVAL_SEC}s) βββ"
|
| 342 |
+
notify "online β interval ${INTERVAL_SEC}s"
|
| 343 |
+
while true; do
|
| 344 |
+
sweep
|
| 345 |
+
sleep "$INTERVAL_SEC"
|
| 346 |
+
done
|
|
@@ -37,6 +37,7 @@ MODELS=(
|
|
| 37 |
"v1|axentx/surrogate-1-coder-7b-v1|Qwen/Qwen2.5-Coder-7B-Instruct"
|
| 38 |
"base7B|Qwen/Qwen2.5-Coder-7B-Instruct|"
|
| 39 |
"v1.1-extended|axentx/surrogate-1-7B-v1.1-extended|Qwen/Qwen2.5-Coder-7B-Instruct"
|
|
|
|
| 40 |
)
|
| 41 |
# Bench ladder pivoted 2026-05-01 after V4 (32B OOM) + V5 (14B OOM) both
|
| 42 |
# crashed Kaggle T4Γ2. Pick 7B as the validation base β fits T4Γ2 cleanly,
|
|
@@ -126,7 +127,7 @@ run_eval() {
|
|
| 126 |
SWE_RESOLVED=$(grep -oE "resolved.*[0-9]+\.[0-9]+" "$out/swebench.log" 2>/dev/null | tail -1 | grep -oE "[0-9]+\.[0-9]+" | tail -1)
|
| 127 |
|
| 128 |
# ββ 7. axentx-eval-50 (custom in-domain DevSecOps eval) ββ
|
| 129 |
-
log " [7/
|
| 130 |
if [[ -f "$HOME/.surrogate/hf-space/bin/v2/axentx-eval-50.py" ]]; then
|
| 131 |
python3 "$HOME/.surrogate/hf-space/bin/v2/axentx-eval-50.py" \
|
| 132 |
--model "$mdl" --out "$out/axentx-eval" 2>&1 | tee -a "$out/axentx-eval.log" | tail -30
|
|
@@ -135,6 +136,26 @@ run_eval() {
|
|
| 135 |
AXENTX_SCORE="--"
|
| 136 |
fi
|
| 137 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
# Persist scores
|
| 139 |
python3 - <<PYEOF
|
| 140 |
import json
|
|
@@ -147,6 +168,8 @@ data["$label"] = {
|
|
| 147 |
"ruler_16k_avg": "${RULER_AVG:-?}",
|
| 148 |
"swebench_verified_lite100": "${SWE_RESOLVED:-?}",
|
| 149 |
"axentx_eval_50": "${AXENTX_SCORE:-?}",
|
|
|
|
|
|
|
| 150 |
}
|
| 151 |
json.dump(data, open("$SUMMARY_JSON", "w"), indent=2)
|
| 152 |
PYEOF
|
|
|
|
| 37 |
"v1|axentx/surrogate-1-coder-7b-v1|Qwen/Qwen2.5-Coder-7B-Instruct"
|
| 38 |
"base7B|Qwen/Qwen2.5-Coder-7B-Instruct|"
|
| 39 |
"v1.1-extended|axentx/surrogate-1-7B-v1.1-extended|Qwen/Qwen2.5-Coder-7B-Instruct"
|
| 40 |
+
"v1.2-research|axentx/surrogate-1-7B-v1.2-research|Qwen/Qwen2.5-Coder-7B-Instruct"
|
| 41 |
)
|
| 42 |
# Bench ladder pivoted 2026-05-01 after V4 (32B OOM) + V5 (14B OOM) both
|
| 43 |
# crashed Kaggle T4Γ2. Pick 7B as the validation base β fits T4Γ2 cleanly,
|
|
|
|
| 127 |
SWE_RESOLVED=$(grep -oE "resolved.*[0-9]+\.[0-9]+" "$out/swebench.log" 2>/dev/null | tail -1 | grep -oE "[0-9]+\.[0-9]+" | tail -1)
|
| 128 |
|
| 129 |
# ββ 7. axentx-eval-50 (custom in-domain DevSecOps eval) ββ
|
| 130 |
+
log " [7/9] axentx-eval-50 (custom DevSecOps)"
|
| 131 |
if [[ -f "$HOME/.surrogate/hf-space/bin/v2/axentx-eval-50.py" ]]; then
|
| 132 |
python3 "$HOME/.surrogate/hf-space/bin/v2/axentx-eval-50.py" \
|
| 133 |
--model "$mdl" --out "$out/axentx-eval" 2>&1 | tee -a "$out/axentx-eval.log" | tail -30
|
|
|
|
| 136 |
AXENTX_SCORE="--"
|
| 137 |
fi
|
| 138 |
|
| 139 |
+
# ββ 8. Multi-IaC-Eval (NEW V8) β CFN+TF+CDK pass-rate w/ cfn-guard +tfsec β
|
| 140 |
+
log " [8/9] Multi-IaC-Eval (CFN/TF/CDK)"
|
| 141 |
+
if [[ -f "$HOME/.surrogate/hf-space/bin/v2/multi-iac-eval.py" ]]; then
|
| 142 |
+
python3 "$HOME/.surrogate/hf-space/bin/v2/multi-iac-eval.py" \
|
| 143 |
+
--model "$mdl" --out "$out/multi-iac" 2>&1 | tee -a "$out/multi-iac.log" | tail -30
|
| 144 |
+
MULTI_IAC=$(grep -oE "iac_pass_rate.*[0-9]+\.[0-9]+" "$out/multi-iac.log" | tail -1 | grep -oE "[0-9]+\.[0-9]+" | tail -1)
|
| 145 |
+
else
|
| 146 |
+
MULTI_IAC="--"
|
| 147 |
+
fi
|
| 148 |
+
|
| 149 |
+
# ββ 9. ITBench-lite (NEW V8) β 102 K8s SRE/CISO/FinOps scenarios ββ
|
| 150 |
+
log " [9/9] ITBench-lite"
|
| 151 |
+
if [[ -f "$HOME/.surrogate/hf-space/bin/v2/itbench-lite.py" ]]; then
|
| 152 |
+
python3 "$HOME/.surrogate/hf-space/bin/v2/itbench-lite.py" \
|
| 153 |
+
--model "$mdl" --out "$out/itbench" 2>&1 | tee -a "$out/itbench.log" | tail -30
|
| 154 |
+
ITBENCH=$(grep -oE "itbench_score.*[0-9]+\.[0-9]+" "$out/itbench.log" | tail -1 | grep -oE "[0-9]+\.[0-9]+" | tail -1)
|
| 155 |
+
else
|
| 156 |
+
ITBENCH="--"
|
| 157 |
+
fi
|
| 158 |
+
|
| 159 |
# Persist scores
|
| 160 |
python3 - <<PYEOF
|
| 161 |
import json
|
|
|
|
| 168 |
"ruler_16k_avg": "${RULER_AVG:-?}",
|
| 169 |
"swebench_verified_lite100": "${SWE_RESOLVED:-?}",
|
| 170 |
"axentx_eval_50": "${AXENTX_SCORE:-?}",
|
| 171 |
+
"multi_iac_eval": "${MULTI_IAC:-?}",
|
| 172 |
+
"itbench_lite": "${ITBENCH:-?}",
|
| 173 |
}
|
| 174 |
json.dump(data, open("$SUMMARY_JSON", "w"), indent=2)
|
| 175 |
PYEOF
|
|
@@ -0,0 +1,118 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Surrogate-1 β idempotency keys (research Β§autonomous-24x7 pattern 2).
|
| 3 |
+
|
| 4 |
+
Every autonomous action computes idempotency_key = sha256(plan). If the
|
| 5 |
+
same key has been seen within the TTL, the action is treated as already-
|
| 6 |
+
applied and SKIPPED (preventing replay storms when the same anomaly fires
|
| 7 |
+
twice in a row). Records live in a JSONL ledger.
|
| 8 |
+
|
| 9 |
+
Ledger entry:
|
| 10 |
+
{"key":"<sha256>", "ts":"...", "daemon":"sre|release", "outcome":"applied|queued"}
|
| 11 |
+
|
| 12 |
+
Usage:
|
| 13 |
+
# Check if seen recently β exit 0 if seen (skip), 1 if new
|
| 14 |
+
idempotency.py check --plan /path/to/plan.json --ttl-hours 4
|
| 15 |
+
|
| 16 |
+
# Record after applying
|
| 17 |
+
idempotency.py record --plan /path/to/plan.json \
|
| 18 |
+
--daemon sre --outcome applied
|
| 19 |
+
"""
|
| 20 |
+
from __future__ import annotations
|
| 21 |
+
|
| 22 |
+
import argparse
|
| 23 |
+
import datetime as dt
|
| 24 |
+
import hashlib
|
| 25 |
+
import json
|
| 26 |
+
import os
|
| 27 |
+
import sys
|
| 28 |
+
from pathlib import Path
|
| 29 |
+
|
| 30 |
+
LEDGER = Path(os.environ.get(
|
| 31 |
+
"SURROGATE_IDEMPOTENCY_LEDGER",
|
| 32 |
+
str(Path.home() / ".surrogate/state/idempotency.jsonl")))
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
def compute_key(plan_path: Path) -> str:
|
| 36 |
+
txt = plan_path.read_text() if plan_path.is_file() else str(plan_path)
|
| 37 |
+
h = hashlib.sha256()
|
| 38 |
+
h.update(txt.encode())
|
| 39 |
+
return h.hexdigest()
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
def load_ledger() -> list[dict]:
|
| 43 |
+
if not LEDGER.exists():
|
| 44 |
+
return []
|
| 45 |
+
out = []
|
| 46 |
+
for L in LEDGER.read_text().splitlines():
|
| 47 |
+
try:
|
| 48 |
+
out.append(json.loads(L))
|
| 49 |
+
except Exception:
|
| 50 |
+
continue
|
| 51 |
+
return out
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
def append_ledger(rec: dict) -> None:
|
| 55 |
+
LEDGER.parent.mkdir(parents=True, exist_ok=True)
|
| 56 |
+
with LEDGER.open("a") as f:
|
| 57 |
+
f.write(json.dumps(rec) + "\n")
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
def is_recent(key: str, ttl_hours: float) -> bool:
|
| 61 |
+
cutoff = dt.datetime.now(dt.timezone.utc) - dt.timedelta(hours=ttl_hours)
|
| 62 |
+
for r in load_ledger():
|
| 63 |
+
if r.get("key") != key:
|
| 64 |
+
continue
|
| 65 |
+
try:
|
| 66 |
+
ts = dt.datetime.strptime(r["ts"], "%Y-%m-%dT%H:%M:%SZ")
|
| 67 |
+
except Exception:
|
| 68 |
+
continue
|
| 69 |
+
if ts > cutoff:
|
| 70 |
+
return True
|
| 71 |
+
return False
|
| 72 |
+
|
| 73 |
+
|
| 74 |
+
def main() -> int:
|
| 75 |
+
p = argparse.ArgumentParser()
|
| 76 |
+
sp = p.add_subparsers(dest="cmd", required=True)
|
| 77 |
+
|
| 78 |
+
pc = sp.add_parser("check")
|
| 79 |
+
pc.add_argument("--plan", required=True)
|
| 80 |
+
pc.add_argument("--ttl-hours", type=float, default=4.0)
|
| 81 |
+
|
| 82 |
+
pr = sp.add_parser("record")
|
| 83 |
+
pr.add_argument("--plan", required=True)
|
| 84 |
+
pr.add_argument("--daemon", required=True)
|
| 85 |
+
pr.add_argument("--outcome", required=True)
|
| 86 |
+
|
| 87 |
+
pk = sp.add_parser("key")
|
| 88 |
+
pk.add_argument("--plan", required=True)
|
| 89 |
+
|
| 90 |
+
args = p.parse_args()
|
| 91 |
+
|
| 92 |
+
if args.cmd == "key":
|
| 93 |
+
print(compute_key(Path(args.plan)))
|
| 94 |
+
return 0
|
| 95 |
+
|
| 96 |
+
key = compute_key(Path(args.plan))
|
| 97 |
+
|
| 98 |
+
if args.cmd == "check":
|
| 99 |
+
seen = is_recent(key, args.ttl_hours)
|
| 100 |
+
print(json.dumps({"key": key, "seen_recently": seen,
|
| 101 |
+
"ttl_hours": args.ttl_hours}))
|
| 102 |
+
return 0 if seen else 1 # 0 = seen (skip); 1 = new (proceed)
|
| 103 |
+
|
| 104 |
+
if args.cmd == "record":
|
| 105 |
+
append_ledger({
|
| 106 |
+
"key": key,
|
| 107 |
+
"ts": dt.datetime.now(dt.timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
|
| 108 |
+
"daemon": args.daemon,
|
| 109 |
+
"outcome": args.outcome,
|
| 110 |
+
})
|
| 111 |
+
print(f"recorded {key[:12]}β¦")
|
| 112 |
+
return 0
|
| 113 |
+
|
| 114 |
+
return 2
|
| 115 |
+
|
| 116 |
+
|
| 117 |
+
if __name__ == "__main__":
|
| 118 |
+
sys.exit(main())
|
|
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Surrogate-1 β outcome logger.
|
| 3 |
+
|
| 4 |
+
All autonomous daemons (autonomous-sre, autonomous-release) call this
|
| 5 |
+
after every action to append a structured record to the outcomes log.
|
| 6 |
+
self-improve.sh reads that log to build the next training round's
|
| 7 |
+
preference + SFT data.
|
| 8 |
+
|
| 9 |
+
One JSONL record per action:
|
| 10 |
+
{
|
| 11 |
+
"ts": "2026-05-01T12:34:56Z",
|
| 12 |
+
"daemon": "sre" | "release",
|
| 13 |
+
"trigger": "...probe_name...",
|
| 14 |
+
"anomaly": {...probe details...},
|
| 15 |
+
"prompt": "<full prompt sent to Surrogate>",
|
| 16 |
+
"response": {...Surrogate's parsed JSON output...},
|
| 17 |
+
"verdict": {...verifier-ensemble JSON...},
|
| 18 |
+
"applied": true|false,
|
| 19 |
+
"outcome": "success" | "rollback" | "queued" | "rejected",
|
| 20 |
+
"metric_after": {...optional post-action observation...},
|
| 21 |
+
"lesson": "optional one-line takeaway"
|
| 22 |
+
}
|
| 23 |
+
|
| 24 |
+
Usage:
|
| 25 |
+
outcome-log.py --daemon sre --trigger hf_space_stage_failed \
|
| 26 |
+
--anomaly /tmp/anomaly.json \
|
| 27 |
+
--prompt /tmp/prompt.md \
|
| 28 |
+
--response /tmp/response.json \
|
| 29 |
+
--verdict /tmp/verdict.json \
|
| 30 |
+
--applied true --outcome success \
|
| 31 |
+
[--lesson "factory_reboot fixed stuck Space"]
|
| 32 |
+
"""
|
| 33 |
+
from __future__ import annotations
|
| 34 |
+
|
| 35 |
+
import argparse
|
| 36 |
+
import datetime as dt
|
| 37 |
+
import json
|
| 38 |
+
import os
|
| 39 |
+
import sys
|
| 40 |
+
from pathlib import Path
|
| 41 |
+
|
| 42 |
+
LOG_PATH = Path(os.environ.get(
|
| 43 |
+
"SURROGATE_OUTCOME_LOG",
|
| 44 |
+
str(Path.home() / ".surrogate/state/outcomes.jsonl")))
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
def _maybe_load(p: str | None) -> object | None:
|
| 48 |
+
if not p:
|
| 49 |
+
return None
|
| 50 |
+
pp = Path(p)
|
| 51 |
+
if not pp.exists():
|
| 52 |
+
return p # treat as inline string
|
| 53 |
+
txt = pp.read_text()
|
| 54 |
+
try:
|
| 55 |
+
return json.loads(txt)
|
| 56 |
+
except Exception:
|
| 57 |
+
return txt # not JSON β store raw
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
def main() -> int:
|
| 61 |
+
p = argparse.ArgumentParser()
|
| 62 |
+
p.add_argument("--daemon", required=True, choices=["sre", "release", "manual"])
|
| 63 |
+
p.add_argument("--trigger", required=True)
|
| 64 |
+
p.add_argument("--anomaly", default=None,
|
| 65 |
+
help="path to JSON file or inline string")
|
| 66 |
+
p.add_argument("--prompt", default=None)
|
| 67 |
+
p.add_argument("--response", default=None)
|
| 68 |
+
p.add_argument("--verdict", default=None)
|
| 69 |
+
p.add_argument("--applied", choices=["true", "false"], required=True)
|
| 70 |
+
p.add_argument("--outcome", required=True,
|
| 71 |
+
choices=["success", "rollback", "queued", "rejected", "error"])
|
| 72 |
+
p.add_argument("--metric-after", default=None)
|
| 73 |
+
p.add_argument("--lesson", default=None)
|
| 74 |
+
args = p.parse_args()
|
| 75 |
+
|
| 76 |
+
rec = {
|
| 77 |
+
"ts": dt.datetime.now(dt.timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ"),
|
| 78 |
+
"daemon": args.daemon,
|
| 79 |
+
"trigger": args.trigger,
|
| 80 |
+
"anomaly": _maybe_load(args.anomaly),
|
| 81 |
+
"prompt": _maybe_load(args.prompt),
|
| 82 |
+
"response": _maybe_load(args.response),
|
| 83 |
+
"verdict": _maybe_load(args.verdict),
|
| 84 |
+
"applied": args.applied == "true",
|
| 85 |
+
"outcome": args.outcome,
|
| 86 |
+
"metric_after": _maybe_load(args.metric_after),
|
| 87 |
+
"lesson": args.lesson,
|
| 88 |
+
}
|
| 89 |
+
LOG_PATH.parent.mkdir(parents=True, exist_ok=True)
|
| 90 |
+
with LOG_PATH.open("a") as f:
|
| 91 |
+
f.write(json.dumps(rec, ensure_ascii=False) + "\n")
|
| 92 |
+
print(f"logged outcome: {args.daemon}/{args.trigger} β {args.outcome} "
|
| 93 |
+
f"(applied={args.applied})", file=sys.stderr)
|
| 94 |
+
return 0
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
if __name__ == "__main__":
|
| 98 |
+
sys.exit(main())
|
|
@@ -0,0 +1,283 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
# Surrogate-1 β self-improvement data flywheel.
|
| 3 |
+
#
|
| 4 |
+
# Reads outcomes.jsonl (produced by autonomous-sre.sh + autonomous-release.sh)
|
| 5 |
+
# and converts the success/failure signal into training data for the next
|
| 6 |
+
# round, then triggers a refresh when accumulation crosses thresholds.
|
| 7 |
+
#
|
| 8 |
+
# Pipeline (from research Β§self-improvement.md, cron cadence aligned to
|
| 9 |
+
# research recommendations):
|
| 10 |
+
# 1. Aggregate outcomes since last run.
|
| 11 |
+
# 2. Split into:
|
| 12 |
+
# SUCCESS = applied && outcome=success
|
| 13 |
+
# FAIL = applied && outcome in (rollback, error)
|
| 14 |
+
# REJECTED = !applied β verifier blocked it
|
| 15 |
+
# 3. Build 3 datasets:
|
| 16 |
+
# a) SFT replay β SUCCESS only, formatted as prompt/response pairs
|
| 17 |
+
# (RLEF-style: model wrote it, executor approved)
|
| 18 |
+
# b) KTO unpaired β every outcome with binary "thumbs" label
|
| 19 |
+
# (KTO doesn't need pairs β lossless on logs)
|
| 20 |
+
# c) Skill library β verified procedures from SUCCESS, indexed by topic
|
| 21 |
+
# 4. Push to HF Hub:
|
| 22 |
+
# axentx/surrogate-1-self-traces (SFT)
|
| 23 |
+
# axentx/surrogate-1-pref-kto (KTO)
|
| 24 |
+
# axentx/surrogate-1-skills (skill library)
|
| 25 |
+
# 5. If SFT pairs β₯ SFT_TRIGGER_N OR KTO β₯ KTO_TRIGGER_N β kick training:
|
| 26 |
+
# - Bumps Kaggle kernel version (or notifies user to upload)
|
| 27 |
+
# - Logs decision to outcomes.jsonl with daemon=manual trigger=self-improve
|
| 28 |
+
#
|
| 29 |
+
# Cadence (from research):
|
| 30 |
+
# - SFT replay weekly Sun 5am (cheap)
|
| 31 |
+
# - KTO refresh biweekly (1st + 15th)
|
| 32 |
+
# - Skill index daily 4am (free)
|
| 33 |
+
# - Trigger train when thresholds met
|
| 34 |
+
#
|
| 35 |
+
# Usage:
|
| 36 |
+
# bash bin/v2/self-improve.sh # run all stages, idempotent
|
| 37 |
+
# bash bin/v2/self-improve.sh sft # just SFT replay
|
| 38 |
+
# bash bin/v2/self-improve.sh kto # just KTO build
|
| 39 |
+
# bash bin/v2/self-improve.sh skills # just skill library
|
| 40 |
+
# bash bin/v2/self-improve.sh status # report counts only
|
| 41 |
+
set -uo pipefail
|
| 42 |
+
[[ -f "$HOME/.hermes/.env" ]] && { set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a; }
|
| 43 |
+
|
| 44 |
+
HFB="$HOME/.surrogate/hf-space/bin/v2"
|
| 45 |
+
STATE="$HOME/.surrogate/state"
|
| 46 |
+
OUTCOMES="$STATE/outcomes.jsonl"
|
| 47 |
+
WORK="$STATE/self-improve"
|
| 48 |
+
LOG="$HOME/.surrogate/logs/self-improve.log"
|
| 49 |
+
mkdir -p "$WORK" "$(dirname "$LOG")"
|
| 50 |
+
|
| 51 |
+
CMD="${1:-all}"
|
| 52 |
+
|
| 53 |
+
# Trigger thresholds β research recommends weekly SFT @ ~$14 H200 cost
|
| 54 |
+
SFT_TRIGGER_N="${SI_SFT_TRIGGER_N:-200}"
|
| 55 |
+
KTO_TRIGGER_N="${SI_KTO_TRIGGER_N:-500}"
|
| 56 |
+
|
| 57 |
+
# HF Hub repos for the three artifact streams
|
| 58 |
+
SFT_REPO="${SI_SFT_REPO:-axentx/surrogate-1-self-traces}"
|
| 59 |
+
KTO_REPO="${SI_KTO_REPO:-axentx/surrogate-1-pref-kto}"
|
| 60 |
+
SKILL_REPO="${SI_SKILL_REPO:-axentx/surrogate-1-skills}"
|
| 61 |
+
|
| 62 |
+
log() { echo "[$(date '+%Y-%m-%dT%H:%M:%S')] $*" | tee -a "$LOG"; }
|
| 63 |
+
notify() {
|
| 64 |
+
[[ -z "${DISCORD_WEBHOOK:-}" ]] && return
|
| 65 |
+
curl -s -X POST -H "Content-Type: application/json" \
|
| 66 |
+
-d "{\"content\":\"β»οΈ self-improve: $1\"}" \
|
| 67 |
+
"$DISCORD_WEBHOOK" >/dev/null 2>&1 || true
|
| 68 |
+
}
|
| 69 |
+
|
| 70 |
+
# ββ Stage: status report ββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 71 |
+
status() {
|
| 72 |
+
if [[ ! -f "$OUTCOMES" ]]; then
|
| 73 |
+
log "no outcomes.jsonl yet β daemons haven't logged anything"
|
| 74 |
+
return 0
|
| 75 |
+
fi
|
| 76 |
+
python3 - <<PYEOF
|
| 77 |
+
import json, collections
|
| 78 |
+
from pathlib import Path
|
| 79 |
+
n = collections.Counter()
|
| 80 |
+
by_daemon = collections.Counter()
|
| 81 |
+
trigger = collections.Counter()
|
| 82 |
+
for L in Path("$OUTCOMES").read_text().splitlines():
|
| 83 |
+
try: r = json.loads(L)
|
| 84 |
+
except: continue
|
| 85 |
+
n[r.get("outcome","?")] += 1
|
| 86 |
+
by_daemon[r.get("daemon","?")] += 1
|
| 87 |
+
trigger[r.get("trigger","?")] += 1
|
| 88 |
+
print(f" total outcomes: {sum(n.values())}")
|
| 89 |
+
print(f" by outcome: {dict(n)}")
|
| 90 |
+
print(f" by daemon: {dict(by_daemon)}")
|
| 91 |
+
print(f" top triggers:")
|
| 92 |
+
for t, c in trigger.most_common(8):
|
| 93 |
+
print(f" {c:4d} {t}")
|
| 94 |
+
PYEOF
|
| 95 |
+
}
|
| 96 |
+
|
| 97 |
+
# ββ Stage: SFT replay (RLEF-aligned) ββββββββββββββββββββββββββββββββββββββββ
|
| 98 |
+
build_sft() {
|
| 99 |
+
log "ββ SFT replay build ββ"
|
| 100 |
+
[[ ! -f "$OUTCOMES" ]] && { log " no outcomes file β skip"; return 0; }
|
| 101 |
+
python3 - <<'PYEOF' "$OUTCOMES" "$WORK/sft.jsonl"
|
| 102 |
+
import json, sys
|
| 103 |
+
from pathlib import Path
|
| 104 |
+
src, dst = sys.argv[1], sys.argv[2]
|
| 105 |
+
n_in = n_out = 0
|
| 106 |
+
with open(dst, "w") as out:
|
| 107 |
+
for L in Path(src).read_text().splitlines():
|
| 108 |
+
n_in += 1
|
| 109 |
+
try: r = json.loads(L)
|
| 110 |
+
except: continue
|
| 111 |
+
if not r.get("applied"): continue
|
| 112 |
+
if r.get("outcome") != "success": continue
|
| 113 |
+
# The model's diagnosis/spec/patch IS the response. The trigger +
|
| 114 |
+
# anomaly together form the prompt.
|
| 115 |
+
prompt = (
|
| 116 |
+
f"You are Surrogate-1 in {r.get('daemon','?')} mode.\n"
|
| 117 |
+
f"Trigger: {r.get('trigger','?')}\n"
|
| 118 |
+
f"Anomaly:\n```json\n{json.dumps(r.get('anomaly'), indent=2)}\n```\n"
|
| 119 |
+
f"Output a JSON action with diagnosis + patch."
|
| 120 |
+
)
|
| 121 |
+
resp = r.get("response")
|
| 122 |
+
if not isinstance(resp, dict): continue
|
| 123 |
+
out.write(json.dumps({
|
| 124 |
+
"prompt": prompt,
|
| 125 |
+
"response": json.dumps(resp, indent=2),
|
| 126 |
+
"source": "self-trace",
|
| 127 |
+
"ts": r.get("ts"),
|
| 128 |
+
"trigger": r.get("trigger"),
|
| 129 |
+
"lesson": r.get("lesson"),
|
| 130 |
+
}, ensure_ascii=False) + "\n")
|
| 131 |
+
n_out += 1
|
| 132 |
+
print(f" SFT pairs: {n_out} (read {n_in})")
|
| 133 |
+
PYEOF
|
| 134 |
+
local n; n=$(wc -l < "$WORK/sft.jsonl" | tr -d ' ')
|
| 135 |
+
log " β $WORK/sft.jsonl ($n pairs)"
|
| 136 |
+
if (( n >= SFT_TRIGGER_N )); then
|
| 137 |
+
log " threshold met ($n β₯ $SFT_TRIGGER_N) β pushing + flagging trigger"
|
| 138 |
+
push_dataset "$SFT_REPO" "$WORK/sft.jsonl"
|
| 139 |
+
trigger_next_round "sft" "$n"
|
| 140 |
+
else
|
| 141 |
+
log " below trigger ($n < $SFT_TRIGGER_N) β accumulating"
|
| 142 |
+
fi
|
| 143 |
+
}
|
| 144 |
+
|
| 145 |
+
# ββ Stage: KTO unpaired preferences βββββββββββββββββββββββββββββββββββββββββ
|
| 146 |
+
build_kto() {
|
| 147 |
+
log "ββ KTO unpaired build ββ"
|
| 148 |
+
[[ ! -f "$OUTCOMES" ]] && { log " no outcomes file β skip"; return 0; }
|
| 149 |
+
python3 - <<'PYEOF' "$OUTCOMES" "$WORK/kto.jsonl"
|
| 150 |
+
import json, sys
|
| 151 |
+
from pathlib import Path
|
| 152 |
+
src, dst = sys.argv[1], sys.argv[2]
|
| 153 |
+
n = 0
|
| 154 |
+
with open(dst, "w") as out:
|
| 155 |
+
for L in Path(src).read_text().splitlines():
|
| 156 |
+
try: r = json.loads(L)
|
| 157 |
+
except: continue
|
| 158 |
+
oc = r.get("outcome")
|
| 159 |
+
if oc not in ("success","rollback","error","queued","rejected"): continue
|
| 160 |
+
# KTO label: True = applied & success, False = anything else
|
| 161 |
+
label = bool(r.get("applied")) and (oc == "success")
|
| 162 |
+
prompt = (
|
| 163 |
+
f"trigger={r.get('trigger','?')} daemon={r.get('daemon','?')}\n"
|
| 164 |
+
f"anomaly={json.dumps(r.get('anomaly'))[:400]}"
|
| 165 |
+
)
|
| 166 |
+
resp = r.get("response")
|
| 167 |
+
if not isinstance(resp, dict): continue
|
| 168 |
+
out.write(json.dumps({
|
| 169 |
+
"prompt": prompt,
|
| 170 |
+
"completion": json.dumps(resp)[:2000],
|
| 171 |
+
"label": label,
|
| 172 |
+
"ts": r.get("ts"),
|
| 173 |
+
}, ensure_ascii=False) + "\n")
|
| 174 |
+
n += 1
|
| 175 |
+
print(f" KTO rows: {n}")
|
| 176 |
+
PYEOF
|
| 177 |
+
local n; n=$(wc -l < "$WORK/kto.jsonl" | tr -d ' ')
|
| 178 |
+
log " β $WORK/kto.jsonl ($n rows)"
|
| 179 |
+
if (( n >= KTO_TRIGGER_N )); then
|
| 180 |
+
log " threshold met β pushing"
|
| 181 |
+
push_dataset "$KTO_REPO" "$WORK/kto.jsonl"
|
| 182 |
+
trigger_next_round "kto" "$n"
|
| 183 |
+
fi
|
| 184 |
+
}
|
| 185 |
+
|
| 186 |
+
# ββ Stage: skill library ββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 187 |
+
build_skills() {
|
| 188 |
+
log "ββ skill library build ββ"
|
| 189 |
+
[[ ! -f "$OUTCOMES" ]] && { log " no outcomes file β skip"; return 0; }
|
| 190 |
+
python3 - <<'PYEOF' "$OUTCOMES" "$WORK/skills.jsonl"
|
| 191 |
+
import json, sys, collections
|
| 192 |
+
from pathlib import Path
|
| 193 |
+
src, dst = sys.argv[1], sys.argv[2]
|
| 194 |
+
# Group successful patches by trigger keyword to form a skill = (keyword, top-N successful patches)
|
| 195 |
+
groups = collections.defaultdict(list)
|
| 196 |
+
for L in Path(src).read_text().splitlines():
|
| 197 |
+
try: r = json.loads(L)
|
| 198 |
+
except: continue
|
| 199 |
+
if not (r.get("applied") and r.get("outcome") == "success"): continue
|
| 200 |
+
resp = r.get("response")
|
| 201 |
+
if not isinstance(resp, dict): continue
|
| 202 |
+
trig = r.get("trigger","misc").split(":")[0]
|
| 203 |
+
groups[trig].append({
|
| 204 |
+
"patch": resp.get("patch",""),
|
| 205 |
+
"rollback": resp.get("rollback",""),
|
| 206 |
+
"test_plan": resp.get("test_plan",""),
|
| 207 |
+
"ts": r.get("ts"),
|
| 208 |
+
})
|
| 209 |
+
n = 0
|
| 210 |
+
with open(dst, "w") as out:
|
| 211 |
+
for trig, items in groups.items():
|
| 212 |
+
items.sort(key=lambda x: x.get("ts",""), reverse=True)
|
| 213 |
+
out.write(json.dumps({
|
| 214 |
+
"skill": trig,
|
| 215 |
+
"n_examples": len(items),
|
| 216 |
+
"examples": items[:5], # keep top 5 most-recent
|
| 217 |
+
}, ensure_ascii=False) + "\n")
|
| 218 |
+
n += 1
|
| 219 |
+
print(f" skills: {n}")
|
| 220 |
+
PYEOF
|
| 221 |
+
local n; n=$(wc -l < "$WORK/skills.jsonl" | tr -d ' ')
|
| 222 |
+
log " β $WORK/skills.jsonl ($n skills)"
|
| 223 |
+
if (( n > 0 )); then
|
| 224 |
+
push_dataset "$SKILL_REPO" "$WORK/skills.jsonl"
|
| 225 |
+
fi
|
| 226 |
+
}
|
| 227 |
+
|
| 228 |
+
# ββ Push to HF Hub via huggingface_hub Python API βββββββββββββββββββββββββββ
|
| 229 |
+
push_dataset() {
|
| 230 |
+
local repo="$1" path="$2"
|
| 231 |
+
if [[ -z "${HF_TOKEN:-}" ]]; then
|
| 232 |
+
log " HF_TOKEN missing β saving locally only"
|
| 233 |
+
return 0
|
| 234 |
+
fi
|
| 235 |
+
python3 - <<PYEOF
|
| 236 |
+
import os
|
| 237 |
+
from huggingface_hub import HfApi, create_repo
|
| 238 |
+
api = HfApi(token=os.environ["HF_TOKEN"])
|
| 239 |
+
try:
|
| 240 |
+
create_repo("$repo", repo_type="dataset", exist_ok=True, private=False)
|
| 241 |
+
except Exception as e:
|
| 242 |
+
print(f" create_repo: {type(e).__name__}: {e}")
|
| 243 |
+
api.upload_file(
|
| 244 |
+
path_or_fileobj="$path",
|
| 245 |
+
path_in_repo="$(basename "$path")",
|
| 246 |
+
repo_id="$repo",
|
| 247 |
+
repo_type="dataset",
|
| 248 |
+
commit_message="self-improve: $(basename "$path") $(date -u +%Y%m%dT%H%MZ)",
|
| 249 |
+
)
|
| 250 |
+
print(f" pushed β https://huggingface.co/datasets/$repo")
|
| 251 |
+
PYEOF
|
| 252 |
+
}
|
| 253 |
+
|
| 254 |
+
# ββ Trigger next training round βββββββββββββββββββββββββββββββββββββββββββββ
|
| 255 |
+
trigger_next_round() {
|
| 256 |
+
local stage="$1" n="$2"
|
| 257 |
+
log " TRIGGER next training round (stage=$stage n=$n)"
|
| 258 |
+
notify "$stage threshold reached ($n) β flagging next training round"
|
| 259 |
+
python3 "$HFB/outcome-log.py" --daemon manual --trigger "self-improve-trigger-$stage" \
|
| 260 |
+
--applied false --outcome queued \
|
| 261 |
+
--lesson "$stage threshold reached ($n) β V8 training queued" || true
|
| 262 |
+
# If Kaggle CLI ever returns to a working state, this is where we'd
|
| 263 |
+
# call `kaggle kernels push`. For now, write a flag file the user
|
| 264 |
+
# checks manually.
|
| 265 |
+
echo "$(date -u +%Y%m%dT%H%MZ) $stage n=$n" >> "$STATE/training-queue.log"
|
| 266 |
+
}
|
| 267 |
+
|
| 268 |
+
# ββ Dispatcher ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 269 |
+
case "$CMD" in
|
| 270 |
+
status) status ;;
|
| 271 |
+
sft) build_sft ;;
|
| 272 |
+
kto) build_kto ;;
|
| 273 |
+
skills) build_skills ;;
|
| 274 |
+
all)
|
| 275 |
+
status
|
| 276 |
+
build_skills
|
| 277 |
+
build_sft
|
| 278 |
+
build_kto
|
| 279 |
+
;;
|
| 280 |
+
*) echo "usage: $0 {all|sft|kto|skills|status}" >&2; exit 2 ;;
|
| 281 |
+
esac
|
| 282 |
+
|
| 283 |
+
log "done"
|
|
@@ -0,0 +1,177 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Surrogate-1 β single-shot call to the ZeroGPU endpoint with strict JSON parse.
|
| 3 |
+
|
| 4 |
+
Used by autonomous-sre.sh + autonomous-release.sh to ask Surrogate-1 for
|
| 5 |
+
a structured diagnosis/spec/patch. Returns parsed JSON on stdout, exits 0
|
| 6 |
+
if the response is valid + matches the schema, else non-zero.
|
| 7 |
+
|
| 8 |
+
Usage:
|
| 9 |
+
surrogate-call.py \
|
| 10 |
+
--space surrogate1/surrogate-1-zero-gpu \
|
| 11 |
+
--prompt-file /tmp/prompt.md \
|
| 12 |
+
--schema diagnosis|spec|patch \
|
| 13 |
+
[--max-tokens 1024] [--temperature 0.2] \
|
| 14 |
+
[--retries 2] [--out /tmp/response.json]
|
| 15 |
+
|
| 16 |
+
Env:
|
| 17 |
+
HF_TOKEN (or HF_TOKEN_PRO) β required for private/queued Space
|
| 18 |
+
SURROGATE_TIMEOUT_SEC=120 β per-call timeout
|
| 19 |
+
SURROGATE_RETRY_BACKOFF_SEC=15 β sleep between retries
|
| 20 |
+
"""
|
| 21 |
+
from __future__ import annotations
|
| 22 |
+
|
| 23 |
+
import argparse
|
| 24 |
+
import json
|
| 25 |
+
import os
|
| 26 |
+
import re
|
| 27 |
+
import sys
|
| 28 |
+
import time
|
| 29 |
+
from pathlib import Path
|
| 30 |
+
from urllib import request, error
|
| 31 |
+
|
| 32 |
+
TIMEOUT = int(os.environ.get("SURROGATE_TIMEOUT_SEC", "120"))
|
| 33 |
+
BACKOFF = int(os.environ.get("SURROGATE_RETRY_BACKOFF_SEC", "15"))
|
| 34 |
+
|
| 35 |
+
SCHEMAS = {
|
| 36 |
+
"diagnosis": {
|
| 37 |
+
"required": ["diagnosis", "fix_kind", "confidence"],
|
| 38 |
+
"fix_kind_enum": ["code", "iac", "shell", "sql", "none"],
|
| 39 |
+
"extras": ["patch", "target_file", "rollback", "test_plan"],
|
| 40 |
+
},
|
| 41 |
+
"spec": {
|
| 42 |
+
"required": ["title", "problem", "user_stories",
|
| 43 |
+
"acceptance_criteria", "impact", "confidence"],
|
| 44 |
+
"extras": ["competitors_observed", "out_of_scope", "rollout_plan"],
|
| 45 |
+
},
|
| 46 |
+
"patch": {
|
| 47 |
+
"required": ["target_file", "patch", "kind",
|
| 48 |
+
"test_plan", "rollback", "confidence"],
|
| 49 |
+
"extras": ["fix_kind", "diagnosis"],
|
| 50 |
+
},
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
def _hf_token() -> str | None:
|
| 55 |
+
return (os.environ.get("HF_TOKEN")
|
| 56 |
+
or os.environ.get("HF_TOKEN_PRO")
|
| 57 |
+
or os.environ.get("HF_TOKEN_PRO_WRITE"))
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
def _post_json(url: str, body: dict, token: str | None) -> dict:
|
| 61 |
+
headers = {"Content-Type": "application/json"}
|
| 62 |
+
if token:
|
| 63 |
+
headers["Authorization"] = f"Bearer {token}"
|
| 64 |
+
req = request.Request(url, data=json.dumps(body).encode(),
|
| 65 |
+
headers=headers, method="POST")
|
| 66 |
+
with request.urlopen(req, timeout=TIMEOUT) as resp:
|
| 67 |
+
return json.loads(resp.read().decode())
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
def _extract_json(text: str) -> dict | None:
|
| 71 |
+
# Try fenced ```json β¦ ``` first, then loose {...} sweep
|
| 72 |
+
m = re.search(r"```(?:json)?\s*(\{.*?\})\s*```", text, flags=re.S)
|
| 73 |
+
candidates = [m.group(1)] if m else []
|
| 74 |
+
# also try the longest balanced {..} substring
|
| 75 |
+
depth = 0; start = -1; longest = ""
|
| 76 |
+
for i, ch in enumerate(text):
|
| 77 |
+
if ch == "{":
|
| 78 |
+
if depth == 0:
|
| 79 |
+
start = i
|
| 80 |
+
depth += 1
|
| 81 |
+
elif ch == "}":
|
| 82 |
+
depth -= 1
|
| 83 |
+
if depth == 0 and start >= 0:
|
| 84 |
+
blob = text[start:i + 1]
|
| 85 |
+
if len(blob) > len(longest):
|
| 86 |
+
longest = blob
|
| 87 |
+
if longest:
|
| 88 |
+
candidates.append(longest)
|
| 89 |
+
for c in candidates:
|
| 90 |
+
try:
|
| 91 |
+
return json.loads(c)
|
| 92 |
+
except Exception:
|
| 93 |
+
continue
|
| 94 |
+
return None
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
def _validate(parsed: dict, schema: str) -> tuple[bool, str]:
|
| 98 |
+
spec = SCHEMAS.get(schema)
|
| 99 |
+
if not spec:
|
| 100 |
+
return False, f"unknown schema: {schema}"
|
| 101 |
+
missing = [k for k in spec["required"] if k not in parsed]
|
| 102 |
+
if missing:
|
| 103 |
+
return False, f"missing required keys: {missing}"
|
| 104 |
+
if schema == "diagnosis":
|
| 105 |
+
if parsed.get("fix_kind") not in spec["fix_kind_enum"]:
|
| 106 |
+
return False, f"fix_kind must be one of {spec['fix_kind_enum']}"
|
| 107 |
+
try:
|
| 108 |
+
c = float(parsed.get("confidence", -1))
|
| 109 |
+
if not (0.0 <= c <= 1.0):
|
| 110 |
+
return False, f"confidence out of [0,1]: {c}"
|
| 111 |
+
except Exception:
|
| 112 |
+
return False, "confidence not numeric"
|
| 113 |
+
return True, "ok"
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
def _call_gradio(space: str, prompt: str, max_tokens: int,
|
| 117 |
+
temperature: float) -> str:
|
| 118 |
+
# Most Surrogate ZeroGPU Spaces expose /run/predict or /api/predict.
|
| 119 |
+
# Try modern /api/predict first, fall back to /run/predict.
|
| 120 |
+
base = f"https://{space.replace('/', '-')}.hf.space"
|
| 121 |
+
body = {"data": [prompt, "", max_tokens, temperature]}
|
| 122 |
+
for path in ("/api/predict", "/run/predict"):
|
| 123 |
+
try:
|
| 124 |
+
r = _post_json(base + path, body, _hf_token())
|
| 125 |
+
if isinstance(r, dict) and "data" in r and r["data"]:
|
| 126 |
+
first = r["data"][0]
|
| 127 |
+
if isinstance(first, str):
|
| 128 |
+
return first
|
| 129 |
+
if isinstance(first, list) and first:
|
| 130 |
+
return str(first[0])
|
| 131 |
+
return json.dumps(r)
|
| 132 |
+
except error.HTTPError as e:
|
| 133 |
+
if e.code in (404, 405):
|
| 134 |
+
continue
|
| 135 |
+
raise
|
| 136 |
+
raise RuntimeError(f"no working endpoint on {base}")
|
| 137 |
+
|
| 138 |
+
|
| 139 |
+
def main() -> int:
|
| 140 |
+
p = argparse.ArgumentParser()
|
| 141 |
+
p.add_argument("--space", required=True,
|
| 142 |
+
help="HF Space repo, e.g. surrogate1/surrogate-1-zero-gpu")
|
| 143 |
+
p.add_argument("--prompt-file", required=True)
|
| 144 |
+
p.add_argument("--schema", required=True, choices=list(SCHEMAS.keys()))
|
| 145 |
+
p.add_argument("--max-tokens", type=int, default=1024)
|
| 146 |
+
p.add_argument("--temperature", type=float, default=0.2)
|
| 147 |
+
p.add_argument("--retries", type=int, default=2)
|
| 148 |
+
p.add_argument("--out", default=None)
|
| 149 |
+
args = p.parse_args()
|
| 150 |
+
|
| 151 |
+
prompt = Path(args.prompt_file).read_text()
|
| 152 |
+
last_err = ""
|
| 153 |
+
for attempt in range(args.retries + 1):
|
| 154 |
+
try:
|
| 155 |
+
raw = _call_gradio(args.space, prompt, args.max_tokens, args.temperature)
|
| 156 |
+
parsed = _extract_json(raw)
|
| 157 |
+
if parsed is None:
|
| 158 |
+
last_err = f"no JSON in response (preview: {raw[:200]})"
|
| 159 |
+
else:
|
| 160 |
+
ok, msg = _validate(parsed, args.schema)
|
| 161 |
+
if ok:
|
| 162 |
+
out = json.dumps(parsed, indent=2)
|
| 163 |
+
print(out)
|
| 164 |
+
if args.out:
|
| 165 |
+
Path(args.out).write_text(out)
|
| 166 |
+
return 0
|
| 167 |
+
last_err = f"schema validation failed: {msg}"
|
| 168 |
+
except Exception as e:
|
| 169 |
+
last_err = f"{type(e).__name__}: {e}"
|
| 170 |
+
if attempt < args.retries:
|
| 171 |
+
time.sleep(BACKOFF * (attempt + 1))
|
| 172 |
+
sys.stderr.write(f"surrogate-call failed: {last_err}\n")
|
| 173 |
+
return 2
|
| 174 |
+
|
| 175 |
+
|
| 176 |
+
if __name__ == "__main__":
|
| 177 |
+
sys.exit(main())
|
|
@@ -0,0 +1,404 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env python3
|
| 2 |
+
"""Surrogate-1 β verifier ensemble (single source of truth for "safe to apply").
|
| 3 |
+
|
| 4 |
+
Used by autonomous-sre.sh + autonomous-release.sh BEFORE any action is
|
| 5 |
+
applied to the user's real systems. Returns a JSON verdict; the caller
|
| 6 |
+
applies only if verdict.ok == True.
|
| 7 |
+
|
| 8 |
+
Layers (each returns PASS / FAIL / SKIP):
|
| 9 |
+
1. ast β Python/JS AST parses
|
| 10 |
+
2. lint β ruff (.py) / eslint (.ts/.js) / shellcheck (.sh) / hadolint (Dockerfile)
|
| 11 |
+
/ cfn-lint (CF) / tfsec (TF)
|
| 12 |
+
3. typecheck β mypy / tsc / pyright if config present
|
| 13 |
+
4. tests β pytest -k test_<topic> if tests dir present
|
| 14 |
+
5. policy β refuse-list of destructive patterns (rm -rf /, DROP DATABASE,
|
| 15 |
+
iam:* on Resource: "*", DELETE FROM <table> without WHEREβ¦)
|
| 16 |
+
6. security β semgrep --config=p/ci, gitleaks for secrets, prowler if cf/tf
|
| 17 |
+
7. diff β change must be reversible, scoped, β€ MAX_LINES_CHANGED
|
| 18 |
+
8. sandbox β exec in throwaway docker/E2B if marked executable
|
| 19 |
+
9. confidence β caller passes model logprob; threshold check
|
| 20 |
+
|
| 21 |
+
DECISION:
|
| 22 |
+
ALL non-SKIP must be PASS, AND at least MIN_VERIFIERS_RUN actually executed.
|
| 23 |
+
Any FAIL β ok=False with reasons.
|
| 24 |
+
|
| 25 |
+
Usage:
|
| 26 |
+
verifier-ensemble.py \
|
| 27 |
+
--change /path/to/patch.diff \
|
| 28 |
+
--target /path/to/file/being/changed \
|
| 29 |
+
--kind iac|code|sql|shell \
|
| 30 |
+
--confidence 0.92 \
|
| 31 |
+
--out /tmp/verdict.json
|
| 32 |
+
"""
|
| 33 |
+
from __future__ import annotations
|
| 34 |
+
|
| 35 |
+
import argparse
|
| 36 |
+
import json
|
| 37 |
+
import os
|
| 38 |
+
import re
|
| 39 |
+
import shlex
|
| 40 |
+
import subprocess
|
| 41 |
+
import sys
|
| 42 |
+
import tempfile
|
| 43 |
+
from dataclasses import dataclass, field, asdict
|
| 44 |
+
from pathlib import Path
|
| 45 |
+
|
| 46 |
+
MIN_VERIFIERS_RUN = int(os.environ.get("VERIFIER_MIN_RUN", "3"))
|
| 47 |
+
MAX_LINES_CHANGED = int(os.environ.get("VERIFIER_MAX_LINES", "300"))
|
| 48 |
+
CONFIDENCE_FLOOR = float(os.environ.get("VERIFIER_CONFIDENCE_FLOOR", "0.55"))
|
| 49 |
+
|
| 50 |
+
# Hard refuse list β patterns that auto-FAIL regardless of other checks.
|
| 51 |
+
# Each entry: (regex, reason). Sourced from research (autonomous-24x7.md
|
| 52 |
+
# Β§HardGuards) β 14+ canonical rules. NEVER auto-override these in code.
|
| 53 |
+
REFUSE_PATTERNS = [
|
| 54 |
+
# 1. Filesystem destruction
|
| 55 |
+
(r"\brm\s+-rf\s+/(?!tmp|var/tmp|home/[^/]+/\.surrogate)", "rm -rf on real fs root"),
|
| 56 |
+
(r"\bchmod\s+-R\s+777\s+/(?!tmp)", "chmod 777 outside /tmp"),
|
| 57 |
+
(r"\bchown\s+-R\s+\S+\s+/(?!tmp|home/)", "chown -R on system path"),
|
| 58 |
+
# 2. Database destruction
|
| 59 |
+
(r"\bDROP\s+(DATABASE|TABLE|SCHEMA)\b", "destructive SQL DDL"),
|
| 60 |
+
(r"\bDELETE\s+FROM\b(?![^;]*\bWHERE\b)", "DELETE without WHERE"),
|
| 61 |
+
(r"\bTRUNCATE\s+TABLE\b", "TRUNCATE TABLE"),
|
| 62 |
+
# 3. IaC destructive ops on prod
|
| 63 |
+
(r"\bterraform\s+destroy\b", "terraform destroy"),
|
| 64 |
+
(r"\bterraform\s+(apply|plan).*\bworkspace.*\bprd\b", "terraform on prd workspace"),
|
| 65 |
+
(r"\bcdk\s+destroy\b.*\b(prd|prod)\b", "cdk destroy on prod"),
|
| 66 |
+
# 4. Cloud destructive ops
|
| 67 |
+
(r"\baws\s+s3\s+rb\s+--force\b", "aws s3 rb --force"),
|
| 68 |
+
(r"\baws\s+ec2\s+terminate-instances\b(?!.*--dry-run)", "ec2 terminate w/o dry-run"),
|
| 69 |
+
(r"\baws\s+rds\s+delete-db-instance\b(?!.*--final-db-snapshot-identifier)",
|
| 70 |
+
"rds delete w/o final snapshot"),
|
| 71 |
+
(r"\baws\s+route53\s+change-resource-record-sets\b.*\bDELETE\b", "Route53 DELETE record"),
|
| 72 |
+
# 5. Kubernetes destructive ops
|
| 73 |
+
(r"\bkubectl\s+delete\s+ns\b", "kubectl delete namespace"),
|
| 74 |
+
(r"\bkubectl\s+delete\s+\S+\s+\S*prod\S*\b", "kubectl delete *prod*"),
|
| 75 |
+
(r"\bhelm\s+install\b.*\b(http://|registry\.\S+)\b(?!.*allowlist)",
|
| 76 |
+
"helm install from non-allowlist registry"),
|
| 77 |
+
# 6. Git/source destruction
|
| 78 |
+
(r"\bgit\s+push\s+(--force|--force-with-lease).*\b(main|master|prod)\b",
|
| 79 |
+
"force-push to main/prod"),
|
| 80 |
+
(r"\bgit\s+filter-(branch|repo)\b", "git history rewrite"),
|
| 81 |
+
# 7. IAM / auth weakening
|
| 82 |
+
(r'"Action"\s*:\s*"\*".*"Resource"\s*:\s*"\*"', "IAM Allow * on *"),
|
| 83 |
+
(r'"Effect"\s*:\s*"Allow".*"Principal"\s*:\s*"\*"', "IAM Allow Principal *"),
|
| 84 |
+
(r"\baws\s+iam\s+(delete-user|delete-role|update-assume-role-policy)\b.*\b(admin|root|prod)\b",
|
| 85 |
+
"IAM destructive op on privileged identity"),
|
| 86 |
+
(r"\baws\s+ec2\s+revoke-security-group-(ingress|egress)\b.*\bprod\b",
|
| 87 |
+
"revoke prod SG rule"),
|
| 88 |
+
# 8. Disk / network
|
| 89 |
+
(r"\bdd\s+if=/dev/(zero|random)\s+of=/dev/[shv]d", "raw disk overwrite"),
|
| 90 |
+
(r"\biptables\s+-F\b", "iptables flush"),
|
| 91 |
+
# 9. Untrusted execution
|
| 92 |
+
(r"\b(curl|wget)\b\s+\S+\s*\|\s*(sudo\s+)?(bash|sh|zsh|python\d?)\b",
|
| 93 |
+
"curl | sh from network"),
|
| 94 |
+
(r"\bnpx\s+\S+\b(?!.*--package-lock-only)", "npx of untrusted package"),
|
| 95 |
+
# 10. Secrets in patch (must never land)
|
| 96 |
+
(r"AKIA[0-9A-Z]{16}", "AWS access key in patch"),
|
| 97 |
+
(r"-----BEGIN\s+(RSA|OPENSSH|EC|DSA)\s+PRIVATE\s+KEY-----", "private key in patch"),
|
| 98 |
+
(r"\bsk-[A-Za-z0-9]{32,}", "OpenAI/Anthropic-style API key"),
|
| 99 |
+
(r"\bhf_[A-Za-z0-9]{34}\b", "HuggingFace token in patch"),
|
| 100 |
+
# 11. MFA / security degradation
|
| 101 |
+
(r"\baws\s+iam\s+deactivate-mfa-device\b", "MFA deactivation"),
|
| 102 |
+
(r'"MultiFactorAuthPresent"\s*:\s*\{\s*"Bool"\s*:\s*"false"', "IAM bypass MFA"),
|
| 103 |
+
# 12. Helm / supply-chain risk
|
| 104 |
+
(r"\bdocker\s+pull\s+\S+(?!.*@sha256:)", "docker pull without digest pin"),
|
| 105 |
+
]
|
| 106 |
+
|
| 107 |
+
# Destructive-class actions require >=0.95 confidence (from research Β§HardGuards)
|
| 108 |
+
DESTRUCTIVE_KEYWORDS = (
|
| 109 |
+
"destroy", "delete", "drop", "truncate", "force-push", "rm -rf",
|
| 110 |
+
"terminate", "revoke", "deactivate-mfa", "filter-branch",
|
| 111 |
+
)
|
| 112 |
+
DESTRUCTIVE_CONFIDENCE_FLOOR = float(
|
| 113 |
+
os.environ.get("VERIFIER_DESTRUCTIVE_FLOOR", "0.95"))
|
| 114 |
+
|
| 115 |
+
|
| 116 |
+
def _is_destructive(change: str) -> bool:
|
| 117 |
+
low = change.lower()
|
| 118 |
+
return any(kw in low for kw in DESTRUCTIVE_KEYWORDS)
|
| 119 |
+
|
| 120 |
+
|
| 121 |
+
@dataclass
|
| 122 |
+
class CheckResult:
|
| 123 |
+
name: str
|
| 124 |
+
status: str # PASS / FAIL / SKIP
|
| 125 |
+
detail: str = ""
|
| 126 |
+
|
| 127 |
+
def passed(self) -> bool:
|
| 128 |
+
return self.status == "PASS"
|
| 129 |
+
|
| 130 |
+
def failed(self) -> bool:
|
| 131 |
+
return self.status == "FAIL"
|
| 132 |
+
|
| 133 |
+
|
| 134 |
+
@dataclass
|
| 135 |
+
class Verdict:
|
| 136 |
+
ok: bool
|
| 137 |
+
reasons: list[str] = field(default_factory=list)
|
| 138 |
+
checks: list[CheckResult] = field(default_factory=list)
|
| 139 |
+
n_pass: int = 0
|
| 140 |
+
n_fail: int = 0
|
| 141 |
+
n_skip: int = 0
|
| 142 |
+
|
| 143 |
+
|
| 144 |
+
def _run(cmd: list[str], timeout: int = 60, cwd: str | None = None) -> tuple[int, str, str]:
|
| 145 |
+
try:
|
| 146 |
+
p = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout, cwd=cwd)
|
| 147 |
+
return p.returncode, p.stdout, p.stderr
|
| 148 |
+
except subprocess.TimeoutExpired:
|
| 149 |
+
return 124, "", "timeout"
|
| 150 |
+
except FileNotFoundError:
|
| 151 |
+
return 127, "", f"binary not found: {cmd[0]}"
|
| 152 |
+
|
| 153 |
+
|
| 154 |
+
def _have(binary: str) -> bool:
|
| 155 |
+
return _run(["which", binary])[0] == 0
|
| 156 |
+
|
| 157 |
+
|
| 158 |
+
# ββ Layer 1: AST parse ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 159 |
+
def check_ast(target: Path, kind: str) -> CheckResult:
|
| 160 |
+
if not target.exists():
|
| 161 |
+
return CheckResult("ast", "SKIP", "target file does not exist yet")
|
| 162 |
+
if kind == "code" and target.suffix == ".py":
|
| 163 |
+
try:
|
| 164 |
+
import ast
|
| 165 |
+
ast.parse(target.read_text())
|
| 166 |
+
return CheckResult("ast", "PASS", "python AST parses")
|
| 167 |
+
except SyntaxError as e:
|
| 168 |
+
return CheckResult("ast", "FAIL", f"py syntax: {e}")
|
| 169 |
+
if kind == "code" and target.suffix in (".js", ".ts", ".tsx", ".jsx"):
|
| 170 |
+
if _have("node"):
|
| 171 |
+
rc, _, err = _run(["node", "--check", str(target)], timeout=15)
|
| 172 |
+
return CheckResult("ast", "PASS" if rc == 0 else "FAIL", err.strip()[:200] or "ok")
|
| 173 |
+
return CheckResult("ast", "SKIP", "node not installed")
|
| 174 |
+
if kind == "shell" or target.suffix == ".sh":
|
| 175 |
+
rc, _, err = _run(["bash", "-n", str(target)], timeout=15)
|
| 176 |
+
return CheckResult("ast", "PASS" if rc == 0 else "FAIL", err.strip()[:200] or "ok")
|
| 177 |
+
if kind == "iac" and target.suffix in (".yml", ".yaml", ".json"):
|
| 178 |
+
try:
|
| 179 |
+
txt = target.read_text()
|
| 180 |
+
if target.suffix == ".json":
|
| 181 |
+
json.loads(txt)
|
| 182 |
+
else:
|
| 183 |
+
import yaml # type: ignore
|
| 184 |
+
yaml.safe_load(txt)
|
| 185 |
+
return CheckResult("ast", "PASS", "yaml/json parses")
|
| 186 |
+
except Exception as e:
|
| 187 |
+
return CheckResult("ast", "FAIL", f"parse: {e}")
|
| 188 |
+
return CheckResult("ast", "SKIP", f"no AST parser for {target.suffix} (kind={kind})")
|
| 189 |
+
|
| 190 |
+
|
| 191 |
+
# ββ Layer 2: lint βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 192 |
+
def check_lint(target: Path, kind: str) -> CheckResult:
|
| 193 |
+
if not target.exists():
|
| 194 |
+
return CheckResult("lint", "SKIP", "no file")
|
| 195 |
+
sx = target.suffix
|
| 196 |
+
if sx == ".py" and _have("ruff"):
|
| 197 |
+
rc, out, _ = _run(["ruff", "check", str(target), "--quiet"], timeout=30)
|
| 198 |
+
return CheckResult("lint", "PASS" if rc == 0 else "FAIL", out.strip()[:300] or "clean")
|
| 199 |
+
if sx == ".sh" and _have("shellcheck"):
|
| 200 |
+
rc, out, _ = _run(["shellcheck", "-S", "warning", str(target)], timeout=30)
|
| 201 |
+
return CheckResult("lint", "PASS" if rc == 0 else "FAIL", out.strip()[:300] or "clean")
|
| 202 |
+
if target.name in ("Dockerfile",) and _have("hadolint"):
|
| 203 |
+
rc, out, _ = _run(["hadolint", "--no-fail", str(target)], timeout=30)
|
| 204 |
+
return CheckResult("lint", "PASS" if rc == 0 else "FAIL", out.strip()[:300])
|
| 205 |
+
if kind == "iac" and "cf" in str(target).lower() and _have("cfn-lint"):
|
| 206 |
+
rc, out, _ = _run(["cfn-lint", str(target)], timeout=60)
|
| 207 |
+
return CheckResult("lint", "PASS" if rc == 0 else "FAIL", out.strip()[:300] or "clean")
|
| 208 |
+
if kind == "iac" and target.suffix == ".tf" and _have("tflint"):
|
| 209 |
+
rc, out, _ = _run(["tflint", str(target)], timeout=60)
|
| 210 |
+
return CheckResult("lint", "PASS" if rc == 0 else "FAIL", out.strip()[:300] or "clean")
|
| 211 |
+
return CheckResult("lint", "SKIP", "no linter for file type or binary missing")
|
| 212 |
+
|
| 213 |
+
|
| 214 |
+
# ββ Layer 3: typecheck ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 215 |
+
def check_typecheck(target: Path, kind: str) -> CheckResult:
|
| 216 |
+
if not target.exists() or kind != "code":
|
| 217 |
+
return CheckResult("typecheck", "SKIP", "n/a")
|
| 218 |
+
if target.suffix == ".py" and _have("mypy"):
|
| 219 |
+
rc, out, _ = _run(["mypy", "--ignore-missing-imports", "--no-error-summary",
|
| 220 |
+
str(target)], timeout=45)
|
| 221 |
+
return CheckResult("typecheck", "PASS" if rc == 0 else "FAIL", out.strip()[:300] or "ok")
|
| 222 |
+
if target.suffix in (".ts", ".tsx") and _have("tsc"):
|
| 223 |
+
rc, out, _ = _run(["tsc", "--noEmit", "--allowJs", str(target)], timeout=60)
|
| 224 |
+
return CheckResult("typecheck", "PASS" if rc == 0 else "FAIL", out.strip()[:300] or "ok")
|
| 225 |
+
return CheckResult("typecheck", "SKIP", "no typechecker available")
|
| 226 |
+
|
| 227 |
+
|
| 228 |
+
# ββ Layer 4: tests ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 229 |
+
def check_tests(target: Path, kind: str) -> CheckResult:
|
| 230 |
+
repo = target.parent
|
| 231 |
+
while repo != repo.parent and not (repo / ".git").exists():
|
| 232 |
+
repo = repo.parent
|
| 233 |
+
if not (repo / ".git").exists():
|
| 234 |
+
return CheckResult("tests", "SKIP", "not a git repo")
|
| 235 |
+
test_dir = next((repo / d for d in ("tests", "test", "__tests__") if (repo / d).is_dir()), None)
|
| 236 |
+
if test_dir is None:
|
| 237 |
+
return CheckResult("tests", "SKIP", "no tests/ dir")
|
| 238 |
+
if _have("pytest"):
|
| 239 |
+
rc, out, _ = _run(["pytest", "-x", "--tb=line", "-q", str(test_dir)],
|
| 240 |
+
timeout=180, cwd=str(repo))
|
| 241 |
+
return CheckResult("tests", "PASS" if rc == 0 else "FAIL",
|
| 242 |
+
out.strip().splitlines()[-1][:200] if out else "no output")
|
| 243 |
+
return CheckResult("tests", "SKIP", "pytest not installed")
|
| 244 |
+
|
| 245 |
+
|
| 246 |
+
# ββ Layer 5: policy (refuse-list) βββββββββββββββββββββββββββββββββββββββββββ
|
| 247 |
+
def check_policy(change: str) -> CheckResult:
|
| 248 |
+
hits = []
|
| 249 |
+
for pat, reason in REFUSE_PATTERNS:
|
| 250 |
+
if re.search(pat, change, flags=re.IGNORECASE):
|
| 251 |
+
hits.append(reason)
|
| 252 |
+
if hits:
|
| 253 |
+
return CheckResult("policy", "FAIL", f"refused: {'; '.join(hits)}")
|
| 254 |
+
return CheckResult("policy", "PASS", "no refuse-list patterns matched")
|
| 255 |
+
|
| 256 |
+
|
| 257 |
+
# ββ Layer 6: security βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 258 |
+
def check_security(target: Path, change: str) -> CheckResult:
|
| 259 |
+
detail = []
|
| 260 |
+
# secrets β gitleaks if available, else regex fallback
|
| 261 |
+
if _have("gitleaks"):
|
| 262 |
+
with tempfile.NamedTemporaryFile("w", suffix=".diff", delete=False) as f:
|
| 263 |
+
f.write(change); patch = f.name
|
| 264 |
+
rc, out, _ = _run(["gitleaks", "detect", "--no-git", "--source", patch,
|
| 265 |
+
"--report-format", "json"], timeout=30)
|
| 266 |
+
if rc != 0 and out.strip() and out.strip() != "[]":
|
| 267 |
+
detail.append(f"gitleaks hit: {out[:200]}")
|
| 268 |
+
else:
|
| 269 |
+
for pat in (r"AKIA[0-9A-Z]{16}", r"AIza[0-9A-Za-z\-_]{35}",
|
| 270 |
+
r"sk-[a-zA-Z0-9]{32,}", r"hf_[a-zA-Z0-9]{34}"):
|
| 271 |
+
if re.search(pat, change):
|
| 272 |
+
detail.append(f"secret pattern: {pat[:20]}β¦")
|
| 273 |
+
# semgrep
|
| 274 |
+
if _have("semgrep") and target.exists():
|
| 275 |
+
rc, out, _ = _run(["semgrep", "--config=p/ci", "--quiet", "--error",
|
| 276 |
+
"--timeout", "30", str(target)], timeout=90)
|
| 277 |
+
if rc not in (0, 1): # 1 = findings, accept; >1 = real error
|
| 278 |
+
detail.append(f"semgrep err: {out[:120]}")
|
| 279 |
+
elif rc == 1:
|
| 280 |
+
detail.append(f"semgrep findings: {out.strip().splitlines()[-1][:150]}")
|
| 281 |
+
# iac scanners
|
| 282 |
+
if "cf" in str(target).lower() and _have("cfn-guard"):
|
| 283 |
+
rules = os.environ.get("CFN_GUARD_RULES", "")
|
| 284 |
+
if rules:
|
| 285 |
+
rc, out, _ = _run(["cfn-guard", "validate", "-d", str(target), "-r", rules],
|
| 286 |
+
timeout=60)
|
| 287 |
+
if rc != 0:
|
| 288 |
+
detail.append(f"cfn-guard: {out[:200]}")
|
| 289 |
+
if not detail:
|
| 290 |
+
return CheckResult("security", "PASS", "no findings")
|
| 291 |
+
return CheckResult("security", "FAIL", " | ".join(detail))
|
| 292 |
+
|
| 293 |
+
|
| 294 |
+
# ββ Layer 7: diff sanity ββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 295 |
+
def check_diff(change: str) -> CheckResult:
|
| 296 |
+
lines = change.splitlines()
|
| 297 |
+
add = sum(1 for L in lines if L.startswith("+") and not L.startswith("+++"))
|
| 298 |
+
rem = sum(1 for L in lines if L.startswith("-") and not L.startswith("---"))
|
| 299 |
+
total = add + rem
|
| 300 |
+
if total == 0:
|
| 301 |
+
return CheckResult("diff", "FAIL", "empty diff")
|
| 302 |
+
if total > MAX_LINES_CHANGED:
|
| 303 |
+
return CheckResult("diff", "FAIL",
|
| 304 |
+
f"{total} lines changed > limit {MAX_LINES_CHANGED}")
|
| 305 |
+
files_changed = sum(1 for L in lines if L.startswith("+++ b/"))
|
| 306 |
+
if files_changed > 8:
|
| 307 |
+
return CheckResult("diff", "FAIL", f"{files_changed} files in one change > 8")
|
| 308 |
+
return CheckResult("diff", "PASS", f"+{add}/-{rem} lines, {files_changed} files")
|
| 309 |
+
|
| 310 |
+
|
| 311 |
+
# ββ Layer 8: sandbox exec (best-effort) βββββββββββββββββββββββββββββββββββββ
|
| 312 |
+
def check_sandbox(target: Path, kind: str) -> CheckResult:
|
| 313 |
+
if kind != "shell" or target.suffix != ".sh" or not target.exists():
|
| 314 |
+
return CheckResult("sandbox", "SKIP", "not a shell script or no target")
|
| 315 |
+
if not _have("docker"):
|
| 316 |
+
# Fall back to bash subshell with restricted env, no network
|
| 317 |
+
rc, out, err = _run(["env", "-i", "PATH=/usr/bin:/bin",
|
| 318 |
+
"bash", "-c", f"set -e; bash -n {shlex.quote(str(target))}"],
|
| 319 |
+
timeout=10)
|
| 320 |
+
return CheckResult("sandbox", "PASS" if rc == 0 else "FAIL",
|
| 321 |
+
(err or out).strip()[:200] or "ok-no-exec")
|
| 322 |
+
# docker β run in network=none, read-only, dropped caps
|
| 323 |
+
rc, out, err = _run([
|
| 324 |
+
"docker", "run", "--rm", "--network=none", "--read-only",
|
| 325 |
+
"--cap-drop=ALL", "--memory=256m", "--cpus=0.5",
|
| 326 |
+
"-v", f"{target}:/script.sh:ro",
|
| 327 |
+
"alpine:3.20", "sh", "-c", "bash /script.sh --dry-run --help 2>&1 | head -20",
|
| 328 |
+
], timeout=30)
|
| 329 |
+
return CheckResult("sandbox", "PASS" if rc == 0 else "FAIL",
|
| 330 |
+
(out or err).strip()[:200] or "ok")
|
| 331 |
+
|
| 332 |
+
|
| 333 |
+
# ββ Layer 9: confidence (with destructive-class escalation) ββββββββββββββββ
|
| 334 |
+
def check_confidence(conf: float | None, change: str) -> CheckResult:
|
| 335 |
+
if conf is None:
|
| 336 |
+
return CheckResult("confidence", "SKIP", "no confidence supplied")
|
| 337 |
+
floor = CONFIDENCE_FLOOR
|
| 338 |
+
if _is_destructive(change):
|
| 339 |
+
floor = max(floor, DESTRUCTIVE_CONFIDENCE_FLOOR)
|
| 340 |
+
suffix = " (destructive-class)"
|
| 341 |
+
else:
|
| 342 |
+
suffix = ""
|
| 343 |
+
if conf < floor:
|
| 344 |
+
return CheckResult("confidence", "FAIL",
|
| 345 |
+
f"{conf:.2f} below floor {floor}{suffix}")
|
| 346 |
+
return CheckResult("confidence", "PASS", f"{conf:.2f} β₯ {floor}{suffix}")
|
| 347 |
+
|
| 348 |
+
|
| 349 |
+
# ββ Orchestrator ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 350 |
+
def verify(change: str, target: Path, kind: str, confidence: float | None) -> Verdict:
|
| 351 |
+
checks = [
|
| 352 |
+
check_diff(change), # 7
|
| 353 |
+
check_policy(change), # 5 β fail-fast hard
|
| 354 |
+
check_ast(target, kind), # 1
|
| 355 |
+
check_lint(target, kind), # 2
|
| 356 |
+
check_typecheck(target, kind), # 3
|
| 357 |
+
check_tests(target, kind), # 4
|
| 358 |
+
check_security(target, change), # 6
|
| 359 |
+
check_sandbox(target, kind), # 8
|
| 360 |
+
check_confidence(confidence, change), # 9 (with destructive escalation)
|
| 361 |
+
]
|
| 362 |
+
n_pass = sum(c.passed() for c in checks)
|
| 363 |
+
n_fail = sum(c.failed() for c in checks)
|
| 364 |
+
n_skip = sum(c.status == "SKIP" for c in checks)
|
| 365 |
+
reasons = [f"{c.name}: {c.detail}" for c in checks if c.failed()]
|
| 366 |
+
n_run = n_pass + n_fail
|
| 367 |
+
ok = (n_fail == 0) and (n_run >= MIN_VERIFIERS_RUN)
|
| 368 |
+
if not ok and n_run < MIN_VERIFIERS_RUN:
|
| 369 |
+
reasons.append(f"only {n_run} verifiers ran (min {MIN_VERIFIERS_RUN}) β install missing tools")
|
| 370 |
+
return Verdict(ok=ok, reasons=reasons, checks=checks,
|
| 371 |
+
n_pass=n_pass, n_fail=n_fail, n_skip=n_skip)
|
| 372 |
+
|
| 373 |
+
|
| 374 |
+
def main() -> int:
|
| 375 |
+
p = argparse.ArgumentParser()
|
| 376 |
+
p.add_argument("--change", required=True,
|
| 377 |
+
help="path to unified-diff or raw patch text")
|
| 378 |
+
p.add_argument("--target", required=True,
|
| 379 |
+
help="primary file the change applies to")
|
| 380 |
+
p.add_argument("--kind", required=True, choices=["code", "iac", "sql", "shell"])
|
| 381 |
+
p.add_argument("--confidence", type=float, default=None,
|
| 382 |
+
help="model logprob-derived confidence in [0,1]")
|
| 383 |
+
p.add_argument("--out", default=None, help="write verdict JSON to this path")
|
| 384 |
+
args = p.parse_args()
|
| 385 |
+
|
| 386 |
+
change_path = Path(args.change)
|
| 387 |
+
change_txt = change_path.read_text() if change_path.exists() else args.change
|
| 388 |
+
verdict = verify(change_txt, Path(args.target), args.kind, args.confidence)
|
| 389 |
+
j = json.dumps({
|
| 390 |
+
"ok": verdict.ok,
|
| 391 |
+
"reasons": verdict.reasons,
|
| 392 |
+
"n_pass": verdict.n_pass,
|
| 393 |
+
"n_fail": verdict.n_fail,
|
| 394 |
+
"n_skip": verdict.n_skip,
|
| 395 |
+
"checks": [asdict(c) for c in verdict.checks],
|
| 396 |
+
}, indent=2)
|
| 397 |
+
print(j)
|
| 398 |
+
if args.out:
|
| 399 |
+
Path(args.out).write_text(j)
|
| 400 |
+
return 0 if verdict.ok else 1
|
| 401 |
+
|
| 402 |
+
|
| 403 |
+
if __name__ == "__main__":
|
| 404 |
+
sys.exit(main())
|
|
@@ -0,0 +1,195 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
#!/usr/bin/env bash
|
| 2 |
+
# Surrogate-1 β watchdog (research Β§autonomous-24x7 pattern 6).
|
| 3 |
+
#
|
| 4 |
+
# Runs as a SEPARATE process from autonomous-sre / autonomous-release. Its
|
| 5 |
+
# only job is to observe outcomes.jsonl + the daemon processes and KILL
|
| 6 |
+
# them if it detects:
|
| 7 |
+
# - Loop / runaway (same trigger fires N times in M minutes)
|
| 8 |
+
# - Audit-gap (daemon logs an "applied" outcome but verifier was missing)
|
| 9 |
+
# - Failure cascade (β₯X consecutive rollback/error outcomes)
|
| 10 |
+
# - Disk fill (state dir > $STATE_GB_LIMIT GB)
|
| 11 |
+
# - Outcome rate spike (β₯X outcomes in 1 min β likely runaway)
|
| 12 |
+
#
|
| 13 |
+
# Watchdog must NEVER apply patches itself (no Surrogate calls, no patch
|
| 14 |
+
# tool). It only observes and kills. Restart of daemons is a human
|
| 15 |
+
# decision after reading the kill reason.
|
| 16 |
+
#
|
| 17 |
+
# Usage (run on a machine separate from the daemons in the hardened setup;
|
| 18 |
+
# for now we run it as a sibling process):
|
| 19 |
+
# nohup bash bin/v2/watchdog.sh \
|
| 20 |
+
# > $HOME/.surrogate/logs/watchdog.log 2>&1 &
|
| 21 |
+
set -uo pipefail
|
| 22 |
+
[[ -f "$HOME/.hermes/.env" ]] && { set -a; source "$HOME/.hermes/.env" 2>/dev/null; set +a; }
|
| 23 |
+
|
| 24 |
+
STATE="$HOME/.surrogate/state"
|
| 25 |
+
OUTCOMES="$STATE/outcomes.jsonl"
|
| 26 |
+
LOG="$HOME/.surrogate/logs/watchdog.log"
|
| 27 |
+
KILLED="$STATE/watchdog-killed"
|
| 28 |
+
mkdir -p "$STATE" "$(dirname "$LOG")"
|
| 29 |
+
|
| 30 |
+
INTERVAL_SEC="${WD_INTERVAL_SEC:-60}" # check every minute
|
| 31 |
+
LOOP_THRESHOLD_N="${WD_LOOP_N:-5}" # same trigger β₯5Γ
|
| 32 |
+
LOOP_WINDOW_MIN="${WD_LOOP_WIN_MIN:-15}" # in 15 min
|
| 33 |
+
CASCADE_THRESHOLD="${WD_CASCADE_N:-5}" # β₯5 consecutive failures
|
| 34 |
+
RATE_SPIKE_PER_MIN="${WD_RATE_SPIKE:-30}" # β₯30 outcomes/min
|
| 35 |
+
STATE_GB_LIMIT="${WD_STATE_GB:-5}"
|
| 36 |
+
DAEMONS=(
|
| 37 |
+
"autonomous-sre.sh"
|
| 38 |
+
"autonomous-release.sh"
|
| 39 |
+
"auto-swap-and-bench.sh"
|
| 40 |
+
)
|
| 41 |
+
|
| 42 |
+
log() { echo "[$(date '+%Y-%m-%dT%H:%M:%S')] $*" | tee -a "$LOG"; }
|
| 43 |
+
notify() {
|
| 44 |
+
[[ -z "${DISCORD_WEBHOOK:-}" ]] && return
|
| 45 |
+
curl -s -X POST -H "Content-Type: application/json" \
|
| 46 |
+
-d "{\"content\":\"π¨ watchdog: $1\"}" \
|
| 47 |
+
"$DISCORD_WEBHOOK" >/dev/null 2>&1 || true
|
| 48 |
+
}
|
| 49 |
+
|
| 50 |
+
kill_daemons() {
|
| 51 |
+
local reason="$1"
|
| 52 |
+
log "βββ KILL: $reason βββ"
|
| 53 |
+
notify "KILL β $reason"
|
| 54 |
+
: > "$KILLED"; date -u +%Y-%m-%dT%H:%M:%SZ >> "$KILLED"
|
| 55 |
+
echo "$reason" >> "$KILLED"
|
| 56 |
+
for d in "${DAEMONS[@]}"; do
|
| 57 |
+
if pgrep -f "$d" >/dev/null; then
|
| 58 |
+
log " pkill -f $d"
|
| 59 |
+
pkill -f "$d" || true
|
| 60 |
+
fi
|
| 61 |
+
done
|
| 62 |
+
sleep 5
|
| 63 |
+
for d in "${DAEMONS[@]}"; do
|
| 64 |
+
if pgrep -f "$d" >/dev/null; then
|
| 65 |
+
log " pkill -9 -f $d (still alive)"
|
| 66 |
+
pkill -9 -f "$d" || true
|
| 67 |
+
fi
|
| 68 |
+
done
|
| 69 |
+
}
|
| 70 |
+
|
| 71 |
+
# Detect: same trigger fires N times in M minutes
|
| 72 |
+
check_loop() {
|
| 73 |
+
[[ ! -f "$OUTCOMES" ]] && return 0
|
| 74 |
+
python3 - <<PYEOF
|
| 75 |
+
import json, datetime as dt, collections, sys
|
| 76 |
+
cutoff = dt.datetime.now(dt.timezone.utc) - dt.timedelta(minutes=$LOOP_WINDOW_MIN)
|
| 77 |
+
recent = collections.Counter()
|
| 78 |
+
for L in open("$OUTCOMES"):
|
| 79 |
+
try: r = json.loads(L)
|
| 80 |
+
except: continue
|
| 81 |
+
try:
|
| 82 |
+
ts = dt.datetime.strptime(r["ts"], "%Y-%m-%dT%H:%M:%SZ")
|
| 83 |
+
except: continue
|
| 84 |
+
if ts < cutoff: continue
|
| 85 |
+
recent[r.get("trigger","?")] += 1
|
| 86 |
+
for trig, n in recent.items():
|
| 87 |
+
if n >= $LOOP_THRESHOLD_N:
|
| 88 |
+
sys.exit(11) # loop detected
|
| 89 |
+
sys.exit(0)
|
| 90 |
+
PYEOF
|
| 91 |
+
return $?
|
| 92 |
+
}
|
| 93 |
+
|
| 94 |
+
# Detect: β₯X consecutive non-success outcomes
|
| 95 |
+
check_cascade() {
|
| 96 |
+
[[ ! -f "$OUTCOMES" ]] && return 0
|
| 97 |
+
python3 - <<PYEOF
|
| 98 |
+
import json, sys
|
| 99 |
+
streak = 0
|
| 100 |
+
recent = []
|
| 101 |
+
for L in open("$OUTCOMES"):
|
| 102 |
+
try: r = json.loads(L)
|
| 103 |
+
except: continue
|
| 104 |
+
recent.append(r)
|
| 105 |
+
recent = recent[-$CASCADE_THRESHOLD:]
|
| 106 |
+
if len(recent) < $CASCADE_THRESHOLD:
|
| 107 |
+
sys.exit(0)
|
| 108 |
+
if all(r.get("outcome") in ("rollback","error") for r in recent):
|
| 109 |
+
sys.exit(12)
|
| 110 |
+
sys.exit(0)
|
| 111 |
+
PYEOF
|
| 112 |
+
return $?
|
| 113 |
+
}
|
| 114 |
+
|
| 115 |
+
# Detect: outcome rate spike (>X in last minute)
|
| 116 |
+
check_rate_spike() {
|
| 117 |
+
[[ ! -f "$OUTCOMES" ]] && return 0
|
| 118 |
+
python3 - <<PYEOF
|
| 119 |
+
import json, datetime as dt, sys
|
| 120 |
+
cutoff = dt.datetime.now(dt.timezone.utc) - dt.timedelta(minutes=1)
|
| 121 |
+
n = 0
|
| 122 |
+
for L in open("$OUTCOMES"):
|
| 123 |
+
try: r = json.loads(L)
|
| 124 |
+
except: continue
|
| 125 |
+
try:
|
| 126 |
+
ts = dt.datetime.strptime(r["ts"], "%Y-%m-%dT%H:%M:%SZ")
|
| 127 |
+
except: continue
|
| 128 |
+
if ts >= cutoff: n += 1
|
| 129 |
+
if n >= $RATE_SPIKE_PER_MIN:
|
| 130 |
+
sys.exit(13)
|
| 131 |
+
sys.exit(0)
|
| 132 |
+
PYEOF
|
| 133 |
+
return $?
|
| 134 |
+
}
|
| 135 |
+
|
| 136 |
+
# Detect: applied without a verdict (audit gap)
|
| 137 |
+
check_audit_gap() {
|
| 138 |
+
[[ ! -f "$OUTCOMES" ]] && return 0
|
| 139 |
+
python3 - <<'PYEOF'
|
| 140 |
+
import json, sys, os
|
| 141 |
+
gaps = 0
|
| 142 |
+
with open(os.environ["OUTCOMES"]) as f:
|
| 143 |
+
for L in f.readlines()[-50:]:
|
| 144 |
+
try: r = json.loads(L)
|
| 145 |
+
except: continue
|
| 146 |
+
if r.get("applied") and not r.get("verdict"):
|
| 147 |
+
gaps += 1
|
| 148 |
+
if gaps >= 3:
|
| 149 |
+
sys.exit(14)
|
| 150 |
+
sys.exit(0)
|
| 151 |
+
PYEOF
|
| 152 |
+
return $?
|
| 153 |
+
}
|
| 154 |
+
export OUTCOMES
|
| 155 |
+
|
| 156 |
+
# Detect: state dir disk fill
|
| 157 |
+
check_disk() {
|
| 158 |
+
local kb gb
|
| 159 |
+
kb=$(du -sk "$STATE" 2>/dev/null | awk '{print $1}')
|
| 160 |
+
gb=$(( kb / 1048576 ))
|
| 161 |
+
if (( gb > STATE_GB_LIMIT )); then
|
| 162 |
+
log "state dir = ${gb}GB > limit ${STATE_GB_LIMIT}GB"
|
| 163 |
+
return 15
|
| 164 |
+
fi
|
| 165 |
+
return 0
|
| 166 |
+
}
|
| 167 |
+
|
| 168 |
+
log "βββ watchdog starting (interval=${INTERVAL_SEC}s) βββ"
|
| 169 |
+
notify "watchdog online"
|
| 170 |
+
|
| 171 |
+
while true; do
|
| 172 |
+
if [[ -f "$KILLED" ]]; then
|
| 173 |
+
log "killed marker present β staying dormant. Remove $KILLED to re-arm."
|
| 174 |
+
sleep "$INTERVAL_SEC"
|
| 175 |
+
continue
|
| 176 |
+
fi
|
| 177 |
+
|
| 178 |
+
rc=0
|
| 179 |
+
check_loop || rc=$?
|
| 180 |
+
[[ $rc -eq 11 ]] && { kill_daemons "LOOP detected (β₯$LOOP_THRESHOLD_N same trigger in ${LOOP_WINDOW_MIN}m)"; continue; }
|
| 181 |
+
|
| 182 |
+
check_cascade || rc=$?
|
| 183 |
+
[[ $rc -eq 12 ]] && { kill_daemons "CASCADE: $CASCADE_THRESHOLD consecutive rollback/error"; continue; }
|
| 184 |
+
|
| 185 |
+
check_rate_spike || rc=$?
|
| 186 |
+
[[ $rc -eq 13 ]] && { kill_daemons "RATE SPIKE: β₯$RATE_SPIKE_PER_MIN outcomes in 60s"; continue; }
|
| 187 |
+
|
| 188 |
+
check_audit_gap || rc=$?
|
| 189 |
+
[[ $rc -eq 14 ]] && { kill_daemons "AUDIT GAP: β₯3 applied actions without verdict"; continue; }
|
| 190 |
+
|
| 191 |
+
check_disk || rc=$?
|
| 192 |
+
[[ $rc -eq 15 ]] && { kill_daemons "DISK: $STATE >${STATE_GB_LIMIT}GB"; continue; }
|
| 193 |
+
|
| 194 |
+
sleep "$INTERVAL_SEC"
|
| 195 |
+
done
|