OGrohit commited on
Commit
580afcf
·
1 Parent(s): 1972eae

Updated README

Browse files
Files changed (1) hide show
  1. README.md +46 -41
README.md CHANGED
@@ -409,48 +409,53 @@ tags:
409
 
410
  ## 12. Baseline Inference Script
411
 
412
- `inference.py` uses an OpenAI-compatible client with configurable provider settings to run `llama-3.3-70b-versatile` as a zero-shot agent against all 3 tasks and reports scores.
413
 
414
- ```python
415
- # inference.py (structure)
416
- import os
417
- from openai import OpenAI
418
- import requests
419
-
420
- BASE_URL = os.getenv("ENV_URL", "http://localhost:7860")
421
- client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
422
-
423
- def run_task(task_id: str) -> float:
424
- # reset environment
425
- obs = requests.post(f"{BASE_URL}/reset", json={"task": task_id}).json()
426
-
427
- done = False
428
- while not done:
429
- # build prompt from observation
430
- prompt = build_prompt(obs)
431
-
432
- # call LLM
433
- response = client.chat.completions.create(
434
- model="gpt-4o-mini",
435
- messages=[{"role": "user", "content": prompt}]
436
- )
437
-
438
- # parse action from response
439
- action = parse_action(response.choices[0].message.content)
440
-
441
- # step environment
442
- result = requests.post(f"{BASE_URL}/step", json=action).json()
443
- obs = result
444
- done = result["done"]
445
-
446
- # get final grader score
447
- score = requests.post(f"{BASE_URL}/grader").json()["score"]
448
- return score
449
-
450
- if __name__ == "__main__":
451
- for task in ["single_crash", "cascading_failure", "silent_degradation"]:
452
- score = run_task(task)
453
- print(f"{task}: {score:.3f}")
 
 
 
 
 
454
  ```
455
 
456
  ---
 
409
 
410
  ## 12. Baseline Inference Script
411
 
412
+ `inference.py` uses an OpenAI-compatible client with configurable provider settings to run any LLM (default: `meta-llama/Llama-3.3-70B-Instruct` via Hugging Face router) as a zero-shot SRE agent against all 3 tasks and reports structured scores.
413
 
414
+ ### Environment Variables
415
+
416
+ | Variable | Default | Description |
417
+ |---|---|---|
418
+ | `HF_TOKEN` | *(required)* | API key for the LLM provider |
419
+ | `API_BASE_URL` | `https://router.huggingface.co/v1` | API endpoint |
420
+ | `MODEL_NAME` | `meta-llama/Llama-3.3-70B-Instruct` | Model identifier |
421
+ | `ENV_URL` | `http://localhost:7860` | LogTriageEnv server |
422
+
423
+ ### Key Features
424
+
425
+ - **System prompt** Structured SRE triage persona with action schema enforced via JSON output
426
+ - **Conversation history** — Bounded to 8 turns to stay within context limits
427
+ - **Fallback logic** — Heuristic fallback if LLM fails to parse or call; avoids episode crashes
428
+ - **Step rate limiting** — 200ms sleep between steps to avoid provider rate limits
429
+ - **Health check** Validates environment is reachable before running tasks
430
+ - **Seeded reproducibility** — All tasks run with `seed=42`
431
+
432
+ ### Usage
433
+
434
+ ```bash
435
+ export HF_TOKEN=your_key_here
436
+ export API_BASE_URL=https://api.groq.com/openai/v1 # or HF router
437
+ export MODEL_NAME=llama-3.3-70b-versatile
438
+
439
+ python inference.py
440
+ ```
441
+
442
+ ### Output
443
+
444
+ The script prints a per-task score bar and returns a JSON block with full breakdown:
445
+
446
+ ```json
447
+ {
448
+ "api_base_url": "https://api.groq.com/openai/v1",
449
+ "model_name": "llama-3.3-70b-versatile",
450
+ "seed": 42,
451
+ "results": [
452
+ { "task_id": "single_crash", "score": 1.0, "steps_taken": 5, "breakdown": {} },
453
+ { "task_id": "cascading_failure", "score": 0.65, "steps_taken": 9, "breakdown": {} },
454
+ { "task_id": "silent_degradation", "score": 1.0, "steps_taken": 12, "breakdown": {} }
455
+ ],
456
+ "average_score": 0.8833,
457
+ "runtime_seconds": 45.2
458
+ }
459
  ```
460
 
461
  ---