File size: 2,410 Bytes
9e6be29
 
 
 
 
 
2292d06
9e6be29
 
 
2292d06
 
9e6be29
 
 
 
 
2292d06
 
 
9e6be29
2292d06
9e6be29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# Fixed Levels Submission Dataset

This folder contains a fixed three-level OSINT benchmark set built on one shared base graph.

## Files

- `seed_fixed_levels.json`: master fixed seed with an expanded canonical graph and 30 fixed questions.
- `fixed_graph_questions.json`: extracted fixed dataset snapshot for submission packaging.
- `shared_config_fixed_levels.json`: run config used for generation and evaluation.
- `complete_dataset_qwen_generated.json`: full dataset after Qwen (`qwen3:2b` via Ollama) expands the graph.
- `qwen_swarm_eval_fixed_levels.json`: legacy Qwen swarm evaluation summary from the older smaller version of the set.
- `qwen_swarm_benchmark_fixed_levels.json`: legacy benchmark output from the older smaller version of the set.
- `leaderboard_fixed_levels.json`: leaderboard file for this dataset.
- `dashboard_fixed_levels.html`: interactive dashboard generated from the benchmark run.

## Difficulty Design

- Easy: 10 questions. These now use the older hard-style multi-hop traces as the new floor.
- Mid: 10 questions. Each question spans roughly 15-20 supporting nodes.
- High: 10 questions. Each question spans roughly 50 supporting nodes.

All 30 questions are fixed and share the same larger seeded graph.

## Regenerate Artifacts

```bash
source ~/arl/bin/activate
cd /home/ritish/test1
PYTHONPATH=src python scripts/build_fixed_levels_dataset.py \
  --seed-file datasets/fixed_levels/seed_fixed_levels.json \
  --shared-config datasets/fixed_levels/shared_config_fixed_levels.json \
  --output-dir datasets/fixed_levels
```

## Evaluate Qwen Swarm

```bash
source ~/arl/bin/activate
cd /home/ritish/test1
PYTHONPATH=src osint-env eval \
  --config datasets/fixed_levels/shared_config_fixed_levels.json \
  --seed-file datasets/fixed_levels/seed_fixed_levels.json \
  --agent-mode swarm \
  --llm-provider ollama \
  --llm-model qwen3:2b \
  --episodes 15
```

## Benchmark + Dashboard

```bash
source ~/arl/bin/activate
cd /home/ritish/test1
PYTHONPATH=src osint-env benchmark \
  --config datasets/fixed_levels/shared_config_fixed_levels.json \
  --seed-file datasets/fixed_levels/seed_fixed_levels.json \
  --agent-mode swarm \
  --llm-provider ollama \
  --llm-model qwen3:2b \
  --episodes 15 \
  --name fixed_levels_qwen_swarm \
  --leaderboard datasets/fixed_levels/leaderboard_fixed_levels.json \
  --dashboard datasets/fixed_levels/dashboard_fixed_levels.html
```