OSINT / datasets /fixed_levels /README.md
siddeshwar-kagatikar
fix(rewards): never crash GRPO on malformed completions
d814291

Fixed Levels Submission Dataset

This folder contains a fixed three-level OSINT benchmark set built on one shared base graph.

Files

  • seed_fixed_levels.json: master fixed seed with an expanded canonical graph and 30 fixed questions.
  • fixed_graph_questions.json: extracted fixed dataset snapshot for submission packaging.
  • shared_config_fixed_levels.json: run config used for generation and evaluation.
  • complete_dataset_qwen_generated.json: full dataset after Qwen (qwen3:2b via Ollama) expands the graph.
  • qwen_swarm_eval_fixed_levels.json: legacy Qwen swarm evaluation summary from the older smaller version of the set.
  • qwen_swarm_benchmark_fixed_levels.json: legacy benchmark output from the older smaller version of the set.
  • leaderboard_fixed_levels.json: leaderboard file for this dataset.
  • dashboard_fixed_levels.html: interactive dashboard generated from the benchmark run.

Difficulty Design

  • Easy: 10 questions. These now use the older hard-style multi-hop traces as the new floor.
  • Mid: 10 questions. Each question spans roughly 15-20 supporting nodes.
  • High: 10 questions. Each question spans roughly 50 supporting nodes.

All 30 questions are fixed and share the same larger seeded graph.

Regenerate Artifacts

source ~/arl/bin/activate
cd /home/ritish/test1
PYTHONPATH=src python scripts/build_fixed_levels_dataset.py \
  --seed-file datasets/fixed_levels/seed_fixed_levels.json \
  --shared-config datasets/fixed_levels/shared_config_fixed_levels.json \
  --output-dir datasets/fixed_levels

Evaluate Qwen Swarm

source ~/arl/bin/activate
cd /home/ritish/test1
PYTHONPATH=src osint-env eval \
  --config datasets/fixed_levels/shared_config_fixed_levels.json \
  --seed-file datasets/fixed_levels/seed_fixed_levels.json \
  --agent-mode swarm \
  --llm-provider ollama \
  --llm-model qwen3:2b \
  --episodes 15

Benchmark + Dashboard

source ~/arl/bin/activate
cd /home/ritish/test1
PYTHONPATH=src osint-env benchmark \
  --config datasets/fixed_levels/shared_config_fixed_levels.json \
  --seed-file datasets/fixed_levels/seed_fixed_levels.json \
  --agent-mode swarm \
  --llm-provider ollama \
  --llm-model qwen3:2b \
  --episodes 15 \
  --name fixed_levels_qwen_swarm \
  --leaderboard datasets/fixed_levels/leaderboard_fixed_levels.json \
  --dashboard datasets/fixed_levels/dashboard_fixed_levels.html