Spaces:

Siddeshwar1625
/

OSINT

Paused

App Files Files Community

OSINT / datasets /fixed_levels /README.md

siddeshwar-kagatikar

fix(rewards): never crash GRPO on malformed completions

d814291 13 days ago

preview code

raw

history blame contribute delete

2.41 kB

Fixed Levels Submission Dataset

This folder contains a fixed three-level OSINT benchmark set built on one shared base graph.

Files

seed_fixed_levels.json: master fixed seed with an expanded canonical graph and 30 fixed questions.
fixed_graph_questions.json: extracted fixed dataset snapshot for submission packaging.
shared_config_fixed_levels.json: run config used for generation and evaluation.
complete_dataset_qwen_generated.json: full dataset after Qwen (qwen3:2b via Ollama) expands the graph.
qwen_swarm_eval_fixed_levels.json: legacy Qwen swarm evaluation summary from the older smaller version of the set.
qwen_swarm_benchmark_fixed_levels.json: legacy benchmark output from the older smaller version of the set.
leaderboard_fixed_levels.json: leaderboard file for this dataset.
dashboard_fixed_levels.html: interactive dashboard generated from the benchmark run.

Difficulty Design

Easy: 10 questions. These now use the older hard-style multi-hop traces as the new floor.
Mid: 10 questions. Each question spans roughly 15-20 supporting nodes.
High: 10 questions. Each question spans roughly 50 supporting nodes.

All 30 questions are fixed and share the same larger seeded graph.

Regenerate Artifacts

source ~/arl/bin/activate
cd /home/ritish/test1
PYTHONPATH=src python scripts/build_fixed_levels_dataset.py \
  --seed-file datasets/fixed_levels/seed_fixed_levels.json \
  --shared-config datasets/fixed_levels/shared_config_fixed_levels.json \
  --output-dir datasets/fixed_levels

Evaluate Qwen Swarm

source ~/arl/bin/activate
cd /home/ritish/test1
PYTHONPATH=src osint-env eval \
  --config datasets/fixed_levels/shared_config_fixed_levels.json \
  --seed-file datasets/fixed_levels/seed_fixed_levels.json \
  --agent-mode swarm \
  --llm-provider ollama \
  --llm-model qwen3:2b \
  --episodes 15

Benchmark + Dashboard

source ~/arl/bin/activate
cd /home/ritish/test1
PYTHONPATH=src osint-env benchmark \
  --config datasets/fixed_levels/shared_config_fixed_levels.json \
  --seed-file datasets/fixed_levels/seed_fixed_levels.json \
  --agent-mode swarm \
  --llm-provider ollama \
  --llm-model qwen3:2b \
  --episodes 15 \
  --name fixed_levels_qwen_swarm \
  --leaderboard datasets/fixed_levels/leaderboard_fixed_levels.json \
  --dashboard datasets/fixed_levels/dashboard_fixed_levels.html