Model Card for Asymmetric-Executor-Swarm
Model Summary
This model is a fine-tuned version of Qwen3-VL-8B-Instruct, specialized as a Low-Level Executor for large-scale Swarm Confrontation tasks (e.g., 15 vs 15 UAV/UGV battles).
Unlike traditional RL agents (e.g., MADDPG, QMIX) that rely on compact state vectors, this model processes raw visual observations to make decentralized tactical decisions. It operates within an Asymmetric Cognitive Architecture, receiving macro-tactical instructions from a High-Level Planner (Gemini 3 Pro) and grounding them into atomic combat maneuvers.
Model Details
Base Model: Qwen3-VL-8B-Instruct
Architecture: Decentralized Vision-Language Policy
Task: Multi-Agent Pathfinding, Adversarial Combat, Formation Control
Dataset: Expert Trajectories generated via Rule-based Self-Play (Curriculum Level 0-2)
Paper: Strategic Planning, Precise Execution: An Asymmetric Cognitive Architecture for Long-Horizon VLM Agents (ICML Submission)
Intended Use
This model is designed to control individual units in a distributed swarm system.
Input:
Visual Observation: Top-down local view (RGB) containing allies, enemies, and terrain.
Tactical Instruction: High-level command from the Planner (e.g., "Maintain Delta formation and engage flank").
Output:
A structured Chain-of-Thought (CoT) followed by an atomic action.
Situation Awareness: "Enemy detected at 2 o'clock, Ally at 9 o'clock."
Tactical Verification: "Am I in position? NO."
Action: Move(North-East) or Attack(Target_ID).
Performance
Evaluated in a 15 vs 15 heterogeneous swarm simulation:
Win Rate: Significantly outperforms standard RL baselines (MAPPO/QMIX) in complex obstacle environments.
Kill/Death (KD) Ratio: Demonstrates superior survivability through vision-based obstacle avoidance.
Robustness: Maintained formation integrity in 92% of engagement scenarios.
Training Data & Methodology
Data Source: 5,000 episodes of expert self-play (Blue Team vs. Red Team).
Curriculum Learning:
Stage 1: Basic Navigation & Obstacle Avoidance.
Stage 2: 1v1 and 3v3 Skirmishes.
Stage 3: Full-scale 15v15 Team Battles.
Fine-Tuning: LoRA (Rank 8, Alpha 16) on Qwen3-VL-8B-Instruct.
- Downloads last month
- 4
Model tree for Wuduandaun/curr_swarm
Base model
Qwen/Qwen3-VL-8B-Instruct