Model Card for Asymmetric-Executor-Swarm

Model Summary

This model is a fine-tuned version of Qwen3-VL-8B-Instruct, specialized as a Low-Level Executor for large-scale Swarm Confrontation tasks (e.g., 15 vs 15 UAV/UGV battles).

Unlike traditional RL agents (e.g., MADDPG, QMIX) that rely on compact state vectors, this model processes raw visual observations to make decentralized tactical decisions. It operates within an Asymmetric Cognitive Architecture, receiving macro-tactical instructions from a High-Level Planner (Gemini 3 Pro) and grounding them into atomic combat maneuvers.

Model Details

Base Model: Qwen3-VL-8B-Instruct

Architecture: Decentralized Vision-Language Policy

Task: Multi-Agent Pathfinding, Adversarial Combat, Formation Control

Dataset: Expert Trajectories generated via Rule-based Self-Play (Curriculum Level 0-2)

Paper: Strategic Planning, Precise Execution: An Asymmetric Cognitive Architecture for Long-Horizon VLM Agents (ICML Submission)

Intended Use

This model is designed to control individual units in a distributed swarm system.

Input:

Visual Observation: Top-down local view (RGB) containing allies, enemies, and terrain.

Tactical Instruction: High-level command from the Planner (e.g., "Maintain Delta formation and engage flank").

Output:

A structured Chain-of-Thought (CoT) followed by an atomic action.

Situation Awareness: "Enemy detected at 2 o'clock, Ally at 9 o'clock."

Tactical Verification: "Am I in position? NO."

Action: Move(North-East) or Attack(Target_ID).

Performance

Evaluated in a 15 vs 15 heterogeneous swarm simulation:

Win Rate: Significantly outperforms standard RL baselines (MAPPO/QMIX) in complex obstacle environments.

Kill/Death (KD) Ratio: Demonstrates superior survivability through vision-based obstacle avoidance.

Robustness: Maintained formation integrity in 92% of engagement scenarios.

Training Data & Methodology

Data Source: 5,000 episodes of expert self-play (Blue Team vs. Red Team).

Curriculum Learning:

Stage 1: Basic Navigation & Obstacle Avoidance.

Stage 2: 1v1 and 3v3 Skirmishes.

Stage 3: Full-scale 15v15 Team Battles.

Fine-Tuning: LoRA (Rank 8, Alpha 16) on Qwen3-VL-8B-Instruct.

Downloads last month: 4

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for Wuduandaun/curr_swarm

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

(238)

this model