Model Overview
- Base model: Qwen/Qwen3-14B
- Parameter count:trainable params: 64,225,280 || all params: 14,832,532,480 || trainable%: 0.4330
- Track: Track2
- Adaptation method: LoRA (PEFT)
- Task: Pokemon Red action prediction (completion-only SFT)
Provenance
This model is based on the publicly released Qwen/Qwen3-14B model. No modifications were made to the base weights. Task-specific behavior is introduced via LoRA adapters trained by the team.
Finetuning Data
The raw data was collected by directly playing the game.
- Format: JSONL
- File: small-lit/overfit_small-aicrowd-pokemon_red-training-data-14
- Structure: system / user / assistant conversations
- Content:
- System: Pokemon Red action-inference rules and playbook constraints
- User: Structured game state observations (title/dialog/field/battle)
- Assistant: One action per step in the required "### Actions" format
- Example fields per row:
- id: string
- history: list of {role, content} excluding the final assistant response
- response: assistant response text to learn
- messages: history + assistant (kept for inspection)
Training (High-level)
- Training type: Supervised fine-tuning (completion-only)
- Objective: Predict the correct action completion given game state context
- Loss masking:
- Loss is applied only to the assistant completion tokens
- System and user tokens are excluded from loss
- Tokenization:
- Uses tokenizer chat template when available
- Falls back to a role-tagged plain-text prompt otherwise
- Assistant responses always end with an EOS token
- Max sequence length: 16384
Training Configuration
- Epochs: 15
- Optimizer: AdamW (via Transformers Trainer default)
- Learning rate: 5e-5
- LR scheduler: Trainer default
- Batch size (per device): 1
- Gradient accumulation steps: 8
- Precision: fp16
- Gradient checkpointing: Enabled
- Weight decay: 0.01
- Warmup steps: 50
LoRA Configuration
- Rank (r): 16
- Alpha: 32
- Dropout: 0.0 (disabled in code)
- Target modules:
- q_proj, k_proj, v_proj, o_proj
- gate_proj, up_proj, down_proj
Run Instructions
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_id = "Qwen/Qwen3-14B"
adapter_id = "small-lit/overfit_small-aicrowd-pokemon_red-lora-14B"
tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
base_id,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, adapter_id)
model.eval()
License & Usage
- Base model license: Qwen3 License (see the original model card)
- Adapter weights: Released for evaluation and research purposes only
- This model is intended solely for use within the AIcrowd Orak Game Agent Challenge evaluation
- Downloads last month
- -