You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Model Overview

Base model: Qwen/Qwen3-14B
Parameter count:trainable params: 64,225,280 || all params: 14,832,532,480 || trainable%: 0.4330
Track: Track2
Adaptation method: LoRA (PEFT)
Task: Pokemon Red action prediction (completion-only SFT)

Provenance

This model is based on the publicly released Qwen/Qwen3-14B model. No modifications were made to the base weights. Task-specific behavior is introduced via LoRA adapters trained by the team.

Finetuning Data

The raw data was collected by directly playing the game.

Format: JSONL
File: small-lit/overfit_small-aicrowd-pokemon_red-training-data-14
Structure: system / user / assistant conversations
Content:
- System: Pokemon Red action-inference rules and playbook constraints
- User: Structured game state observations (title/dialog/field/battle)
- Assistant: One action per step in the required "### Actions" format
Example fields per row:
- id: string
- history: list of {role, content} excluding the final assistant response
- response: assistant response text to learn
- messages: history + assistant (kept for inspection)

Training (High-level)

Training type: Supervised fine-tuning (completion-only)
Objective: Predict the correct action completion given game state context
Loss masking:
- Loss is applied only to the assistant completion tokens
- System and user tokens are excluded from loss
Tokenization:
- Uses tokenizer chat template when available
- Falls back to a role-tagged plain-text prompt otherwise
- Assistant responses always end with an EOS token
Max sequence length: 16384

Training Configuration

Epochs: 15
Optimizer: AdamW (via Transformers Trainer default)
Learning rate: 5e-5
LR scheduler: Trainer default
Batch size (per device): 1
Gradient accumulation steps: 8
Precision: fp16
Gradient checkpointing: Enabled
Weight decay: 0.01
Warmup steps: 50

LoRA Configuration

Rank (r): 16
Alpha: 32
Dropout: 0.0 (disabled in code)
Target modules:
- q_proj, k_proj, v_proj, o_proj
- gate_proj, up_proj, down_proj

Run Instructions

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "Qwen/Qwen3-14B"
adapter_id = "small-lit/overfit_small-aicrowd-pokemon_red-lora-14B"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

License & Usage

Base model license: Qwen3 License (see the original model card)
Adapter weights: Released for evaluation and research purposes only
This model is intended solely for use within the AIcrowd Orak Game Agent Challenge evaluation

Downloads last month: -

Model tree for small-lit/overfit_small-aicrowd-pokemon_red-lora-14B

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Adapter

(259)

this model