You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Model Overview

  • Base model: Qwen/Qwen3-14B
  • Parameter count:trainable params: 64,225,280 || all params: 14,832,532,480 || trainable%: 0.4330
  • Track: Track2
  • Adaptation method: LoRA (PEFT)
  • Task: Pokemon Red action prediction (completion-only SFT)

Provenance

This model is based on the publicly released Qwen/Qwen3-14B model. No modifications were made to the base weights. Task-specific behavior is introduced via LoRA adapters trained by the team.

Finetuning Data

The raw data was collected by directly playing the game.

  • Format: JSONL
  • File: small-lit/overfit_small-aicrowd-pokemon_red-training-data-14
  • Structure: system / user / assistant conversations
  • Content:
    • System: Pokemon Red action-inference rules and playbook constraints
    • User: Structured game state observations (title/dialog/field/battle)
    • Assistant: One action per step in the required "### Actions" format
  • Example fields per row:
    • id: string
    • history: list of {role, content} excluding the final assistant response
    • response: assistant response text to learn
    • messages: history + assistant (kept for inspection)

Training (High-level)

  • Training type: Supervised fine-tuning (completion-only)
  • Objective: Predict the correct action completion given game state context
  • Loss masking:
    • Loss is applied only to the assistant completion tokens
    • System and user tokens are excluded from loss
  • Tokenization:
    • Uses tokenizer chat template when available
    • Falls back to a role-tagged plain-text prompt otherwise
    • Assistant responses always end with an EOS token
  • Max sequence length: 16384

Training Configuration

  • Epochs: 15
  • Optimizer: AdamW (via Transformers Trainer default)
  • Learning rate: 5e-5
  • LR scheduler: Trainer default
  • Batch size (per device): 1
  • Gradient accumulation steps: 8
  • Precision: fp16
  • Gradient checkpointing: Enabled
  • Weight decay: 0.01
  • Warmup steps: 50

LoRA Configuration

  • Rank (r): 16
  • Alpha: 32
  • Dropout: 0.0 (disabled in code)
  • Target modules:
    • q_proj, k_proj, v_proj, o_proj
    • gate_proj, up_proj, down_proj

Run Instructions

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_id = "Qwen/Qwen3-14B"
adapter_id = "small-lit/overfit_small-aicrowd-pokemon_red-lora-14B"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(base, adapter_id)
model.eval()

License & Usage

  • Base model license: Qwen3 License (see the original model card)
  • Adapter weights: Released for evaluation and research purposes only
  • This model is intended solely for use within the AIcrowd Orak Game Agent Challenge evaluation
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for small-lit/overfit_small-aicrowd-pokemon_red-lora-14B

Finetuned
Qwen/Qwen3-14B
Adapter
(259)
this model