Model Card for Model ID

Pre-trained adapters for question generation on police body-worn camera footage, designed to work with Qwen3-4B-Thinking-2507. Trained using reinforcement learning, GRPO.

Usage

from unsloth import FastLanguageModel

adapter_path = "ADAPTER/PATH"
self.model, self.tokenizer = FastLanguageModel.from_pretrained(
            model_name = adapter_path,
            max_seq_length = 4096,
            dtype = None,
            load_in_4bit = True,
        )
        
FastLanguageModel.for_inference(self.model)

Architecture

Base Model: Qwen3-4B-Thinking-2507

Training

Trained on high quality investigative questions and the corresponding chain of thought (CoT) tokens generated by Deepseek V3.2 (Reasoner)
Utilized a multi-objective reward function
Rule-based Reward: This component encourages the structural integrity of the generated output. It assigns high values for the correct use of (Chain-of-Thought) and tags. Additionally, it applies a length penalty on short questions and a keyword bonus to prioritize interrogative words (e.g., how, describe, identify).
Model-based Reward: To capture semantic nuance, we employ a large foundation model as an automated judge. The judge evaluates each generated question on a binary scale {0, 1} based on its relevance to the specific video context and investigative quality.

Github Repository

Full Codebase: https://github.com/Karish-Gupta/BodyCam-VQA/tree/main/fine_tuning

Downloads last month: 10