Model Card for Model ID

Pre-trained adapters for question generation on police body-worn camera footage, designed to work with Qwen3-4B-Thinking-2507. Trained using reinforcement learning, GRPO.

Usage

from unsloth import FastLanguageModel

adapter_path = "ADAPTER/PATH"
self.model, self.tokenizer = FastLanguageModel.from_pretrained(
            model_name = adapter_path,
            max_seq_length = 4096,
            dtype = None,
            load_in_4bit = True,
        )
        
FastLanguageModel.for_inference(self.model)

Architecture

  • Base Model: Qwen3-4B-Thinking-2507

Training

  • Trained on high quality investigative questions and the corresponding chain of thought (CoT) tokens generated by Deepseek V3.2 (Reasoner)
  • Utilized a multi-objective reward function
  • Rule-based Reward: This component encourages the structural integrity of the generated output. It assigns high values for the correct use of (Chain-of-Thought) and tags. Additionally, it applies a length penalty on short questions and a keyword bonus to prioritize interrogative words (e.g., how, describe, identify).
  • Model-based Reward: To capture semantic nuance, we employ a large foundation model as an automated judge. The judge evaluates each generated question on a binary scale {0, 1} based on its relevance to the specific video context and investigative quality.

Github Repository

Full Codebase: https://github.com/Karish-Gupta/BodyCam-VQA/tree/main/fine_tuning

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support