Model Card for Model ID
Pre-trained adapters for question generation on police body-worn camera footage, designed to work with Qwen3-4B-Thinking-2507. Trained using reinforcement learning, GRPO.
Usage
from unsloth import FastLanguageModel
adapter_path = "ADAPTER/PATH"
self.model, self.tokenizer = FastLanguageModel.from_pretrained(
model_name = adapter_path,
max_seq_length = 4096,
dtype = None,
load_in_4bit = True,
)
FastLanguageModel.for_inference(self.model)
Architecture
- Base Model: Qwen3-4B-Thinking-2507
Training
- Trained on high quality investigative questions and the corresponding chain of thought (CoT) tokens generated by Deepseek V3.2 (Reasoner)
- Utilized a multi-objective reward function
- Rule-based Reward: This component encourages the structural integrity of the generated output. It assigns high values for the correct use of (Chain-of-Thought) and tags. Additionally, it applies a length penalty on short questions and a keyword bonus to prioritize interrogative words (e.g., how, describe, identify).
- Model-based Reward: To capture semantic nuance, we employ a large foundation model as an automated judge. The judge evaluates each generated question on a binary scale {0, 1} based on its relevance to the specific video context and investigative quality.
Github Repository
Full Codebase: https://github.com/Karish-Gupta/BodyCam-VQA/tree/main/fine_tuning
- Downloads last month
- 10