YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen3-VL Pose Stage 1
This repository stores the stage 1 pose-only alignment artifacts for a Qwen3-VL based exercise feedback model.
What This Stage Does
Stage 1 focuses on aligning structured pose features to the Qwen3-VL token embedding space before running a later joint multimodal stage.
The model setup for this stage is:
- Base model:
Qwen/Qwen3-VL-4B-Instruct - Modalities used during training:
pose + text - Image branch: disabled for this stage
- Vision encoder: frozen
- Language model: frozen
- LoRA: disabled
- Pose adapter: enabled in
last_linearmode - Pose projector: trainable
- Pose placeholder tokens:
8
In this stage, pose features are encoded, projected into the Qwen embedding space, and injected into reserved pose token positions. Training is supervised with generated exercise descriptions and feedback text.
Data Used
Training used the following sources:
processed/generated_descriptions.jsonltrain/unimodal/training.csvtrain/unimodal/validation.csv
The supervision target is built from:
response.descriptionresponse.feedback
Training Configuration
- Epochs:
5 - Per-device batch size:
1 - Gradient accumulation:
8 - Effective batch size:
8 - Learning rate:
5e-6 - Pose projector learning rate:
5e-6 - Pose adapter learning rate:
1e-6 - Warmup ratio:
0.05 - Max grad norm:
1.0 - Logging backend:
TensorBoard
Final Metrics
- Train loss:
22.7354 - Eval loss:
2.6238 - Train samples:
8989 - Eval samples:
612 - Train runtime:
21133.94s
Trainable Components
This stage trains only a small pose-side subset:
pose_feature_encoder.pose_adapter.pose_embedding_projector.4.*pose_projector.output_gate_logitpose_projector.input_norm.*pose_projector.proj.*
Everything in the main Qwen language backbone remains frozen.
Repository Contents
This repository is intended to keep the most relevant stage 1 artifacts:
pose_projector.pt: learned pose projector weightspose_adapter.pt: stage 1 tuned pose adapter weightspose_bridge_config.json: pose token injection metadatastage_manifest.json: training-stage manifesttraining_args.bin: Hugging Face training argumentstrain_results.json: final train metricseval_results.json: final eval metricsall_results.json: aggregate run metricslogs/: TensorBoard event files
Depending on what was uploaded, intermediate checkpoints or full Qwen weights may be omitted on purpose.
Intended Next Step
This stage is not the final model. It is the pose alignment stage before a later joint image + pose + text training stage.