YOLO11x-pose MMA Fighter Pose Estimator

Fine-tuned YOLO11x-pose for 17-keypoint pose estimation of fighters in MMA and boxing footage. Part of the fight-judge project β€” an AI pipeline for automated combat sports scoring.

Model Description

This model jointly detects fighters and estimates their 17 COCO keypoints in a single forward pass. It is the second stage in the fight-judge pipeline, providing skeleton sequences that feed into downstream action recognition.

  • Architecture: YOLO11x-pose (extra-large)
  • Task: Pose estimation (single class: fighter, 17 COCO keypoints)
  • Input: RGB image, 640Γ—640 px
  • Output: Bounding boxes + 17 keypoints per fighter [x, y, visibility]
  • Base model: yolo11x-pose.pt (COCO pretrained), continued from epoch 90 checkpoint
  • Finetuned on: MMA Fighter Pose Estimation Dataset β€” 5,106 images, 10,155 fighter instances with auto-generated keypoint annotations

Keypoints (COCO 17-point schema)

Index Keypoint Index Keypoint
0 nose 9 left_wrist
1 left_eye 10 right_wrist
2 right_eye 11 left_hip
3 left_ear 12 right_hip
4 right_ear 13 left_knee
5 left_shoulder 14 right_knee
6 right_shoulder 15 left_ankle
7 left_elbow 16 right_ankle
8 right_elbow

Training

Training was conducted in two phases on Kaggle (Tesla P100-PCIE-16GB):

Phase 1 (epochs 1–90): Initial finetuning from COCO pretrained yolo11x-pose.pt

Phase 2 (epochs 91–150): Resumed from the epoch 90 checkpoint with AdamW optimizer

Parameter Value
Total epochs 150 (resumed at ep. 91)
Batch size 8
Image size 640
Optimizer AdamW
LR initial 0.001
LR final factor 0.01
Warmup epochs 3
Weight decay 0.0005
Patience 50
Save period every 10 epochs
AMP true
Hardware 1Γ— Tesla P100-PCIE-16GB
Training time ~7.4 hours

Augmentations: HSV jitter, horizontal flip (p=0.5), scale, mosaic, random erasing (p=0.4), RandAugment.

Evaluation Results

Evaluated on the held-out test split of the MMA Fighter Pose Estimation Dataset at epoch 150.

Detection (box)

Metric Value
mAP50-95 (box) 0.9859
mAP50 (box) 0.9950
Precision 0.9990
Recall 0.9995
Val box loss 0.1687
Val cls loss 0.1092

Pose (keypoints)

Metric Value
mAP50-95 (pose) 0.9198
mAP50 (pose) 0.9932
Precision (pose) 0.9924
Recall (pose) 0.9907
Val pose loss 0.5105
Val kobj loss 0.0016

Training started from a strong checkpoint (epoch 90, pose mAP50-95 β‰ˆ 0.875) and improved steadily to 0.920 by epoch 150, with losses still decreasing at the end of training.

Dataset

MMA Fighter Pose Estimation Dataset

  • 5,106 images extracted from 20 UFC fights (stand-up phases only)
  • 640Γ—640 px, YOLO-Pose format, single class: fighter
  • 10,186 fighter instances; 10,155 (99.7%) successfully labeled with 17 COCO keypoints
  • Keypoint annotations auto-generated using pretrained yolo11x-pose with IoU β‰₯ 0.6 matching against ground-truth fighter bboxes
  • Images sourced from the MMA Fighter Detection Dataset (CC BY-NC-SA 4.0)

Split:

Split Images Fighter instances
Train 3,635 ~7,250
Valid 980 ~1,960
Test 491 ~980

Usage

from ultralytics import YOLO

model = YOLO("hasanfaesal/yolov11x-pose-mma-fighter")

results = model("fight_frame.jpg")
results[0].show()
# Access keypoints directly
results = model("fight_frame.jpg")
for r in results:
    keypoints = r.keypoints.xy    # [N, 17, 2] β€” x, y pixel coords
    visibility = r.keypoints.conf # [N, 17]    β€” confidence per keypoint
    boxes = r.boxes.xyxy          # [N, 4]     β€” bounding boxes
# Video inference
results = model("fight.mp4", stream=True)
for r in results:
    kps = r.keypoints.data  # [N, 17, 3] β€” x, y, conf

Limitations

  • Trained exclusively on UFC stand-up footage. Performance may degrade on:
    • Ground game sequences (grappling, guard, mount) where limbs are heavily occluded or interleaved
    • Non-UFC broadcasts with unusual camera angles or significant lens distortion
    • Extreme close-ups where only part of the fighter's body is visible
  • Keypoint annotations were auto-generated (pseudo-labels) β€” rare edge cases may contain labeling noise
  • Visibility flags may be unreliable for highly occluded keypoints in clinch situations

Pipeline Context

YOLOv8s MMA Fighter Detector β†’ [This model] β†’ Action Recognition (planned)

See the fight-judge repository for the full pipeline including data preparation scripts and methodology.

Citation

If you use this model or the dataset, please cite:

@misc{faisal2025mmafighter,
  author    = {Hasan Faisal},
  title     = {MMA Fighter Detection Dataset},
  year      = {2025},
  publisher = {Mendeley Data},
  version   = {V1},
  doi       = {10.17632/c456bnk8bm.1}
}

License

Code and model weights: MIT License Dataset: CC BY-NC-SA 4.0

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support