YOLO11x-pose MMA Fighter Pose Estimator

Fine-tuned YOLO11x-pose for 17-keypoint pose estimation of fighters in MMA and boxing footage. Part of the fight-judge project — an AI pipeline for automated combat sports scoring.

Model Description

This model jointly detects fighters and estimates their 17 COCO keypoints in a single forward pass. It is the second stage in the fight-judge pipeline, providing skeleton sequences that feed into downstream action recognition.

Architecture: YOLO11x-pose (extra-large)
Task: Pose estimation (single class: fighter, 17 COCO keypoints)
Input: RGB image, 640×640 px
Output: Bounding boxes + 17 keypoints per fighter [x, y, visibility]
Base model: yolo11x-pose.pt (COCO pretrained), continued from epoch 90 checkpoint
Finetuned on: MMA Fighter Pose Estimation Dataset — 5,106 images, 10,155 fighter instances with auto-generated keypoint annotations

Keypoints (COCO 17-point schema)

Index	Keypoint	Index	Keypoint
0	nose	9	left_wrist
1	left_eye	10	right_wrist
2	right_eye	11	left_hip
3	left_ear	12	right_hip
4	right_ear	13	left_knee
5	left_shoulder	14	right_knee
6	right_shoulder	15	left_ankle
7	left_elbow	16	right_ankle
8	right_elbow

Training

Training was conducted in two phases on Kaggle (Tesla P100-PCIE-16GB):

Phase 1 (epochs 1–90): Initial finetuning from COCO pretrained yolo11x-pose.pt

Phase 2 (epochs 91–150): Resumed from the epoch 90 checkpoint with AdamW optimizer

Parameter	Value
Total epochs	150 (resumed at ep. 91)
Batch size	8
Image size	640
Optimizer	AdamW
LR initial	0.001
LR final factor	0.01
Warmup epochs	3
Weight decay	0.0005
Patience	50
Save period	every 10 epochs
AMP	true
Hardware	1× Tesla P100-PCIE-16GB
Training time	~7.4 hours

Augmentations: HSV jitter, horizontal flip (p=0.5), scale, mosaic, random erasing (p=0.4), RandAugment.

Evaluation Results

Evaluated on the held-out test split of the MMA Fighter Pose Estimation Dataset at epoch 150.

Detection (box)

Metric	Value
mAP50-95 (box)	0.9859
mAP50 (box)	0.9950
Precision	0.9990
Recall	0.9995
Val box loss	0.1687
Val cls loss	0.1092

Pose (keypoints)

Metric	Value
mAP50-95 (pose)	0.9198
mAP50 (pose)	0.9932
Precision (pose)	0.9924
Recall (pose)	0.9907
Val pose loss	0.5105
Val kobj loss	0.0016

Training started from a strong checkpoint (epoch 90, pose mAP50-95 ≈ 0.875) and improved steadily to 0.920 by epoch 150, with losses still decreasing at the end of training.

Dataset

MMA Fighter Pose Estimation Dataset

5,106 images extracted from 20 UFC fights (stand-up phases only)
640×640 px, YOLO-Pose format, single class: fighter
10,186 fighter instances; 10,155 (99.7%) successfully labeled with 17 COCO keypoints
Keypoint annotations auto-generated using pretrained yolo11x-pose with IoU ≥ 0.6 matching against ground-truth fighter bboxes
Images sourced from the MMA Fighter Detection Dataset (CC BY-NC-SA 4.0)

Split:

Split	Images	Fighter instances
Train	3,635	~7,250
Valid	980	~1,960
Test	491	~980

Usage

from ultralytics import YOLO

model = YOLO("hasanfaesal/yolov11x-pose-mma-fighter")

results = model("fight_frame.jpg")
results[0].show()

# Access keypoints directly
results = model("fight_frame.jpg")
for r in results:
    keypoints = r.keypoints.xy    # [N, 17, 2] — x, y pixel coords
    visibility = r.keypoints.conf # [N, 17]    — confidence per keypoint
    boxes = r.boxes.xyxy          # [N, 4]     — bounding boxes

# Video inference
results = model("fight.mp4", stream=True)
for r in results:
    kps = r.keypoints.data  # [N, 17, 3] — x, y, conf

Limitations

Trained exclusively on UFC stand-up footage. Performance may degrade on:
- Ground game sequences (grappling, guard, mount) where limbs are heavily occluded or interleaved
- Non-UFC broadcasts with unusual camera angles or significant lens distortion
- Extreme close-ups where only part of the fighter's body is visible
Keypoint annotations were auto-generated (pseudo-labels) — rare edge cases may contain labeling noise
Visibility flags may be unreliable for highly occluded keypoints in clinch situations

Pipeline Context

YOLOv8s MMA Fighter Detector → [This model] → Action Recognition (planned)

See the fight-judge repository for the full pipeline including data preparation scripts and methodology.

Citation

If you use this model or the dataset, please cite:

@misc{faisal2025mmafighter,
  author    = {Hasan Faisal},
  title     = {MMA Fighter Detection Dataset},
  year      = {2025},
  publisher = {Mendeley Data},
  version   = {V1},
  doi       = {10.17632/c456bnk8bm.1}
}

License

Code and model weights: MIT License Dataset: CC BY-NC-SA 4.0

Downloads last month: 5