YOLO11x-pose MMA Fighter Pose Estimator
Fine-tuned YOLO11x-pose for 17-keypoint pose estimation of fighters in MMA and boxing footage. Part of the fight-judge project β an AI pipeline for automated combat sports scoring.
Model Description
This model jointly detects fighters and estimates their 17 COCO keypoints in a single forward pass. It is the second stage in the fight-judge pipeline, providing skeleton sequences that feed into downstream action recognition.
- Architecture: YOLO11x-pose (extra-large)
- Task: Pose estimation (single class:
fighter, 17 COCO keypoints) - Input: RGB image, 640Γ640 px
- Output: Bounding boxes + 17 keypoints per fighter
[x, y, visibility] - Base model:
yolo11x-pose.pt(COCO pretrained), continued from epoch 90 checkpoint - Finetuned on: MMA Fighter Pose Estimation Dataset β 5,106 images, 10,155 fighter instances with auto-generated keypoint annotations
Keypoints (COCO 17-point schema)
| Index | Keypoint | Index | Keypoint |
|---|---|---|---|
| 0 | nose | 9 | left_wrist |
| 1 | left_eye | 10 | right_wrist |
| 2 | right_eye | 11 | left_hip |
| 3 | left_ear | 12 | right_hip |
| 4 | right_ear | 13 | left_knee |
| 5 | left_shoulder | 14 | right_knee |
| 6 | right_shoulder | 15 | left_ankle |
| 7 | left_elbow | 16 | right_ankle |
| 8 | right_elbow |
Training
Training was conducted in two phases on Kaggle (Tesla P100-PCIE-16GB):
Phase 1 (epochs 1β90): Initial finetuning from COCO pretrained yolo11x-pose.pt
Phase 2 (epochs 91β150): Resumed from the epoch 90 checkpoint with AdamW optimizer
| Parameter | Value |
|---|---|
| Total epochs | 150 (resumed at ep. 91) |
| Batch size | 8 |
| Image size | 640 |
| Optimizer | AdamW |
| LR initial | 0.001 |
| LR final factor | 0.01 |
| Warmup epochs | 3 |
| Weight decay | 0.0005 |
| Patience | 50 |
| Save period | every 10 epochs |
| AMP | true |
| Hardware | 1Γ Tesla P100-PCIE-16GB |
| Training time | ~7.4 hours |
Augmentations: HSV jitter, horizontal flip (p=0.5), scale, mosaic, random erasing (p=0.4), RandAugment.
Evaluation Results
Evaluated on the held-out test split of the MMA Fighter Pose Estimation Dataset at epoch 150.
Detection (box)
| Metric | Value |
|---|---|
| mAP50-95 (box) | 0.9859 |
| mAP50 (box) | 0.9950 |
| Precision | 0.9990 |
| Recall | 0.9995 |
| Val box loss | 0.1687 |
| Val cls loss | 0.1092 |
Pose (keypoints)
| Metric | Value |
|---|---|
| mAP50-95 (pose) | 0.9198 |
| mAP50 (pose) | 0.9932 |
| Precision (pose) | 0.9924 |
| Recall (pose) | 0.9907 |
| Val pose loss | 0.5105 |
| Val kobj loss | 0.0016 |
Training started from a strong checkpoint (epoch 90, pose mAP50-95 β 0.875) and improved steadily to 0.920 by epoch 150, with losses still decreasing at the end of training.
Dataset
MMA Fighter Pose Estimation Dataset
- 5,106 images extracted from 20 UFC fights (stand-up phases only)
- 640Γ640 px, YOLO-Pose format, single class:
fighter - 10,186 fighter instances; 10,155 (99.7%) successfully labeled with 17 COCO keypoints
- Keypoint annotations auto-generated using pretrained
yolo11x-posewith IoU β₯ 0.6 matching against ground-truth fighter bboxes - Images sourced from the MMA Fighter Detection Dataset (CC BY-NC-SA 4.0)
Split:
| Split | Images | Fighter instances |
|---|---|---|
| Train | 3,635 | ~7,250 |
| Valid | 980 | ~1,960 |
| Test | 491 | ~980 |
Usage
from ultralytics import YOLO
model = YOLO("hasanfaesal/yolov11x-pose-mma-fighter")
results = model("fight_frame.jpg")
results[0].show()
# Access keypoints directly
results = model("fight_frame.jpg")
for r in results:
keypoints = r.keypoints.xy # [N, 17, 2] β x, y pixel coords
visibility = r.keypoints.conf # [N, 17] β confidence per keypoint
boxes = r.boxes.xyxy # [N, 4] β bounding boxes
# Video inference
results = model("fight.mp4", stream=True)
for r in results:
kps = r.keypoints.data # [N, 17, 3] β x, y, conf
Limitations
- Trained exclusively on UFC stand-up footage. Performance may degrade on:
- Ground game sequences (grappling, guard, mount) where limbs are heavily occluded or interleaved
- Non-UFC broadcasts with unusual camera angles or significant lens distortion
- Extreme close-ups where only part of the fighter's body is visible
- Keypoint annotations were auto-generated (pseudo-labels) β rare edge cases may contain labeling noise
- Visibility flags may be unreliable for highly occluded keypoints in clinch situations
Pipeline Context
YOLOv8s MMA Fighter Detector β [This model] β Action Recognition (planned)
See the fight-judge repository for the full pipeline including data preparation scripts and methodology.
Citation
If you use this model or the dataset, please cite:
@misc{faisal2025mmafighter,
author = {Hasan Faisal},
title = {MMA Fighter Detection Dataset},
year = {2025},
publisher = {Mendeley Data},
version = {V1},
doi = {10.17632/c456bnk8bm.1}
}
License
Code and model weights: MIT License Dataset: CC BY-NC-SA 4.0
- Downloads last month
- 5