RF-DETR: Neural Architecture Search for Real-Time Detection Transformers
Paper • 2511.09554 • Published • 9
This model was converted to MLX format from RF-DETR (ICLR 2026) using mlx-vlm version 0.4.3.
RF-DETR Seg-Small supports both object detection and instance segmentation on COCO 80 classes.
pip install -U mlx-vlm
from pathlib import Path
from PIL import Image
from mlx_vlm.utils import load_model
from mlx_vlm.models.rfdetr.processing_rfdetr import RFDETRProcessor
from mlx_vlm.models.rfdetr.generate import RFDETRPredictor
model = load_model(Path("mlx-community/rfdetr-seg-small-fp32"))
processor = RFDETRProcessor.from_pretrained("mlx-community/rfdetr-seg-small-fp32")
predictor = RFDETRPredictor(model, processor, score_threshold=0.3, nms_threshold=0.5)
result = predictor.predict(Image.open("image.jpg"))
# result.boxes - (N, 4) xyxy pixel coordinates
# result.scores - (N,) confidence scores
# result.masks - (N, H, W) binary instance masks
# result.class_names - list of class names
# Image segmentation
python -m mlx_vlm.models.rfdetr.generate --image photo.jpg --model mlx-community/rfdetr-seg-small-fp32
# Video segmentation
python -m mlx_vlm.models.rfdetr.generate --video input.mp4 --model mlx-community/rfdetr-seg-small-fp32
# Realtime camera
python -m mlx_vlm.models.rfdetr.generate --task realtime --model mlx-community/rfdetr-seg-small-fp32
# Blur background (focus on detections)
python -m mlx_vlm.models.rfdetr.generate --image photo.jpg --model mlx-community/rfdetr-seg-small-fp32 --annotator blur+bg
# Pixelate detections
python -m mlx_vlm.models.rfdetr.generate --image photo.jpg --model mlx-community/rfdetr-seg-small-fp32 --annotator pixelate
| Preset | Effect |
|---|---|
mask+box |
Mask overlay + boxes + labels (default) |
blur |
Blur detections |
blur+bg |
Blur background |
pixelate |
Pixelate detections |
pixelate+bg |
Pixelate background |
halo+box |
Halo effect + boxes |
box |
Boxes + labels only |
| Architecture | DINOv2-small backbone + C2f projector + Deformable DETR decoder + Segmentation head |
| Task | Object detection + instance segmentation (COCO 80 classes) |
| Parameters | ~34M |
| Input resolution | 384x384 |
| Dtype | float32 |
| Inference (M4 Max) |
Quantized