Midjourney Text-to-Image Prompts Dataset

This dataset contains cleaned and extracted Midjourney text prompts extracted from Discord messages.

Dataset Structure

The dataset contains three splits:

  • train: 53,183 unique prompts (80%)
  • validation: 6,648 unique prompts (10%)
  • test: 6,648 unique prompts (10%)

Data Format

Each line is a JSON object with a single field:

  • text: The Midjourney prompt text

Extraction Logic

Prompts were extracted from Discord messages using the following process:

  1. Filtered for message types 0 (INITIAL_OR_VARIATION) and 19 (UPSCALE)
  2. Extracted text between double asterisks (**)
  3. Removed embedded image URLs (e.g., https://s.mj.run/...)
  4. Removed duplicates to ensure unique prompts

Use Case

This dataset is suitable for fine-tuning language models on prompt generation or other NLP tasks related to text-to-image prompts.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support