Midjourney Text-to-Image Prompts Dataset
This dataset contains cleaned and extracted Midjourney text prompts extracted from Discord messages.
Dataset Structure
The dataset contains three splits:
train: 53,183 unique prompts (80%)validation: 6,648 unique prompts (10%)test: 6,648 unique prompts (10%)
Data Format
Each line is a JSON object with a single field:
text: The Midjourney prompt text
Extraction Logic
Prompts were extracted from Discord messages using the following process:
- Filtered for message types 0 (INITIAL_OR_VARIATION) and 19 (UPSCALE)
- Extracted text between double asterisks (**)
- Removed embedded image URLs (e.g., https://s.mj.run/...)
- Removed duplicates to ensure unique prompts
Use Case
This dataset is suitable for fine-tuning language models on prompt generation or other NLP tasks related to text-to-image prompts.