StoryBox
๐ Introduction
This is the repository for the paper StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models, accepted by AAAI 2026.
StoryBox is a framework that leverages collaborative multi-agent simulation for hybrid bottom-up long-form story generation. By combining bottom-up character-driven agent interactions with top-down narrative planning, it dynamically constructs deep, coherent, and engaging story worlds.
โก Quick Start (with uv)
# 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# 2. Clone
git clone https://huggingface.co/raazkumar/storybox-reproduction
cd storybox-reproduction
# 3. Install dependencies (uv handles everything)
uv sync
# 4. Run quick test (1 day, mock LLM โ no API key needed)
uv run python reverie/test_run.py
# 5. Run full simulation (14 days, requires API key)
export OPENAI_API_KEY="sk-..."
uv run python reverie/run.py
โ๏ธ Installation Options
# Base install (OpenAI only)
uv sync
# Apple Silicon โ native MLX (fastest on M1/M2/M3/M4)
uv sync --extra mlx
# NVIDIA NIM inference
uv sync --extra nim
# Ollama local inference
uv sync --extra ollama
# All local backends (MLX + Ollama)
uv sync --extra local
# Everything (all providers + dev tools)
uv sync --extra all --extra dev
๐ Supported LLM Providers
| Provider | Config | Speed | Setup |
|---|---|---|---|
| OpenAI | gpt-4o-mini |
Fastest | API key |
| Ollama | gemma4 |
Fast | brew install ollama |
| MLX (Apple) โญ | llama3.1-8b-mlx |
Fastest on Mac | uv sync --extra mlx |
| NVIDIA NIM | nvidia/meta/llama-3.1-8b-instruct |
Fast | API key |
Switch providers by changing one line in reverie/config/config.py:
llm_model_name = 'llama3.1-8b-mlx' # MLX native (Apple)
llm_model_name = 'gemma4' # Ollama
llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct' # NIM
llm_model_name = 'gpt-4o-mini' # OpenAI
๐ Project Structure
storybox/
โโโ reverie/
โ โโโ run.py # Main entry point
โ โโโ test_run.py # Quick 1-day test
โ โโโ config/config.py # All settings
โ โโโ agent/storyteller.py # Story generation
โ โโโ persona/ # Characters + cognition
โ โโโ environment/world.py # Sandbox world
โ โโโ common/llm.py # LLM provider router
โ โโโ common/mlx_llm.py # Native MLX (Apple)
โ โโโ prompts/prompt-1/ # 30+ prompt templates
โโโ data/story01-20/ # 20 story settings
โโโ pyproject.toml # uv project config
โโโ uv.lock # Locked dependencies
๐ ๏ธ Common Commands
# Run with uv (recommended)
uv run python reverie/run.py
uv run python reverie/test_run.py
# Install additional dependencies
uv add gradio
uv add --dev pytest
# Lock dependencies
uv lock
# Update dependencies
uv sync --upgrade
# Run tests
uv run pytest
# Format code
uv run ruff format .
uv run ruff check --fix .
# Type check
uv run mypy reverie/
๐ฎ๐ณ Hindi / Multilingual Stories
# Generate English story
uv run python reverie/run.py
# Translate to Hindi (post-generation)
# See GRADIO_UI_GUIDE.md for full pipeline
Recommended approach: English simulation โ Hindi storyteller. Characters plan/chat in English, final story written in Hindi.
๐ญ Synthetic Data Generation
# Generate 1000 stories for training data
uv run python scripts/generate_synthetic_dataset.py \
--num-stories 1000 \
--output stories.jsonl
# Convert to instruction format
uv run python scripts/to_instruction_format.py \
--input stories.jsonl \
--output train.json
๐ Citation
@inproceedings{chen2026storybox,
title = {StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models},
author = {Chen, Zehao and Pan, Rong and Li, Haoran},
booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
volume = {40},
number = {36},
pages = {30359--30367},
year = {2026}
}
โ๏ธ License
This project is licensed under the MIT License.
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = 'raazkumar/storybox-reproduction'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.
