---
tags:
- ml-intern
---
StoryBox
## 📝 Introduction
This is the repository for the paper [StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models](https://ojs.aaai.org/index.php/AAAI/article/view/40288), accepted by **AAAI 2026**.

**StoryBox** is a framework that leverages collaborative multi-agent simulation for hybrid bottom-up long-form story generation. By combining bottom-up character-driven agent interactions with top-down narrative planning, it dynamically constructs deep, coherent, and engaging story worlds.
---
## ⚡ Quick Start (with uv)
```bash
# 1. Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# 2. Clone
git clone https://huggingface.co/raazkumar/storybox-reproduction
cd storybox-reproduction
# 3. Install dependencies (uv handles everything)
uv sync
# 4. Run quick test (1 day, mock LLM — no API key needed)
uv run python reverie/test_run.py
# 5. Run full simulation (14 days, requires API key)
export OPENAI_API_KEY="sk-..."
uv run python reverie/run.py
```
---
## ⚙️ Installation Options
```bash
# Base install (OpenAI only)
uv sync
# Apple Silicon — native MLX (fastest on M1/M2/M3/M4)
uv sync --extra mlx
# NVIDIA NIM inference
uv sync --extra nim
# Ollama local inference
uv sync --extra ollama
# All local backends (MLX + Ollama)
uv sync --extra local
# Everything (all providers + dev tools)
uv sync --extra all --extra dev
```
---
## 🚀 Supported LLM Providers
| Provider | Config | Speed | Setup |
|----------|--------|-------|-------|
| **OpenAI** | `gpt-4o-mini` | Fastest | API key |
| **Ollama** | `gemma4` | Fast | `brew install ollama` |
| **MLX (Apple)** ⭐ | `llama3.1-8b-mlx` | **Fastest on Mac** | `uv sync --extra mlx` |
| **NVIDIA NIM** | `nvidia/meta/llama-3.1-8b-instruct` | Fast | API key |
Switch providers by changing **one line** in `reverie/config/config.py`:
```python
llm_model_name = 'llama3.1-8b-mlx' # MLX native (Apple)
llm_model_name = 'gemma4' # Ollama
llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct' # NIM
llm_model_name = 'gpt-4o-mini' # OpenAI
```
---
## 📁 Project Structure
```
storybox/
├── reverie/
│ ├── run.py # Main entry point
│ ├── test_run.py # Quick 1-day test
│ ├── config/config.py # All settings
│ ├── agent/storyteller.py # Story generation
│ ├── persona/ # Characters + cognition
│ ├── environment/world.py # Sandbox world
│ ├── common/llm.py # LLM provider router
│ ├── common/mlx_llm.py # Native MLX (Apple)
│ └── prompts/prompt-1/ # 30+ prompt templates
├── data/story01-20/ # 20 story settings
├── pyproject.toml # uv project config
└── uv.lock # Locked dependencies
```
---
## 🛠️ Common Commands
```bash
# Run with uv (recommended)
uv run python reverie/run.py
uv run python reverie/test_run.py
# Install additional dependencies
uv add gradio
uv add --dev pytest
# Lock dependencies
uv lock
# Update dependencies
uv sync --upgrade
# Run tests
uv run pytest
# Format code
uv run ruff format .
uv run ruff check --fix .
# Type check
uv run mypy reverie/
```
---
## 🇮🇳 Hindi / Multilingual Stories
```bash
# Generate English story
uv run python reverie/run.py
# Translate to Hindi (post-generation)
# See GRADIO_UI_GUIDE.md for full pipeline
```
**Recommended approach**: English simulation → Hindi storyteller. Characters plan/chat in English, final story written in Hindi.
---
## 🏭 Synthetic Data Generation
```bash
# Generate 1000 stories for training data
uv run python scripts/generate_synthetic_dataset.py \
--num-stories 1000 \
--output stories.jsonl
# Convert to instruction format
uv run python scripts/to_instruction_format.py \
--input stories.jsonl \
--output train.json
```
---
## 📚 Citation
```bibtex
@inproceedings{chen2026storybox,
title = {StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models},
author = {Chen, Zehao and Pan, Rong and Li, Haoran},
booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence},
volume = {40},
number = {36},
pages = {30359--30367},
year = {2026}
}
```
---
## ⚖️ License
This project is licensed under the [MIT License](https://opensource.org/license/MIT).
## Generated by ML Intern
This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = 'raazkumar/storybox-reproduction'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
```
For non-causal architectures, replace `AutoModelForCausalLM` with the appropriate `AutoModel` class.