Jinyang23 commited on
Commit
6bc5a60
·
verified ·
1 Parent(s): 260e92e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +199 -0
README.md CHANGED
@@ -1,3 +1,202 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - reinforcement-learning
7
+ - multimodal
8
+ - agent
9
+ - tool-use
10
+ - orchestration
11
+ - model-routing
12
+ - qwen3-vl
13
+ - grpo
14
+ library_name: transformers
15
+ pipeline_tag: image-text-to-text
16
+ base_model:
17
+ - Qwen/Qwen3-VL-4B-Thinking
18
+ metrics:
19
+ - accuracy
20
  ---
21
+
22
+ <h1 align="center">
23
+ MAESTRO-4B: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles
24
+ </h1>
25
+
26
+ <div align="center">
27
+ <p>
28
+ <a href="https://arxiv.org/pdf/2605.22177">
29
+ <img src="https://img.shields.io/badge/Paper-arxiv%3A2605.22177-blue" alt="Paper"/>
30
+ </a>
31
+ <a href="https://huggingface.co/papers/2605.22177">
32
+ <img src="https://img.shields.io/badge/Daily%20Paper-HuggingFace-yellow" alt="HF Daily Paper"/>
33
+ </a>
34
+ <a href="https://github.com/jinyangwu/Maestro">
35
+ <img src="https://img.shields.io/badge/Code-GitHub-black" alt="Code"/>
36
+ </a>
37
+ </p>
38
+ </div>
39
+
40
+ ## Overview
41
+
42
+ **MAESTRO-4B** is the lightweight multimodal orchestrator used in **MAESTRO: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles**.
43
+
44
+ Rather than solving every task with a single monolithic model, MAESTRO frames multimodal agent execution as a sequential decision-making problem over a hierarchical model-skill registry. At each reasoning step, the 4B orchestrator decides:
45
+
46
+ - whether to invoke an external expert,
47
+ - which expert model to call,
48
+ - which task-specific skill to use,
49
+ - and when to terminate with a final answer.
50
+
51
+ The full MAESTRO system is available at [jinyangwu/Maestro](https://github.com/jinyangwu/Maestro). The repository includes example train/validation data under `data/` and skill implementations under `skills/`.
52
+
53
+ > **Important**
54
+ > This checkpoint is an **orchestrator policy**, not a standalone all-purpose VLM. To reproduce MAESTRO-style rollout, use this model together with the skill registry and auxiliary model services provided in the GitHub repository.
55
+
56
+ ## Key Features
57
+
58
+ - **RL-trained orchestration policy**: Learns model-skill routing through outcome-based reinforcement learning.
59
+ - **Hierarchical skill registry**: Selects coarse Level-1 skills and dispatches to fine-grained Level-2 solvers.
60
+ - **Model-skill composition**: Treats expert model selection and skill invocation as a unified action.
61
+ - **Plug-and-play extensibility**: Can exploit newly added experts and skills without retraining in the reported setup.
62
+ - **Efficient 4B controller**: Uses a compact orchestrator to coordinate larger or specialized frozen expert models.
63
+
64
+ ## Performance Highlights
65
+
66
+ The MAESTRO paper evaluates the full orchestration system across representative multimodal benchmarks covering mathematical reasoning, chart understanding, high-resolution perception, and domain-specific analysis.
67
+
68
+ | Setting | Result |
69
+ | --- | --- |
70
+ | In-domain multimodal benchmarks | 70.1% average accuracy |
71
+ | Closed-source reference baselines | GPT-5: 69.3%, Gemini-2.5-Pro: 68.7% |
72
+ | Augmented out-of-domain registry without retraining | 59.5% average accuracy |
73
+ | Average latency in the reported setup | 2.88s |
74
+
75
+ These numbers describe the **full MAESTRO system** with its model-skill registry and external services, not isolated single-model inference from this checkpoint alone.
76
+
77
+ ## Quickstart
78
+
79
+ ### Load the orchestrator checkpoint
80
+
81
+ Below is a minimal Transformers-style loading example. Full model-skill orchestration requires the MAESTRO repository and the auxiliary services described below.
82
+
83
+ ```python
84
+ import torch
85
+ from transformers import AutoProcessor, AutoModelForImageTextToText
86
+
87
+ model_id = "Jinyang23/Maestro-4B"
88
+
89
+ model = AutoModelForImageTextToText.from_pretrained(
90
+ model_id,
91
+ torch_dtype=torch.bfloat16,
92
+ device_map="auto",
93
+ trust_remote_code=True,
94
+ )
95
+ processor = AutoProcessor.from_pretrained(
96
+ model_id,
97
+ trust_remote_code=True,
98
+ )
99
+ ```
100
+
101
+ ### Run the full MAESTRO framework
102
+
103
+ Clone the project repository:
104
+
105
+ ```bash
106
+ git clone https://github.com/jinyangwu/Maestro
107
+ cd Maestro
108
+ ```
109
+
110
+ Create the Python environment and install dependencies:
111
+
112
+ ```bash
113
+ conda create -n maestro python=3.10 -y
114
+ conda activate maestro
115
+ pip install -r requirements.txt
116
+ ```
117
+
118
+ Set an OpenAI API key before training or rollout:
119
+
120
+ ```bash
121
+ export OPENAI_API_KEY=<your_api_key>
122
+ ```
123
+
124
+ Before training, deploy the auxiliary model services. Replace each `/path/to/<model>` placeholder with a local model directory or Hugging Face model id.
125
+
126
+ Example:
127
+
128
+ ```bash
129
+ vllm serve /path/to/Intern-S1-mini --served-model-name Intern-S1-mini --tensor_parallel_size 1 --max-num-seqs 512 --trust-remote-code --port 2368 --gpu_memory_utilization 0.9
130
+ ```
131
+
132
+ Default service ports used by the skills:
133
+
134
+ | Port | Model service |
135
+ | --- | --- |
136
+ | `2362` | `qwen3-VL-8B-Instruct` |
137
+ | `2364` | `Chart-R1` |
138
+ | `2368` | `Intern-S1-mini` |
139
+ | `2369` | `medgemma-1.5-4b-it` |
140
+ | `2370` | `DeepEyes-7B` |
141
+ | `2376` | `GLM-4.6V-Flash` |
142
+ | `2388` | `GLM-OCR` |
143
+ | `2389` | `PR1-Qwen2.5-VL-3B-Detection` |
144
+
145
+ Start training with:
146
+
147
+ ```bash
148
+ bash train.sh
149
+ ```
150
+
151
+ To train from a local checkpoint or a different model id, override `MODEL_NAME`:
152
+
153
+ ```bash
154
+ MODEL_NAME=/path/to/Qwen3-VL-4B-Thinking bash train.sh
155
+ ```
156
+
157
+ ## Model Details
158
+
159
+ - **Model name**: `Jinyang23/Maestro-4B`
160
+ - **Role**: MAESTRO multimodal orchestration policy
161
+ - **Base model**: `Qwen3-VL-4B-Thinking`
162
+ - **Training method**: outcome-based reinforcement learning with GRPO-style optimization
163
+ - **Action space**: latent reasoning, model-skill search actions, and terminal answers
164
+ - **Skill interface**: hierarchical skill registry from the MAESTRO repository
165
+ - **Expected usage**: high-level controller for external expert models and modular skills
166
+
167
+ ## Intended Use
168
+
169
+ This model is intended for research on:
170
+
171
+ - multimodal agent orchestration,
172
+ - reinforcement learning for tool and skill use,
173
+ - model routing and expert selection,
174
+ - hierarchical skill libraries,
175
+ - agentic evaluation across heterogeneous tasks.
176
+
177
+ It is especially useful when integrated with the full MAESTRO framework, where the orchestrator can call external expert services during rollout.
178
+
179
+ ## Citation
180
+
181
+ If you use this model or the MAESTRO framework in your research, please cite:
182
+
183
+ ```bibtex
184
+ @misc{wu2026maestro,
185
+ title={MAESTRO: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles},
186
+ author={Jinyang Wu and Guocheng Zhai and Ruihan Jin and Yuhao Shen and Zhengxi Lu and Fan Zhang and Haoran Luo and Zheng Lian and Zhengqi Wen and Jianhua Tao},
187
+ year={2026},
188
+ eprint={2605.22177},
189
+ archivePrefix={arXiv},
190
+ primaryClass={cs.LG},
191
+ url={https://arxiv.org/abs/2605.22177},
192
+ }
193
+ ```
194
+
195
+ ## Links
196
+
197
+ - Code: [https://github.com/jinyangwu/Maestro](https://github.com/jinyangwu/Maestro)
198
+ - Model: [https://huggingface.co/Jinyang23/Maestro-4B](https://huggingface.co/Jinyang23/Maestro-4B)
199
+
200
+ ## Acknowledgement
201
+
202
+ This project builds on open-source reinforcement learning and model-serving ecosystems, including `verl` and vLLM. We thank the authors and contributors of these projects, as well as the developers of the expert models and skill implementations used by MAESTRO.