POMO: Policy Optimization with Multiple Optima for Reinforcement Learning
Paper • 2010.16011 • Published
ONNX-exported neural network for solving Capacitated Vehicle Routing Problems (CVRP) in drone delivery contexts.
The model supports dynamic batch and problem sizes up to 200 nodes.
| Port | Name | Shape | Type |
|---|---|---|---|
| Input | locs |
[batch, nodes, 3] |
float32 |
| Input | demand |
[batch, nodes, 1] |
float32 |
| Input | capacity |
[batch, 1] |
float32 |
| Output | actions |
[batch, max_steps] |
int64 |
| Output | log_p |
[batch, max_steps] |
float32 |
The wrapper automatically prepends a depot at (0.5, 0.5, 0.0) with zero demand. Set capacity high (e.g. 999) for unconstrained TSP optimization, or use the trained value (40) for standard CVRP-50 benchmark routing.
| Parameter | Value |
|---|---|
| Problem size | 50 customers |
| Batch size | 512–1024 |
| Epochs | 500–1000 |
| Learning rate | 1e-4 (cosine annealing) |
| Warmup | 10 epochs |
| Demand range | {1..9} |
| Vehicle capacity | 40 |
| Entropy bonus | 0.01 |
import numpy as np
import onnxruntime as ort
session = ort.InferenceSession("cvrp50_model.onnx")
locs = np.random.rand(1, 10, 3).astype(np.float32)
demand = np.ones((1, 10, 1), dtype=np.float32)
capacity = np.array([[999.0]], dtype=np.float32)
actions, log_p = session.run(None, {
"locs": locs,
"demand": demand,
"capacity": capacity,
})
print(actions[0, :20])
Try it live in the Drone VRP Space — side-by-side comparison with a nearest-neighbor heuristic, physics-based energy modeling, and interactive route visualization.