Move own-solver/PyTorch -to- ONNX.md to own-solver/
Browse files
own-solver/PyTorch -to- ONNX.md
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Yes, absolutely viable — and it's probably the right move for the next batch of tasks. Here's the key tradeoff:
|
| 2 |
+
|
| 3 |
+
Why it works
|
| 4 |
+
PyTorch → ONNX export is straightforward for the ops we need:
|
| 5 |
+
|
| 6 |
+
torch.onnx.export(model, dummy_input, "task.onnx", opset_version=10)
|
| 7 |
+
This lets us use PyTorch's optimizer (Adam, SGD) to train small networks per-task via gradient descent instead of lstsq. That's strictly more powerful:
|
| 8 |
+
|
| 9 |
+
lstsq (current) PyTorch + gradient descent
|
| 10 |
+
Architecture Single conv layer only Multi-layer, nonlinear (ReLU, etc.)
|
| 11 |
+
Fitting Closed-form, one-shot Iterative, can fit complex patterns
|
| 12 |
+
Failure mode Underfits if pattern isn't linear Can overfit (which is GOOD here — we WANT to memorize the training examples)
|
| 13 |
+
Speed Fast for small kernels Slower, but GPU helps
|
| 14 |
+
The 94 unsolved tasks fail because a single conv layer can't express the transformation. A 2-3 layer network with ReLU activations could.
|
| 15 |
+
|
| 16 |
+
Why it's not trivial
|
| 17 |
+
Opset 10 restrictions. Not every PyTorch op exports cleanly to opset 10. You need to verify the exported model actually runs. The main gotchas:
|
| 18 |
+
|
| 19 |
+
torch.nn.Conv2d → Conv ✅ (opset 1)
|
| 20 |
+
torch.relu → Relu ✅ (opset 1)
|
| 21 |
+
torch.argmax → ArgMax ✅ (opset 1)
|
| 22 |
+
torch.gather → Gather ✅ (opset 1)
|
| 23 |
+
BUT: F.one_hot → OneHot ❌ (opset 9, but has CUDA issues — we already work around this with Equal+Cast)
|
| 24 |
+
Dynamic shapes / control flow → ❌ forbidden
|
| 25 |
+
So the pattern would be: build a small static PyTorch network, train it to perfectly fit the task's train+test examples, then export.
|
| 26 |
+
The network must have fixed tensor shapes throughout (no data-dependent branching).
|
| 27 |
+
|
| 28 |
+
What architecture to try
|
| 29 |
+
For the 94 unsolved tasks, a 2-layer conv net would be the first thing:
|
| 30 |
+
|
| 31 |
+
class TinyARC(nn.Module):
|
| 32 |
+
def __init__(self, hidden=32, ks=5):
|
| 33 |
+
super().__init__()
|
| 34 |
+
self.conv1 = nn.Conv2d(10, hidden, ks, padding=ks//2)
|
| 35 |
+
self.conv2 = nn.Conv2d(hidden, 10, ks, padding=ks//2)
|
| 36 |
+
|
| 37 |
+
def forward(self, x): # x: [1,10,30,30]
|
| 38 |
+
mask = x.sum(dim=1, keepdim=True) # [1,1,30,30] - valid pixel mask
|
| 39 |
+
h = torch.relu(self.conv1(x))
|
| 40 |
+
logits = self.conv2(h) # [1,10,30,30]
|
| 41 |
+
out = (logits.argmax(dim=1, keepdim=True) == torch.arange(10).reshape(1,10,1,1).to(x.device)).float()
|
| 42 |
+
return out * mask
|
| 43 |
+
Train with cross-entropy on the one-hot targets, export to ONNX. The argmax+equal+cast trick we already use would work in the export.
|
| 44 |
+
|
| 45 |
+
The real blocker isn't PyTorch→ONNX
|
| 46 |
+
It's the variable output shapes. 60+ of the 94 unsolved tasks have outputs that are a different size than the input, and that size varies across examples.
|
| 47 |
+
No static ONNX graph can produce a 5×3 output for one input and a 7×9 output for another — the graph is fixed.
|
| 48 |
+
|
| 49 |
+
For those, the only hope is that the conv learns to put valid content in the right region and zeros elsewhere
|
| 50 |
+
(which our mask = ReduceSum(input) trick handles when output ⊆ input bounds). But when output is LARGER than input, we're stuck without a way to derive the output mask.
|
| 51 |
+
|
| 52 |
+
Bottom line: Yes, build in PyTorch, export to ONNX. Focus the PyTorch solver on the ~10 same-shape tasks and ~7 fixed-diff-shape tasks where lstsq conv failed but a deeper network might succeed. The 77 variable-diff-shape tasks are a harder structural problem regardless of framework.
|