rogermt commited on
Commit
1cc31ee
·
verified ·
1 Parent(s): 1b5636f

Move own-solver/PyTorch -to- ONNX.md to own-solver/

Browse files
Files changed (1) hide show
  1. own-solver/PyTorch -to- ONNX.md +52 -0
own-solver/PyTorch -to- ONNX.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Yes, absolutely viable — and it's probably the right move for the next batch of tasks. Here's the key tradeoff:
2
+
3
+ Why it works
4
+ PyTorch → ONNX export is straightforward for the ops we need:
5
+
6
+ torch.onnx.export(model, dummy_input, "task.onnx", opset_version=10)
7
+ This lets us use PyTorch's optimizer (Adam, SGD) to train small networks per-task via gradient descent instead of lstsq. That's strictly more powerful:
8
+
9
+ lstsq (current) PyTorch + gradient descent
10
+ Architecture Single conv layer only Multi-layer, nonlinear (ReLU, etc.)
11
+ Fitting Closed-form, one-shot Iterative, can fit complex patterns
12
+ Failure mode Underfits if pattern isn't linear Can overfit (which is GOOD here — we WANT to memorize the training examples)
13
+ Speed Fast for small kernels Slower, but GPU helps
14
+ The 94 unsolved tasks fail because a single conv layer can't express the transformation. A 2-3 layer network with ReLU activations could.
15
+
16
+ Why it's not trivial
17
+ Opset 10 restrictions. Not every PyTorch op exports cleanly to opset 10. You need to verify the exported model actually runs. The main gotchas:
18
+
19
+ torch.nn.Conv2d → Conv ✅ (opset 1)
20
+ torch.relu → Relu ✅ (opset 1)
21
+ torch.argmax → ArgMax ✅ (opset 1)
22
+ torch.gather → Gather ✅ (opset 1)
23
+ BUT: F.one_hot → OneHot ❌ (opset 9, but has CUDA issues — we already work around this with Equal+Cast)
24
+ Dynamic shapes / control flow → ❌ forbidden
25
+ So the pattern would be: build a small static PyTorch network, train it to perfectly fit the task's train+test examples, then export.
26
+ The network must have fixed tensor shapes throughout (no data-dependent branching).
27
+
28
+ What architecture to try
29
+ For the 94 unsolved tasks, a 2-layer conv net would be the first thing:
30
+
31
+ class TinyARC(nn.Module):
32
+ def __init__(self, hidden=32, ks=5):
33
+ super().__init__()
34
+ self.conv1 = nn.Conv2d(10, hidden, ks, padding=ks//2)
35
+ self.conv2 = nn.Conv2d(hidden, 10, ks, padding=ks//2)
36
+
37
+ def forward(self, x): # x: [1,10,30,30]
38
+ mask = x.sum(dim=1, keepdim=True) # [1,1,30,30] - valid pixel mask
39
+ h = torch.relu(self.conv1(x))
40
+ logits = self.conv2(h) # [1,10,30,30]
41
+ out = (logits.argmax(dim=1, keepdim=True) == torch.arange(10).reshape(1,10,1,1).to(x.device)).float()
42
+ return out * mask
43
+ Train with cross-entropy on the one-hot targets, export to ONNX. The argmax+equal+cast trick we already use would work in the export.
44
+
45
+ The real blocker isn't PyTorch→ONNX
46
+ It's the variable output shapes. 60+ of the 94 unsolved tasks have outputs that are a different size than the input, and that size varies across examples.
47
+ No static ONNX graph can produce a 5×3 output for one input and a 7×9 output for another — the graph is fixed.
48
+
49
+ For those, the only hope is that the conv learns to put valid content in the right region and zeros elsewhere
50
+ (which our mask = ReduceSum(input) trick handles when output ⊆ input bounds). But when output is LARGER than input, we're stuck without a way to derive the output mask.
51
+
52
+ Bottom line: Yes, build in PyTorch, export to ONNX. Focus the PyTorch solver on the ~10 same-shape tasks and ~7 fixed-diff-shape tasks where lstsq conv failed but a deeper network might succeed. The 77 variable-diff-shape tasks are a harder structural problem regardless of framework.