A13 / 3rd-gen Neural Engine: text_encoder + vector_estimator plan build fails; 8-step too slow, 4-step quality drops
#1
by shlaikov - opened
Device: iPhone 11 Pro Max, A13 Bionic (3rd-gen ANE), iOS 26.0.1.
Models: Reza2kn/supertonic-3-coreml/fp16, compiled to .mlmodelc via xcrun coremlc compile. CoreMLTools 8.3.0.
What we observed
Tried compute_units ladder [.all β .cpuAndGPU β .cpuOnly] per stage. First success wins:
| Stage | .all (ANE+GPU+CPU) |
.cpuAndGPU |
.cpuOnly |
|---|---|---|---|
| duration_predictor | ok ~1.1s | β | β |
| text_encoder | FAIL "Error in building plan." (~5s) | FAIL same (~5s) | ok ~1s |
| vector_estimator | (skipped, planner can wedge for minutes) | ok ~3.6s | ok |
| vocoder | ok ~2.3s | β | β |
So on A13: DP and Voc run on ANE, TE on scalar CPU, VE on Metal GPU.
Per-chunk timing (T=L=320, FP16, voice F1, 8 Euler steps)
- DP: 15β60 ms Β· TE: 17β50 ms Β· Voc: 250β400 ms
- VE: 10β14 s per chunk (~1.3 s/step on A13 GPU; cold 10 s, throttles to 14 s after ~80 s sustained)
- Total per chunk: 10.5β14 s.
Trade-off we hit
- 8 steps β above timings: too slow for our use.
- 4 steps β VE halves to ~6.3 s, but audio quality drops audibly on RU: grainy on sustained vowels, smudged plosives, artifact bursts on chunk boundaries. Not usable.
Questions
- Is the ANE planner rejection on A13 expected? Could a re-export with op patterns 3rd-gen Neural Engine accepts (e.g. unfused rotary cos/sin) light up ANE for TE/VE?
- Any INT8 or step-distilled variant you're considering for older devices?
- Is 4-step quality drop expected for flow-matching at this scale, or is it FP16/voice-specific?