PaddleOCR-VL-1.5 0.9B request

#3
by snowfluke - opened

Hello, community.

I've been struggling to convert PaddleOCR-VL-1.5 0.9B into onnx version. I hit a hard brick

Here's the summary:

Root Problem
We're trying to convert PaddleOCR-VL-1.5 (a custom VLM by PaddlePaddle) to ONNX for use in Bun.js via transformers.js.

Specific Problems

  1. Custom architecture โ€” PaddleOCRVLConfig is not registered in transformers' AutoModel registry, so standard export tools don't recognize it

  2. Dependency hell โ€” The conversion requires:

    • optimum 1.x for main_export API (moved to optimum-onnx in 2.x)
    • transformers >= 4.51 for the model's own code (masking_utils, create_causal_mask, etc.)
    • But optimum 1.x was written for transformers ~4.43 โ€” these two requirements are mutually exclusive
  3. Removed symbols โ€” Every optimum 1.x file imports symbols that newer transformers deleted (is_tf_available, TF2_WEIGHTS_NAME, download_url, is_remote_url, get_parameter_dtype, SlidingWindowCache) โ€” patching them one by one is endless

  4. Model patches needed โ€” The model itself needs patching (flash_attn hard import, rope_config_validation signature)

Can someone help me?

Sign up or log in to comment