(sigma_vla) root@C.28189995:/workspace$ cd /workspace && \ for SEED in 1 2 3 4 5 6 7; do echo "===== START TELEPATHY SEED ${SEED} =====" python eval_sigma_vla_rollout.py \ --base_model_id "lerobot/pi05_base" \ --tokenizer_id "google/paligemma-3b-pt-224" \ --artifacts_repo_id "Veltraxor/Sigma" \ --output_dir "/workspace/storage/sigma_eval_out_telepathy_seed${SEED}" \ --batch_size 4 \ --num_workers 2 \ --dtype bf16 \ --shuffle \ --seed ${SEED} echo "===== END TELEPATHY SEED ${SEED} =====" done ===== START TELEPATHY SEED 1 ===== /venv/sigma_vla/lib/python3.10/site-packages/huggingface_hub/file_download.py:982: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder. warnings.warn( Fetching 6 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 8645.08it/s] [INFO] Using cached shard_dir: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace [INFO] Using cached telepathy_heads_path: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt /venv/sigma_vla/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( WARNING:bitsandbytes.cextension:Could not find the bitsandbytes CUDA binary at PosixPath('/venv/sigma_vla/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda126.so') WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. [policies_init] WARNING: optional groot deps missing: Failed to import diffusers.models.modeling_utils because of the following error (look up to see its traceback): No module named 'triton.ops' The PI05 model is a direct port of the OpenPI implementation. This implementation follows the original OpenPI structure for compatibility. Original implementation: https://github.com/Physical-Intelligence/openpi WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. /venv/sigma_vla/lib/python3.10/site-packages/transformers/models/paligemma/configuration_paligemma.py:137: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.44, Please use `text_config.vocab_size` instead. warnings.warn( WARNING:root:[patch_pi05] Could not run transformers version guard (An incorrect transformer version is used, please create an issue on https://github.com/huggingface/lerobot/issues). Continuing without strict transformers check. cannot import name 'check' from 'transformers.models.siglip' (/venv/sigma_vla/lib/python3.10/site-packages/transformers/models/siglip/__init__.py) Loading model from: lerobot/pi05_base ✓ Loaded state dict from model.safetensors WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight Remapped: action_in_proj.bias -> model.action_in_proj.bias Remapped: action_in_proj.weight -> model.action_in_proj.weight Remapped: action_out_proj.bias -> model.action_out_proj.bias Remapped: action_out_proj.weight -> model.action_out_proj.weight Remapped: paligemma_with_expert.gemma_expert.lm_head.weight -> model.paligemma_with_expert.gemma_expert.lm_head.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight Remapped 812 state dict keys Warning: Could not remap state dict keys: Error(s) in loading state_dict for PI05Policy: Missing key(s) in state_dict: "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.language_model.model.embed_tokens.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.norm.weight", "model.paligemma_with_expert.paligemma.language_model.lm_head.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.norm.weight". Unexpected key(s) in state_dict: "model.paligemma_with_expert.paligemma.lm_head.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.norm.weight", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.norm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.norm.dense.weight". /venv/sigma_vla/lib/python3.10/site-packages/torch/nn/modules/transformer.py:382: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True warnings.warn( [CHECK-A] disable_telepathy=False [CHECK-A] telepathy_heads_path=/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt size=561.95MB [CHECK-A] heads_tensors=325 mean=0.002335 std=0.106945 rms=0.106970 [CHECK-A] heads fully matched (no missing/unexpected). [INFO] Found 3 shard files. Example: ['/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00000.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00001.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00002.pt'] [CHECK-B] telepathy_effect_mean_abs_diff(action_vector)=1.849936 batch=0 mse_vec=91.1453 mse_chk=172.6581 mse_trj=149.0812 tau_l2=51.5999 sem_align=0.0451 batch=20 mse_vec=68.8681 mse_chk=116.7822 mse_trj=94.6013 tau_l2=51.6103 sem_align=0.0451 batch=40 mse_vec=93.6274 mse_chk=164.4543 mse_trj=141.4758 tau_l2=51.6014 sem_align=0.0451 batch=60 mse_vec=74.1309 mse_chk=145.8682 mse_trj=120.7362 tau_l2=51.6046 sem_align=0.0451 batch=80 mse_vec=109.8682 mse_chk=203.3160 mse_trj=173.1136 tau_l2=51.5970 sem_align=0.0451 batch=100 mse_vec=62.8688 mse_chk=230.1524 mse_trj=199.0094 tau_l2=51.5977 sem_align=0.0451 batch=120 mse_vec=105.8950 mse_chk=241.3408 mse_trj=208.7307 tau_l2=51.5932 sem_align=0.0451 batch=140 mse_vec=102.8104 mse_chk=252.0382 mse_trj=206.8071 tau_l2=51.5942 sem_align=0.0451 batch=160 mse_vec=99.5331 mse_chk=161.0820 mse_trj=141.8014 tau_l2=51.6012 sem_align=0.0451 batch=180 mse_vec=71.2653 mse_chk=120.7163 mse_trj=109.6980 tau_l2=51.6042 sem_align=0.0451 [DONE] Saved report: {'num_samples': 723, 'num_batches': 181, 'avg_mse_vector': 79.0839328449734, 'avg_mse_chunk': 202.99226299306963, 'avg_mse_traj': 174.65617357960065, 'avg_tau_l2': 51.59832373771878, 'avg_semantic_text_alignment': 0.04513881746576636, 'hard_thresholds': {'vec': 0.1, 'chk': 0.2, 'trj': 0.2}, 'avg_hard_mse_vector': 79.09474697086657, 'avg_hard_mse_chunk': 203.10606166897637, 'avg_hard_mse_traj': 174.7460184479815, 'hard_sample_fraction': 1.0, 'total_hard_samples': 723} ===== END TELEPATHY SEED 1 ===== ===== START TELEPATHY SEED 2 ===== /venv/sigma_vla/lib/python3.10/site-packages/huggingface_hub/file_download.py:982: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder. warnings.warn( Fetching 6 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 47482.69it/s] [INFO] Using cached shard_dir: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace [INFO] Using cached telepathy_heads_path: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt /venv/sigma_vla/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( WARNING:bitsandbytes.cextension:Could not find the bitsandbytes CUDA binary at PosixPath('/venv/sigma_vla/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda126.so') WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. [policies_init] WARNING: optional groot deps missing: Failed to import diffusers.models.modeling_utils because of the following error (look up to see its traceback): No module named 'triton.ops' The PI05 model is a direct port of the OpenPI implementation. This implementation follows the original OpenPI structure for compatibility. Original implementation: https://github.com/Physical-Intelligence/openpi WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. /venv/sigma_vla/lib/python3.10/site-packages/transformers/models/paligemma/configuration_paligemma.py:137: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.44, Please use `text_config.vocab_size` instead. warnings.warn( WARNING:root:[patch_pi05] Could not run transformers version guard (An incorrect transformer version is used, please create an issue on https://github.com/huggingface/lerobot/issues). Continuing without strict transformers check. cannot import name 'check' from 'transformers.models.siglip' (/venv/sigma_vla/lib/python3.10/site-packages/transformers/models/siglip/__init__.py) Loading model from: lerobot/pi05_base ✓ Loaded state dict from model.safetensors WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight Remapped: action_in_proj.bias -> model.action_in_proj.bias Remapped: action_in_proj.weight -> model.action_in_proj.weight Remapped: action_out_proj.bias -> model.action_out_proj.bias Remapped: action_out_proj.weight -> model.action_out_proj.weight Remapped: paligemma_with_expert.gemma_expert.lm_head.weight -> model.paligemma_with_expert.gemma_expert.lm_head.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight Remapped 812 state dict keys Warning: Could not remap state dict keys: Error(s) in loading state_dict for PI05Policy: Missing key(s) in state_dict: "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.language_model.model.embed_tokens.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.norm.weight", "model.paligemma_with_expert.paligemma.language_model.lm_head.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.norm.weight". Unexpected key(s) in state_dict: "model.paligemma_with_expert.paligemma.lm_head.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.norm.weight", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.norm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.norm.dense.weight". /venv/sigma_vla/lib/python3.10/site-packages/torch/nn/modules/transformer.py:382: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True warnings.warn( [CHECK-A] disable_telepathy=False [CHECK-A] telepathy_heads_path=/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt size=561.95MB [CHECK-A] heads_tensors=325 mean=0.002335 std=0.106945 rms=0.106970 [CHECK-A] heads fully matched (no missing/unexpected). [INFO] Found 3 shard files. Example: ['/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00000.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00001.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00002.pt'] [CHECK-B] telepathy_effect_mean_abs_diff(action_vector)=1.849930 batch=0 mse_vec=53.4704 mse_chk=200.9438 mse_trj=167.1021 tau_l2=51.5998 sem_align=0.0730 batch=20 mse_vec=100.6955 mse_chk=209.7466 mse_trj=174.2008 tau_l2=51.5959 sem_align=0.0729 batch=40 mse_vec=89.1137 mse_chk=206.1420 mse_trj=184.1404 tau_l2=51.5968 sem_align=0.0730 batch=60 mse_vec=64.5972 mse_chk=228.2550 mse_trj=187.9087 tau_l2=51.5955 sem_align=0.0729 batch=80 mse_vec=59.2233 mse_chk=101.2093 mse_trj=99.0826 tau_l2=51.6064 sem_align=0.0729 batch=100 mse_vec=97.7206 mse_chk=229.9763 mse_trj=210.1980 tau_l2=51.5925 sem_align=0.0730 batch=120 mse_vec=95.5264 mse_chk=306.5800 mse_trj=242.9746 tau_l2=51.5914 sem_align=0.0730 batch=140 mse_vec=81.5597 mse_chk=276.1557 mse_trj=236.3938 tau_l2=51.5909 sem_align=0.0730 batch=160 mse_vec=47.0134 mse_chk=170.0403 mse_trj=143.9573 tau_l2=51.6031 sem_align=0.0729 batch=180 mse_vec=78.9725 mse_chk=190.2414 mse_trj=170.5679 tau_l2=51.5976 sem_align=0.0729 [DONE] Saved report: {'num_samples': 723, 'num_batches': 181, 'avg_mse_vector': 79.09448435029931, 'avg_mse_chunk': 203.08874456921993, 'avg_mse_traj': 174.74055544184057, 'avg_tau_l2': 51.59744106735314, 'avg_semantic_text_alignment': 0.07294989883570381, 'hard_thresholds': {'vec': 0.1, 'chk': 0.2, 'trj': 0.2}, 'avg_hard_mse_vector': 79.09465275892413, 'avg_hard_mse_chunk': 203.1065152116831, 'avg_hard_mse_traj': 174.74632362376275, 'hard_sample_fraction': 1.0, 'total_hard_samples': 723} ===== END TELEPATHY SEED 2 ===== ===== START TELEPATHY SEED 3 ===== /venv/sigma_vla/lib/python3.10/site-packages/huggingface_hub/file_download.py:982: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder. warnings.warn( Fetching 6 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 5716.91it/s] [INFO] Using cached shard_dir: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace [INFO] Using cached telepathy_heads_path: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt /venv/sigma_vla/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( WARNING:bitsandbytes.cextension:Could not find the bitsandbytes CUDA binary at PosixPath('/venv/sigma_vla/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda126.so') WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. [policies_init] WARNING: optional groot deps missing: Failed to import diffusers.models.modeling_utils because of the following error (look up to see its traceback): No module named 'triton.ops' The PI05 model is a direct port of the OpenPI implementation. This implementation follows the original OpenPI structure for compatibility. Original implementation: https://github.com/Physical-Intelligence/openpi WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. /venv/sigma_vla/lib/python3.10/site-packages/transformers/models/paligemma/configuration_paligemma.py:137: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.44, Please use `text_config.vocab_size` instead. warnings.warn( WARNING:root:[patch_pi05] Could not run transformers version guard (An incorrect transformer version is used, please create an issue on https://github.com/huggingface/lerobot/issues). Continuing without strict transformers check. cannot import name 'check' from 'transformers.models.siglip' (/venv/sigma_vla/lib/python3.10/site-packages/transformers/models/siglip/__init__.py) Loading model from: lerobot/pi05_base ✓ Loaded state dict from model.safetensors WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight Remapped: action_in_proj.bias -> model.action_in_proj.bias Remapped: action_in_proj.weight -> model.action_in_proj.weight Remapped: action_out_proj.bias -> model.action_out_proj.bias Remapped: action_out_proj.weight -> model.action_out_proj.weight Remapped: paligemma_with_expert.gemma_expert.lm_head.weight -> model.paligemma_with_expert.gemma_expert.lm_head.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight Remapped 812 state dict keys Warning: Could not remap state dict keys: Error(s) in loading state_dict for PI05Policy: Missing key(s) in state_dict: "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.language_model.model.embed_tokens.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.norm.weight", "model.paligemma_with_expert.paligemma.language_model.lm_head.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.norm.weight". Unexpected key(s) in state_dict: "model.paligemma_with_expert.paligemma.lm_head.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.norm.weight", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.norm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.norm.dense.weight". /venv/sigma_vla/lib/python3.10/site-packages/torch/nn/modules/transformer.py:382: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True warnings.warn( [CHECK-A] disable_telepathy=False [CHECK-A] telepathy_heads_path=/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt size=561.95MB [CHECK-A] heads_tensors=325 mean=0.002335 std=0.106945 rms=0.106970 [CHECK-A] heads fully matched (no missing/unexpected). [INFO] Found 3 shard files. Example: ['/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00000.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00001.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00002.pt'] [CHECK-B] telepathy_effect_mean_abs_diff(action_vector)=1.850036 batch=0 mse_vec=50.3579 mse_chk=161.7480 mse_trj=125.3196 tau_l2=51.6060 sem_align=0.0212 batch=20 mse_vec=65.6432 mse_chk=154.8831 mse_trj=151.7508 tau_l2=51.6013 sem_align=0.0212 batch=40 mse_vec=62.7463 mse_chk=212.1584 mse_trj=188.7695 tau_l2=51.5975 sem_align=0.0212 batch=60 mse_vec=95.6893 mse_chk=245.5913 mse_trj=212.9743 tau_l2=51.5935 sem_align=0.0212 batch=80 mse_vec=65.8146 mse_chk=207.9756 mse_trj=182.5015 tau_l2=51.5969 sem_align=0.0212 batch=100 mse_vec=76.7893 mse_chk=219.2563 mse_trj=190.6781 tau_l2=51.5975 sem_align=0.0212 batch=120 mse_vec=81.8499 mse_chk=292.7881 mse_trj=249.0664 tau_l2=51.5902 sem_align=0.0212 batch=140 mse_vec=122.3695 mse_chk=156.1362 mse_trj=138.5501 tau_l2=51.5987 sem_align=0.0212 batch=160 mse_vec=109.8613 mse_chk=230.8596 mse_trj=193.3161 tau_l2=51.5942 sem_align=0.0212 batch=180 mse_vec=84.0838 mse_chk=214.7054 mse_trj=178.4077 tau_l2=51.5970 sem_align=0.0212 [DONE] Saved report: {'num_samples': 723, 'num_batches': 181, 'avg_mse_vector': 79.10140119731756, 'avg_mse_chunk': 203.1217087382111, 'avg_mse_traj': 174.75099873411062, 'avg_tau_l2': 51.59852753950087, 'avg_semantic_text_alignment': 0.021217601188556267, 'hard_thresholds': {'vec': 0.1, 'chk': 0.2, 'trj': 0.2}, 'avg_hard_mse_vector': 79.09450918279413, 'avg_hard_mse_chunk': 203.10568836681742, 'avg_hard_mse_traj': 174.74593812301447, 'hard_sample_fraction': 1.0, 'total_hard_samples': 723} ===== END TELEPATHY SEED 3 ===== ===== START TELEPATHY SEED 4 ===== /venv/sigma_vla/lib/python3.10/site-packages/huggingface_hub/file_download.py:982: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder. warnings.warn( Fetching 6 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 1428.50it/s] [INFO] Using cached shard_dir: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace [INFO] Using cached telepathy_heads_path: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt /venv/sigma_vla/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( WARNING:bitsandbytes.cextension:Could not find the bitsandbytes CUDA binary at PosixPath('/venv/sigma_vla/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda126.so') WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. [policies_init] WARNING: optional groot deps missing: Failed to import diffusers.models.modeling_utils because of the following error (look up to see its traceback): No module named 'triton.ops' The PI05 model is a direct port of the OpenPI implementation. This implementation follows the original OpenPI structure for compatibility. Original implementation: https://github.com/Physical-Intelligence/openpi WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. /venv/sigma_vla/lib/python3.10/site-packages/transformers/models/paligemma/configuration_paligemma.py:137: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.44, Please use `text_config.vocab_size` instead. warnings.warn( WARNING:root:[patch_pi05] Could not run transformers version guard (An incorrect transformer version is used, please create an issue on https://github.com/huggingface/lerobot/issues). Continuing without strict transformers check. cannot import name 'check' from 'transformers.models.siglip' (/venv/sigma_vla/lib/python3.10/site-packages/transformers/models/siglip/__init__.py) Loading model from: lerobot/pi05_base ✓ Loaded state dict from model.safetensors WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight Remapped: action_in_proj.bias -> model.action_in_proj.bias Remapped: action_in_proj.weight -> model.action_in_proj.weight Remapped: action_out_proj.bias -> model.action_out_proj.bias Remapped: action_out_proj.weight -> model.action_out_proj.weight Remapped: paligemma_with_expert.gemma_expert.lm_head.weight -> model.paligemma_with_expert.gemma_expert.lm_head.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight Remapped 812 state dict keys Warning: Could not remap state dict keys: Error(s) in loading state_dict for PI05Policy: Missing key(s) in state_dict: "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.language_model.model.embed_tokens.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.norm.weight", "model.paligemma_with_expert.paligemma.language_model.lm_head.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.norm.weight". Unexpected key(s) in state_dict: "model.paligemma_with_expert.paligemma.lm_head.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.norm.weight", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.norm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.norm.dense.weight". /venv/sigma_vla/lib/python3.10/site-packages/torch/nn/modules/transformer.py:382: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True warnings.warn( [CHECK-A] disable_telepathy=False [CHECK-A] telepathy_heads_path=/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt size=561.95MB [CHECK-A] heads_tensors=325 mean=0.002335 std=0.106945 rms=0.106970 [CHECK-A] heads fully matched (no missing/unexpected). [INFO] Found 3 shard files. Example: ['/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00000.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00001.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00002.pt'] [CHECK-B] telepathy_effect_mean_abs_diff(action_vector)=1.849815 batch=0 mse_vec=59.5022 mse_chk=262.8067 mse_trj=232.0481 tau_l2=51.5932 sem_align=-0.0202 batch=20 mse_vec=69.7500 mse_chk=214.8774 mse_trj=176.4210 tau_l2=51.5972 sem_align=-0.0202 batch=40 mse_vec=78.8262 mse_chk=176.1915 mse_trj=161.7353 tau_l2=51.5991 sem_align=-0.0202 batch=60 mse_vec=83.2721 mse_chk=219.7870 mse_trj=186.8151 tau_l2=51.5959 sem_align=-0.0202 batch=80 mse_vec=84.8577 mse_chk=217.7641 mse_trj=180.5562 tau_l2=51.5949 sem_align=-0.0202 batch=100 mse_vec=90.0467 mse_chk=226.7838 mse_trj=191.9354 tau_l2=51.5934 sem_align=-0.0202 batch=120 mse_vec=52.9022 mse_chk=212.5053 mse_trj=189.4557 tau_l2=51.5972 sem_align=-0.0202 batch=140 mse_vec=59.6478 mse_chk=225.3488 mse_trj=189.5491 tau_l2=51.5980 sem_align=-0.0202 batch=160 mse_vec=105.7627 mse_chk=187.3916 mse_trj=154.2318 tau_l2=51.5977 sem_align=-0.0202 batch=180 mse_vec=47.1299 mse_chk=143.0543 mse_trj=118.0026 tau_l2=51.6054 sem_align=-0.0202 [DONE] Saved report: {'num_samples': 723, 'num_batches': 181, 'avg_mse_vector': 79.05039384747079, 'avg_mse_chunk': 203.02360113154458, 'avg_mse_traj': 174.66807813275585, 'avg_tau_l2': 51.59752522252541, 'avg_semantic_text_alignment': -0.020162050010255686, 'hard_thresholds': {'vec': 0.1, 'chk': 0.2, 'trj': 0.2}, 'avg_hard_mse_vector': 79.0945430982492, 'avg_hard_mse_chunk': 203.10654733322798, 'avg_hard_mse_traj': 174.74645160342646, 'hard_sample_fraction': 1.0, 'total_hard_samples': 723} ===== END TELEPATHY SEED 4 ===== ===== START TELEPATHY SEED 5 ===== /venv/sigma_vla/lib/python3.10/site-packages/huggingface_hub/file_download.py:982: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder. warnings.warn( Fetching 6 files: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 24696.59it/s] [INFO] Using cached shard_dir: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace [INFO] Using cached telepathy_heads_path: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt /venv/sigma_vla/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( WARNING:bitsandbytes.cextension:Could not find the bitsandbytes CUDA binary at PosixPath('/venv/sigma_vla/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda126.so') WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. [policies_init] WARNING: optional groot deps missing: Failed to import diffusers.models.modeling_utils because of the following error (look up to see its traceback): No module named 'triton.ops' The PI05 model is a direct port of the OpenPI implementation. This implementation follows the original OpenPI structure for compatibility. Original implementation: https://github.com/Physical-Intelligence/openpi WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. /venv/sigma_vla/lib/python3.10/site-packages/transformers/models/paligemma/configuration_paligemma.py:137: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.44, Please use `text_config.vocab_size` instead. warnings.warn( WARNING:root:[patch_pi05] Could not run transformers version guard (An incorrect transformer version is used, please create an issue on https://github.com/huggingface/lerobot/issues). Continuing without strict transformers check. cannot import name 'check' from 'transformers.models.siglip' (/venv/sigma_vla/lib/python3.10/site-packages/transformers/models/siglip/__init__.py) Loading model from: lerobot/pi05_base ✓ Loaded state dict from model.safetensors WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight Remapped: action_in_proj.bias -> model.action_in_proj.bias Remapped: action_in_proj.weight -> model.action_in_proj.weight Remapped: action_out_proj.bias -> model.action_out_proj.bias Remapped: action_out_proj.weight -> model.action_out_proj.weight Remapped: paligemma_with_expert.gemma_expert.lm_head.weight -> model.paligemma_with_expert.gemma_expert.lm_head.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight Remapped 812 state dict keys Warning: Could not remap state dict keys: Error(s) in loading state_dict for PI05Policy: Missing key(s) in state_dict: "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.language_model.model.embed_tokens.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.norm.weight", "model.paligemma_with_expert.paligemma.language_model.lm_head.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.norm.weight". Unexpected key(s) in state_dict: "model.paligemma_with_expert.paligemma.lm_head.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.norm.weight", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.norm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.norm.dense.weight". /venv/sigma_vla/lib/python3.10/site-packages/torch/nn/modules/transformer.py:382: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True warnings.warn( [CHECK-A] disable_telepathy=False [CHECK-A] telepathy_heads_path=/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt size=561.95MB [CHECK-A] heads_tensors=325 mean=0.002335 std=0.106945 rms=0.106970 [CHECK-A] heads fully matched (no missing/unexpected). [INFO] Found 3 shard files. Example: ['/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00000.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00001.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00002.pt'] [CHECK-B] telepathy_effect_mean_abs_diff(action_vector)=1.849881 batch=0 mse_vec=84.9373 mse_chk=211.5600 mse_trj=183.6418 tau_l2=51.5972 sem_align=-0.0170 batch=20 mse_vec=73.5967 mse_chk=197.4458 mse_trj=168.2540 tau_l2=51.5991 sem_align=-0.0170 batch=40 mse_vec=81.2562 mse_chk=228.1887 mse_trj=190.8593 tau_l2=51.5960 sem_align=-0.0170 batch=60 mse_vec=67.7136 mse_chk=251.3496 mse_trj=207.3730 tau_l2=51.5935 sem_align=-0.0170 batch=80 mse_vec=101.6583 mse_chk=182.1198 mse_trj=169.3136 tau_l2=51.5947 sem_align=-0.0170 batch=100 mse_vec=65.7184 mse_chk=110.3085 mse_trj=89.5234 tau_l2=51.6092 sem_align=-0.0170 batch=120 mse_vec=73.0949 mse_chk=157.1021 mse_trj=137.9820 tau_l2=51.6010 sem_align=-0.0170 batch=140 mse_vec=74.0891 mse_chk=192.5568 mse_trj=158.6031 tau_l2=51.6023 sem_align=-0.0170 batch=160 mse_vec=52.7596 mse_chk=181.6425 mse_trj=150.3264 tau_l2=51.6027 sem_align=-0.0170 batch=180 mse_vec=63.1710 mse_chk=214.4721 mse_trj=184.6560 tau_l2=51.5997 sem_align=-0.0170 [DONE] Saved report: {'num_samples': 723, 'num_batches': 181, 'avg_mse_vector': 79.07247373675773, 'avg_mse_chunk': 203.12175118462156, 'avg_mse_traj': 174.75985953821004, 'avg_tau_l2': 51.59806983905602, 'avg_semantic_text_alignment': -0.017001329720514255, 'hard_thresholds': {'vec': 0.1, 'chk': 0.2, 'trj': 0.2}, 'avg_hard_mse_vector': 79.09446769070658, 'avg_hard_mse_chunk': 203.10605191854685, 'avg_hard_mse_traj': 174.7461692208571, 'hard_sample_fraction': 1.0, 'total_hard_samples': 723} ===== END TELEPATHY SEED 5 ===== ===== START TELEPATHY SEED 6 ===== /venv/sigma_vla/lib/python3.10/site-packages/huggingface_hub/file_download.py:982: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder. warnings.warn( Fetching 6 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 9649.47it/s] [INFO] Using cached shard_dir: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace [INFO] Using cached telepathy_heads_path: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt /venv/sigma_vla/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( WARNING:bitsandbytes.cextension:Could not find the bitsandbytes CUDA binary at PosixPath('/venv/sigma_vla/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda126.so') WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. [policies_init] WARNING: optional groot deps missing: Failed to import diffusers.models.modeling_utils because of the following error (look up to see its traceback): No module named 'triton.ops' The PI05 model is a direct port of the OpenPI implementation. This implementation follows the original OpenPI structure for compatibility. Original implementation: https://github.com/Physical-Intelligence/openpi WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. /venv/sigma_vla/lib/python3.10/site-packages/transformers/models/paligemma/configuration_paligemma.py:137: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.44, Please use `text_config.vocab_size` instead. warnings.warn( WARNING:root:[patch_pi05] Could not run transformers version guard (An incorrect transformer version is used, please create an issue on https://github.com/huggingface/lerobot/issues). Continuing without strict transformers check. cannot import name 'check' from 'transformers.models.siglip' (/venv/sigma_vla/lib/python3.10/site-packages/transformers/models/siglip/__init__.py) Loading model from: lerobot/pi05_base ✓ Loaded state dict from model.safetensors WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight Remapped: action_in_proj.bias -> model.action_in_proj.bias Remapped: action_in_proj.weight -> model.action_in_proj.weight Remapped: action_out_proj.bias -> model.action_out_proj.bias Remapped: action_out_proj.weight -> model.action_out_proj.weight Remapped: paligemma_with_expert.gemma_expert.lm_head.weight -> model.paligemma_with_expert.gemma_expert.lm_head.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight Remapped 812 state dict keys Warning: Could not remap state dict keys: Error(s) in loading state_dict for PI05Policy: Missing key(s) in state_dict: "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.language_model.model.embed_tokens.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.norm.weight", "model.paligemma_with_expert.paligemma.language_model.lm_head.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.norm.weight". Unexpected key(s) in state_dict: "model.paligemma_with_expert.paligemma.lm_head.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.norm.weight", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.norm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.norm.dense.weight". /venv/sigma_vla/lib/python3.10/site-packages/torch/nn/modules/transformer.py:382: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True warnings.warn( [CHECK-A] disable_telepathy=False [CHECK-A] telepathy_heads_path=/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt size=561.95MB [CHECK-A] heads_tensors=325 mean=0.002335 std=0.106945 rms=0.106970 [CHECK-A] heads fully matched (no missing/unexpected). [INFO] Found 3 shard files. Example: ['/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00000.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00001.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00002.pt'] [CHECK-B] telepathy_effect_mean_abs_diff(action_vector)=1.849967 batch=0 mse_vec=97.3612 mse_chk=148.2933 mse_trj=137.6667 tau_l2=51.6018 sem_align=-0.0546 batch=20 mse_vec=58.6521 mse_chk=222.3761 mse_trj=184.3230 tau_l2=51.5995 sem_align=-0.0546 batch=40 mse_vec=79.1016 mse_chk=183.8757 mse_trj=161.7383 tau_l2=51.6002 sem_align=-0.0546 batch=60 mse_vec=73.7469 mse_chk=172.4991 mse_trj=145.1159 tau_l2=51.6007 sem_align=-0.0546 batch=80 mse_vec=52.5184 mse_chk=129.6025 mse_trj=107.8588 tau_l2=51.6095 sem_align=-0.0546 batch=100 mse_vec=95.1627 mse_chk=188.9693 mse_trj=161.0506 tau_l2=51.5985 sem_align=-0.0546 batch=120 mse_vec=99.8815 mse_chk=190.9049 mse_trj=156.4361 tau_l2=51.6004 sem_align=-0.0546 batch=140 mse_vec=86.2409 mse_chk=165.1766 mse_trj=143.0694 tau_l2=51.6028 sem_align=-0.0546 batch=160 mse_vec=37.2533 mse_chk=117.7467 mse_trj=102.0079 tau_l2=51.6097 sem_align=-0.0546 batch=180 mse_vec=97.1493 mse_chk=196.6505 mse_trj=166.4396 tau_l2=51.5978 sem_align=-0.0546 [DONE] Saved report: {'num_samples': 723, 'num_batches': 181, 'avg_mse_vector': 79.11951998715901, 'avg_mse_chunk': 203.09671829813752, 'avg_mse_traj': 174.7343520275137, 'avg_tau_l2': 51.59876276511514, 'avg_semantic_text_alignment': -0.05463192686026926, 'hard_thresholds': {'vec': 0.1, 'chk': 0.2, 'trj': 0.2}, 'avg_hard_mse_vector': 79.09458201554801, 'avg_hard_mse_chunk': 203.1056346339309, 'avg_hard_mse_traj': 174.7458247478902, 'hard_sample_fraction': 1.0, 'total_hard_samples': 723} ===== END TELEPATHY SEED 6 ===== ===== START TELEPATHY SEED 7 ===== /venv/sigma_vla/lib/python3.10/site-packages/huggingface_hub/file_download.py:982: UserWarning: `local_dir_use_symlinks` parameter is deprecated and will be ignored. The process to download files to a local folder has been updated and do not rely on symlinks anymore. You only need to pass a destination folder as`local_dir`. For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder. warnings.warn( Fetching 6 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 1775.49it/s] [INFO] Using cached shard_dir: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace [INFO] Using cached telepathy_heads_path: /workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt /venv/sigma_vla/lib/python3.10/site-packages/transformers/utils/hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. warnings.warn( WARNING:bitsandbytes.cextension:Could not find the bitsandbytes CUDA binary at PosixPath('/venv/sigma_vla/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda126.so') WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. [policies_init] WARNING: optional groot deps missing: Failed to import diffusers.models.modeling_utils because of the following error (look up to see its traceback): No module named 'triton.ops' The PI05 model is a direct port of the OpenPI implementation. This implementation follows the original OpenPI structure for compatibility. Original implementation: https://github.com/Physical-Intelligence/openpi WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. WARNING:lerobot.configs.policies:Device 'mps' is not available. Switching to 'cuda'. /venv/sigma_vla/lib/python3.10/site-packages/transformers/models/paligemma/configuration_paligemma.py:137: FutureWarning: The `vocab_size` attribute is deprecated and will be removed in v4.44, Please use `text_config.vocab_size` instead. warnings.warn( WARNING:root:[patch_pi05] Could not run transformers version guard (An incorrect transformer version is used, please create an issue on https://github.com/huggingface/lerobot/issues). Continuing without strict transformers check. cannot import name 'check' from 'transformers.models.siglip' (/venv/sigma_vla/lib/python3.10/site-packages/transformers/models/siglip/__init__.py) Loading model from: lerobot/pi05_base ✓ Loaded state dict from model.safetensors WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias WARNING:root:Vision embedding key might need handling: paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight Remapped: action_in_proj.bias -> model.action_in_proj.bias Remapped: action_in_proj.weight -> model.action_in_proj.weight Remapped: action_out_proj.bias -> model.action_out_proj.bias Remapped: action_out_proj.weight -> model.action_out_proj.weight Remapped: paligemma_with_expert.gemma_expert.lm_head.weight -> model.paligemma_with_expert.gemma_expert.lm_head.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias Remapped: paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.down_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.gate_proj.weight Remapped: paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight -> model.paligemma_with_expert.gemma_expert.model.layers.0.mlp.up_proj.weight Remapped 812 state dict keys Warning: Could not remap state dict keys: Error(s) in loading state_dict for PI05Policy: Missing key(s) in state_dict: "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.paligemma.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.language_model.model.embed_tokens.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.language_model.model.norm.weight", "model.paligemma_with_expert.paligemma.language_model.lm_head.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.norm.weight". Unexpected key(s) in state_dict: "model.paligemma_with_expert.paligemma.lm_head.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.input_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.down_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.gate_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.mlp.up_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.post_attention_layernorm.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.o_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.language_model.norm.weight", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.bias", "model.paligemma_with_expert.paligemma.model.multi_modal_projector.linear.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.patch_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.embeddings.position_embedding.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.1.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.10.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.11.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.12.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.13.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.14.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.15.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.16.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.17.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.18.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.19.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.2.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.20.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.21.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.22.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.23.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.24.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.25.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.26.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.3.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.4.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.5.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.6.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.7.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.8.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.layer_norm2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc1.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.mlp.fc2.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.k_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.out_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.q_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.encoder.layers.9.self_attn.v_proj.weight", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.bias", "model.paligemma_with_expert.paligemma.model.vision_tower.vision_model.post_layernorm.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.0.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.1.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.2.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.3.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.4.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.5.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.6.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.7.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.8.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.9.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.10.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.11.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.12.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.13.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.14.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.15.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.16.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.input_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.layers.17.post_attention_layernorm.dense.weight", "model.paligemma_with_expert.gemma_expert.model.norm.dense.bias", "model.paligemma_with_expert.gemma_expert.model.norm.dense.weight". /venv/sigma_vla/lib/python3.10/site-packages/torch/nn/modules/transformer.py:382: UserWarning: enable_nested_tensor is True, but self.use_nested_tensor is False because encoder_layer.norm_first was True warnings.warn( [CHECK-A] disable_telepathy=False [CHECK-A] telepathy_heads_path=/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_lora_out/sigma_telepathy_heads.pt size=561.95MB [CHECK-A] heads_tensors=325 mean=0.002335 std=0.106945 rms=0.106970 [CHECK-A] heads fully matched (no missing/unexpected). [INFO] Found 3 shard files. Example: ['/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00000.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00001.pt', '/workspace/.hf_sigma_cache/Veltraxor__Sigma/storage/sigma_pickplace/shard_00002.pt'] [CHECK-B] telepathy_effect_mean_abs_diff(action_vector)=1.849864 batch=0 mse_vec=71.4803 mse_chk=192.0109 mse_trj=178.6827 tau_l2=51.5960 sem_align=-0.0300 batch=20 mse_vec=75.9183 mse_chk=210.7663 mse_trj=180.5662 tau_l2=51.5957 sem_align=-0.0300 batch=40 mse_vec=109.6518 mse_chk=253.0076 mse_trj=212.1419 tau_l2=51.5915 sem_align=-0.0300 batch=60 mse_vec=40.1021 mse_chk=110.8871 mse_trj=93.0982 tau_l2=51.6098 sem_align=-0.0300 batch=80 mse_vec=104.5201 mse_chk=161.9205 mse_trj=151.1866 tau_l2=51.5965 sem_align=-0.0300 batch=100 mse_vec=53.7836 mse_chk=159.0046 mse_trj=139.1124 tau_l2=51.6006 sem_align=-0.0300 batch=120 mse_vec=73.0655 mse_chk=228.6042 mse_trj=196.1060 tau_l2=51.5951 sem_align=-0.0300 batch=140 mse_vec=61.7474 mse_chk=221.5039 mse_trj=199.8378 tau_l2=51.5955 sem_align=-0.0300 batch=160 mse_vec=93.2828 mse_chk=223.5394 mse_trj=204.1155 tau_l2=51.5933 sem_align=-0.0300 batch=180 mse_vec=89.0465 mse_chk=172.3793 mse_trj=139.7676 tau_l2=51.6009 sem_align=-0.0300 [DONE] Saved report: {'num_samples': 723, 'num_batches': 181, 'avg_mse_vector': 79.10848227653715, 'avg_mse_chunk': 203.06493508354734, 'avg_mse_traj': 174.69874151240396, 'avg_tau_l2': 51.59656528873338, 'avg_semantic_text_alignment': -0.029960344807960053, 'hard_thresholds': {'vec': 0.1, 'chk': 0.2, 'trj': 0.2}, 'avg_hard_mse_vector': 79.0947371149129, 'avg_hard_mse_chunk': 203.10737700614033, 'avg_hard_mse_traj': 174.74705524365436, 'hard_sample_fraction': 1.0, 'total_hard_samples': 723} ===== END TELEPATHY SEED 7 =====