1. Base Model Information
✅ This model is built on the publicly released pre-trained weights of SoftFold by the X-VLA team as the backbone model.
✅ It inherits SoftFold’s native capabilities in modeling soft deformation of clothing and category-specific features, making it suitable for fine-grained manipulation tasks involving fabric materials.
2. Core Model Training Details
Training Method
Adopts the LoRA (Low-Rank Adaptation) lightweight fine-tuning scheme. While retaining the core capabilities of the base model, it performs targeted parameter fine-tuning for the dedicated clothes folding task. Only low-rank adaptation matrices are trained, which significantly reduces training costs and avoids degradation of the base model’s performance.
Training Data Scale
Training was completed on 2 independent task-specific datasets for clothes folding, with a total of 100 full episodes of training data.
Training Iterations & Final Deployment Weights
✅ Final version used in deployment environment: Checkpoint (ckpt) weight file generated after 25k steps (25,000 training steps).
This version is verified as the optimal one, balancing the accuracy of folding actions, the stability of clothing posture convergence, and inference efficiency.
3. Model Architecture & Fine-Tuning Features
Backbone: X-VLA SoftFold (vision-language-action fusion architecture, suitable for robotic manipulation tasks).
Fine-Tuning Strategy: LoRA lightweight adaptation. Only the low-rank matrix parameters of the model are fine-tuned, while the backbone weights of the base model are frozen, ensuring training efficiency and model stability.
Weight Form: The final deployment ckpt is a fused weight package of "SoftFold base weights + LoRA fine-tuned adaptation weights", which can be directly loaded for inference without additional fusion operations.
4. Key Strengths
Lightweight Fine-Tuning: The LoRA scheme only updates a small number of parameters, achieving high training efficiency and avoiding catastrophic forgetting of the base model.
Task-Specific Data: Trained on 100 episodes of clothes folding data.
Optimal Deployment Version: The 25k steps ckpt is the verified optimal weight, balancing accuracy, inference speed, and real-time requirements of the deployment environment.
5. Weight File Note
Deployment Weight Identifier: 25k steps folding lora ckpt
Weight Source: Complete checkpoint file retained at 25,000 training steps after fine-tuning based on the base model + LoRA.
Compatibility: Fully compatible with the original inference framework and code of X-VLA SoftFold, and can be used to replace base weights seamlessly.
6. Limitations
Training data is based on 100 episodes from 2 specified datasets, resulting in limited generalization ability for other clothing.
This weight is a task-specific fine-tuned version, only suitable for clothes folding tasks. Retraining is required for migration to other manipulation tasks.
7. Version Information
Core Identifier: SoftFold-Base + LoRA (100 episodes) + 25k steps ckpt
Status: Training Completed
Supplementary Notes
Core Pipeline: X-VLA SoftFold Pre-trained Weights → LoRA Low-Rank Fine-Tuning (2 datasets/100 episodes) → Multi-step Checkpoint Validation → Selection of 25k steps weights as the final deployment version.
Model Positioning: Task-specific fine-tuned weights, not an update to the general base model.