xiaomoguhzz commited on
Commit
19cba99
·
verified ·
1 Parent(s): 05695ac

docs: add README describing checkpoint layout and training code mapping

Browse files
Files changed (1) hide show
  1. README.md +39 -0
README.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ tags:
5
+ - vision-encoder
6
+ - distillation
7
+ - video-language
8
+ - siglip2
9
+ - dinov3
10
+ ---
11
+
12
+ # VisionEncoder Checkpoints
13
+
14
+ Final model checkpoints from the **VisionEncoder** research project.
15
+
16
+ **Training code**: https://github.com/xiaomoguhz/VisionEncoder
17
+
18
+ ## Contents
19
+
20
+ Each directory corresponds to one training pipeline in the code repo:
21
+
22
+ | Directory | Training code |
23
+ |---|---|
24
+ | `declip_siglip2/spatial_align/` | `declip_siglip2/` — DeCLIP spatial alignment distillation on SigLIP2 using DINOv2 / DINOv3 as teacher |
25
+ | `kd_mllm/s1_kd_pretrain/` | `ms-swift/kd_mllm/` stage-1 pretrain (`ms-swift/run_s1.sh`) |
26
+ | `kd_mllm/s1_siglip2_qwen3_4b/` | `ms-swift/kd_mllm/` stage-1, SigLIP2 + Qwen3-4B backbone |
27
+ | `kd_mllm/s2_siglip2_qwen3_4b_10pct/` | `ms-swift/kd_mllm/` stage-2 SFT on 10% data (`run_s2.sh`) |
28
+ | `self_refine/qwen3vl_2b_10pct/` | `ms-swift/self_refine/` — register token injection + auto-calibrated GP threshold loss |
29
+ | `video_mllm_swift/s1_siglip2_qwen3_1.7b/` | `ms-swift/video_mllm/` stage-1 with SigLIP2 encoder |
30
+ | `video_mllm_swift/s1_declip_siglip2_qwen3_1.7b/` | `ms-swift/video_mllm/` stage-1 with DeCLIP-SigLIP2 encoder |
31
+ | `video_mllm_swift/s2_siglip2_qwen3_1.7b_10pct/` | `ms-swift/video_mllm/` stage-2 SFT, SigLIP2 |
32
+ | `video_mllm_swift/s2_declip_siglip2_qwen3_1.7b_10pct/` | `ms-swift/video_mllm/` stage-2 SFT, DeCLIP-SigLIP2 |
33
+ | `video_mllm_swift/s2_image_only_10pct/` | Ablation: image-only stage-2 training |
34
+ | `ms-swift-data/` | Not a checkpoint — preprocessed SFT training data (`ms-swift/data/`) used by the pipelines above |
35
+
36
+ ## Related repositories
37
+
38
+ - **Code**: https://github.com/xiaomoguhz/VisionEncoder
39
+ - **Evaluation data (~323 GB tarballs)**: https://huggingface.co/datasets/xiaomoguhzz/R3-Bench-data