Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
omar-ah
/
ViL-DLM-0.6B
like
1
Image-Text-to-Text
English
vision-language
diffusion
xlstm
vision-lstm
masked-diffusion
mdlm
multimodal
arxiv:
7 papers
License:
apache-2.0
Model card
Files
Files and versions
xet
Community
main
ViL-DLM-0.6B
1.95 MB
Ctrl+K
Ctrl+K
1 contributor
History:
27 commits
omar-ah
Remove final model artifact from repo
516d0c0
13 days ago
code
Add timestep-aware sparse KD weighting
13 days ago
external
Add Vision Transformer and utility functions for sequence processing
14 days ago
.gitattributes
Safe
1.52 kB
Remove final model artifact from repo
13 days ago
.gitignore
Safe
38 Bytes
Add Vision Transformer and utility functions for sequence processing
14 days ago
README.md
Safe
8.62 kB
Add timestep-aware sparse KD weighting
13 days ago
pyproject.toml
Safe
720 Bytes
Implement stage-aware real-run training pipeline
14 days ago
train_production.py
Safe
172 Bytes
Update model configuration and training scripts with new vision backbone support and dependencies
15 days ago