Codon Motif Vision 1
Model Introduction
Codon Motif Vision 1 is an experimental Visual Tokenizer model that supports images of arbitrary scales.
Technical Specifications
The model is based on the VQ-GAN architecture and integrates the following key technologies:
- Quantizer: Adopts LFQ (Lookup Free Quantization), a vector quantization technique without lookups, calculating Entropy and Perplexity.
- Positional Encoding: Integrates 2D RoPE (Rotary Positional Embedding) to enhance spatial awareness.
- Attention Mechanism: Both Encoder and Decoder use Spatial Multi-Head Attention.
- Encoder: Combines ConvBlock and ResBasicBlock, supporting dynamic input sizes.
- Decoder: Uses PixelShuffle for upsampling reconstruction.
Model Parameters and Files
File List
| Filename | Size | Description |
|---|---|---|
motif-v1.safetensors |
84.92MB | Full model weights |
motif-v1_encoder.safetensors |
43.32MB | Encoder weights only |
motif-v1_decoder.safetensors |
41.56MB | Decoder weights only |
motif-v1_quantizer.safetensors |
34.37KB | Quantizer weights only |
Default Configuration
The following are the default initialization parameters (i.e., v1 standard configuration):
- Input/Output Channels: 3 (RGB)
- Patch Size: 16
- Latent Dim: 256
- Codebook Dim: 16
- Encoder: Hidden Dim 256, 1 ResBlock, 4 Heads
- Decoder: Hidden Dim 256, 3 ResBlocks, 4 Heads
- RoPE Max Len: 4096
Usage
1. Installation
Ensure the orbit-torch library is installed:
pip install orbit-torch
2. Model Loading
Use the load_pretrained method to load weights. Weights are divided into full weights and partial weights (Encoder, Decoder, Quantizer).
Import Module
from orbit.model.motif.vision.v1 import MotifV1
Scenario A: Load Full Model
model = MotifV1()
model.load_pretrained('motif-v1.safetensors')
Scenario B: Use Encoder Only
Note: When using the Encoder to extract Tokens, you must also load the Quantizer. Load via submodules of the MotifV1 instance.
# 1. Instantiate the main model
model = MotifV1()
# 2. Load weights separately
model.encoder.load_pretrained('motif-v1_encoder.safetensors')
model.quantizer.load_pretrained('motif-v1_quantizer.safetensors')
# 3. Usage example (using the wrapped encode method)
# x = torch.randn(1, 3, 256, 256) # [B, 3, H, W]
# indices, mask, z_q = model.encode(x)
# print(indices.shape) # [B, H, W]
Scenario C: Use Decoder Only
Note: When using the Decoder to reconstruct images, you must also load the Quantizer (used to restore vectors from indices).
# 1. Instantiate the main model
model = MotifV1()
# 2. Load weights separately
model.decoder.load_pretrained('motif-v1_decoder.safetensors')
model.quantizer.load_pretrained('motif-v1_quantizer.safetensors')
# 3. Usage example (using the wrapped decode method)
# indices = ... # [B, H, W]
# reconstruction = model.decode(indices)
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support