Codon Motif Vision 1

Model Introduction

Codon Motif Vision 1 is an experimental Visual Tokenizer model that supports images of arbitrary scales.

Technical Specifications

The model is based on the VQ-GAN architecture and integrates the following key technologies:

Quantizer: Adopts LFQ (Lookup Free Quantization), a vector quantization technique without lookups, calculating Entropy and Perplexity.
Positional Encoding: Integrates 2D RoPE (Rotary Positional Embedding) to enhance spatial awareness.
Attention Mechanism: Both Encoder and Decoder use Spatial Multi-Head Attention.
Encoder: Combines ConvBlock and ResBasicBlock, supporting dynamic input sizes.
Decoder: Uses PixelShuffle for upsampling reconstruction.

Model Parameters and Files

File List

Filename	Size	Description
`motif-v1.safetensors`	84.92MB	Full model weights
`motif-v1_encoder.safetensors`	43.32MB	Encoder weights only
`motif-v1_decoder.safetensors`	41.56MB	Decoder weights only
`motif-v1_quantizer.safetensors`	34.37KB	Quantizer weights only

Default Configuration

The following are the default initialization parameters (i.e., v1 standard configuration):

Input/Output Channels: 3 (RGB)
Patch Size: 16
Latent Dim: 256
Codebook Dim: 16
Encoder: Hidden Dim 256, 1 ResBlock, 4 Heads
Decoder: Hidden Dim 256, 3 ResBlocks, 4 Heads
RoPE Max Len: 4096

Usage

1. Installation

Ensure the orbit-torch library is installed:

pip install orbit-torch

2. Model Loading

Use the load_pretrained method to load weights. Weights are divided into full weights and partial weights (Encoder, Decoder, Quantizer).

Import Module

from orbit.model.motif.vision.v1 import MotifV1

Scenario A: Load Full Model

model = MotifV1()
model.load_pretrained('motif-v1.safetensors')

Scenario B: Use Encoder Only

Note: When using the Encoder to extract Tokens, you must also load the Quantizer. Load via submodules of the MotifV1 instance.

# 1. Instantiate the main model
model = MotifV1()

# 2. Load weights separately
model.encoder.load_pretrained('motif-v1_encoder.safetensors')
model.quantizer.load_pretrained('motif-v1_quantizer.safetensors')

# 3. Usage example (using the wrapped encode method)
# x = torch.randn(1, 3, 256, 256) # [B, 3, H, W]
# indices, mask, z_q = model.encode(x)
# print(indices.shape) # [B, H, W]

Scenario C: Use Decoder Only

Note: When using the Decoder to reconstruct images, you must also load the Quantizer (used to restore vectors from indices).

# 1. Instantiate the main model
model = MotifV1()

# 2. Load weights separately
model.decoder.load_pretrained('motif-v1_decoder.safetensors')
model.quantizer.load_pretrained('motif-v1_quantizer.safetensors')

# 3. Usage example (using the wrapped decode method)
# indices = ... # [B, H, W]
# reconstruction = model.decode(indices)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support