Codon Motif Vision 1

Model Introduction

Codon Motif Vision 1 is an experimental Visual Tokenizer model that supports images of arbitrary scales.

Technical Specifications

The model is based on the VQ-GAN architecture and integrates the following key technologies:

  • Quantizer: Adopts LFQ (Lookup Free Quantization), a vector quantization technique without lookups, calculating Entropy and Perplexity.
  • Positional Encoding: Integrates 2D RoPE (Rotary Positional Embedding) to enhance spatial awareness.
  • Attention Mechanism: Both Encoder and Decoder use Spatial Multi-Head Attention.
  • Encoder: Combines ConvBlock and ResBasicBlock, supporting dynamic input sizes.
  • Decoder: Uses PixelShuffle for upsampling reconstruction.

Model Parameters and Files

File List

Filename Size Description
motif-v1.safetensors 84.92MB Full model weights
motif-v1_encoder.safetensors 43.32MB Encoder weights only
motif-v1_decoder.safetensors 41.56MB Decoder weights only
motif-v1_quantizer.safetensors 34.37KB Quantizer weights only

Default Configuration

The following are the default initialization parameters (i.e., v1 standard configuration):

  • Input/Output Channels: 3 (RGB)
  • Patch Size: 16
  • Latent Dim: 256
  • Codebook Dim: 16
  • Encoder: Hidden Dim 256, 1 ResBlock, 4 Heads
  • Decoder: Hidden Dim 256, 3 ResBlocks, 4 Heads
  • RoPE Max Len: 4096

Usage

1. Installation

Ensure the orbit-torch library is installed:

pip install orbit-torch

2. Model Loading

Use the load_pretrained method to load weights. Weights are divided into full weights and partial weights (Encoder, Decoder, Quantizer).

Import Module

from orbit.model.motif.vision.v1 import MotifV1

Scenario A: Load Full Model

model = MotifV1()
model.load_pretrained('motif-v1.safetensors')

Scenario B: Use Encoder Only

Note: When using the Encoder to extract Tokens, you must also load the Quantizer. Load via submodules of the MotifV1 instance.

# 1. Instantiate the main model
model = MotifV1()

# 2. Load weights separately
model.encoder.load_pretrained('motif-v1_encoder.safetensors')
model.quantizer.load_pretrained('motif-v1_quantizer.safetensors')

# 3. Usage example (using the wrapped encode method)
# x = torch.randn(1, 3, 256, 256) # [B, 3, H, W]
# indices, mask, z_q = model.encode(x)
# print(indices.shape) # [B, H, W]

Scenario C: Use Decoder Only

Note: When using the Decoder to reconstruct images, you must also load the Quantizer (used to restore vectors from indices).

# 1. Instantiate the main model
model = MotifV1()

# 2. Load weights separately
model.decoder.load_pretrained('motif-v1_decoder.safetensors')
model.quantizer.load_pretrained('motif-v1_quantizer.safetensors')

# 3. Usage example (using the wrapped decode method)
# indices = ... # [B, H, W]
# reconstruction = model.decode(indices)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support