Transformers
ONNX
English
mla-attention
multi-head-latent-attention
flow-matching
rectified-flow
on-device
efficient-attention
smol-scale
research
proof-of-concept
Instructions to use Tinman-Lab/Tinman-SmolOmni-MLA-256M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Tinman-Lab/Tinman-SmolOmni-MLA-256M with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Tinman-Lab/Tinman-SmolOmni-MLA-256M", dtype="auto") - Notebooks
- Google Colab
- Kaggle
File size: 854 Bytes
b4251b3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | """
ONNX Runtime ODE Solver for SmolOmni Flow-Matching Image Generation
Usage:
import onnxruntime as ort
sess_ctx = ort.InferenceSession("smolomni_256M_gen_context.onnx")
sess_flow = ort.InferenceSession("smolomni_256M_flow_head_step.onnx")
def generate_image(prompt_tokens, num_steps=50):
ctx = sess_ctx.run(None, {"input_ids": prompt_tokens})[0]
latents = np.random.randn(1, 4, 32, 32).astype(np.float32)
dt = 1.0 / num_steps
for i in range(num_steps):
t = np.array([i * dt * 1000], dtype=np.float32)
velocity = sess_flow.run(None, {
"noisy_latents": latents,
"timestep": t,
"context": ctx,
})[0]
latents = latents + velocity * dt
return latents # Pass to VAE decoder for final image
"""
|