Model Card for Meow-Omni 1-Base

Meow-Omni 1-Base is the foundational quad-modal Multimodal Large Language Model (MLLM) architecture. It represents the successful "model surgery" and latent-space alignment of text, vision, audio, and biological time-series, prior to specific fine-tuning for feline intention decoding.

Model Description

Meow-Omni 1-Base was engineered to bridge the "modality gap" in foundation models. While standard MLLMs are restricted to audio-visual data, this base model natively integrates high-frequency biological time-series (TS) into the linguistic latent space.

It serves as a scalable template for researchers who wish to apply quad-modal reasoning to other non-human species or human clinical diagnostics.

Model Type: Quad-modal Omni-MLLM (Text, Video, Audio, Time-Series)
Base Backbones: MiniCPM-o 4.5 & Intern-S1 Pro (Scientific TS Encoders)
License: Apache 2.0

Technical Specifications

Architectural "Model Surgery"

The base model is the result of deep architectural integration:

Backbone: Utilizes MiniCPM-o 4.5 for core reasoning.
TS Integration: Specialized scientific encoders from Intern-S1 Pro were grafted into the architecture via a custom-designed Linear Projection Layer.
Tokenizer Expansion: The tokenizer was expanded to handle biological streams natively through the introduction of <|ts_start|>, <|ts_unit|>, and <|ts_end|> control tokens.

Latent Space Alignment

This base version has not undergone initial alignment of the time series projector, to enable further development for other species.

Uses

Direct Use

Research Foundation: A starting point for fine-tuning quad-modal models on different animal species (e.g., canines, primates, or endangered wildlife).

Out-of-Scope Use

Direct Intent Inference: This base model has not been intent-aligned via the time series projector or the meow-10K dataset and may not provide accurate feline intent decoding without further training.
Clinical Use: Not certified for immediate veterinary or medical diagnostic use.

🔗 The Meow-Omni Ecosystem

To facilitate reproducibility and further research in computational ethology, we have released the following components:

Main Model: Meow-Omni 1 — The full fine-tuned quad-modal MLLM aligned for intent.
Training Dataset: Meow-10K — The 10,000 sample dataset used for training.
Evaluation Benchmark: MeowBench — The expert-verified quad-modal benchmark suite.

📝 Citation

If you find our work helpful, please cite us using the following BibTeX entry:

@misc{hu2026meowomni1multimodallarge,
      title={Meow-Omni 1: A Multimodal Large Language Model for Feline Ethology}, 
      author={Jucheng Hu and Zhangquan Chen and Yulin Chen and Chengjie Hong and Liang Zhou and Tairan Wang and Sifei Li and Giulio Zhu and Feng Zhou and Yiheng Zeng and Suorong Yang and Dongzhan Zhou},
      year={2026},
      eprint={2605.09152},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.09152}, 
}

Downloads last month: 35

Safetensors

Model size

10B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for smgjch/Meow-Omni-1-Base

Meow-Omni 1: A Multimodal Large Language Model for Feline Ethology

Paper • 2605.09152 • Published 4 days ago