Model Card for Meow-Omni 1-Base
Meow-Omni 1-Base is the foundational quad-modal Multimodal Large Language Model (MLLM) architecture. It represents the successful "model surgery" and latent-space alignment of text, vision, audio, and biological time-series, prior to specific fine-tuning for feline intention decoding.
Model Description
Meow-Omni 1-Base was engineered to bridge the "modality gap" in foundation models. While standard MLLMs are restricted to audio-visual data, this base model natively integrates high-frequency biological time-series (TS) into the linguistic latent space.
It serves as a scalable template for researchers who wish to apply quad-modal reasoning to other non-human species or human clinical diagnostics.
- Model Type: Quad-modal Omni-MLLM (Text, Video, Audio, Time-Series)
- Base Backbones: MiniCPM-o 4.5 & Intern-S1 Pro (Scientific TS Encoders)
- License: Apache 2.0
Technical Specifications
Architectural "Model Surgery"
The base model is the result of deep architectural integration:
- Backbone: Utilizes MiniCPM-o 4.5 for core reasoning.
- TS Integration: Specialized scientific encoders from Intern-S1 Pro were grafted into the architecture via a custom-designed Linear Projection Layer.
- Tokenizer Expansion: The tokenizer was expanded to handle biological streams natively through the introduction of
<|ts_start|>,<|ts_unit|>, and<|ts_end|>control tokens.
Latent Space Alignment
This base version has not undergone initial alignment of the time series projector, to enable further development for other species.
Uses
Direct Use
- Research Foundation: A starting point for fine-tuning quad-modal models on different animal species (e.g., canines, primates, or endangered wildlife).
Out-of-Scope Use
- Direct Intent Inference: This base model has not been intent-aligned via the time series projector or the meow-10K dataset and may not provide accurate feline intent decoding without further training.
- Clinical Use: Not certified for immediate veterinary or medical diagnostic use.
π The Meow-Omni Ecosystem
To facilitate reproducibility and further research in computational ethology, we have released the following components:
- Main Model: Meow-Omni 1 β The full fine-tuned quad-modal MLLM aligned for intent.
- Training Dataset: Meow-10K β The 10,000 sample dataset used for training.
- Evaluation Benchmark: MeowBench β The expert-verified quad-modal benchmark suite.
π Citation
If you find our work helpful, please cite us using the following BibTeX entry:
@misc{hu2026meowomni1multimodallarge,
title={Meow-Omni 1: A Multimodal Large Language Model for Feline Ethology},
author={Jucheng Hu and Zhangquan Chen and Yulin Chen and Chengjie Hong and Liang Zhou and Tairan Wang and Sifei Li and Giulio Zhu and Feng Zhou and Yiheng Zeng and Suorong Yang and Dongzhan Zhou},
year={2026},
eprint={2605.09152},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2605.09152},
}
- Downloads last month
- 35