Supported Models

Contents

Supported Models#

mstar ships the following model families. The table below summarizes the registered families, their registry key (the value of model: in a config YAML), and a representative Hugging Face identifier.

Registry keys live in mstar/model/registry.py (MODEL_REGISTRY / HF_MODELS).

Registered model families#
Registry key	Example Hugging Face model ID	Description
`bagel`	`ByteDance-Seed/BAGEL-7B-MoT`	Unified multimodal model (text + image understanding and generation).
`orpheus`	`canopylabs/orpheus-3b-0.1-ft`	TTS: Llama 3.2 3B LLM emitting audio tokens + SNAC 24 kHz decoder.
`pi05`	`lerobot/pi05_base`	Pi0.5 vision-language-action robotics model (ViT encoder + LLM + flow action expert).
`qwen3_omni`	`Qwen/Qwen3-Omni-30B-A3B-Instruct`	Omni-modal (text/image/audio/video in, text/audio out): Thinker + Talker + codec.
`vjepa2`	`facebook/vjepa2-vitl-fpc64-256`	V-JEPA 2 video encoder + masked predictor.
`vjepa2_ac`	`vjepa2-ac-vitg`	V-JEPA 2-AC encoder + action-conditioned predictor.

Notes#

The IDs above are representative. You may use local paths or compatible variants.
Some families accept multimodal input (image/audio/video); see the model’s process_prompt for the inputs it expects.
To add a new family, see Adding a New Model.