Supported Models

Contents

Supported Models#

mstar ships the following model families. The table below summarizes the registered families, their registry key (the value of model: in a config YAML), and a representative Hugging Face identifier.

Registry keys live in mstar/model/registry.py (MODEL_REGISTRY / HF_MODELS).

Registered model families#

Registry key

Example Hugging Face model ID

Description

bagel

ByteDance-Seed/BAGEL-7B-MoT

Unified multimodal model (text + image understanding and generation).

orpheus

canopylabs/orpheus-3b-0.1-ft

TTS: Llama 3.2 3B LLM emitting audio tokens + SNAC 24 kHz decoder.

pi05

lerobot/pi05_base

Pi0.5 vision-language-action robotics model (ViT encoder + LLM + flow action expert).

qwen3_omni

Qwen/Qwen3-Omni-30B-A3B-Instruct

Omni-modal (text/image/audio/video in, text/audio out): Thinker + Talker + codec.

vjepa2

facebook/vjepa2-vitl-fpc64-256

V-JEPA 2 video encoder + masked predictor.

vjepa2_ac

vjepa2-ac-vitg

V-JEPA 2-AC encoder + action-conditioned predictor.

Notes#

  • The IDs above are representative. You may use local paths or compatible variants.

  • Some families accept multimodal input (image/audio/video); see the model’s process_prompt for the inputs it expects.

  • To add a new family, see Adding a New Model.