mstar.model.pi05.config#
Configuration for the Pi0.5 vision-language-action model.
Functions
|
Build a Pi05Config, optionally overlaying values from an HF config dict. |
Classes
|
Pi0.5 model configuration. |
- class mstar.model.pi05.config.Pi05Config(vit_hidden_size=1152, vit_num_layers=27, vit_num_heads=16, vit_intermediate_size=4304, vit_patch_size=14, vit_image_size=224, tokens_per_image=256, num_cameras=3, num_layers=18, num_qo_heads=8, num_kv_heads=1, head_dim=256, rms_norm_eps=1e-06, rope_theta=10000.0, hidden_size=2048, pali_intermediate_size=16384, vocab_size=257152, pad_token_id=0, action_hidden_size=1024, action_intermediate_size=4096, num_flow_steps=10, action_horizon=50, action_dim=32, state_dim=32, state_token_bins=256, state_token_offset=0, max_lang_tokens=200, max_position_embeddings=2048, timestep_min_period=0.004, timestep_max_period=4.0, default_action_dtype='float32', extra=<factory>)[source]#
Bases:
objectPi0.5 model configuration.
Pi0.5 combines a SigLIP vision encoder with two Gemma transformer experts: PaliGemma (Gemma-2B backbone) processes the prefix (image + text + state tokens) and writes a KV cache; an action expert (Gemma) reads that frozen cache and runs a 10-step Euler flow-matching loop with adaRMS timestep conditioning to produce a 50-step robot action trajectory.
Both experts share KV-cache dimensions (num_kv_heads, head_dim) so that the action expert can attend to the cache written by PaliGemma.
- Parameters:
vit_hidden_size (int)
vit_num_layers (int)
vit_num_heads (int)
vit_intermediate_size (int)
vit_patch_size (int)
vit_image_size (int)
tokens_per_image (int)
num_cameras (int)
num_layers (int)
num_qo_heads (int)
num_kv_heads (int)
head_dim (int)
rms_norm_eps (float)
rope_theta (float)
hidden_size (int)
pali_intermediate_size (int)
vocab_size (int)
pad_token_id (int)
action_hidden_size (int)
action_intermediate_size (int)
num_flow_steps (int)
action_horizon (int)
action_dim (int)
state_dim (int)
state_token_bins (int)
state_token_offset (int)
max_lang_tokens (int)
max_position_embeddings (int)
timestep_min_period (float)
timestep_max_period (float)
default_action_dtype (str)
extra (dict)
- mstar.model.pi05.config.load_pi05_config(hf_config=None)[source]#
Build a Pi05Config, optionally overlaying values from an HF config dict.
Auto-maps any HF key that matches a Pi05Config field name. For the few HF keys whose names differ (e.g.
num_hidden_layers->num_layers), an explicit rename dict is used. Unrecognised keys are silently ignored.- Parameters:
hf_config (dict | None)
- Return type: