mstar.model.pi05.components.tokenization#
Tokenization wrapper for Pi0.5: PaliGemma tokenizer + state discretization.
Functions
|
Lowercase + strip whitespace, matching openpi's PaligemmaTokenizer. |
Classes
|
Wrapper around the HF PaliGemma tokenizer that also tokenizes robot state. |
- class mstar.model.pi05.components.tokenization.Pi05Tokenizer(hf_tokenizer, config)[source]#
Bases:
objectWrapper around the HF PaliGemma tokenizer that also tokenizes robot state.
Robot state values are discretized into
state_token_binsbins and mapped to language token IDs starting atstate_token_offset. Pi0.5 reuses bottom-of-vocab tokens for state bins so that PaliGemma’s embedding table can embed them directly.- Parameters:
config (Pi05Config)