mstar.api_server.media_io#

Media decode/encode helpers shared by the native and OpenAI-compatible APIs.

Two directions:

  • Inbound — turn media referenced by an OpenAI-style request (data: URLs, http(s) URLs, or base64 blobs) into files under the API server’s upload_dir, so model.load_image / load_audio / load_video can read them by path. This is the same contract /generate already uses for multipart uploads.

  • Outbound — wrap raw model audio output (16-bit PCM, no container header) into a real audio container (WAV by default) and encode image bytes (PNG) as a data: URL for OpenAI chat image output.

Only stdlib + numpy are required. mp3 / flac / ogg encoding is opt-in and degrades to WAV when the optional soundfile backend is unavailable, so the base install stays slim.

Functions

modality_from_mime(mime)

Map a MIME type to one of our modality strings (image/audio/video).

pcm16_to_container(pcm, sample_rate[, fmt])

Encode raw 16-bit PCM into fmt.

pcm16_to_wav_bytes(pcm, sample_rate[, ...])

Wrap raw little-endian 16-bit PCM (the model's audio output) into a WAV blob.

png_to_data_url(png_bytes)

Encode PNG image bytes (the model's image output) as a data URL.

resolve_media_ref(ref, upload_dir, *[, ...])

Resolve a media reference (data URL, http(s) URL, or local path).

save_base64(b64, fmt, modality_hint, upload_dir)

Persist a bare base64 blob with a known fmt (e.g. "wav").

save_data_url(data_url, upload_dir)

Persist a data:<mime>;base64,<payload> URL.

save_remote_url(url, upload_dir[, timeout])

Download an http(s) URL into upload_dir.

wav_stream_header(sample_rate[, ...])

A 44-byte WAV header with streaming (unknown-length) size fields.

mstar.api_server.media_io.modality_from_mime(mime)[source]#

Map a MIME type to one of our modality strings (image/audio/video).

Parameters:

mime (str)

Return type:

str

mstar.api_server.media_io.pcm16_to_container(pcm, sample_rate, fmt='wav')[source]#

Encode raw 16-bit PCM into fmt. Returns (bytes, mime_type).

wav and pcm use the stdlib (the bytes are already PCM_16). Compressed formats need the optional soundfile backend; if it is missing we fall back to WAV and log once.

Parameters:
Return type:

tuple[bytes, str]

mstar.api_server.media_io.pcm16_to_wav_bytes(pcm, sample_rate, num_channels=1)[source]#

Wrap raw little-endian 16-bit PCM (the model’s audio output) into a WAV blob.

Parameters:
Return type:

bytes

mstar.api_server.media_io.png_to_data_url(png_bytes)[source]#

Encode PNG image bytes (the model’s image output) as a data URL.

Parameters:

png_bytes (bytes)

Return type:

str

mstar.api_server.media_io.resolve_media_ref(ref, upload_dir, *, allow_remote=True)[source]#

Resolve a media reference (data URL, http(s) URL, or local path).

Returns (modality, path). Local paths are passed through unchanged (modality inferred from extension).

Parameters:
Return type:

tuple[str, str]

mstar.api_server.media_io.save_base64(b64, fmt, modality_hint, upload_dir)[source]#

Persist a bare base64 blob with a known fmt (e.g. "wav").

Parameters:
Return type:

tuple[str, str]

mstar.api_server.media_io.save_data_url(data_url, upload_dir)[source]#

Persist a data:<mime>;base64,<payload> URL. Returns (modality, path).

Parameters:
Return type:

tuple[str, str]

mstar.api_server.media_io.save_remote_url(url, upload_dir, timeout=30.0)[source]#

Download an http(s) URL into upload_dir. Returns (modality, path).

Note: fetching arbitrary URLs has SSRF surface. Callers exposing this publicly should allowlist hosts or disable remote fetch (data-URL only).

Parameters:
Return type:

tuple[str, str]

mstar.api_server.media_io.wav_stream_header(sample_rate, num_channels=1, bits=16)[source]#

A 44-byte WAV header with streaming (unknown-length) size fields.

Used to stream TTS audio over a single HTTP response: emit this header, then 16-bit PCM frames as they arrive. The 0xFFFFFFFF placeholders signal an open-ended stream, which players and the OpenAI client’s stream_to_file handle.

Parameters:
  • sample_rate (int)

  • num_channels (int)

  • bits (int)

Return type:

bytes