Using a Server#

Once a server is running (see Serving), you can reach it three ways: the native /generate endpoint, the Python SDK, or the OpenAI-compatible API. Every model is reachable via /generate and the SDK; the OpenAI routes cover the chat, speech, and image models.

Native `/generate`#

POST /generate takes a multipart form and returns either a single JSON document or an NDJSON stream.

Form fields#
Field	Default	Meaning
`text`	—	Text prompt (optional if media is provided).
`files`	—	One or more media uploads; each file’s modality is inferred from its extension.
`input_modalities`	auto	Comma-separated input modalities; auto-detected from the data when omitted.
`output_modalities`	`text`	Comma-separated desired outputs (e.g. `text`, `image`, `audio`, `video`, `action`).
`streaming`	`true`	`true` → NDJSON stream of chunks; `false` → one JSON document.
`model_kwargs`	—	JSON object of model-specific parameters (e.g. `{"voice": "tara"}`).
`request_id`	(uuid)	Optional client-supplied id; the server generates one when omitted.

A non-streaming response groups outputs by modality, each payload base64-encoded:

{
  "request_id": "…",
  "outputs": {
    "text":  [{"data": "<base64>",     "metadata": {}}],
    "image": [{"data": "<base64-png>", "metadata": {}}]
  }
}

A streaming response is application/x-ndjson — one JSON object per line as chunks arrive. GET /health returns {"status": "healthy"}.

# text (non-streaming → JSON)
curl -s http://localhost:8000/generate -F 'text=Hello' -F 'streaming=false'

# image understanding (image in, text out)
curl -s http://localhost:8000/generate -F 'text=What is in this image?' -F 'files=@cat.jpg'

# text-to-speech (base64 PCM in outputs.audio)
curl -s http://localhost:8000/generate \
  -F 'text=hello there' -F 'output_modalities=audio' \
  -F 'model_kwargs={"voice":"tara"}' -F 'streaming=false'

Python SDK#

The SDK (mstar.client.MStarClient) is a thin HTTP client over /generate. It depends only on requests (plus numpy for the audio helpers) — no torch — so it can run anywhere:

from mstar import MStarClient
client = MStarClient("http://localhost:8000")   # optional: timeout=600.0

The core method is generate:

generate(*, text=None, images=None, audio=None, video=None, output_modalities=("text",), input_modalities=None, stream=False, request_id=None, **model_kwargs): Submit a request. images / audio / video accept a path, raw bytes, a (filename, bytes) tuple, or a list of those. Extra keyword args are forwarded as the model’s model_kwargs (e.g. voice="tara", temperature=0.7, max_output_tokens=256); None values are dropped. Returns a GenerateResult when stream=False, or an iterator of stream events when stream=True.

Convenience wrappers:

Method	Returns
`chat(prompt, , images=None, audio=None, output_modalities=("text",), stream=False, *kw)`	Text generation (and, with `output_modalities=("text", "audio")`, speech).
`generate_image(prompt, **kw)`	PNG `bytes` (e.g. BAGEL text-to-image).
`tts(text, , voice=None, *kw)`	An `AudioBuffer` (`.to_wav(path)`, `.to_numpy()`, `len(...)` samples).
`stream(**kw)`	Sugar for `generate(stream=True, ...)`.
`health()`	`True` if the server is healthy.

Result and event types live in mstar.client:

GenerateResult — .text, .images (list of PNG bytes), .audio (an AudioBuffer or None), .raw; plus .save_image(path) / .save_audio(path).
AudioBuffer — decoded PCM with .sample_rate; .to_wav(path), .to_numpy(), len(...).
Stream events — TextChunk(text), ImageChunk(data) (.save(path)), AudioChunk(pcm, sample_rate).

res = client.chat("Hello!")                       # GenerateResult
print(res.text)

open("cat.png", "wb").write(client.generate_image("a cat in a hat"))

client.tts("Hi there", voice="tara").to_wav("out.wav")

for event in client.stream(text="Tell me a story"):
    print(getattr(event, "text", ""), end="", flush=True)

OpenAI-compatible API#

mstar mounts OpenAI-style routes under /v1 for the models with standard OpenAI semantics. Point any OpenAI client at http://<host>:<port>/v1:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")

Endpoints and model coverage:

Endpoint	Models	Notes
`GET /v1/models`	all	Lists the served model.
`POST /v1/chat/completions`	`bagel`, `qwen3_omni`	Text chat (streaming + non-streaming). Qwen3-Omni can also emit speech.
`POST /v1/audio/speech`	`orpheus`, `qwen3_omni`	Text-to-speech.
`POST /v1/images/generations`	`bagel`	Text-to-image.
`POST /v1/images/edits`	`bagel`	Image editing (image + prompt → image).

Models without an OpenAI surface (pi05, vjepa2, vjepa2_ac) return 404 on /v1/*; use /generate or the SDK for them.

# chat
client.chat.completions.create(model="bagel", messages=[{"role": "user", "content": "hi"}])

# text-to-speech
client.audio.speech.create(model="orpheus", input="hello there", voice="tara")

# image generation
client.images.generate(model="bagel", prompt="a cat in a hat")

Per-model notes:

BAGEL — chat returns text only; use /v1/images/generations and /v1/images/edits for image output.
Qwen3-Omni — text sampling uses thinker_* keys and speech uses talker_*; set the speaker with voice (default Ethan) and request audio output by including "audio" in modalities. Non-OpenAI knobs (e.g. talker_top_k) go through extra_body.
Orpheus — set the speaker with voice — one of tara (default), zoe, zac, jess, leo, mia, julia, leah (the available_voices list in the Orpheus config).

Using a Server

Contents

Using a Server#

Native /generate#

Python SDK#

OpenAI-compatible API#

Native `/generate`#