Audio

OpenGateway exposes the OpenAI Audio shapes, normalized across Hugging Face providers.

Text to speech

POST /<lane>/v1/audio/speech returns audio bytes.

curl "https://api.opengateway.one/oss/v1/audio/speech" \
  -H "Authorization: Bearer $OPENGATEWAY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "model": "hexgrad/Kokoro-82M", "input": "Welcome to the edge.", "voice": "default" }' \
  --output speech.mp3

Transcription

POST /<lane>/v1/audio/transcriptions accepts multipart form data.

curl
OpenAI (Python)

curl "https://api.opengateway.one/oss/v1/audio/transcriptions" \
  -H "Authorization: Bearer $OPENGATEWAY_API_KEY" \
  -F "model=openai/whisper-large-v3" \
  -F "file=@sample.mp3"

with open("sample.mp3", "rb") as f:
    tx = client.audio.transcriptions.create(
        model="openai/whisper-large-v3",
        file=f,
    )
print(tx.text)

{ "text": "Welcome to the edge." }

Translation is available at /v1/audio/translations. Requesting a non-audio model returns 422 model_task_mismatch.