1M context
Several models served through OpenGateway support context windows up to 1M tokens. The gateway advertises real per-model windows and routes large requests to a provider whose window actually fits.
Honest, per-model windows
Section titled “Honest, per-model windows”Model discovery advertises a context trio truthfully per model:
{ "context_window": 1000000, "context_window_tokens": 1000000, "max_context_tokens": 1000000}The default discovery floor is 200k, but long-context models advertise their real maximum so clients enable long-context paths. The window is per provider × model — the same model can be 1M tokens on one provider and 64K on another — so OpenGateway routes to a provider whose window meets the request.
Anthropic 1M, headerless
Section titled “Anthropic 1M, headerless”On Claude 4.6+ models, 1M context is GA and headerless — no beta flag is
required. The legacy header anthropic-beta: context-1m-2025-08-07 was retired
2026-04-30.
OpenGateway accepts and ignores the retired header — it will never 400
on it — so older Claude Code/SDK configurations keep working unchanged:
curl "https://api.opengateway.one/frontier/v1/messages" \ -H "Authorization: Bearer $OPENGATEWAY_API_KEY" \ -H "anthropic-version: 2023-06-01" \ -H "anthropic-beta: context-1m-2025-08-07" \ -H "Content-Type: application/json" \ -d '{ "model": "turbo-agent-model-claude-opus-4-7", "max_tokens": 512, "messages": [{ "role": "user", "content": "Summarize this repo." }] }'OpenAI clients
Section titled “OpenAI clients”OpenAI Chat/Responses have no context header — clients rely on the model’s
advertised window. OpenGateway routes to a large-window upstream and clamps
max_tokens / max_output_tokens sanely. Codex additionally honors a
client-side model_context_window that it truncates to — set it to match the
model you target (see Codex setup).
Ask about configuring OpenGateway — lanes, base URLs, client setup, model choice, or an error you hit. Answers are grounded in the docs.