1M context

Several models served through OpenGateway support context windows up to 1M tokens. The gateway advertises real per-model windows and routes large requests to a provider whose window actually fits.

Honest, per-model windows

Model discovery advertises a context trio truthfully per model:

{
  "context_window": 1000000,
  "context_window_tokens": 1000000,
  "max_context_tokens": 1000000
}

The default discovery floor is 200k, but long-context models advertise their real maximum so clients enable long-context paths. The window is per provider × model — the same model can be 1M tokens on one provider and 64K on another — so OpenGateway routes to a provider whose window meets the request.

Anthropic 1M, headerless

On Claude 4.6+ models, 1M context is GA and headerless — no beta flag is required. The legacy header anthropic-beta: context-1m-2025-08-07 was retired 2026-04-30.

OpenGateway accepts and ignores the retired header — it will never 400 on it — so older Claude Code/SDK configurations keep working unchanged:

curl "https://api.opengateway.one/frontier/v1/messages" \
  -H "Authorization: Bearer $OPENGATEWAY_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "anthropic-beta: context-1m-2025-08-07" \
  -H "Content-Type: application/json" \
  -d '{ "model": "turbo-agent-model-claude-opus-4-7", "max_tokens": 512,
        "messages": [{ "role": "user", "content": "Summarize this repo." }] }'

OpenAI clients

OpenAI Chat/Responses have no context header — clients rely on the model’s advertised window. OpenGateway routes to a large-window upstream and clamps max_tokens / max_output_tokens sanely. Codex additionally honors a client-side model_context_window that it truncates to — set it to match the model you target (see Codex setup).