Skip to content

1M context

Several models served through OpenGateway support context windows up to 1M tokens. The gateway advertises real per-model windows and routes large requests to a provider whose window actually fits.

Model discovery advertises a context trio truthfully per model:

{
"context_window": 1000000,
"context_window_tokens": 1000000,
"max_context_tokens": 1000000
}

The default discovery floor is 200k, but long-context models advertise their real maximum so clients enable long-context paths. The window is per provider × model — the same model can be 1M tokens on one provider and 64K on another — so OpenGateway routes to a provider whose window meets the request.

On Claude 4.6+ models, 1M context is GA and headerless — no beta flag is required. The legacy header anthropic-beta: context-1m-2025-08-07 was retired 2026-04-30.

OpenGateway accepts and ignores the retired header — it will never 400 on it — so older Claude Code/SDK configurations keep working unchanged:

Terminal window
curl "https://api.opengateway.one/frontier/v1/messages" \
-H "Authorization: Bearer $OPENGATEWAY_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: context-1m-2025-08-07" \
-H "Content-Type: application/json" \
-d '{ "model": "turbo-agent-model-claude-opus-4-7", "max_tokens": 512,
"messages": [{ "role": "user", "content": "Summarize this repo." }] }'

OpenAI Chat/Responses have no context header — clients rely on the model’s advertised window. OpenGateway routes to a large-window upstream and clamps max_tokens / max_output_tokens sanely. Codex additionally honors a client-side model_context_window that it truncates to — set it to match the model you target (see Codex setup).