Skip to content

Effort levels

Reasoning models expose a “how hard should I think” dial. Every API spells it differently. OpenGateway normalizes a single internal enum and translates it to whatever the upstream model expects — regardless of which inbound API you used.

off · minimal · low · medium · high · max

The default is roughly medium. If the upstream model does not support an effort knob, the gateway drops it silently — it never hard-fails a coding client over an unsupported parameter.

| Tier | OpenAI Chat | OpenAI Responses | Anthropic (4.6+) | Anthropic (legacy ≤4.5) | | --- | --- | --- | --- | --- | | Off | reasoning_effort: "none" | reasoning.effort: "none" | omit thinking | thinking.type: "disabled" | | Minimal | "minimal" | "minimal" | thinking.type: "adaptive" + output_config.effort: "low" | thinking.budget_tokens: ~1–2k | | Low | "low" | "low" | output_config.effort: "low" | ~4–8k | | Medium | "medium" | "medium" | output_config.effort: "medium" | ~8–16k | | High | "high" | "high" | output_config.effort: "high" | ~16–32k | | Max | "xhigh" | "xhigh" | output_config.effort: "xhigh"/"max" | ~32–64k (< max_tokens) |

await client.chat.completions.create({
model: 'claude-turbo-hub-qwen3-coder',
messages: [{ role: 'user', content: 'Find the bug.' }],
reasoning_effort: 'high',
});

The point of the normalized enum is that effort survives a protocol mismatch:

  • Claude Code (Anthropic thinking) → routed to an HF chat model ⇒ emitted as OpenAI reasoning_effort.
  • Codex (OpenAI Responses reasoning.effort) → routed to a Claude frontier model ⇒ emitted as Anthropic thinking / output_config.effort.

The branded SDK exposes a single effort option that maps to whichever wire your call uses.