Effort levels

Reasoning models expose a “how hard should I think” dial. Every API spells it differently. OpenGateway normalizes a single internal enum and translates it to whatever the upstream model expects — regardless of which inbound API you used.

The normalized enum

off · minimal · low · medium · high · max

The default is roughly medium. If the upstream model does not support an effort knob, the gateway drops it silently — it never hard-fails a coding client over an unsupported parameter.

How each API signals effort

| Tier | OpenAI Chat | OpenAI Responses | Anthropic (4.6+) | Anthropic (legacy ≤4.5) | | --- | --- | --- | --- | --- | | Off | reasoning_effort: "none" | reasoning.effort: "none" | omit thinking | thinking.type: "disabled" | | Minimal | "minimal" | "minimal" | thinking.type: "adaptive" + output_config.effort: "low" | thinking.budget_tokens: ~1–2k | | Low | "low" | "low" | output_config.effort: "low" | ~4–8k | | Medium | "medium" | "medium" | output_config.effort: "medium" | ~8–16k | | High | "high" | "high" | output_config.effort: "high" | ~16–32k | | Max | "xhigh" | "xhigh" | output_config.effort: "xhigh"/"max" | ~32–64k (< max_tokens) |

Setting effort

await client.chat.completions.create({
  model: 'claude-turbo-hub-qwen3-coder',
  messages: [{ role: 'user', content: 'Find the bug.' }],
  reasoning_effort: 'high',
});

await client.responses.create({
  model: 'claude-turbo-hub-gpt-oss-120b',
  input: 'Diagnose this stack trace.',
  reasoning: { effort: 'high' },
});

await client.messages.create({
  model: 'claude-turbo-hub-qwen3-coder',
  max_tokens: 1024,
  thinking: { type: 'adaptive' },
  output_config: { effort: 'high' },
  messages: [{ role: 'user', content: 'Review this diff.' }],
});

Cross-API translation

The point of the normalized enum is that effort survives a protocol mismatch:

Claude Code (Anthropic thinking) → routed to an HF chat model ⇒ emitted as OpenAI reasoning_effort.
Codex (OpenAI Responses reasoning.effort) → routed to a Claude frontier model ⇒ emitted as Anthropic thinking / output_config.effort.

The branded SDK exposes a single effort option that maps to whichever wire your call uses.