Table of Contents
- What Sets Claude Opus 4.6 Apart from Earlier Models?
- How Do You Set Up Claude Opus 4.6 via the API?
- What Is Adaptive Thinking and How Should You Use It?
- How Does Context Compaction Handle Unlimited Session Lengths?
- Is Claude Opus 4.6 Worth the Cost Compared to Cheaper Models?
- Common Questions — — Claude Opus 4.6
- Conclusion
Key Takeaways
- Claude Opus 4.6 ships with a 1M token context window — roughly 750,000 words — now generally available at standard $5/$25 per million token pricing with no surcharge.
- The new Adaptive Thinking mode replaces the deprecated
budget_tokensapproach; setthinking: {type: "adaptive"}with effort levelhigh(default) ormaxfor the hardest tasks. - Context Compaction automatically summarizes older conversation history when context fills, enabling effectively unlimited session lengths for long-running agents.
- Prompt caching cuts cache reads from $5.00 to $0.50 per million tokens — a 90% reduction — making Opus 4.6 viable for large-scale production workflows.
- Prefilling assistant messages is now a breaking change: requests with prefills return a 400 error. Switch to structured outputs or system prompts instead.
Most AI models hit a wall at 128,000 tokens. Claude Opus 4.6 pushed that wall out to one million tokens — enough to hold an entire enterprise codebase, a year of Slack history, or a 600-page legal contract in a single conversation. Anthropic released Opus 4.6 on February 5, 2026, and made the 1M context window generally available at standard pricing on March 13, 2026.

But the context window is only one of several major upgrades. Adaptive Thinking, Context Compaction, Fast Mode, and free code execution alongside web tools all shipped at the same time. Running prompts the same way you did on Opus 4.5 means leaving significant performance — and cost savings — on the table.
This guide covers every key feature, shows you the exact API patterns to use, and tells you when each capability pays off. By the end, you will know how to configure Claude Opus 4.6 for coding agents, document analysis, and long-running multi-step workflows.
What Sets Claude Opus 4.6 Apart from Earlier Models?
Claude Opus 4.6 builds on Opus 4.5 with meaningful improvements in agentic benchmarks. On Terminal Bench — which tests long-horizon autonomous coding — Opus 4.6 scores 65.4%, up from 59.8% for Opus 4.5. On OSWorld, which measures agentic computer-use performance, scores rose from 66.3% to 72.7%. SWE-bench Verified sits at 80.8%.
The context retrieval numbers tell a stronger story. On the MRCR v2 benchmark — which hides eight needles deep in a long document — Opus 4.6 scores 93% accuracy at 256K context and 76% at the full 1M. That is four to nine times more reliable than Sonnet 4.5 for deep-context retrieval tasks, making it the right choice for any workflow that requires finding specific facts inside large documents.
Three architectural changes drive most practical improvements: Adaptive Thinking replaces the old extended-thinking approach, Context Compaction keeps long agent sessions alive indefinitely, and a 128k output token ceiling (double the previous limit) allows far longer thinking budgets. Together, these make Opus 4.6 the first Claude model designed for multi-hour agentic workflows. For broader context on how Opus 4.6 fits the current AI landscape, explore our AI coverage.
How Do You Set Up Claude Opus 4.6 via the API?

Getting started takes under five minutes if you already have an Anthropic API key. First, update the Python SDK to the latest version:
pip install --upgrade anthropic
A minimal request with Adaptive Thinking enabled looks like this:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-6",
max_tokens=16000,
thinking={"type": "adaptive"},
messages=[
{"role": "user", "content": "Audit this codebase and list all security issues."}
],
)
print(response.content)
Set max_tokens to at least 8,000 for complex tasks — Opus 4.6 supports up to 128,000 output tokens per request. For responses over approximately 4,000 tokens, the SDK requires streaming to avoid HTTP timeouts. Use .stream() with .get_final_message() to capture the full response without handling each streaming event individually.
If you are building on AWS Bedrock, the model ID is anthropic.claude-opus-4-6-v1. On Google Vertex AI, use claude-opus-4-6. Both platforms support the same features and pricing as the direct Anthropic API.
What Is Adaptive Thinking and How Should You Use It?
Adaptive Thinking replaces the deprecated thinking: {type: "enabled", budget_tokens: N} pattern. Instead of hard-coding a thinking budget, you set an effort level and let the model decide how much reasoning to apply. Easy questions skip thinking entirely, saving tokens. Hard problems receive deep deliberation automatically.
Four effort levels are available: low, medium, high (default), and max. At high, Opus 4.6 thinks on almost every request. At max, it applies maximum reasoning depth — best for mathematical proofs, complex multi-file refactors, or long legal analysis. At low, it skips thinking for straightforward tasks, cutting token usage significantly.
The old interleaved-thinking-2025-05-14 beta header is deprecated on Opus 4.6. Adaptive Thinking automatically enables interleaved thinking, so remove that header from your requests. Interleaved thinking lets the model pause and reason mid-response rather than only at the start, which improves accuracy on step-by-step problems. See the official Adaptive Thinking documentation for the full parameter reference.
How Does Context Compaction Handle Unlimited Session Lengths?
Context Compaction is a beta feature that automatically summarizes older conversation turns when the context window approaches its limit. The API triggers compaction silently — your code does not need to detect it or restructure requests. When the context fills, the server replaces earlier turns with a compact summary, then continues the conversation without losing the thread.
This matters most for coding agents that iterate across dozens of tool calls. Without compaction, you would need to build sliding-window logic, prune conversation history manually, or restart sessions and re-inject context. Compaction handles all of that automatically, making it feasible to run an agent for hours on a single task without interruption.
One important caveat: compaction operates server-side, so you cannot control exactly what gets summarized. For workflows where specific early context must persist — a strict compliance ruleset or a security policy — put that information in the system prompt rather than relying on it surviving compaction. Always test your agent’s behavior under compaction before deploying to production.
Is Claude Opus 4.6 Worth the Cost Compared to Cheaper Models?
Standard Opus 4.6 pricing sits at $5 per million input tokens and $25 per million output tokens. That is higher than Sonnet 4.6 or Haiku 4.5, but the gap narrows significantly with prompt caching and the Batch API.
| Model | Input ($/MTok) | Output ($/MTok) | Max Context | Best For |
|---|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | 1M tokens | Complex agents, deep analysis |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 1M tokens | Balanced speed + intelligence |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K tokens | High-volume, low-latency tasks |
Prompt caching dramatically reduces effective cost. Cache reads on Opus 4.6 cost just $0.50 per million tokens — a 90% reduction from the standard input rate. For agentic workflows that repeatedly inject the same system prompt or codebase context, caching can cut your bill by 60-80% in practice. The Batch API adds another 50% discount on all token usage, making Opus 4.6 viable for large-scale offline document processing pipelines.
Use Opus 4.6 when task complexity justifies the cost — multi-step coding agents, legal document review, or deep research synthesis. For conversational features, auto-complete, or routine summarization, Sonnet 4.6 or Haiku 4.5 will be more cost-efficient. For guidance on building cost-efficient AI pipelines, see our Dev/IT Ops guides.
Common Questions — — Claude Opus 4.6
Q: What is the model ID for Claude Opus 4.6 in the API?
A: Use claude-opus-4-6 as the model string in your API requests. Anthropic recommends pinning to the exact model ID rather than a floating alias to avoid unintended behavior changes when new versions roll out.
Q: Can I still use budget_tokens with Claude Opus 4.6?
A: The thinking: {type: "enabled", budget_tokens: N} pattern is deprecated on Opus 4.6 and will be removed in a future release. It still functions for now, but migrate to thinking: {type: "adaptive"} with the effort parameter as soon as possible.
Q: Does the 1M context window cost extra on Opus 4.6?
A: No. As of March 13, 2026, the full 1M token context window is generally available at standard pricing — $5/$25 per million tokens input/output. There is no surcharge for using more of the context window on Opus 4.6 or Sonnet 4.6.
Q: What broke in the upgrade from Opus 4.5 to Opus 4.6?
A: The main breaking change is the removal of assistant-turn prefills. Any request with a prefilled assistant message now returns a 400 error. Switch to structured outputs using output_config.format or system-prompt instructions to guide response format instead.
Conclusion
Claude Opus 4.6 delivers three advances worth acting on immediately: a 1M context window at standard pricing, Adaptive Thinking that calibrates reasoning depth automatically, and Context Compaction for unlimited session lengths. Paired with prompt caching (90% cheaper reads) and the Batch API (50% discount), the effective cost drops well below list price for most production workloads.
The migration is straightforward: update the SDK, switch to claude-opus-4-6, replace budget_tokens with thinking: {type: "adaptive"}, and move any prefill logic to output_config.format or system prompts. Explore more guides in our How-to section.
Last Updated: April 15, 2026








