AI NewsApril 16, 20269 min

Claude Opus 4.7 Released: Benchmarks, Pricing, and What Changed From Opus 4.6

Anthropic's Claude Opus 4.7 is now available with major gains in agentic coding, high-resolution vision, and a new xhigh effort level. Full benchmarks, pricing, migration guide, and hands-on findings.

NeuralStackly
Author
Journal

Claude Opus 4.7 Released: Benchmarks, Pricing, and What Changed From Opus 4.6

Claude Opus 4.7 Released: Benchmarks, Pricing, and What Changed From Opus 4.6

Claude Opus 4.7 Released: Benchmarks, Pricing, and What Changed From Opus 4.6

Last Updated: April 16, 2026 | Reading Time: 12 minutes | Trend Alert: 🔥 Just Released

On April 16, 2026, Anthropic released Claude Opus 4.7, its most capable generally available model to date. The model delivers a step-change improvement in agentic coding over Opus 4.6, introduces high-resolution image support (up to 3.75 megapixels), and ships with a new "xhigh" effort level for fine-grained control over reasoning depth vs. cost.

This is not a minor point release. Opus 4.7 represents a meaningful architecture update with a new tokenizer, removed legacy sampling parameters, and adaptive thinking as the only reasoning mode. If you're using Opus 4.6 in production, this post covers everything you need to know before upgrading.


What's New in Claude Opus 4.7

1. Agentic Coding Performance

The headline improvement is in software engineering, particularly for long-running autonomous tasks. Opus 4.7 was designed for the hardest coding work: the kind where you hand a model a complex task and let it run.

Key findings from Anthropic's early-access testers:

TesterResult
Cursor (CursorBench)70% pass rate vs. Opus 4.6's 58%
CodeRabbit10%+ recall improvement in code review, stable precision
Rakuten (Rakuten-SWE-Bench)3x more production tasks resolved than Opus 4.6
Notion14% improvement over Opus 4.6, 1/3 fewer tool errors
Factory10-15% lift in task success for autonomous engineering
VercelMore correct and complete one-shot coding; no regressions

Replit's President Michele Catasta noted that Opus 4.7 achieves "the same quality at lower cost" compared to Opus 4.6 for tasks like log analysis, bug finding, and fix proposals.

Hex's co-founder Caitlin Colgrove put it plainly: "low-effort Opus 4.7 is roughly equivalent to medium-effort Opus 4.6." That's a direct cost efficiency win.

2. High-Resolution Vision (3.75MP)

Opus 4.7 is the first Claude model with high-resolution image support. The maximum resolution jumped from 1,568px (1.15MP) to 2,576px (3.75MP) on the long edge.

This matters for:

  • •Computer-use agents reading dense UI screenshots
  • •Document extraction from complex diagrams and charts
  • •Pixel-perfect coordinate mapping (no more scale-factor math; coordinates are 1:1 with actual pixels)
  • •Scientific and medical imaging workflows

XBOW reported a massive jump on their visual-acuity benchmark: 98.5% for Opus 4.7 vs. 54.5% for Opus 4.6. Solve Intelligence noted improved multimodal understanding for chemical structures and technical diagrams.

The tradeoff: high-res images use more tokens. Downsample if you don't need the detail.

3. New `xhigh` Effort Level

Opus 4.7 introduces a new effort level between high and max, called xhigh. The effort parameter controls the tradeoff between reasoning depth and token spend.

Anthropic recommends:

Effort LevelUse Case
low / mediumQuick tasks, simple queries
highMost intelligence-sensitive tasks (minimum recommended)
xhighCoding and agentic use cases (new default for Claude Code)
maxMaximum reasoning, highest cost

This is Messages API only. Claude Managed Agents handles effort automatically.

4. Task Budgets (Beta)

A new feature that gives Claude an advisory token budget for the full agentic loop, including thinking, tool calls, and final output. The model sees a running countdown and prioritizes work accordingly.

response = client.beta.messages.create(
    model="claude-opus-4-7",
    max_tokens=128000,
    output_config={
        "effort": "high",
        "task_budget": {"type": "tokens", "total": 128000},
    },
    messages=[
        {"role": "user", "content": "Review the codebase and propose a refactor plan."}
    ],
    betas=["task-budgets-2026-03-13"],
)

Key points:

  • •This is an advisory cap, not a hard limit (unlike max_tokens)
  • •Minimum value: 20k tokens
  • •Best for workloads where you need the model to scope its own work
  • •For open-ended tasks where quality matters over speed, skip the budget

5. Improved Memory and Knowledge Work

Opus 4.7 is better at maintaining and using file-system-based memory across long sessions. If your agent uses a scratchpad or notes file, Opus 4.7 should improve at writing useful notes and leveraging them in subsequent tasks.

Knowledge work improvements include:

  • •Better .docx redlining and .pptx editing with self-verification
  • •Improved chart and figure analysis with programmatic tool-calling
  • •Harvey's BigLaw Bench: 90.9% at high effort with better legal reasoning calibration
  • •Bloomberg's research-agent benchmark: top overall score at 0.715, best long-context consistency

Benchmarks: Opus 4.7 vs. Opus 4.6 vs. GPT-5.4 vs. Gemini 3.1 Pro

Anthropic published a comprehensive benchmark table comparing Opus 4.7 against Opus 4.6, Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro. Here are the key results from the official announcement:

Coding Benchmarks

BenchmarkOpus 4.7Opus 4.6GPT-5.4Gemini 3.1 Pro
SWE-bench VerifiedState-of-artBelowBelowBelow
SWE-bench ProLeadingBelowBelowBelow
SWE-bench MultilingualLeadingBelowBelowBelow
Terminal-Bench 2.0BestBelowBelowBelow
CursorBench70%58%--
Rakuten-SWE-Bench3x Opus 4.6Baseline--

Vision Benchmarks

BenchmarkOpus 4.7Opus 4.6
XBOW Visual Acuity98.5%54.5%
SWE-bench MultimodalLeadingBelow
Image LocalizationImprovedBaseline

Knowledge Work Benchmarks

BenchmarkOpus 4.7Opus 4.6
Harvey BigLaw Bench90.9%Below
GDPval-AAState-of-artBelow
Finance Agent EvalState-of-artBelow
Research Agent (Bloomberg)0.7150.767 (Finance: 0.813 vs 0.767)
Databricks OfficeQA Pro21% fewer errorsBaseline

Safety

Opus 4.7 shows a similar safety profile to Opus 4.6 with improvements in:

  • •Honesty metrics
  • •Resistance to prompt injection attacks
  • •Overall misaligned behavior score (modest improvement over Opus 4.6)

It ships with new real-time cybersecurity safeguards that detect and block prohibited or high-risk requests. Security professionals can apply to the Cyber Verification Program for legitimate use cases.


Pricing and Availability

FeatureDetails
Input$5 / million tokens
Output$25 / million tokens
Context Window1M tokens (no long-context premium)
Max Output128k tokens
API Model IDclaude-opus-4-7
AWS Bedrockanthropic.claude-opus-4-7
GCP Vertex AIclaude-opus-4-7

Pricing is identical to Opus 4.6. However, note the new tokenizer uses roughly 1.0-1.35x as many tokens for the same input text. Your bill may increase on text-heavy workloads even at the same per-token price.

Available on: Claude API, Amazon Bedrock, Google Cloud Vertex AI, Microsoft Foundry, and all Claude consumer products.


Breaking Changes for Developers

If you're upgrading from Opus 4.6, these changes require code updates:

1. Extended Thinking Budgets Removed

Setting thinking: {"type": "enabled", "budget_tokens": N} will return a 400 error. The only thinking-on mode is now adaptive:

# Before (Opus 4.6)
thinking = {"type": "enabled", "budget_tokens": 32000}

# After (Opus 4.7)
thinking = {"type": "adaptive"}
output_config = {"effort": "high"}

Adaptive thinking is off by default. You must set it explicitly to enable it.

2. Sampling Parameters Removed

Setting temperature, top_p, or top_k to any non-default value returns a 400 error. Remove these parameters entirely and use prompting to control behavior.

3. Thinking Content Omitted by Default

Thinking blocks still stream, but their content field is empty unless you opt in:

thinking = {
    "type": "adaptive",
    "display": "summarized",  # opts back in to visible thinking
}

Without this, your users will see a long pause before output begins during thinking.

4. New Tokenizer

The updated tokenizer uses 1.0-1.35x more tokens for the same text. Update your max_tokens parameters and compaction triggers to account for this.


Opus 4.7 vs. Opus 4.6: What Actually Changed

Beyond the benchmarks, here are the behavioral differences that Anthropic highlighted:

  • •More literal instruction following. Opus 4.7 won't silently generalize instructions or infer requests you didn't make. This can break prompts written for looser models.
  • •Response length calibrates to task complexity rather than defaulting to a fixed verbosity.
  • •Fewer tool calls by default, using more reasoning. Raising effort increases tool usage.
  • •More direct, opinionated tone with less validation-forward phrasing and fewer emoji than Opus 4.6.
  • •More regular progress updates during long agentic traces. Remove any scaffolding you added to force interim status messages.
  • •Fewer subagents spawned by default, steerable through prompting.

Anthropic's advice: re-tune your prompts. Old prompts that relied on loose interpretation may produce unexpected results with Opus 4.7's more literal reading.


Also Launching Today

Alongside Opus 4.7, Anthropic announced:

  • •Claude Code /ultrareview: A dedicated review session that reads through changes and flags bugs and design issues. Pro and Max users get 3 free ultrareviews.
  • •Auto mode for Max users: Claude makes decisions on your behalf, enabling longer autonomous runs with fewer interruptions.
  • •Default effort level raised to xhigh in Claude Code for all plans.

Should You Upgrade?

For coding and agentic workflows, the answer is clearly yes. The benchmark data from Cursor, CodeRabbit, Rakuten, Notion, and Vercel all point to a meaningful step up. Hex's finding that low-effort Opus 4.7 matches medium-effort Opus 4.6 makes the economics compelling.

For vision-heavy workflows, the jump from 1.15MP to 3.75MP and XBOW's 98.5% visual acuity score make this a clear upgrade.

The caveats:

  • •The new tokenizer means 0-35% more tokens per request
  • •Breaking changes require code updates (sampling params, thinking budgets)
  • •More literal instruction following may break prompts tuned for Opus 4.6
  • •Adaptive thinking must be explicitly enabled (it's off by default)

Migrate during a low-traffic period, test your prompts against the more literal interpretation style, and measure token usage on real traffic before committing.


Further Reading

Share this article

N

About NeuralStackly

Expert researcher and writer at NeuralStackly, dedicated to finding the best AI tools to boost productivity and business growth.

View all posts

Related Articles

Continue reading with these related posts