← All models
CL46SAnthropic · Claude 4

Claude Sonnet 4.6

Anthropic's workhorse — 63.4% on DualEntry, 1,674 GDPVal-AA Elo, strong balance of capability and cost.

Accounting overall

63.4%

Input / Output

$3.00 / $15 per MTok

Context

1M

Speed

~120 tok/s

Released

2025-12

Cutoff

2025-08

GDPVal-AA Elo

1674

01Accounting Task Breakdown

Eight accounting-task categories borrowed from DualEntry's 101-task benchmark. Measured where published, synthesized from adjacent benchmarks otherwise.

Transaction Class.
76.0
Journal Entry
74.0
Accounts Payable
66.0
Accounts Receivable
64.0
Bank Reconciliation
60.0
Financial Reporting
50.0
Month-End Close
38.0
Accounting Knowledge
74.0
02Research

Sonnet 4.6 is Anthropic's mid-tier offering — reliable, well-understood by developers, and present in many production agentic accounting tools at the time this issue publishes. At 63.4% on DualEntry it meaningfully underperforms GPT-5.4-Mini (74.3%) and GPT-5.4-Nano (75.2%) on accounting work despite similar per-token pricing.

The newly-added GDPVal-AA Elo of 1,674 (Max Effort configuration, tied with GPT-5.4 xhigh) is worth noting because it reframes the value-for-money story: on broad agentic workloads, Sonnet 4.6 punches materially above its DualEntry-specific score. For general agentic accounting work that extends beyond structured bookkeeping tasks — controller-style reasoning, financial-analyst drafting, audit-prep narrative — Sonnet 4.6's position is stronger than DualEntry alone implies.

For tools built on Sonnet 4.6 today, the upgrade path is either Opus 4.7 (capability-first) or GPT-5.4-Mini (cost-optimized). The question for tool builders: is your current Sonnet 4.6 integration producing the accuracy your customers expect? The DualEntry numbers alone suggest the ceiling on structured accounting tasks is lower than it feels.

Citations