CL46SAnthropic · Claude 4

Claude Sonnet 4.6

Anthropic's workhorse — 63.4% on DualEntry, 1,674 GDPVal-AA Elo, strong balance of capability and cost.

Accounting overall

63.4%

Input / Output

$3.00 / $15 per MTok

Context

Speed

~120 tok/s

Released

2025-12

Cutoff

2025-08

GDPVal-AA Elo

1674

01Accounting Task Breakdown

Eight accounting-task categories borrowed from DualEntry's 101-task benchmark. Measured where published, synthesized from adjacent benchmarks otherwise.

Transaction Class.

76.0

Journal Entry

74.0

Accounts Payable

66.0

Accounts Receivable

64.0

Bank Reconciliation

60.0

Financial Reporting

50.0

Month-End Close

38.0

Accounting Knowledge

74.0

02Research

Sonnet 4.6 is Anthropic's mid-tier offering — reliable, well-understood by developers, and present in many production agentic accounting tools at the time this issue publishes. At 63.4% on DualEntry it meaningfully underperforms GPT-5.4-Mini (74.3%) and GPT-5.4-Nano (75.2%) on accounting work despite similar per-token pricing.

The newly-added GDPVal-AA Elo of 1,674 (Max Effort configuration, tied with GPT-5.4 xhigh) is worth noting because it reframes the value-for-money story: on broad agentic workloads, Sonnet 4.6 punches materially above its DualEntry-specific score. For general agentic accounting work that extends beyond structured bookkeeping tasks — controller-style reasoning, financial-analyst drafting, audit-prep narrative — Sonnet 4.6's position is stronger than DualEntry alone implies.

For tools built on Sonnet 4.6 today, the upgrade path is either Opus 4.7 (capability-first) or GPT-5.4-Mini (cost-optimized). The question for tool builders: is your current Sonnet 4.6 integration producing the accuracy your customers expect? The DualEntry numbers alone suggest the ceiling on structured accounting tasks is lower than it feels.

Citations

DualEntry benchmark (Sonnet 4.6 63.4%)dualentry.com/blog/claude-opus-4-7-accounting-ai-benchmark-results
Artificial Analysis — GDPVal-AA leaderboard (Sonnet 4.6 Max Effort 1,674 Elo)artificialanalysis.ai/evaluations/gdpval-aa