Claude Sonnet 4.6
Anthropic's workhorse — 63.4% on DualEntry, 1,674 GDPVal-AA Elo, strong balance of capability and cost.
Accounting overall
63.4%
Input / Output
$3.00 / $15 per MTok
Context
1M
Speed
~120 tok/s
Released
2025-12
Cutoff
2025-08
GDPVal-AA Elo
1674
Eight accounting-task categories borrowed from DualEntry's 101-task benchmark. Measured where published, synthesized from adjacent benchmarks otherwise.
Sonnet 4.6 is Anthropic's mid-tier offering — reliable, well-understood by developers, and present in many production agentic accounting tools at the time this issue publishes. At 63.4% on DualEntry it meaningfully underperforms GPT-5.4-Mini (74.3%) and GPT-5.4-Nano (75.2%) on accounting work despite similar per-token pricing.
The newly-added GDPVal-AA Elo of 1,674 (Max Effort configuration, tied with GPT-5.4 xhigh) is worth noting because it reframes the value-for-money story: on broad agentic workloads, Sonnet 4.6 punches materially above its DualEntry-specific score. For general agentic accounting work that extends beyond structured bookkeeping tasks — controller-style reasoning, financial-analyst drafting, audit-prep narrative — Sonnet 4.6's position is stronger than DualEntry alone implies.
For tools built on Sonnet 4.6 today, the upgrade path is either Opus 4.7 (capability-first) or GPT-5.4-Mini (cost-optimized). The question for tool builders: is your current Sonnet 4.6 integration producing the accuracy your customers expect? The DualEntry numbers alone suggest the ceiling on structured accounting tasks is lower than it feels.
Citations
- DualEntry benchmark (Sonnet 4.6 63.4%)dualentry.com/blog/claude-opus-4-7-accounting-ai-benchmark-results
- Artificial Analysis — GDPVal-AA leaderboard (Sonnet 4.6 Max Effort 1,674 Elo)artificialanalysis.ai/evaluations/gdpval-aa