Models IndexQ2 · 2026

Foundation models on accounting tasks.

Tools are products. Models are the capability they sit on top of. This index benchmarks 15 foundation LLMs on eight accounting-task categories — anchored to DualEntry's public 101-task eval where published, synthesized from adjacent public signals otherwise.

15
Models tracked
9
Measured
6
Synthesized
8
Task categories
01Models Leaderboard

Ranked by the composite Models Index (70% accounting-task mean + 15% cost efficiency + 10% context + 5% speed). Accounting % is the DualEntry overall where published, otherwise the mean of our eight sub-category scores.

#TickerModelProviderAccountingΔ Q/QCostCtxIndex
01GPT54NGPT-5.4-NanoOpenAI's fastest, cheapest GPT-5.4 variant — 75.2% on DualEntry, best speed-per-dollar.OpenAI75.2%+6.9$0.10/$0.401M81.8
02GPT54MGPT-5.4-MiniThe price-performance workhorse — 74.3% on DualEntry at a fraction of flagship cost.OpenAI74.3%+5.3$0.50/$2.001M79.3
03CL47Claude Opus 4.7Anthropic's flagship reasoning model —Anthropic79.2%+1.4$5.00/$251M78.2
04MMX27MiniMax M2.7MiniMax's frontier — 71.3% on DualEntry, competitive mid-tier pricing.MiniMax71.3%+5.2$0.80/$2.201M75.2
05DSV4DeepSeek V4Synth1T MoE open-weights model — synthesized ~70% accounting capability at roughly 1/50th of GPT-5.4's cost.DeepSeek68.8%+2.7$0.30/$0.901M74.7
06GRK420Grok 4.20SynthxAI's reasoning flagship — AA Intelligence Index 49, 2M context, $2/$6 per MTok.xAI66.0%0.0$2.00/$6.002M74.7
07QWN35Qwen3.5 PlusSynthAlibaba's open-weights frontier — 1M context at $0.26/$1.56 per MTok, strong on coding and math.66.6%0.0$0.26/$1.561M73.9
08GM31PGemini 3.1 ProGoogle's flagship for agentic deployment — 66% on DualEntry, strong long-context story.Google66.0%+7.3$2.00/$121M71.8
09GPT54GPT-5.4OpenAI's flagship — 77.3% on DualEntry; 272K standard context, 1M context available at a premium.OpenAI77.3%-6.6$2.50/$15272K70.8
10CL46SClaude Sonnet 4.6Anthropic's workhorse — 63.4% on DualEntry, 1,674 GDPVal-AA Elo, strong balance of capability and cost.Anthropic63.4%+5.8$3.00/$151M69.6
11GLM51Z.ai GLM-5.1SynthZ.ai's next-gen open-weights flagship —Z.ai71.3%0.0$1.40/$4.40200K68.8
12GLM5Z.ai GLM-5Strong Chinese-origin model — 72.3% on DualEntry, aggressive price point.Z.ai72.3%-2.2$0.60/$1.80200K68.6
13GRK41Grok 4.1 FastSynthxAI's value tier — synthesized ~58% on DualEntry, massive 2M context at rock-bottom pricing.xAI57.5%+11.0$0.20/$0.502M68.5
14KIMI26Kimi K2.6SynthMoonshot's 1T open-weights MoE — AA Intelligence Index 54, top open-model on HLE with tools.68.1%0.0$0.95/$4.00256K67.9
15CL45HClaude Haiku 4.5Anthropic's fastest tier — 61.4% on DualEntry, strong for high-volume classification at $1/$5 per MTok.Anthropic61.4%+3.7$1.00/$5.00200K63.9
02Sources