Models IndexQ2 · 2026

Foundation models on accounting tasks.

Tools are products. Models are the capability they sit on top of. This index benchmarks 15 foundation LLMs on eight accounting-task categories — anchored to DualEntry's public 101-task eval where published, synthesized from adjacent public signals otherwise.

15
Models tracked: 9
Measured: 6
Synthesized: 8
Task categories

01Models Leaderboard

Ranked by the composite Models Index (70% accounting-task mean + 15% cost efficiency + 10% context + 5% speed). Accounting % is the DualEntry overall where published, otherwise the mean of our eight sub-category scores.

#	Ticker	Model	Provider	Accounting	Δ Q/Q	Cost	Ctx	Index
01	GPT54N	GPT-5.4-NanoOpenAI's fastest, cheapest GPT-5.4 variant — 75.2% on DualEntry, best speed-per-dollar.	OpenAI	75.2%	▲+6.9	$0.10/$0.40	1M	81.8
02	GPT54M	GPT-5.4-MiniThe price-performance workhorse — 74.3% on DualEntry at a fraction of flagship cost.	OpenAI	74.3%	▲+5.3	$0.50/$2.00	1M	79.3
03	CL47	Claude Opus 4.7Anthropic's flagship reasoning model —	Anthropic	79.2%	▲+1.4	$5.00/$25	1M	78.2
04	MMX27	MiniMax M2.7MiniMax's frontier — 71.3% on DualEntry, competitive mid-tier pricing.	MiniMax	71.3%	▲+5.2	$0.80/$2.20	1M	75.2
05	DSV4	DeepSeek V4Synth1T MoE open-weights model — synthesized ~70% accounting capability at roughly 1/50th of GPT-5.4's cost.	DeepSeek	68.8%	▲+2.7	$0.30/$0.90	1M	74.7
06	GRK420	Grok 4.20SynthxAI's reasoning flagship — AA Intelligence Index 49, 2M context, $2/$6 per MTok.	xAI	66.0%	•0.0	$2.00/$6.00	2M	74.7
07	QWN35	Qwen3.5 PlusSynthAlibaba's open-weights frontier — 1M context at $0.26/$1.56 per MTok, strong on coding and math.		66.6%	•0.0	$0.26/$1.56	1M	73.9
08	GM31P	Gemini 3.1 ProGoogle's flagship for agentic deployment — 66% on DualEntry, strong long-context story.	Google	66.0%	▲+7.3	$2.00/$12	1M	71.8
09	GPT54	GPT-5.4OpenAI's flagship — 77.3% on DualEntry; 272K standard context, 1M context available at a premium.	OpenAI	77.3%	▼-6.6	$2.50/$15	272K	70.8
10	CL46S	Claude Sonnet 4.6Anthropic's workhorse — 63.4% on DualEntry, 1,674 GDPVal-AA Elo, strong balance of capability and cost.	Anthropic	63.4%	▲+5.8	$3.00/$15	1M	69.6
11	GLM51	Z.ai GLM-5.1SynthZ.ai's next-gen open-weights flagship —	Z.ai	71.3%	•0.0	$1.40/$4.40	200K	68.8
12	GLM5	Z.ai GLM-5Strong Chinese-origin model — 72.3% on DualEntry, aggressive price point.	Z.ai	72.3%	▼-2.2	$0.60/$1.80	200K	68.6
13	GRK41	Grok 4.1 FastSynthxAI's value tier — synthesized ~58% on DualEntry, massive 2M context at rock-bottom pricing.	xAI	57.5%	▲+11.0	$0.20/$0.50	2M	68.5
14	KIMI26	Kimi K2.6SynthMoonshot's 1T open-weights MoE — AA Intelligence Index 54, top open-model on HLE with tools.		68.1%	•0.0	$0.95/$4.00	256K	67.9
15	CL45H	Claude Haiku 4.5Anthropic's fastest tier — 61.4% on DualEntry, strong for high-volume classification at $1/$5 per MTok.	Anthropic	61.4%	▲+3.7	$1.00/$5.00	200K	63.9

02Sources