TOON Benchmarks

Token-Oriented Object Notation (TOON) is a new format that has been proposed as “a good way to pass structured data to Large Language Models with significantly reduced token usage.”

It is (fairly) token-efficient. But do LLMs understand it as well as better-known formats?

We ran some tests to try to find out.

Test 1: Understanding of Tabular Data

We recently tested how well an LLM (GPT-4.1 nano) understood table data in a variety of different formats.

In this test, we added TOON to the comparison.

Here’s what we found…

Good Accuracy for the Token Costs

Looking at accuracy vs. token costs, TOON was amongst the strongest performers at the token-efficient end of the spectrum.

Lower Accuracy Than With More Token-Hungry Formats

Accuracy with TOON wasn’t as good as with either our slightly less token-efficient ‘markdown table’ format or, unsurprisingly, with more token-hungry formats including markdown-kv, XML, YAML, HTML and JSON.

The difference in accuracy that we saw between using TOON and using the even more token-efficient CSV format wasn’t statistically significant.

(See our original comparison article for details of our methodology.)

Format	Accuracy	95% Confidence Interval	Tokens
Markdown-KV	60.7%	57.6% – 63.7%	52,104
XML	56.0%	52.9% – 59.0%	76,114
INI	55.7%	52.6% – 58.8%	48,100
YAML	54.7%	51.6% – 57.8%	55,395
HTML	53.6%	50.5% – 56.7%	75,204
JSON	52.3%	49.2% – 55.4%	66,396
Markdown-Table	51.9%	48.8% – 55.0%	25,140
Natural-Language	49.6%	46.5% – 52.7%	43,411
TOON	47.5%	44.4% – 50.6%	21,518
JSONL	45.0%	41.9% – 48.1%	54,407
CSV	44.3%	41.2% – 47.4%	19,524
Pipe-Delimited	41.1%	38.1% – 44.2%	43,098

Test 2: Understanding of Nested Data

Format	Accuracy	95% CI	Tokens
YAML	62.1%	[59.1%, 65.1%]	42,477
Markdown	54.3%	[51.2%, 57.4%]	38,357
JSON	50.3%	[47.2%, 53.4%]	57,933
XML	44.4%	[41.3%, 47.5%]	68,804
TOON	43.1%	[40.0%, 46.2%]	45,436

In our test of LLM understanding of nested data, this time using GPT-5 nano, TOON performed worse than the other formats we tested, including YAML and markdown that both also used fewer tokens.

Other Observations

Interestingly, the data retrieval benchmarks shared in the TOON GitHub repository showed TOON performing significantly better than other formats with GPT-5 nano, on a seemingly similar kind of test to the ones we ran:

TOON data retrieval benchmark from GitHub repo

We have run those tests ourselves and found similar results.

We’ve also reviewed the code for the tests and it looks good to us.

Conclusions

We like the idea of designing a format with LLM token efficiency specifically in mind.

It’s unclear at this stage how well LLMs can retrieve information from data provided to them in the TOON format.

On the one hand, in our tests, we failed to find circumstances where TOON was the best-performing format. (And in our test of retrieval from nested data, it performed relatively poorly.)

On the other, in the tests provided in the TOON GitHub repo, TOON performed well.

Let us know if you run any tests on TOON yourself. We’d be interested in your findings.