TOON Benchmarks

research ai-evaluation llms
TOON Benchmarks

Token-Oriented Object Notation (TOON) is a new format that has been proposed as “a good way to pass structured data to Large Language Models with significantly reduced token usage.”

It is (fairly) token-efficient. But do LLMs understand it as well as better-known formats?

We ran some tests to try to find out.

Test 1: Understanding of Tabular Data

We recently tested how well an LLM (GPT-4.1 nano) understood table data in a variety of different formats.

In this test, we added TOON to the comparison.

Here’s what we found…

Good Accuracy for the Token Costs

Looking at accuracy vs. token costs, TOON was amongst the strongest performers at the token-efficient end of the spectrum.

Lower Accuracy Than With More Token-Hungry Formats

Accuracy with TOON wasn’t as good as with either our slightly less token-efficient ‘markdown table’ format or, unsurprisingly, with more token-hungry formats including markdown-kv, XML, YAML, HTML and JSON.

The difference in accuracy that we saw between using TOON and using the even more token-efficient CSV format wasn’t statistically significant.

(See our original comparison article for details of our methodology.)
FormatAccuracy95% Confidence IntervalTokens
Markdown-KV60.7%57.6% – 63.7%52,104
XML56.0%52.9% – 59.0%76,114
INI55.7%52.6% – 58.8%48,100
YAML54.7%51.6% – 57.8%55,395
HTML53.6%50.5% – 56.7%75,204
JSON52.3%49.2% – 55.4%66,396
Markdown-Table51.9%48.8% – 55.0%25,140
Natural-Language49.6%46.5% – 52.7%43,411
TOON47.5%44.4% – 50.6%21,518
JSONL45.0%41.9% – 48.1%54,407
CSV44.3%41.2% – 47.4%19,524
Pipe-Delimited41.1%38.1% – 44.2%43,098

Test 2: Understanding of Nested Data

FormatAccuracy95% CITokens
YAML62.1%[59.1%, 65.1%]42,477
Markdown54.3%[51.2%, 57.4%]38,357
JSON50.3%[47.2%, 53.4%]57,933
XML44.4%[41.3%, 47.5%]68,804
TOON43.1%[40.0%, 46.2%]45,436

In our test of LLM understanding of nested data, this time using GPT-5 nano, TOON performed worse than the other formats we tested, including YAML and markdown that both also used fewer tokens.

Other Observations

Interestingly, the data retrieval benchmarks shared in the TOON GitHub repository showed TOON performing significantly better than other formats with GPT-5 nano, on a seemingly similar kind of test to the ones we ran:

TOON data retrieval benchmark from GitHub repo

We have run those tests ourselves and found similar results.

We’ve also reviewed the code for the tests and it looks good to us.

Conclusions

We like the idea of designing a format with LLM token efficiency specifically in mind.

It’s unclear at this stage how well LLMs can retrieve information from data provided to them in the TOON format.

On the one hand, in our tests, we failed to find circumstances where TOON was the best-performing format. (And in our test of retrieval from nested data, it performed relatively poorly.)

On the other, in the tests provided in the TOON GitHub repo, TOON performed well.

Let us know if you run any tests on TOON yourself. We’d be interested in your findings.

Enjoyed This Article?

Get more tactical AI agent insights delivered to your inbox

We respect your privacy. Unsubscribe at any time.