MCP Token Efficiency: Mad Lit vs Notion

27 June 2026 · Model claude-sonnet-4-6 · Measured with Anthropic's count_tokens API

Summary

We compared the Mad Lit MCP server against the Notion hosted MCP server on one metric: how many tokens their tool definitions consume. Mad Lit uses about 5× fewer tokens – 4,667 versus 25,409 for the full tool set – despite exposing more tools (27 versus 18).

An MCP server's tool definitions are loaded into the model's context on every request, before any work happens. A leaner schema leaves more of the context window for actual work and costs less per call. This report measures token usage only; it does not assess answer quality, latency, or task success.

Method

Token counts come from Anthropic's count_tokens endpoint – the same tokenizer the model uses – run against claude-sonnet-4-6. For each server we captured the complete live tool list and measured the tokens those definitions add to a request.

The per-tool figures below exclude the fixed 497-token “tool-use preamble” the model adds whenever any tool is present (identical for both servers), leaving each tool's own schema size. Notion's schema was captured from its hosted MCP server (18 tools) in June 2026.

Results

Total schema cost

Server	Tools	Schema tokens
Mad Lit	27	4,667
Notion	18	25,409

Mad Lit's full tool set is 5.4× smaller than Notion's despite exposing nine more tools. On a per-tool average, Mad Lit's definitions are roughly 9× smaller (154 vs 1,384 tokens per tool).

Per-capability comparison

For operations both servers provide, comparing each tool's own definition size:

Operation	Mad Lit	Notion	Ratio
Overwrite page content	162	4,076	25×
Create a page	163	3,028	19×
Search by text	115	2,102	18×
Start a comment thread	190	2,875	15×
Read a page	100	795	8×
Read comment threads	166	441	2.7×
Move a page	229	602	2.6×

Notion's largest single tool (4,368 tokens) is close to the size of Mad Lit's entire 27-tool schema.

Workflow totals

Fifteen representative tasks were replayed with identical synthetic content, counting total tokens across all turns. Averages by task complexity:

Task tier	Mad Lit	Notion
Tier 1 – single operation	4,818	25,560
Tier 2 – chained operation	4,897	25,639
Tier 3 – multi-step workflow	5,180	25,922

Because the tool schema dominates the per-request total, the difference is roughly constant across task complexity.

Limitations

The report measures token cost only. It does not measure answer quality, tool-selection accuracy, latency, or task success.
Lower token cost does not imply greater capability. The per-capability table compares matched operations, not feature scope.
Figures are specific to claude-sonnet-4-6; token counts differ by model.
Tool schemas change over time. These figures reflect both servers as of June 2026 and should be re-measured if either server's tool set changes.

Reproducibility

The benchmark snapshots each server's live tool list and counts tokens via the Anthropic API. Raw per-scenario and per-tool results are stored as JSON alongside the script.

npm run benchmark            # both servers, 15 scenarios
npm run benchmark:per-tool   # per-capability breakdown