Report

MCP Token Efficiency: Mad Lit vs Notion

27 June 2026 · Model claude-sonnet-4-6 · Measured with Anthropic's count_tokens API


Summary

We compared the Mad Lit MCP server against the Notion hosted MCP server on one metric: how many tokens their tool definitions consume. Mad Lit uses about 5× fewer tokens – 4,667 versus 25,409 for the full tool set – despite exposing more tools (27 versus 18).

An MCP server's tool definitions are loaded into the model's context on every request, before any work happens. A leaner schema leaves more of the context window for actual work and costs less per call. This report measures token usage only; it does not assess answer quality, latency, or task success.

Method

Token counts come from Anthropic's count_tokens endpoint – the same tokenizer the model uses – run against claude-sonnet-4-6. For each server we captured the complete live tool list and measured the tokens those definitions add to a request.

The per-tool figures below exclude the fixed 497-token “tool-use preamble” the model adds whenever any tool is present (identical for both servers), leaving each tool's own schema size. Notion's schema was captured from its hosted MCP server (18 tools) in June 2026.

Results

Total schema cost

ServerToolsSchema tokens
Mad Lit274,667
Notion1825,409

Mad Lit's full tool set is 5.4× smaller than Notion's despite exposing nine more tools. On a per-tool average, Mad Lit's definitions are roughly 9× smaller (154 vs 1,384 tokens per tool).

Per-capability comparison

For operations both servers provide, comparing each tool's own definition size:

OperationMad LitNotionRatio
Overwrite page content1624,07625×
Create a page1633,02819×
Search by text1152,10218×
Start a comment thread1902,87515×
Read a page100795
Read comment threads1664412.7×
Move a page2296022.6×

Notion's largest single tool (4,368 tokens) is close to the size of Mad Lit's entire 27-tool schema.

Workflow totals

Fifteen representative tasks were replayed with identical synthetic content, counting total tokens across all turns. Averages by task complexity:

Task tierMad LitNotion
Tier 1 – single operation4,81825,560
Tier 2 – chained operation4,89725,639
Tier 3 – multi-step workflow5,18025,922

Because the tool schema dominates the per-request total, the difference is roughly constant across task complexity.

Limitations

  • The report measures token cost only. It does not measure answer quality, tool-selection accuracy, latency, or task success.
  • Lower token cost does not imply greater capability. The per-capability table compares matched operations, not feature scope.
  • Figures are specific to claude-sonnet-4-6; token counts differ by model.
  • Tool schemas change over time. These figures reflect both servers as of June 2026 and should be re-measured if either server's tool set changes.

Reproducibility

The benchmark snapshots each server's live tool list and counts tokens via the Anthropic API. Raw per-scenario and per-tool results are stored as JSON alongside the script.

npm run benchmark            # both servers, 15 scenarios
npm run benchmark:per-tool   # per-capability breakdown