Report
MCP Token Efficiency: Mad Lit vs Notion
27 June 2026 · Model claude-sonnet-4-6 · Measured with Anthropic's count_tokens API
Summary
We compared the Mad Lit MCP server against the Notion hosted MCP server on one metric: how many tokens their tool definitions consume. Mad Lit uses about 5× fewer tokens – 4,667 versus 25,409 for the full tool set – despite exposing more tools (27 versus 18).
An MCP server's tool definitions are loaded into the model's context on every request, before any work happens. A leaner schema leaves more of the context window for actual work and costs less per call. This report measures token usage only; it does not assess answer quality, latency, or task success.
Method
Token counts come from Anthropic's count_tokens endpoint – the same tokenizer the model uses – run against claude-sonnet-4-6. For each server we captured the complete live tool list and measured the tokens those definitions add to a request.
The per-tool figures below exclude the fixed 497-token “tool-use preamble” the model adds whenever any tool is present (identical for both servers), leaving each tool's own schema size. Notion's schema was captured from its hosted MCP server (18 tools) in June 2026.
Results
Total schema cost
| Server | Tools | Schema tokens |
|---|---|---|
| Mad Lit | 27 | 4,667 |
| Notion | 18 | 25,409 |
Mad Lit's full tool set is 5.4× smaller than Notion's despite exposing nine more tools. On a per-tool average, Mad Lit's definitions are roughly 9× smaller (154 vs 1,384 tokens per tool).
Per-capability comparison
For operations both servers provide, comparing each tool's own definition size:
| Operation | Mad Lit | Notion | Ratio |
|---|---|---|---|
| Overwrite page content | 162 | 4,076 | 25× |
| Create a page | 163 | 3,028 | 19× |
| Search by text | 115 | 2,102 | 18× |
| Start a comment thread | 190 | 2,875 | 15× |
| Read a page | 100 | 795 | 8× |
| Read comment threads | 166 | 441 | 2.7× |
| Move a page | 229 | 602 | 2.6× |
Notion's largest single tool (4,368 tokens) is close to the size of Mad Lit's entire 27-tool schema.
Workflow totals
Fifteen representative tasks were replayed with identical synthetic content, counting total tokens across all turns. Averages by task complexity:
| Task tier | Mad Lit | Notion |
|---|---|---|
| Tier 1 – single operation | 4,818 | 25,560 |
| Tier 2 – chained operation | 4,897 | 25,639 |
| Tier 3 – multi-step workflow | 5,180 | 25,922 |
Because the tool schema dominates the per-request total, the difference is roughly constant across task complexity.
Limitations
- The report measures token cost only. It does not measure answer quality, tool-selection accuracy, latency, or task success.
- Lower token cost does not imply greater capability. The per-capability table compares matched operations, not feature scope.
- Figures are specific to
claude-sonnet-4-6; token counts differ by model. - Tool schemas change over time. These figures reflect both servers as of June 2026 and should be re-measured if either server's tool set changes.
Reproducibility
The benchmark snapshots each server's live tool list and counts tokens via the Anthropic API. Raw per-scenario and per-tool results are stored as JSON alongside the script.
npm run benchmark # both servers, 15 scenarios npm run benchmark:per-tool # per-capability breakdown