Track your MCP server's token use

Matthew WangNov 18, 20255 min read

Today's LLMs still have really small context windows. Models like Claude 4.5 Sonnet have a 200k context window. An MCP server with 20 tools can use over a fifth of Claude's context window before a single message is sent.

When designing MCP servers, it is crucial to optimize for token use to prevent performance degradation, context loss, and hallucinations occurring.

Track token use in MCPJam Playground

You can now track your MCP server's token use in the LLM playground:

Input and output tokens
System prompt token use (estimation)
Tool context use (estimation)

Note that system prompt and tool context use are estimations because they are baked into the input tokens. We do a rough estimation of how many tokens they use. The input and output tokens are accurate as they come directly from the LLM's APIs.

Token usage tracking in MCPJam Playground — Track input tokens, output tokens, and context usage in MCPJam Playground

We share our learnings with you every week.

Tips to reduce token usage

Here are some ways that you can optimize your MCP server's token usage. All of these tips do one thing - reduce redundancy, stay verbose.

Consolidate similar tools

I often see a pattern of redundant tools, where the tool functionalities are the same, but just divided into different tools:

search_projects
search_portfolios
search_milestones

Since all of these tools are search tools, but for different objects, you can consolidate these into a single tool and having the branching as a enum parameter:

{
  "Name": "search",
  "Description": "search object",
  "Parameters": {
    "Type": "projects" | "portfolios" | "milestones",
    "objectId": "number"
  }
}

Verbose tool / parameter descriptions

Less is more when it comes to writing good tool / parameter descriptions. When an LLM is overloaded with context, it leads to hallucination and outright ignoring of descriptions.

// bad
This parameter is used by the system to determine the amount of time, expressed in seconds, that the API should wait before terminating a request that has exceeded the allowed duration for processing.

// good
request timeout (seconds)

Do not return raw JSON in tool return values

JSON is for APIs to ingest, not for LLMs. Though LLMs can handle JSON just fine, there are more efficient ways to provide structured data to an LLM, one of them being TOON. TOON is more context efficient, on average taking up half the token use compared to JSON.

JSON

[
  {
    "id": 1,
    "name": "Alice",
    "role": "admin"
  },
  {
    "id": 2,
    "name": "Bob",
    "role": "user"
  }
]

TOON equivalent

users[2]{id,name,role}:
 1,Alice,admin
 2,Bob,user

Track token use in MCPJam Playground

Subscribe to the blog

Tips to reduce token usage

Consolidate similar tools

Verbose tool / parameter descriptions

Do not return raw JSON in tool return values