Track your MCP server's token use
Today's LLMs still have really small context windows. Models like Claude 4.5 Sonnet have a 200k context window. An MCP server with 20 tools can use over a fifth of Claude's context window before a single message is sent.
When designing MCP servers, it is crucial to optimize for token use to prevent performance degradation, context loss, and hallucinations occurring.
Track token use in MCPJam Playground
You can now track your MCP server's token use in the LLM playground:
- Input and output tokens
- System prompt token use (estimation)
- Tool context use (estimation)
Note that system prompt and tool context use are estimations because they are baked into the input tokens. We do a rough estimation of how many tokens they use. The input and output tokens are accurate as they come directly from the LLM's APIs.

Subscribe to the blog
We share our learnings with you every week.
Tips to reduce token usage
Here are some ways that you can optimize your MCP server's token usage. All of these tips do one thing - reduce redundancy, stay verbose.
Consolidate similar tools
I often see a pattern of redundant tools, where the tool functionalities are the same, but just divided into different tools:
search_projects
search_portfolios
search_milestonesSince all of these tools are search tools, but for different objects, you can consolidate these into a single tool and having the branching as a enum parameter:
{
"Name": "search",
"Description": "search object",
"Parameters": {
"Type": "projects" | "portfolios" | "milestones",
"objectId": "number"
}
}Verbose tool / parameter descriptions
Less is more when it comes to writing good tool / parameter descriptions. When an LLM is overloaded with context, it leads to hallucination and outright ignoring of descriptions.
// bad
This parameter is used by the system to determine the amount of time, expressed in seconds, that the API should wait before terminating a request that has exceeded the allowed duration for processing.
// good
request timeout (seconds)Do not return raw JSON in tool return values
JSON is for APIs to ingest, not for LLMs. Though LLMs can handle JSON just fine, there are more efficient ways to provide structured data to an LLM, one of them being TOON. TOON is more context efficient, on average taking up half the token use compared to JSON.
JSON
[
{
"id": 1,
"name": "Alice",
"role": "admin"
},
{
"id": 2,
"name": "Bob",
"role": "user"
}
]TOON equivalent
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user