- The AI Stack for devs
- Posts
- Context Management
Context Management
You don’t need bigger prompts


Before you jump in, here’s something you’ll love.
I’ve put together a curated list of MCP servers and MCP registries to support your development workflows. When you subscribe to the newsletter, you’ll get instant access to this resource in your welcome email.
Bonus: I’m also working on a collection of free educational resources for building AI agents at every skill level. Coming soon for subscribers. Stay tuned!

Is Cursor or Claude still giving out AI slop?
The NoLiMa long-context benchmark revealed that most leading LLMs see a sharp drop in performance as context length grows. At 32k tokens, 11 out of 12 models scored less than half as well as they did with shorter inputs.

performance degradations for popular models at longer context lengths
You can read about the benchmarks in this paper here. It was published a while ago, so it’s missing the new models we have available.
Here’s another benchmark from Fiction.liveBench I found, showcases the comprehension level at different context lengths.

fiction.live benchmark
I also wrote about this a month back on why filling up more context does more harm than good. Debugging Decay in AI coding tools.
So what do we do? Just reduce your context
MCP Servers: Default configs are stealth token hogs. A single Playwright server can consume 11k+ tokens, ~12% of a long window. Avoid a global
mcp.json
and spin up only the servers you need with lean configs.my MCP tools are taking about 17k tokens (~8.5%)
I showcased how to load MCP configs selectively. Read MCP with multiple Configurations section here.CLAUDE.md & AGENTS.md: If you see memory files taking up too many tokens, consider removing irrelevant rules. These memory files typically occupy more than 10% of the context. Always keep them tiny and only have universal truths. Avoid dumping everything in this one single file
Reusable commands: Use reusable commands (
/bug
,/feature
) to load task specific context on demand (readmes, diffs, directory notes). This keeps the initial load sharp and controllable.Directory level memory: Use scoped memory files so they load only when relevant. e.g.,
frontend/CLAUDE.md
) will only be loaded when the model is working on frontend tasks in that session.Fresh Sessions: One task, one thread. Avoid dragging old conversation history into new work.
Strip irrelevant rules: Backend rules usually don’t belong in CSS fixes. Mixed instructions confuse the model and waste tokens.
Sub-Agents: Claude-specific sub-agents are powerful in controlling the context for the main agent. These subagents have their own 200k context window. Let them fetch docs, triage bugs, or scrape data while the primary agent stays lean.
Also do keep monitoring the window
/context: Claude Code can inspect token breakdowns in real time to catch hidden hogs (like MCP servers or memory bloat above).
Effective Context Length: Know your model’s limits. GPT-5 holds up near ~200k tokens; Claude Sonnet 4 starts slipping past 60–120k.
Takeaway
Great context management creates focused, single-purpose agents that solve problems cleanly, with fewer retries and less oversight.
Less is better.
If you’re not a subscriber, here’s what you missed this month
Subscribe to get access to such posts every week in your email.


🚢 What’s shipping this week?
Codebuff released an open source multi-agent CLI coding tool that beats Claude Code by 61% vs 53% on over 175 real-world coding tasks.
Anthropic finally released MCP Registry. An open catalogue and API for publicly available MCP servers to improve discoverability and implementation.
OpenAI adds full MCP support for all tools, both read and write


👓️ Worth reading for Devs

🤠 What you may have missed last month

My Recommendations
Techpresso gives you a daily rundown of what's happening in tech and read by 300,000+ professionals.
The Deep View The go to daily newsletter for 250k+ founders and knowledge workers who want to stay up to date with artificial intelligence.
Looking for more such updates in your inbox? Discover other newsletters that our audience loves to read here
