Context Management

You don’t need bigger prompts

Before you jump in, here’s something you’ll love.
I’ve put together a curated list of MCP servers and MCP registries to support your development workflows. When you subscribe to the newsletter, you’ll get instant access to this resource in your welcome email.

Bonus: I’m also working on a collection of free educational resources for building AI agents at every skill level. Coming soon for subscribers. Stay tuned!

Is Cursor or Claude still giving out AI slop?

The NoLiMa long-context benchmark revealed that most leading LLMs see a sharp drop in performance as context length grows. At 32k tokens, 11 out of 12 models scored less than half as well as they did with shorter inputs.

performance degradations for popular models at longer context lengths

You can read about the benchmarks in this paper here. It was published a while ago, so it’s missing the new models we have available.

Here’s another benchmark from Fiction.liveBench I found, showcases the comprehension level at different context lengths.

fiction.live benchmark

I also wrote about this a month back on why filling up more context does more harm than good. Debugging Decay in AI coding tools.

So what do we do? Just reduce your context

  • MCP Servers: Default configs are stealth token hogs. A single Playwright server can consume 11k+ tokens, ~12% of a long window. Avoid a global mcp.json and spin up only the servers you need with lean configs.

    my MCP tools are taking about 17k tokens (~8.5%)


    I showcased how to load MCP configs selectively. Read MCP with multiple Configurations section here.

  • CLAUDE.md & AGENTS.md: If you see memory files taking up too many tokens, consider removing irrelevant rules. These memory files typically occupy more than 10% of the context. Always keep them tiny and only have universal truths. Avoid dumping everything in this one single file

  • Reusable commands: Use reusable commands (/bug, /feature) to load task specific context on demand (readmes, diffs, directory notes). This keeps the initial load sharp and controllable.

  • Directory level memory: Use scoped memory files so they load only when relevant. e.g., frontend/CLAUDE.md) will only be loaded when the model is working on frontend tasks in that session.

  • Fresh Sessions: One task, one thread. Avoid dragging old conversation history into new work.

  • Strip irrelevant rules: Backend rules usually don’t belong in CSS fixes. Mixed instructions confuse the model and waste tokens.

  • Sub-Agents: Claude-specific sub-agents are powerful in controlling the context for the main agent. These subagents have their own 200k context window. Let them fetch docs, triage bugs, or scrape data while the primary agent stays lean.

Also do keep monitoring the window

  • /context: Claude Code can inspect token breakdowns in real time to catch hidden hogs (like MCP servers or memory bloat above).

  • Effective Context Length: Know your model’s limits. GPT-5 holds up near ~200k tokens; Claude Sonnet 4 starts slipping past 60–120k.

Takeaway

Great context management creates focused, single-purpose agents that solve problems cleanly, with fewer retries and less oversight.

Less is better.

If you’re not a subscriber, here’s what you missed this month

Subscribe to get access to such posts every week in your email.

🚢 What’s shipping this week?

👓️ Worth reading for Devs

My Recommendations

Techpresso gives you a daily rundown of what's happening in tech and read by 300,000+ professionals.

The Deep View The go to daily newsletter for 250k+ founders and knowledge workers who want to stay up to date with artificial intelligence.

Looking for more such updates in your inbox? Discover other newsletters that our audience loves to read here

📱 Stay connected: Follow me on LinkedIn and Twitter/X for daily AI tool breakdowns and quick wins

Do you like this edition?

Login or Subscribe to participate in polls.