GitHub, Google, and Anthropic all repriced their AI tools in the same month. Here's the local memory server I built in response.

Every AI session starts cold. I got tired of paying for that.

In April 2026, GitHub, Google, and Anthropic all repriced their AI tools in the same month. Token budgets shrank. Costs went up. We paid with our subscriptions to train those models, to the point they are today, but the part that bothered me the most was:

The context tax.

Every new chat session, I was spending 500 to 1,000 tokens just getting the AI back up to speed. Re-explaining the stack. Re-explaining conventions. Re-explaining decisions I had already made three sessions ago. That overhead was not doing anything useful. It was pure waste, and under the new per-token pricing, it was measurable waste.

So I built Zerikai Memory: a local Python MCP server that gives any AI-powered IDE persistent, workspace-isolated memory.

It scans your codebase once, stores compressed semantic summaries in a local vector database, and auto-generates a Project Brief that gets prepended to every query. Because that brief is identical in each session, DeepSeek's KV caching kicks in after the first query and drops the cost from $0.14 per million tokens to $0.0028 per million. A 50x reduction. And because the "auto-router" sends routine lookups to a local Ollama (on hybrid mode) instance, 70 to 80 percent of daily queries never hit the cloud at all.

cloud: DeepSeek for all operations. Maximum brief quality. Recommended for most developers.
hybrid: Ollama for file scanning and routine queries, DeepSeek for briefs and escalated queries.
local: Ollama for everything. Zero cost, fully private, lower brief quality.

The result is an AI that remembers your decisions, knows your architecture, and does not need to be reintroduced to your project every morning.

The deeper lesson was not about cost. It was about what persistent memory actually changes. An AI that remembers prior decisions can flag when you are drifting from them. An AI that knows your architecture can reason about it rather than describe it. That shift, from reactive tool to informed collaborator, is what I was actually building toward.

The repo is open and takes about five minutes to set up.

[Download & Install] | GitHub repo

Enrique Bruzual

posted to

Ideas and Validation

on May 4, 2026

Say something nice to ZerikAI…

Post Comment

1

You’re solving the right infra problem.

Most “AI coding” tools still act stateless, which makes them assistants in the UI but not in the workflow. Persistent memory is the real unlock because it shifts the model from answering prompts to operating with context continuity.

That’s the layer that actually compounds.

Zerikai is clear, but it still sounds project-like for something much more foundational. If this keeps moving toward persistent memory infra for developer systems, the naming can carry more weight.

Vroth.com fits that direction better.

It sounds more like core memory infrastructure than a tool wrapper, which is where this gets more valuable.

aryan_sinh

·
2 days ago
·
Reply
1. 1
  
  I appreciate the insight on the infra layer. You hit it on the head, stateless AI is just a fancy search engine. Zerikai Memory is built as an MCP server precisely because I wanted it to be an invisible background utility, not another UI wrapper.
  
  I’m focused on solving the 'Context Tax' and the 'Junk Context' problem that makes agents hallucinate in complex codebases. Regarding the name, I'm keeping it grounded in 'Architectural AI' for now, but I agree the goal is core memory infra.
  
  I appreciate your feedback.
  
  ZerikAI
  
  ·
  2 days ago
  ·
  Reply
  1. 1
    
    That makes sense.
    
    “Context Tax” and “Junk Context” are actually much stronger than the current brand frame.
    
    That’s the part I’d lean into harder.
    
    If Zerikai Memory stays as a project name, fine.
    
    But if the real product is solving persistent context for complex codebases, the name eventually has to carry that infrastructure weight.
    
    Right now the problem language feels sharper than the product name.
    
    That’s usually the signal.
    
    aryan_sinh
    
    ·
    a day ago
    ·
    Reply