I tested Anthropic's reference MCP servers and found 72% of parameters were undocumented. So I built a quality gate.

Three weeks ago, I was debugging a Claude Code integration that kept making wrong tool calls. The LLM would call a file-reading tool but pass arguments in the wrong format. Every time.

I dug into the MCP server's tool schema and found the problem: most parameters had no descriptions. Claude was guessing what format to use for file paths, what flags to set, what types to pass. The "documentation" was just a parameter name and a type. Nothing about expected format, constraints, or behavior.

That's when I started wondering: how widespread is this problem?

Testing Anthropic's own servers

I wrote a script that connects to an MCP server, inspects all its tools, and grades the schemas. Then I ran it against Anthropic's official reference servers, the ones they publish as examples for the community.

Results:

Memory server: 98/100. Pretty clean, but 50% of parameters still have no descriptions.
Sequential Thinking: 98/100. Works well, but ships a 500+ character description that wastes context tokens.
Everything server: 88/100. Exposes a get-env tool. Every environment variable on the host is one tool call away.
Filesystem server: 81/100. 72% of parameters undocumented. A deprecated tool (read_file) is still in the listing. Duplicate schemas.
Playwright server: 81/100. 21 tools consuming 3,000+ schema tokens. Multiple code execution surfaces.

These are the reference implementations. If Anthropic's own servers have these issues, imagine what the rest of the ecosystem looks like.

The market context

The MCP ecosystem is growing fast. Bloomberry's analysis tracked it going from around 425 servers to over 1,400 in about 6 months. For comparison, Zapier took 5 years to reach 1,400 integrations.

But I couldn't find a single tool that answers a basic question: "Is this MCP server good enough to hand to an LLM?"

There's the MCP Inspector (official visual debugger, but manual only, no scoring). There's the MCP Validator from Janix (checks protocol compliance but ignores quality, security, efficiency). There's mcp-tef from Stacklok (tests descriptions only). None of them give you a composite score or integrate into CI/CD.

What I built

mcp-quality-gate runs 17 live tests against any MCP server and scores it across 4 dimensions:

Compliance (40 points): Does it follow the MCP spec? Tests lifecycle, tool listing, tool calls, resources, prompts, error handling.
Quality (25 points): Are tool descriptions good enough for an LLM to use correctly? Parameter coverage, description length, deprecated tools, duplicate schemas.
Security (20 points): Is it leaking environment variables? Exposing code execution surfaces? Missing destructive operation warnings?
Efficiency (15 points): How many tools? How many schema tokens? Are you burning the context window before the conversation starts?

One command:

npx mcp-quality-gate validate "your-server-command"

Output: composite 0-100 score + detailed breakdown + specific recommendations. Supports JSON output for CI/CD pipelines and a --threshold flag to gate deployments.

The build

Stack: TypeScript, @modelcontextprotocol/sdk, Commander, Zod, Vitest, tsup. The whole thing is about 2,500 lines. I built it in roughly 2 weeks alongside my other projects.

The trickiest part was auto-generating tool arguments for live testing. You can't just send empty args to most tools. The tool needs a file path, a query, a resource URI. I built a generator that reads the schema, identifies types and constraints, and creates plausible arguments for each tool. It's not perfect (some tools will still fail if they need specific state), but it catches about 80% of issues.

Numbers

Revenue: $0. This is open source, MIT licensed.
Downloads: Just launched, so early.
Time invested: ~2 weeks of development time.
Lines of code: ~2,500
Cost to build: $0 out of pocket (used my existing dev environment + Claude Code for pair programming)

Why open source?

This is the second open-source dev tool I've shipped (the first is aiqt, a code quality linter for AI-generated code). My bet: the MCP ecosystem is early enough that the developer who builds the quality standard gets to define what "good" means. That's worth more than any SaaS revenue at this stage.

If mcp-quality-gate becomes the default quality check for MCP servers, it creates a foundation for everything else I'm building.

What's next

Better HTTP/SSE transport support
More security checks (especially around prompt injection vectors)
Community-contributed test suites
A "recommended score" badge that server authors can display

What I'd tell my past self

Start with the scoring framework, not the tests. Having a clear 4-dimension model made every implementation decision easier.
Test against real servers early. My first version only used synthetic schemas. The moment I pointed it at Anthropic's servers and found real issues, it validated the entire project.
Ship before it's polished. v0.1.1 has rough edges. But MCP servers are shipping faster than anyone's testing them, and a useful tool now beats a perfect tool in 3 months.

npm install -g mcp-quality-gate
GitHub: https://github.com/bhvbhushan/mcp-quality-gate

If you build or maintain MCP servers, I'd love for you to run this and file issues. What's your current process for testing MCP integrations before deploying them?