Three weeks ago, I was debugging a Claude Code integration that kept making wrong tool calls. The LLM would call a file-reading tool but pass arguments in the wrong format. Every time.
I dug into the MCP server's tool schema and found the problem: most parameters had no descriptions. Claude was guessing what format to use for file paths, what flags to set, what types to pass. The "documentation" was just a parameter name and a type. Nothing about expected format, constraints, or behavior.
That's when I started wondering: how widespread is this problem?
I wrote a script that connects to an MCP server, inspects all its tools, and grades the schemas. Then I ran it against Anthropic's official reference servers, the ones they publish as examples for the community.
Results:
get-env tool. Every environment variable on the host is one tool call away.read_file) is still in the listing. Duplicate schemas.These are the reference implementations. If Anthropic's own servers have these issues, imagine what the rest of the ecosystem looks like.
The MCP ecosystem is growing fast. Bloomberry's analysis tracked it going from around 425 servers to over 1,400 in about 6 months. For comparison, Zapier took 5 years to reach 1,400 integrations.
But I couldn't find a single tool that answers a basic question: "Is this MCP server good enough to hand to an LLM?"
There's the MCP Inspector (official visual debugger, but manual only, no scoring). There's the MCP Validator from Janix (checks protocol compliance but ignores quality, security, efficiency). There's mcp-tef from Stacklok (tests descriptions only). None of them give you a composite score or integrate into CI/CD.
mcp-quality-gate runs 17 live tests against any MCP server and scores it across 4 dimensions:
One command:
npx mcp-quality-gate validate "your-server-command"
Output: composite 0-100 score + detailed breakdown + specific recommendations. Supports JSON output for CI/CD pipelines and a --threshold flag to gate deployments.
Stack: TypeScript, @modelcontextprotocol/sdk, Commander, Zod, Vitest, tsup. The whole thing is about 2,500 lines. I built it in roughly 2 weeks alongside my other projects.
The trickiest part was auto-generating tool arguments for live testing. You can't just send empty args to most tools. The tool needs a file path, a query, a resource URI. I built a generator that reads the schema, identifies types and constraints, and creates plausible arguments for each tool. It's not perfect (some tools will still fail if they need specific state), but it catches about 80% of issues.
This is the second open-source dev tool I've shipped (the first is aiqt, a code quality linter for AI-generated code). My bet: the MCP ecosystem is early enough that the developer who builds the quality standard gets to define what "good" means. That's worth more than any SaaS revenue at this stage.
If mcp-quality-gate becomes the default quality check for MCP servers, it creates a foundation for everything else I'm building.
npm install -g mcp-quality-gate
GitHub: https://github.com/bhvbhushan/mcp-quality-gate
If you build or maintain MCP servers, I'd love for you to run this and file issues. What's your current process for testing MCP integrations before deploying them?