Jörg Michno Shield Blog Audit Registry GitHub

← Back to Blog

We Scanned 11,529 MCP Servers for Security Vulnerabilities

By Joerg Michno · March 21, 2026 · 8 min read

We pulled every server listing from registry.modelcontextprotocol.io -- the official Model Context Protocol registry maintained by Anthropic -- and ran each one through ClawGuard's security analysis engine. 11,529 servers. 4.7 seconds. Zero LLM calls.

The result: 850 servers (7.4%) flagged with potential security risks. Zero contained malicious patterns in their registry metadata. The MCP ecosystem is healthier than many expected -- but 850 servers still expose capabilities that developers and companies need to understand before deploying them.

11,529

Servers Scanned

850

Flagged (7.4%)

93/100

Average Score

4.7s

Total Scan Time

Why Scan the Entire Registry?

Previous studies sampled small subsets. A Queen's University study analyzed 1,899 MCP servers and found 7.2% had potential security issues. Other published scans covered 1,000 or fewer servers. Sampling is useful for statistics, but when the full registry is public and your scanner runs in under 5 seconds, there is no reason to sample. Scan everything.

Our findings independently validate the Queen's University result on 6x the sample size: they found 7.2%, we found 7.4%. That convergence on a much larger dataset increases confidence in both results.

Methodology: Two-Layer Analysis

Every server was analyzed using two complementary approaches, both running locally with zero external API calls:

Layer 1: ClawGuard Pattern Engine

Each server's name, description, and metadata scanned against 182 compiled regex patterns covering prompt injection, data exfiltration, tool poisoning, privilege escalation, and 15 other attack categories.

Text preprocessed through 10 evasion-detection stages (leetspeak normalization, zero-width character stripping, homoglyph mapping, markdown splitting) before pattern matching.

Patterns mapped to OWASP LLM Top 10 and OWASP Agentic Security Top 10 categories for standardized classification.

Layer 2: Capability Risk Assessment

Server descriptions analyzed for 10 high-risk capability categories: file system access, database access, authentication handling, code execution, payment processing, and 5 more.

Each capability weighted by inherent risk (code execution = 10, email sending = 5) and aggregated into a 0-100 security score.

Servers scored below 90 flagged as medium risk. Servers with ClawGuard pattern matches flagged as high or critical depending on pattern severity.

This two-layer approach catches both intentionally malicious servers (Layer 1) and inherently risky servers that are legitimate but demand extra scrutiny before deployment (Layer 2).

Results: What 11,529 Servers Look Like

Critical / High Risk

Medium Risk

771

Low Risk

10,679

Clean

Key finding: Zero servers in the official MCP Registry contained malicious patterns (prompt injection, data exfiltration, tool poisoning) in their metadata. The registry's review process appears to be working.

All 850 flagged servers were flagged for capability risk -- meaning they declare access to sensitive resources like file systems, databases, or credentials. That does not make them malicious. It makes them servers that require informed deployment decisions.

Risk Breakdown by Capability

Of the 850 flagged servers, here is what they claim access to:

File System Access

160

Database Access

154

Auth / Credentials

128

Network / HTTP

120

Payment / Financial

110

Cloud Infrastructure

Code Execution

Browser Automation

PII / Sensitive Data

Email Sending

File system access tops the list because many MCP servers are built to help AI agents read, write, and manage files. That is their purpose. But an AI agent with unrestricted file system access connected to an untrusted MCP server is a path traversal vulnerability waiting to happen.

What This Means

The Registry Is Not Compromised

Zero malicious patterns is a strong signal. Anthropic's registry review process is filtering out overtly malicious submissions. This is good news for developers who stick to the official registry rather than installing servers from random GitHub repos.

7.4% Is Not a Small Number

850 servers handling file systems, databases, credentials, and payment processing deserve scrutiny before deployment. The question is not "is this server malicious?" but "what happens if this server is compromised?" A database MCP server with read/write access that gets poisoned through a supply chain attack has the same impact as a deliberately malicious one.

Capability Risk Is the Real Threat Model

The MCP ecosystem's biggest security challenge is not malware in the registry. It is the blast radius of legitimate servers. When 160 servers declare file system access and 128 handle authentication credentials, the attack surface for prompt injection, tool poisoning, and supply chain attacks is significant -- even if every server author has good intentions.

The EU AI Act Deadline You Are Ignoring

August 2, 2026: The EU AI Act's provisions on general-purpose AI models take full effect. If your AI agents connect to MCP servers that process personal data, handle financial transactions, or make automated decisions, you may need to demonstrate documented risk assessment.

Article 9 of the EU AI Act requires risk management systems for high-risk AI applications. MCP servers that handle credentials, PII, or financial data connected to autonomous AI agents are firmly in scope. "We didn't know what the server had access to" is not a compliance defense.

Our scan provides the kind of capability mapping that compliance teams need: for each server, what does it access, what is the risk score, and which OWASP categories apply. That is not a nice-to-have. After August 2, it is a requirement.

Comparison to Prior Research

Study	Servers Scanned	% Flagged	Method
Queen's University (2025)	1,899	7.2%	LLM-assisted analysis
Industry scan (2026)	1,000	Not disclosed	Proprietary
This scan (ClawGuard)	11,529	7.4%	Regex + capability, no LLM

The convergence between Queen's University's 7.2% and our 7.4% is notable given the difference in methodology (LLM-assisted vs. deterministic pattern matching) and sample size (1,899 vs. 11,529). Two independent methods reaching near-identical conclusions on overlapping but differently-sized datasets strengthens the finding.

Our scan also has a property that LLM-based approaches cannot guarantee: deterministic reproducibility. Run the same scan tomorrow and you get the same results. No temperature, no prompt variation, no model version drift. The same patterns either match or they don't.

What You Should Do

Audit before you deploy. Check what capabilities any MCP server claims before connecting it to your AI agent. Our interactive dashboard lets you search all 11,529 servers.
Scan server source code. Registry metadata is just the surface. Run ClawGuard against the actual server code to detect prompt injection, data exfiltration, and tool poisoning patterns.
Apply least privilege. If a server declares file system access but your use case only needs read access to one directory, sandbox accordingly.
Map to compliance frameworks. If you operate under the EU AI Act, NIST AI RMF, or ISO 42001, map each server's capabilities to your risk assessment documentation now -- not after August 2.
Monitor continuously. A server that is clean today can be compromised tomorrow. Build MCP security scanning into your CI/CD pipeline.

Explore the Full Results

Search, filter, and drill into all 11,529 servers. See risk scores, capability breakdowns, and OWASP mappings for every server in the official MCP Registry.

Interactive Dashboard ClawGuard on GitHub

Methodology Notes

Data source: registry.modelcontextprotocol.io API, fetched March 21, 2026
Scanner: ClawGuard v0.7.1 (225 patterns, 15 languages, 10 preprocessor stages)
Analysis scope: Registry metadata only (name, description, tags). Source code analysis requires individual repository access.
Scoring: Servers start at 100 and lose points based on weighted capability matches. Medium risk = score below 90. All scores are deterministic and reproducible.
Limitations: This scan analyzes what servers declare in their metadata. A server could understate its capabilities, or its actual code could differ from its description. Metadata scanning is a necessary first layer, not a complete security audit.
Raw data: Full scan results available in the interactive dashboard.