ログイン中: ゲストモード

ResearchMar 27, 2026Abcas Security Research

We Inspected 3,601 MCP Servers — Here Is What We Found

A full inspection of 3,601 MCP servers found that 70% of a detailed 10-server sample required review or blocking, and popularity did not predict lower observed risk.

Terminology

TermMeaning
MCP ServerAn endpoint that provides tools and data to AI agents via the Model Context Protocol
Safety VerdictAutomated inspection result: PASS (no issues) / WARN (review needed) / BLOCK (critical risk)
Provenance CheckVerification of whether a package comes from a legitimate publisher through a trustworthy distribution path
Consistency CheckVerification of whether a server's declared behavior matches what is actually observed
Known DatabaseA database that accumulates past inspection results and maintains per-server safety assessments

Lead

MCP (Model Context Protocol) has rapidly emerged as the standard way AI agents access external tools and data sources since late 2025. Through MCP servers, developers can grant AI agents capabilities ranging from file manipulation and database access to API integrations and command execution.

However, this rapid adoption has outpaced safety verification. Thousands of MCP servers are now publicly available through npm, GitHub, and dedicated registries — yet systematic safety inspections at scale have been virtually nonexistent. Being listed in a registry, having high download counts, or hosting source code on GitHub does not constitute evidence of safety.

This report presents findings from a full inspection of 3,601 MCP servers. To our knowledge, this is the first systematic safety inspection at this scale in the MCP ecosystem. The dataset spans widely adopted servers with hundreds of thousands of weekly downloads to niche tools published days before the scan. Our goal is to provide quantitative evidence of the safety landscape, giving developers and organizations a factual basis for deployment decisions.

Key Findings

  1. 3,601 servers were inspected with a 99.9% completion rate (3,597 completed). The inspection infrastructure itself is reliable.
  2. In a detailed 10-server sample, 70% triggered warnings or blocks.
  3. Known-database matching classified half of evaluated servers as "caution required."
  4. Even servers with 300,000+ weekly downloads received safety warnings.
  5. The primary causes were provenance concerns and behavioral inconsistencies.

Background: Why MCP Server Safety Matters Now

The Expanding Scope of AI Agent Permissions

The fundamental difference between traditional API integrations and MCP servers lies in the breadth of permissions granted to AI agents. MCP servers are not merely data-retrieval endpoints — they can authorize file creation and deletion, command execution, external service requests, and other operations that directly affect the host environment.

This design means that a malicious or unintentionally dangerous MCP server, once installed, can cause damage far beyond what a traditional API misconfiguration would permit. Operations that the user never explicitly approved may be executed by the AI agent through the MCP server's tool definitions. This is why MCP server safety must be understood not just as a package-security problem, but as an agent-security problem.

Parallels and Differences With Existing Package Ecosystems

Supply-chain attacks through malicious packages are a well-documented phenomenon in ecosystems like npm and PyPI. Attack vectors include typosquatting (publishing packages with names similar to popular ones), dependency hijacking, and maintainer credential theft.

The MCP ecosystem faces all of these risks, plus additional ones unique to its architecture. Unlike traditional libraries, MCP servers define runtime behavior that cannot always be inferred from package metadata alone. A server's declared tool definitions may not match its actual runtime behavior, and detecting this discrepancy requires more than static analysis of the package contents. Multi-perspective inspection — examining provenance, declarations, behavior, and accumulated history — becomes essential.

The "Published Means Safe" Fallacy

The open-source ecosystem operates under an implicit assumption: "if the code is public, someone is reviewing it" and "if many people use it, it must be safe." Both assumptions have been empirically disproven. The event-stream incident of 2018 demonstrated that a package with millions of monthly downloads could harbor a targeted backdoor for weeks before detection.

The MCP ecosystem is even more vulnerable to this fallacy. Its community is younger, review infrastructure is less mature, and the consequences of a compromised server — with access to an AI agent's full execution context — are potentially more severe than a traditional library compromise.

Methodology

ItemValue
Scope3,601 MCP servers (full re-inspection)
Window2026-03-26 to 2026-03-27 (31.2 hours continuous monitoring)
Detailed analysis10 servers extracted for multi-perspective evaluation
NoteServer names are anonymized. Inspection implementation details are not disclosed

Inspection Scope and Approach

The 3,601 servers represent the full population of MCP servers registered across major registries and package repositories at the time of inspection. All servers underwent automated inspection, and the results were used to assess the reliability of the inspection infrastructure and the distribution of safety verdicts.

For detailed analysis, 10 servers were selected to demonstrate what kinds of issues automated inspection can detect. The problem patterns and ratios reported in this article are based on this exploratory 10-server sample and are not intended as statistical estimates of the full 3,601-server population. The purpose of the exploratory analysis is to qualitatively demonstrate the types and severity of detectable issues.

Anonymization Policy

No individual server names, package names, or publisher identities are disclosed in this report. The purpose of this inspection is not to damage the reputation of specific servers, but to illuminate safety trends across the ecosystem as a whole.

Results

Full Inspection Overview

3,601 servers were processed in 31.2 hours, achieving a completion rate of 99.9%.

StateCountRatio
Completed successfully3,59799.89%
Size-based routing diversion20.06%
Processing failure20.06%
Total3,601100%

The 2 failures were caused by timeout or source-fetch errors, managed as design-level exceptions within the inspection infrastructure. The 2 size-based diversions represent a deliberate safety measure for oversized packages that cannot be processed through the standard pipeline — these are distinct from failures. This completion rate demonstrates that full re-inspection of 3,601 servers on a periodic basis is operationally viable.

Detailed Safety Verdicts (10-Server Sample)

10 servers were selected and inspected across multiple verification perspectives:

VerdictCountRatio
PASS (no issues)330%
WARN (review needed)660%
BLOCK (critical risk)110%

70% of sampled servers required review or were blocked before deployment. The 3 servers that received PASS had verifiable publisher information, no contradictions between declared and observed behavior, and were recorded as lower-risk in the known database.

The 6 WARN-rated servers each triggered concerns in at least one verification perspective, but none warranted outright blocking. However, "review needed" does not mean "lower-risk." Organizations deploying WARN-rated servers should examine the specific findings and make an explicit accept-or-reject decision.

The single BLOCK-rated server exhibited serious issues across multiple verification perspectives and was judged unsuitable for deployment.

Known-Database Matching

ClassificationCount
Previously lower-risk5
Known caution5

Half of the evaluated servers were classified as "caution" based on accumulated inspection history. This result highlights that MCP server risk is not a fixed property. Servers previously assessed as lower-risk can change status after package updates, dependency changes, or maintainer transitions. Conversely, previously flagged servers may improve over time.

Because the known database represents accumulated historical judgments, determining the current risk state of a server requires cross-referencing with the most recent inspection results. This is one of the key reasons periodic re-inspection is essential.

Download Volume vs. Safety Verdict

MetricValue
Highest weekly downloads (in sample)309,704
Lowest weekly downloads (in sample)24
Gap~12,900x

Weekly download counts in our 10-server sample ranged from 24 to over 309,000 — a gap of approximately 12,900x. Yet no correlation was observed between download volume and safety verdicts.

This finding may seem counterintuitive, but it is consistent with patterns observed across software security more broadly. Popular packages can become high-value targets for supply-chain attacks, while niche packages may persist without adequate review. The critical takeaway is that download count is a distribution metric, not a safety metric, and deployment decisions should not treat it as one.

What Kinds of Problems Are Detected?

Issues found in detailed inspections fall into three broad categories. Detection methodology details are not disclosed, but the nature of the issues is described below.

1. Provenance Concerns

The most frequently observed category. Problems include insufficient publisher information, inconsistencies between maintainer identity and distribution path, and abnormally short publication history. These are cases where it is not possible to confidently determine whether the server genuinely comes from a legitimate source.

Specific patterns include:

  • The publisher's npm account was recently created with no other published packages
  • The package name closely resembles an existing popular server but is published by a different account
  • The GitHub repository owner does not match the npm package publisher
  • The organization name in the README contradicts actual publisher metadata

These patterns do not necessarily indicate malice. However, unverifiable provenance cannot be treated as "lower-risk" — particularly when typosquatting patterns are present, which may indicate intentional deception.

2. Declaration vs. Behavior Mismatch

Servers that declare certain operations but exhibit behavior contradicting those declarations. In our 10-server sample, 2 servers (20%) showed such contradictions, and both cases materially affected the final safety verdict.

For example, a server declared as a "read-only data retrieval tool" that is found to have capabilities for writing to the filesystem or sending POST requests to external APIs falls into this category. Such inconsistencies — whether the result of deliberate concealment or development oversight — must be treated as risk indicators.

While the contradiction detection rate (20% in this sample) is not high in absolute terms, the impact of detected contradictions is substantial. In both cases, the detected contradiction was the deciding factor that elevated the final verdict to WARN or above. Operationally, "how many verdicts were changed by contradictions" is a more meaningful metric than "what percentage had contradictions."

3. Dependency Risks

Libraries or packages used by the server contain known vulnerabilities. Even when the server itself is not malicious, compromised dependencies create indirect risk.

This problem is not unique to MCP — it is a shared challenge across the entire software supply chain. However, in the MCP context, dependency vulnerabilities combine with the broad permissions granted to AI agents, potentially amplifying the impact of exploitation beyond what would occur in typical library usage.

Discussion

"Publicly Available" Does Not Mean "Safe"

Anyone can publish an MCP server. Being listed in a registry, having high download counts, or hosting source code on GitHub does not constitute safety verification. While this is a known principle, it is particularly critical in the MCP ecosystem for two reasons.

First, MCP servers grant AI agents broad operational permissions, meaning the blast radius of a compromised server is significantly larger than a typical library vulnerability. Second, the ecosystem is growing faster than community review capacity can keep pace, meaning that the informal "many eyes" defense is weaker than in mature ecosystems.

Manual Review Cannot Catch These Issues at Scale

Most of the problems described above cannot be identified by reading a server's README or documentation. Publisher information consistency, declaration-behavior alignment, and dependency vulnerability status all require automated inspection to surface.

This does not mean human code review is unnecessary. However, performing human review on all 3,601 servers is not practical. Automated inspection serves as a first-pass screening that enables efficient allocation of human review resources to the servers that warrant closer examination.

Continuous Re-Inspection Is Required

Servers that pass inspection today may fail tomorrow due to package updates, dependency changes, or maintainer transitions. Our demonstration that 3,601 full re-inspections can be completed in 31.2 hours proves that periodic re-inspection is operationally viable at scale.

In software supply-chain monitoring, continuous monitoring is becoming standard practice alongside point-in-time audits. The MCP ecosystem should adopt the same approach: not just a one-time inspection at deployment, but periodic re-inspection integrated into the organization's security process.

Recommended Actions for Organizations

Based on the findings in this report, we recommend the following:

  1. Mandate pre-deployment inspection: Before integrating any MCP server into organizational systems, conduct automated inspection and review the safety verdict.
  2. Do not rely on download counts as a safety signal: Base deployment decisions on publisher legitimacy and behavioral consistency, not popularity metrics.
  3. Implement periodic re-inspection: Re-inspect deployed servers on a regular schedule to detect changes in safety status.
  4. Define WARN-handling procedures: Establish a clear workflow (investigate → approve/reject → monitor or remove) for servers that receive WARN verdicts.

Limitations

  1. Detailed analysis covers 10 servers and is exploratory — it confirms problem patterns, not population-level statistics. Full population-level verdict distribution will be published separately.
  2. Inspection methodology details are not disclosed, in order to avoid enabling inspection evasion.
  3. Individual server information is anonymized; only aggregated data is reported.
  4. This report is a point-in-time snapshot (March 26–27, 2026) and does not track changes over time.
  5. The 10-server sample was selected to demonstrate inspection capabilities and is not a stratified or random sample.

Conclusion

MCP server risk cannot be judged from availability or popularity alone. Through a full 3,601-server inspection and a detailed 10-server analysis, we have demonstrated that automated inspection reveals problems that are otherwise invisible to manual review.

In particular, provenance concerns and declaration-behavior mismatches cannot be detected by reviewing a README or checking download counts. Given that the MCP ecosystem grants AI agents broad operational permissions, pre-deployment inspection and periodic re-inspection are not recommended practices — they are essential ones.

Future work includes publishing the full 3,601-server verdict distribution, tracking changes over time through longitudinal analysis, and expanding the detailed analysis to a larger sample.


MCP Guard evaluates MCP server risk through automated multi-perspective inspection.