Popularity Is Not a Proxy for Lower Risk: What a 10-Server MCP Sample Showed
Across a 10-server MCP sample, download counts and inspection verdicts did not move together. Popularity was useful as a distribution signal, but not as evidence of lower observed risk.
Terminology
| Term | Meaning |
|---|---|
| Download count | Weekly npm download volume; a distribution metric |
| Inspection verdict | PASS / WARN / BLOCK based on inspection evidence |
| Risk proxy | An indirect signal used instead of direct security evidence |
| Provenance | Whether the publisher and distribution path can be verified |
Lead
In real deployment decisions, teams often treat download count or GitHub stars as a shortcut for trust. The logic is familiar: if many people use a package, someone would have noticed serious problems by now.
This article examines a smaller but more useful question: when we looked at 10 MCP servers in detail, did popularity actually line up with inspection verdicts? In this sample, it did not. The important implication is not that popularity is meaningless, but that it cannot substitute for direct evidence about risk.
Key Findings
- The 10-server sample ranged from 24 to 309,704 weekly downloads, a gap of about 12,900x.
- No consistent relationship was observed between download count and inspection verdict.
- A server with 300,000+ weekly downloads still received a WARN verdict.
- Some low-download servers received PASS verdicts.
- Popularity may expand distribution, but it can also increase attacker incentive.
Dataset
| Item | Value |
|---|---|
| Sample size | 10 MCP servers |
| Observation window | 2026-03-26 to 2026-03-27 |
| Metrics compared | Weekly downloads and final inspection verdict |
| Scope | Exploratory sample, not a formal correlation study |
What We Actually Observed
The sample showed wide variation in popularity:
| Metric | Value |
|---|---|
| Highest weekly downloads | 309,704 |
| Lowest weekly downloads | 24 |
| Gap | ~12,900x |
The corresponding inspection verdicts were:
| Verdict | Count | Ratio |
|---|---|---|
| PASS | 3 | 30% |
| WARN | 6 | 60% |
| BLOCK | 1 | 10% |
The practical point is simple. High downloads did not map cleanly to lower-risk verdicts, and low downloads did not automatically map to higher-risk ones.
Observed examples:
- A high-download server still landed at WARN.
- Some low-download servers landed at PASS.
- The ranking by downloads did not line up with the ranking by inspection outcome.
That is enough to reject the operational shortcut that "popular" can stand in for "already vetted."
Why Popularity Fails as a Risk Proxy
1. Downloads measure reach, not review quality
A download count answers "how often was this package installed?" It does not answer:
- Was the package security-reviewed?
- Was its runtime behavior verified?
- Was the publisher identity checked?
- Did users inspect the dangerous parts or just install it in CI and move on?
Distribution and review are different things.
2. MCP servers widen the consequence of trust mistakes
For ordinary libraries, compromise usually lives inside an application's existing permission boundary. MCP servers are different because they expose tools and runtime operations to AI agents. That means a trust mistake can translate directly into file access, outbound requests, or command execution.
So even if popularity did offer weak reassurance in a traditional package ecosystem, it is an even weaker proxy in the MCP context.
3. Popularity can increase attacker interest
Widely used packages are attractive because compromise scales efficiently. A takeover of a high-distribution package can deliver more impact per unit of effort than compromising a niche tool.
From that angle, popularity is not only unhelpful as a safety label. In some cases, it is part of the attacker's targeting logic.
What To Evaluate Instead
If download count is not enough, what should teams actually look at?
- Provenance: Is the publisher identity verifiable, and does the distribution path make sense?
- Declaration vs. behavior: Does the server do what it claims to do?
- Historical inspection evidence: Has the server repeatedly passed or required review?
- Dependency health: Are there known issues in the libraries it pulls in?
- Capability profile: Does it combine execution, outbound transmission, or filesystem access in risky ways?
These are not popularity signals. They are direct inputs into deployment risk.
What This Article Does Not Claim
This article does not prove that popularity and risk are never related in any larger population. The sample is too small for that.
It supports a narrower and more practical conclusion:
- In this 10-server sample, popularity was not a reliable guide to inspection outcome.
- Therefore, popularity should not be used as a deployment shortcut.
That is a strong enough operational conclusion on its own.
Limitations
- The sample size is 10, so this is not a formal statistical correlation study.
- Only weekly download count was compared; other popularity signals such as stars or forks were not analyzed here.
- The sample is exploratory and not designed as a representative census of the full MCP ecosystem.
- Individual server names are omitted because the purpose is to explain the decision model, not to single out packages.
Conclusion
Across this 10-server MCP sample, download count did not function as a reliable proxy for lower observed risk. Popularity explained distribution, not inspection outcome.
For real deployment decisions, teams should stop asking "How popular is it?" as the first security question and instead ask "What evidence do we actually have?" Provenance, observed behavior, historical inspection results, dependency health, and capability concentration are the signals that matter.
MCP Guard evaluates MCP servers from direct inspection evidence, not popularity metrics.
