*This is post 4 of 6 in Avertium's "The Trust Problem in Enterprise Security" blog series
By Sarah Clarke, Consultant - QSA and AI Architect
On October 20, 2025, AWS US-EAST-1 went down for roughly fifteen hours. The root cause was a DNS resolution failure in the DynamoDB service endpoint, which cascaded into IAM, EC2, Network Load Balancer, and dozens of other services. Netflix, Snapchat, and thousands of e-commerce sites went dark. Downdetector logged over seventeen million outage reports.
This wasn't only an availability event. For every organization whose authentication, data storage, fraud detection, and security telemetry all happen to run on the same hyperscaler, the outage exposed something a procurement matrix doesn't usually capture: The same vendor was supporting a dozen distinct security and compliance dependencies, and when that vendor went down, all of them went with it.
From the assessor side of the table, this is the concentration question I find security and compliance teams least prepared to answer. They can usually tell me how many third-party service providers they use, but they struggle when I ask which of those providers actually carries the weight of the scope.
Most vendor risk programs treat concentration as a procurement metric: How much of our spend goes to one vendor, how many alternatives exist in the market, how locked in we are to a particular technology stack. Those are real questions, just not the security ones.
The security and compliance question is different: For a single critical vendor, how many distinct trust-bearing roles do they hold in your environment? When your cloud provider runs your application hosting, your database, your message queue, your CI/CD, your container registry, your identity broker for service-to-service auth, your secrets manager, and your monitoring telemetry — that's one vendor holding eight roles. Each role represents an attack surface, a failure mode, and often a separate compliance dependency.
When that vendor has an incident, you experience eight failures simultaneously, across systems your scope diagram treats as separate.
PCI DSS 4.0 has two requirements that directly govern third-party service providers (TPSP), and most organizations interpret them more narrowly than the standard intends.
Requirement 12.8 says you must:
Requirement 12.9 places matching obligations on the TPSP side: they must acknowledge in writing their responsibility for cardholder data they store, process, or transmit.
Reading 12.8 strictly is where the gap shows up: It requires documenting which services each TPSP provides. A hyperscaler may be your IaaS provider, your IdP host, your DDoS protection, your DLP backend, your SIEM platform, and your AI service vendor; but the 12.8 list usually shows them once, even though the actual concentration is much higher.
HIPAA's analog is business associate (BA) concentration. When your business associate agreement (BAA) covers a hyperscaler that hosts your electronic health record (her), your patient communication app, your billing system, and your AI summarization service, you have a single BA holding four distinct ePHI-touching roles. That consolidation amplifies the compliance obligation: A single incident triggers multiple breach assessments simultaneously.
The AI angle to this is recent enough that most concentration analyses don't include it yet. It also compounds the AI agent scope problem covered in the previous post: Each agent pulls scope on its own, and the model vendors running them are clustering on the same handful of hyperscalers.
In most enterprises, AI capabilities flow through two or three model vendors. The diversity looks real on a procurement chart, but it often disappears at the infrastructure layer. OpenAI runs on Azure, Anthropic on AWS and GCP, Google's models on GCP. When an organization signs contracts with two model vendors while routing the inference through the same hyperscaler, the diversity exists only on the procurement chart.
There's a related issue at the model-vendor layer itself. Most AI applications inside an enterprise call out to one or two model endpoints. A model vendor outage, a policy change, or a price shock cascades into every product that depends on it. The blast radius scales with how broadly the model is adopted internally, and few organizations track this the way they would track an outage on a critical SaaS application.
For an AI architect designing for resilience, the practical question is whether the abstraction layer between your applications and your models is real or theatrical. If swapping vendors requires you to rewrite prompts, retrain evaluators, and reconfigure tool definitions across every product, the abstraction wasn't real; it was a single-vendor dependency with a fallback that fails under pressure.
Board-level conversations about vendor risk tend to live in two registers, financial exposure and contractual diversification, and neither captures what I'm describing. A useful third register is concentration of scope.
A simple version of the metric is for each critical service in your environment, list the vendor that provides it. For each vendor, count the distinct critical services they support. The vendor with the highest count is your concentration risk; the number itself is your headline figure.
A more rigorous version adds two dimensions. First, weigh each service by the scope it sits in (CDE, ePHI environment, sensitive-data zones). A vendor running ten dev-tooling services is a different risk profile than a vendor running five CDE-adjacent services. Second, model blast radius against if the vendor fails for a defined duration, how many of your compliance obligations are temporarily unmet, and what's your time-to-recover for each?
The output is a single artifact you can take into a board meeting: What percentage of regulated data flows depend on a single vendor, how much of your scope depends on one vendor's continued operation, and which three vendors have the highest cross-scope concentration. That's a conversation a board can act on.
The architecture and contractual answers run in parallel, because the problem has both dimensions.
A few patterns worth considering:
A vendor concentration exercise. No tooling required.
1. Build a matrix: Down one axis, list the critical services in your environment that:2. Across the top, list every vendor that supports any of those services
3. Fill in the cells where a vendor provides a service
Now look at the columns. The vendor with the most filled-in cells is your concentration link.
4. Beside their column, write three things:
For most organizations, the headline number is uncomfortable, the recourse documentation is sparse, and the failover hasn't been tested in years. That's the artifact a board should be seeing once a quarter, and the artifact an assessor will eventually ask about under PCI DSS 4.0's tightened third-party provisions.
Building it now is much cheaper than building it during the next outage.
Stay tuned for our fifth post in the series, ”Where the Frameworks Fall Short,” which will publish on July 1, 2026 at 10:00 am EST.