03: When AI Agents Pull Themselves Into Scope

Written by Sarah Clarke | Jun 17, 2026 2:00:02 PM

*This is post 3 of 6 in Avertium's "The Trust Problem in Enterprise Security" blog series

By Sarah Clarke, Consultant - QSA and AI Architect

In February, Microsoft pushed a worldwide configuration update for Microsoft 365 Copilot after finding that the chat feature could summarize emails labeled confidential. The labels existed and the DLP policies were in place. Microsoft's statement said access controls and data protection policies remained intact. The behavior was nonetheless "not what we intended." A separate January incident, tracked internally as CW1226324, involved Copilot's work tab returning content that DLP rules had been configured to block.

Both incidents have the same shape. The labels were the scope boundary an organization had drawn, and the agent didn't honor them.

Vulnerability scanners can't catch this failure mode, and scoping diagrams don't capture it either, because the agent isn't a system, it's an identity that operates on top of every other system it touches. From the assessor side of the table, this is where the scope question gets interesting.

what "connecting" an ai agent actually means

When a user enables Copilot in their tenant, or when a team wires an AI agent into Slack, Salesforce, or a code repository, the act looks like a checkbox or an OAuth consent screen. Underneath, the system issues a credential ( a service principal, an OAuth token, an API key) that grants the agent some scope of access on behalf of a user or service. That credential is almost always issued by the IdP. The previous post named the IdP as the attack surface; everything in that post about IdP compromise applies here too.

The agent then operates with that access continuously, at search speed, across every file, message, record, and table the credential reaches.

That last detail is the one most permission models were not designed for. Human users access data one file at a time, with context and intent. Agents work differently: in bulk, recombined, retrieved by similarity, summarized into outputs that no policy on the source data anticipated.

The January and February Copilot incidents illustrate this. The label said "confidential" and the policy said to exclude that content from automated processing. The agent's effective access didn't honor the boundary, because it was reading the data through a credential issued for a different purpose, and the label was metadata the retrieval pipeline didn't consistently enforce.

scope creep at the speed of oauth

Every connector, data source, and tool you give an agent is a scope decision. Almost nobody is treating it that way.

Under PCI DSS 4.0, anything that stores, processes, or transmits cardholder data is in scope. That includes AI agents. If a Copilot tenant can read into a SharePoint library containing cardholder data, the Copilot tenant is in scope. A custom-built agent with retrieval permissions over a database holding tokenized card data pulls the retrieval pipeline in too. The model provider, the orchestration layer, and any logging that captures prompts with scoped data all land in the assessment.

HIPAA reaches similar conclusions through different language. An agent that reads ePHI is processing ePHI by HIPAA's definition. The infrastructure running it inherits the same status. And depending on the deployment, the model vendor may be a business associate. This doesn't require the agent to be doing anything exotic, just giving it access to data that's already regulated.

A word of caution: The "I just enabled it for productivity" defense doesn't survive an audit. The scope is whatever data the credential can reach, regardless of what the user thought they were enabling.

what an assessor will actually ask

Here's a list of questions I find myself asking in engagements right now. Most environments can't answer them confidently:

Which AI agents exist in this environment, and who owns each one?
For each agent, what credential does it hold, and what data sources does that credential reach?
Of the data sources the agent can reach, which contain regulated data, and how was that determination made?
When the agent processes regulated data, where do its outputs go, and are those outputs themselves treated as containing regulated data?
Is there a log granular enough to reconstruct which agent accessed which records, at what time, on whose behalf, and with what prompt?
When access was granted, what was the documented business justification, and who approved it?

The last one matters more than it looks. PCI DSS 4.0 has steadily tightened on documented access justification and approval, and the same principle is showing up in updated HIPAA proposals and in mature SOC 2 reports. An assessor's job is to verify a documented justification for every access grant. If nobody can produce one, that's the finding, even if the agent never did anything wrong.

what compensating architecture looks like

The architecture problem here is that agents don't fit neatly into either the "user" or "system" buckets that most access models assume. They behave like users when reading and recombining data, but they hold the kind of continuous, scaled access that's usually granted only to systems. A few patterns are worth building toward:

Treat the agent as a third-party processor, not as a user. The user-like interface invites teams to grant user-like permissions, but the behavior is closer to a data processor running at machine speed. Permissions should be scoped to the processing job, with explicit data classifications attached.
Give each agent its own identity rather than letting it impersonate users. When an agent acts "on behalf of" a user, the audit trail collapses. An agent with its own identity and scoped permissions is much easier to govern and to retire.
Constrain retrieval, not just outputs. Output DLP catches problems after the data has already been read. The retrieval side — what the agent can index, search, and pull into context — is the actual scope boundary.
Verify that label and policy enforcement actually applies to agents. The Copilot incidents are useful as red-team material: assume the agent will sometimes read content the labels said it shouldn't, and design detection that would notice.
Document the data-access justification at agent provisioning time. This is the unglamorous control that audits care about most. A short artifact — what the agent does, what it reads, what it produces, and who approved it — solves a category of question that's otherwise unanswerable.

Most of this is the same identity, classification, and audit discipline that mature programs apply to service accounts and integration users. The AI layer is just where that discipline is currently weakest, with the volume of new identities growing faster than anywhere else.

an exercise for you: What to do this week

Here’s an AI-agent scope exercise. No tooling required.

1. Build a list of evert AI-agent or AI-enabled feature that has access to organizational data:

Microsoft 365 Copilot,
GitHub Copilot,
Salesforce Einstein,
Custom agents built on Copilot Studio or vendor platforms,
Internal LLM apps with tool access,
Third-party assistants connected via OAuth

2. Draw four columns

3. For each AI agent, write four things:

The identity it authenticates as
The data sources its credential reaches
Whether any of those sources contain regulated data
The date and approver of the access grant

Most organizations will discover that the fourth column is empty for almost every entry. The agents got enabled but nobody wrote down why, and the scope of regulated data they touch grew without anyone making a deliberate decision about it.

That's the artifact an assessor will eventually ask for. Building it before someone asks is much easier than building it under audit conditions.

Stay tuned for our fourth post in the series, ”The Vendor Concentration Risk No One Models,” which will publish on June 24, 2026 at 10:00 am EST.

View full post