Beyond prompt injection: data exfiltration risks in enterprise AI agents

Prompt injection is the entry point. The interesting question is what the agent does next. Four exfiltration patterns appear repeatedly in real enterprise AI agent deployments — each one has an architectural remediation, not a prompt-level one.

AdminFounder & Engineering Lead · May 19, 2026 · 6 min read

Prompt injection has become the headline LLM risk because it’s the easiest to demonstrate. But in an enterprise AI Agent deployment, prompt injection is the entry point — not the consequence. The interesting question is what the Agent does next, after an adversary has steered it. Four data-exfiltration patterns appear repeatedly in real enterprise AI Agent deployments. Each one has architectural — not just prompt-level — remediations.

The agent attack surface

An AI Agent isn’t a chatbot with extra steps. It’s a process that holds credentials, calls tools, and writes to systems. Every tool the agent can invoke is a potential exfiltration channel. Every system it reads from is a potential source. Every output it generates is a potential covert message. Threat-modeling agents means looking at the full graph of what the agent can do — not just what users ask it to do.

Pattern 1 — Markdown URL exfiltration

A common pattern: the agent reads sensitive data from one system, then produces a response containing a markdown link with that data encoded in the URL. If the rendering surface fetches images or links automatically (most chat UIs do), an attacker’s domain receives the data the moment the response is rendered. Variants: image tags, link previews, OG-image fetches, click-tracking pixels.

Remediation: strip or sandbox all outbound URLs in agent responses before rendering. Disable automatic media fetches in the rendering surface. Add an allowlist of domains the agent is permitted to reference.

Pattern 2 — Tool-call abuse

The agent has tools. One of them is send_email or post_to_slack or call_webhook. An attacker who lands a prompt injection asks the agent to compile a summary of customer X’s contract and email it to an external address. The agent does what it was told.

Remediation: every tool that emits data to a destination needs an allowlist. send_email restricted to verified internal domains. post_to_slack limited to channels the requesting user already has access to. Sensitive tools require human approval. The agent’s audit log records every parameter of every call.

Pattern 3 — Vector-store poisoning

Agents that read from a shared knowledge base inherit whatever was indexed. An adversary with write access to any source — a wiki, a ticketing system, a public forum the agent crawls — can plant content designed to alter the agent’s behavior for future users. Unlike prompt injection, this attack persists.

Remediation: treat write surfaces feeding the vector store as security-relevant. Require explicit approval before indexing content from low-trust sources. Scan the vector store periodically for adversarial-pattern content. Version the index so you can roll back when a poisoning incident surfaces.

Pattern 4 — Confused deputy

The classic privilege escalation: a user without permission to read system X asks an agent (which does have that permission) to perform an action requiring a read against X. The agent executes on its own authority and returns the result to the user. From the system’s perspective, the agent was authorized. From the security model’s perspective, the user has just escalated.

Remediation: agents should never operate with privileges greater than the requesting user. Permission scoping has to be enforced at the agent-invocation layer, not inside the prompt. Pass the user’s identity through to every backend call. Use existing IAM primitives — group membership, OAuth scopes, RBAC — rather than building agent-level access control from scratch.

A control architecture, not a control list

The four remediations share a structure: don’t trust the agent’s output, don’t trust the agent’s tool invocations, don’t trust the agent’s authority. Build a control layer between the agent and the world, and put your existing security primitives — identity, audit, allowlists, human review — in that layer. The agent becomes a powerful but constrained capability, not an authority unto itself.

Designing this control layer is one of the higher-leverage things IDS AI Solutions does in an AI Audit. The full Agent governance checklist + reference architecture is part of the deliverable. Talk to our team.

Frequently asked questions

What’s the single highest-leverage agent control?

Permission scoping at invocation time — the agent operates as the requesting user, never as a super-account. This single control kills the confused-deputy pattern entirely and dramatically narrows the blast radius of every other attack pattern. Build it first.

How do we audit agent behavior after deployment?

Log every tool invocation with: requesting user, tool name, full parameters, timestamp, and the trace of prompts that led to the call. Route logs to your existing SIEM. Build a small set of dashboards: tool calls per user per hour, percentage of calls hitting non-allowlisted destinations, percentage of calls that triggered human-approval queues. Anomalies become obvious.

Should agents be allowed to call external (internet) services?

Only via an allowlist gateway. Default-deny outbound network access for the agent process. Whitelist specific hostnames the agent legitimately needs. Block unknown URLs in agent responses before they reach the user’s rendering surface to defeat markdown-URL exfiltration.