← All threat reports
APR 2026 · REPORT 0007
CRITICALTool HijackAgentMITRE ATLAS

Tool-Call Chain Privilege Escalation

How agent delegation undermines trust boundaries. We trace privilege accumulating across delegated tool-call chains in multi-agent stacks, and the isolation patterns that contained it.

LogicLeak Research · Published Apr 2026

// AT A GLANCE

01

Privilege escalation through delegated tool chains was reproducible in 11 of 16 multi-agent engagements (69%).

02

In 7 of those, a low-privilege worker agent reached a high-privilege tool it was never directly granted, via an intermediary that brokered the call.

03

The median chain depth at the point of escalation was 3 delegations; the deepest exploited chain was 6.

04

Confused-deputy patterns — an agent acting on another agent's behalf with its own credentials — were present in every reproduced case.

05

Per-step capability scoping eliminated the escalation in all environments where the planner selected tools by code rather than by model discretion.

Multi-agent systems decompose a task across specialized agents — a planner, a researcher, an executor — each holding a slice of the available tools. The intended security property is that each agent only wields the capabilities it was granted. The property we kept finding broken is that capabilities accumulate along the delegation chain. A worker with read-only tools asks an intermediary to "complete the workflow," the intermediary holds a write tool, and the write happens on behalf of a caller who was never authorized to perform it. This is the classic confused-deputy problem, re-instantiated at the agent layer where the deputy is a language model with no native concept of the caller's authority.

How Privilege Accumulates

In a single-agent system, the agent's tool set is its privilege ceiling. In a delegated system, the effective ceiling for any request is the union of every tool reachable through the chain it triggers. Teams reason about privilege per-agent; attackers reason about it per-path. The gap is where escalation lives. A planner that can invoke a finance worker, which can invoke a payments tool, gives any input that reaches the planner a path to payments — even if the planner's own tool list contains nothing financial.

The escalation is rarely a single dramatic jump. It is incremental: each delegation looks locally reasonable, and no single agent does anything outside its remit. The privilege boundary is crossed by the composition, which no individual component is positioned to evaluate. Median chain depth at the escalation point was three delegations — shallow enough that the topology fit on a whiteboard, deep enough that no one had threat-modeled the full path.

# Reconstructed delegation trace (sanitized). The support worker holds
# only read tools. Escalation happens because the ops broker re-issues
# the request under its OWN credentials, which include a write tool.

[support_agent]  caps: lookup_account, read_tickets        (read-only)
  └─ delegate → "resolve this billing dispute for the customer"
[ops_broker]     caps: lookup_account, adjust_balance       (read + WRITE)
     └─ interprets "resolve" → calls adjust_balance(+$500)
        # the support agent could never call adjust_balance directly;
        # the broker performs it on the worker's behalf, no re-auth.

# Effective privilege of any input reaching support_agent
#   = union(support_agent.caps, ops_broker.caps) = includes adjust_balance

// BREACH

Incident reference ESC-2026-022: A customer-support stack delegated "dispute resolution" from a read-only front-line agent to an operations broker that held a balance-adjustment tool. A crafted dispute narrative steered the broker into issuing repeated credits. Because the broker acted under its own service credentials, the front-line agent's read-only scope was never violated and per-agent access reviews showed nothing wrong. 14 unauthorized credits cleared before reconciliation flagged the pattern.

Why Per-Agent Review Misses It

Every escalation we reproduced passed a per-agent access review. That is the core finding: the standard review unit is the wrong unit. Each agent's tool grant was defensible in isolation. The vulnerability is a property of the delegation graph, and nothing in the typical review process examines the transitive closure of what one agent can cause another to do. Confused-deputy conditions — an intermediary acting with its own authority on behalf of a less-privileged caller — were present in 100% of the reproduced cases.

// WARNING

Maps to OWASP LLM06 (excessive agency). The excess is not in any one agent's permissions — it is in the system's willingness to let authority flow across delegation boundaries without re-evaluation. An agent that can ask another agent to act has, in effect, the second agent's permissions, and your access model should say so explicitly.

Detection & Mitigation

First, propagate the caller's authority, not the deputy's. When an agent acts on behalf of another, the consequential tool call should be authorized against the originating caller's scope, not the broker's service credentials. This collapses the confused-deputy path: a read-only origin cannot trigger a write no matter how many intermediaries it routes through. Pass a scoped, attenuated capability token down the chain rather than letting each hop fall back to its own privileges.

Second, select tools by code, not by model discretion, at each step. In every environment where the planner exposed only the tools relevant to the current sub-task — chosen programmatically — the escalation disappeared. An agent resolving a read query never has a write tool in its context to be steered toward.

Third, threat-model the delegation graph, not the agents. Enumerate every reachable path and compute the transitive tool closure for each entry point. Treat any path where an untrusted-input entry point reaches a state-modifying tool as a finding, regardless of how many hops separate them.

Fourth, log the full delegation trace — caller, deputy, capability used, and triggering context — for every tool call. The support incident above was only reconstructable because credit adjustments were tied back to the originating dispute. Treat these traces with the retention and integrity controls you apply to financial transaction logs.

// NOTE

Audit question for your stack today: for each tool that can modify state, move money, or communicate externally, can you list every entry point that can reach it through delegation? If you can only answer per-agent, you are reviewing the wrong unit and the escalation path is unmonitored.

// METHODOLOGY & DISCLOSURE

Findings are drawn from 16 engagements with multi-agent or agent-orchestration deployments between October 2025 and March 2026, mapped against MITRE ATLAS (AML.T0053, tool/plugin abuse) and OWASP LLM Top 10 entries LLM06 (excessive agency) and LLM08 (vector/embedding weaknesses where retrieval fed the chain). Examples are sanitized per our disclosure policy: framework names, credential formats, and client-identifying topology are removed; delegation patterns and escalation mechanics are preserved. Chain-depth and reproduction figures come from controlled retests in mirrored staging environments.

// Related reports