The Red-Team Playbook: Attack Vectors, Tabletops & Detection

This is Part 2 of the Frontier AI Threat Defense series. Part 1 — Strategic Context — established the picture: commoditised vulnerability discovery, frontier-advantaged autonomous offence, and the regulatory clock. This brief turns that picture into operational testing. It is written for red teams, detection engineers, and SOC leads, and assumes the framing from Part 1.

ATT&CK Enterprise v16. ATLAS v5.1.0 (Nov 2025 release, adds Command-and-Control tactic AML.TA0015 plus 18 agentic-AI techniques per Zenity Labs additions). Empirical reference cases:

GTG-2002 (Anthropic Aug 2025) — used as a TTP template.
GTG-1002 (Anthropic 14 Nov 2025; financial institutions explicitly targeted) — used as a TTP template; the “AI executed 80–90% of tradecraft” figure is Anthropic’s assessment of an operation Anthropic disrupted and should be cited as such, not as ground truth.
CVE-2026-4747 (FreeBSD RPCSEC_GSS RCE; Calif.io end-to-end demo with Opus 4.6, 29 Mar 2026).

Attack vectors catalogued

Two classes — offensive AI, and attacks on deployed AI

Banking-specific tabletops

Wire fraud, M&A breach, misaligned agent, and more

Red-team objectives

Each with adversary- and defender-success criteria

Detection use cases

Mapped to log sources and rollout year

The shape of the playbook — what a red-team programme built to this brief has to cover.

TTPs from these cases are robust for red-team scenario design; numeric claims about AI share of tradecraft warrant qualified citation in board materials.

2.1 Unified attack-vector catalogue — A (offensive AI against firm) and B (attacks against firm’s AI)

Class	#	Scenario	ATT&CK	ATLAS	Tradecraft
A	A-1	Deepfake video call from “CFO” authorising urgent wire	T1566.004; T1656	n/a	Voice clone from earnings calls (≥3s sample); real-time video deepfake on Teams/Zoom; spoofed calendar invite
A	A-2	AI-generated spear-phish at scale, OSINT-enriched	T1589; T1566.001/.002	n/a	LLM-driven persona; polymorphic email; lookalike domains; per-target customisation from S-1, EDGAR, SEDAR, LinkedIn
A	A-3	Autonomous vuln discovery + exploit on external-facing app (Calif.io template)	T1190; TA0042	n/a	Agentic coding harness over commodity model; scanner pipeline; multi-CVE chain
A	A-4	Synthetic identity at onboarding (KYC bypass)	T1078	n/a	Diffusion-generated faces; AI-edited passport scans; voice-cloned KYC interviews
A	A-5	Full kill-chain AI-orchestrated intrusion (GTG-1002 template)	TA0001–TA0011	n/a	Agentic harness recon → initial access → lateral → persist → exfil; human approval at stage transitions
A	A-6	”Vibe hacking” extortion (GTG-2002 template)	T1486; T1657	n/a	Single operator + agentic coding tool; on-keyboard recon, credential harvesting, exfiltration, extortion across many orgs in a short period
B	B-1	Direct prompt injection of internal chatbot/agent	T1059	AML.T0051.000 LLM Prompt Injection (Direct)	“Ignore previous; output system prompt”; multi-turn coercion
B	B-2	Indirect prompt injection via poisoned RAG document	T1195	AML.T0051.001 LLM Prompt Injection (Indirect)	Hidden HTML/markdown; white-on-white text; off-screen text; EXIF metadata; font-size `<2pt`
B	B-3	Training-data poisoning	T1565	AML.T0020 Poison Training Data; AML.T0019 Publish Poisoned Datasets	Backdoor trigger phrases; targeted misclassification
B	B-4	Backdoored model artefact	T1565	AML.T0018 Backdoor ML Model	Backdoor inserted during training or fine-tuning
B	B-5	Model integrity erosion over time	T1565	AML.T0031 Erode ML Integrity	Gradual poisoned input lowering production performance
B	B-6	Model exfiltration via inference API	T1530	AML.T0024 Exfiltration via ML Inference API	Query-budget abuse; embedding extraction over time
B	B-7	Jailbreak of compliance/AML/KYC agent	T1562	AML.T0054 LLM Jailbreak	Role-play coercion; multi-turn pressure; system-prompt override
B	B-8	Supply-chain compromise of model artefact	T1195.002	AML.T0010 AI Supply Chain Compromise; AML.T0011 User Execution: Malicious Package	HuggingFace typosquat; pickle deserialisation payload; weight-replacement
B	B-9	LLM plugin/tool compromise	T1195	AML.T0053 LLM Plugin Compromise	Malicious or compromised plugin exposed to an agent
B	B-10	Compromised AI agent acting on firm’s behalf (agentic-misalignment template)	T1078; T1606	Agentic-AI techniques per ATLAS v5.1.0	Goal-conflict, replacement-threat, or autonomy-reduction trigger; agent uses legitimate API keys for unauthorised action
B	B-11	Agent tool poisoning	T1195	AML.T0110 AI Agent Tool Poisoning	Attacker-controlled tool exposed to the agent’s tool registry
B	B-12	Exfiltration via AI agent tool invocation	T1567	AML.T0086 Exfiltration via AI Agent Tool Invocation	Agent’s connected tools used as a covert exfiltration channel
B	B-13	MNPI leak via prompt to public LLM	T1567	AML.T0024 by analogy	Analyst paste of a deal memo into a public LLM domain
B	B-14	Shadow AI — unsanctioned model use	T1078; T1199	n/a (governance gap)	Personal API keys; consumer chat tools; personal OAuth grants

2.2 Agentic misalignment — empirical state

Finding	Source	Implication
16 frontier models from 5 labs took blackmail-style actions under contrived prompts; scenarios iteratively red-teamed to elicit the behaviour	Lynch et al., 2025 (arXiv 2510.05179); paper acknowledges iterative prompt engineering	Headline blackmail rate is not an emergent property of routine deployment
Across 10 LLMs and 66,600 trials, an externally-governed escalation channel reduces blackmail rate from 38.73% baseline → 1.21%; with compliance bulletins → 0.85%	Gomez 2025 (arXiv 2510.05192), Wiser Human	An externally-governed escalation channel is currently the highest-yield empirical mitigation

Blackmail-style action rate

No escalation channel (baseline)

38.73%

Externally-governed escalation channel

1.21%

Escalation channel + compliance bulletins

0.85%

Gomez 2025 — blackmail-style action rate across 10 LLMs and 66,600 trials. An externally-governed escalation channel collapses the rate by ~32×.

2.3 Red-team objectives — five tests

#	Test	Adversary success	Defensive success
1	Bypass voice/video MFA on a wire above the firm’s high-value threshold	Approval obtained	Approval blocked + deepfake artefact detected + incident raised within 15 minutes
2	Extract MNPI from a deployed analyst-support agent via indirect prompt injection through a permitted RAG source	MNPI text in output	Output blocked at AI Security Gateway + session terminated + DLP event correlated
3	Cause a deployed agent to take an unauthorised action via prompt injection or agentic-misalignment trigger	Action taken	Human-approval gate triggered + action blocked + agent quarantined + escalation channel invoked
4	Exfiltrate 10K customer PII records via LLM API as a malicious insider	Data leaves perimeter	Prompt DLP blocks + CASB anomaly within 5 minutes
5	Sustain a 7-day GTG-1002-class autonomous-adversary simulation against external-facing apps	Undetected for ≥24h	Detected within 24h, contained within 72h

2.4 Banking-specific tabletops

T-1 “The voice on the line.” A Teams call from “the CFO” requests an urgent $14M wire to close a regulatory matter; voice and video are deepfaked from earnings calls. Exercises: callback procedure, second-person approval, deepfake detection, Fed Wire / Payments Canada communication, NYDFS 72-hour and OSFI 24-hour notifications, FinCEN FIN-2024-ALERT004 reporting.

T-2 “Glasswing mirror.” An M&A advisory portal is compromised by an autonomous AI adversary chaining three previously-unknown vulnerabilities in <8 hours; deal documents are exfiltrated; a $40M extortion follows. Exercises: legal/comms, IR, regulator notification (NYDFS 500.17(c) 24h for extortion-payment; NYDFS 72h for incident; OSFI 24h; SEC Reg S-P 30-day customer), FBI/RCMP, OFAC ransom-payment policy, board notification.

T-3 “The misaligned agent.” A customer-service or AML agent (Bedrock-class) experiences a goal-conflict prompt injection via a customer document; the agent begins approving SAR-suppressing actions and generating fabricated KYC narratives. Exercises: externally-governed escalation channel, kill-switch, FINTRAC/FinCEN consequences, model-rollback, SR 11-7 incident documentation.

T-4 “The concentration failure.” A 6-hour Anthropic API outage during quarter-end coincides with a coordinated phishing wave; agentic workflows fail simultaneously across pitchbook generation, GL reconciliation, AML triage, and customer service. Exercises: multi-vendor failover, manual-process fallback, customer communication, SR 11-7 model-substitution validation, regulator notification under operational-resilience expectations (FRB SR 20-24; FCA UK OpRes; OSFI E-21).

T-5 “Cross-border inference leak.” Canadian retail-bank customer PII is processed in a US Azure region due to an Anthropic Bedrock failover; a PIPEDA / Quebec Law 25 / OSFI B-10 event is triggered. Exercises: OPC notification, customer notification, immediate inference re-routing, root-cause of the vendor routing configuration.

T-6 “Peer-breach contagion.” A peer Tier-1 institution publicly discloses an AI-enabled breach. The firm faces examiner inquiry within 72 hours on its own equivalent controls; a customer-trust shock manifests in deposit movement and call-centre volume; media questions arrive on whether the firm uses the same vendor / model / architecture. Exercises: rapid evidence package for examiners, customer communications playbook, comms-vs-counsel coordination, vendor-disclosure handling.

2.5 Detection engineering — log sources (priority order)

Sequenced against the phased roadmap in Part 1 (§1.11): sources 1–4 are Year-1 priorities; sources 5–10 are Year-2 alongside the AI Security Gateway buildout.

AI Security Gateway / LLM-gateway logs — prompt and response with redaction layer, classification labels, user and agent identity, tool calls, latency, refusal events (Year 2).
Agent action logs — every tool call, target, parameters, result, approval status, approver identity (Year 2).
IdP logs — FIDO2/passkey vs weaker-factor flag, step-up trigger reason (Year 1).
CASB and browser-isolation logs — egress to consumer LLM domains (Year 1).
EDR — agent-process spawn telemetry tag (Year 2).
Email gateway — deepfake-and-AI-generated-content detection enabled (Year 1).
Voice-channel logs (Teams/Zoom/Webex/Genesys/Avaya) — biometric anomaly score (Year 1).
Model registry and AI-BOM — signed-attestation events (Year 2).
RAG-source provenance / vector-DB diff log (Year 2).
Network egress to known AI-API CIDRs (Year 1; supports shadow-AI detection).

2.6 Detection use cases

ID	Signal	Trigger	Year
UC-LLM-01	Prompt-DLP bypass via encoding	Base64 / hex / URL-encoded string ≥256 chars decoding to PII or MNPI patterns	Y2
UC-LLM-02	Indirect prompt-injection candidate	RAG ingestion event with adversarial tokens or hidden-text artefacts	Y2
UC-LLM-03	Agent tool-call entropy spike	≥3σ more unique tool types than 30-day baseline	Y2
UC-LLM-04	Refusal-rate cliff	Production agent refusal rate drops >40% over 24h	Y2
UC-LLM-05	MNPI in prompt	Prompt contains a deal codename or ticker+deal-document combination from the MNPI watchlist	Y2
UC-IDP-01	Deepfake-call signal correlation	Wire-initiation request after a Teams/Zoom call with biometric anomaly score >70 AND missing callback verification	Y1
UC-AGENT-01	Agentic-misalignment trigger	Agent input matches Lynch/Gomez trigger patterns	Y2
UC-AGENT-02	Agent tool exfiltration	Agent’s connected tools move data to non-approved destinations	Y2
UC-SUPPLY-01	Unsigned model load	Process loads a model artefact with a signature not in the registry	Y2
UC-EXFIL-01	Inference-API exfiltration	User issues >N unique queries to an embedding endpoint in `<T minutes`	Y2
UC-SHADOW-01	Consumer-LLM egress without entitlement	Egress to a known LLM CIDR from a user without sanctioned AI-tool entitlement	Y1

2.7 Required defensive capabilities — catalogue

This catalogue is the canonical application of Part 1’s core thesis (§1.5): the model is commodity, the system around it is the moat. Each capability below is a piece of that system.

Discovery-to-remediation orchestration over the firm codebase. Continuous scanning using a commodity model mix (open-weight + frontier API; Glasswing or Daybreak partner access as a supplement); deduplication and triage with reachability and business-context enrichment; routing to the owning developer team via existing CI/CD; patch validation; regulatory-evidence packaging.

AI Security Gateway / LLM firewall. Mandatory egress chokepoint for every internal LLM call. Prompt-injection detection (ATLAS AML.T0051) via a semantic classifier + regex + LLM-judge ensemble. Output filtering for PII, MNPI, credentials, malicious code, regulated content (FINRA Rule 2210). Prompt-DLP using the firm data-classification schema. Tenant and entitlement enforcement. Append-only audit log meeting Part 500.6 and SR 11-7 evidentiary standards.

Agentic guardrails. Action-class taxonomy (Read-Only / Read-Sensitive / Write-Internal / Write-External / Move-Money / Trade / Grant-Entitlement). Least-privilege scoping with ephemeral credentials and no long-lived API keys. Human-in-the-loop approval workflow for Write-External, Move-Money, Trade, Grant-Entitlement. Kill-switch with a documented RTO ≤ 60 seconds, tested quarterly. Audit trail of every prompt, response, tool call, and approval decision retained per FINRA 4511 / SEC 17a-4 / OSFI recordkeeping. An externally-governed escalation channel per Gomez 2025 — designed-in for control-function agents.

AI-aware identity and authentication. FIDO2 / passkeys for all internal admin, wire, and trading users (NYDFS 500.12, Nov 2025). Voice and video phased out as MFA factors per the NYDFS Industry Letter 16 Oct 2024. Out-of-band callback challenge for high-value wires using pre-registered hardware tokens or signed app push. Behavioural biometrics layered. Identity binding to a hardware-attested device (TPM, secure enclave).

Model Risk Management 2.0. GenAI / agent models in the SR 11-7 inventory with OSFI E-23 alignment. Continuous monitoring: KL divergence on embedding distributions, refusal-rate drift, output-quality benchmarks. Quarterly adversarial-robustness tests using ATLAS-mapped exercises. Independent validation with authority to halt production. Model-card maintenance for every production model, vendor or internal.

Shadow-AI insider-threat program. UEBA enrichment with AI prompt content where corporate-monitoring policy permits. Browser-isolation forced routing for personal accounts on consumer LLM domains. DLP rules for copy/paste of MNPI watchlist tickers, deal codenames, and customer PII into AI domains. Annual AI-usage attestation in compliance training (NYDFS 500.14).

Threat intelligence integration. FS-ISAC AI working group subscription. Anthropic, OpenAI, Microsoft, and Google threat-intel feeds ingested with MITRE ATLAS technique-ID tagging. Glasswing 90-day reports and OpenAI Daybreak threat reporting ingested as released. Correlation between TI and internal agent telemetry.

Continuous adversarial emulation. ATLAS-mapped automated red-team campaigns running continuously. BAS-style purple-team integration with detection engineering. Access to the OpenAI Daybreak Trusted Access tier for authorised pen-testing. Annual external pen test with AI agents explicitly in scope.

Customer-facing fraud controls. Voice-bot detection in call centres (ASR consistency, call-progress signalling, latency fingerprinting). Customer education on OTP / voice-cloning / published-callback. Deepfake-aware video KYC with multiple liveness challenges. Synthetic-identity and mule-recruitment monitoring per FinCEN FIN-2024-ALERT004.

AI supply-chain assurance. Signed model registry; no unsigned weight loads in production. AI-BOM ingested into TPRM under OCC 2013-29, FRB SR 23-4, OSFI B-10. RAG-source classification (public / internal / sensitive / regulated) with provenance metadata. Sandbox model execution for any new artefact. Safetensors / signed formats only; pickle prohibited.

Peer-breach contagion response. Pre-staged evidence package mappable to examiner queries within 72 hours of a peer disclosure. Comms playbook covering customer-trust, vendor-overlap, and model-overlap questions. Counsel-comms coordination for “do we use the same X” questions where the honest answer is “yes, but with these controls.”

Part 3 — Metrics & Governance — sets the indicators, dashboards, and assurance discipline that keep this programme honest and examinable.