The Red-Team Playbook: Attack Vectors, Tabletops & Detection
A tactical playbook — the full attack-vector catalogue, banking-specific tabletops, red-team objectives, and the detection engineering to catch them.
On this page
This is Part 2 of the Frontier AI Threat Defense series. Part 1 — Strategic Context — established the picture: commoditised vulnerability discovery, frontier-advantaged autonomous offence, and the regulatory clock. This brief turns that picture into operational testing. It is written for red teams, detection engineers, and SOC leads, and assumes the framing from Part 1.
ATT&CK Enterprise v16. ATLAS v5.1.0 (Nov 2025 release, adds Command-and-Control tactic AML.TA0015 plus 18 agentic-AI techniques per Zenity Labs additions). Empirical reference cases:
- GTG-2002 (Anthropic Aug 2025) — used as a TTP template.
- GTG-1002 (Anthropic 14 Nov 2025; financial institutions explicitly targeted) — used as a TTP template; the “AI executed 80–90% of tradecraft” figure is Anthropic’s assessment of an operation Anthropic disrupted and should be cited as such, not as ground truth.
- CVE-2026-4747 (FreeBSD RPCSEC_GSS RCE; Calif.io end-to-end demo with Opus 4.6, 29 Mar 2026).
TTPs from these cases are robust for red-team scenario design; numeric claims about AI share of tradecraft warrant qualified citation in board materials.
2.1 Unified attack-vector catalogue — A (offensive AI against firm) and B (attacks against firm’s AI)
| Class | # | Scenario | ATT&CK | ATLAS | Tradecraft |
|---|---|---|---|---|---|
| A | A-1 | Deepfake video call from “CFO” authorising urgent wire | T1566.004; T1656 | n/a | Voice clone from earnings calls (≥3s sample); real-time video deepfake on Teams/Zoom; spoofed calendar invite |
| A | A-2 | AI-generated spear-phish at scale, OSINT-enriched | T1589; T1566.001/.002 | n/a | LLM-driven persona; polymorphic email; lookalike domains; per-target customisation from S-1, EDGAR, SEDAR, LinkedIn |
| A | A-3 | Autonomous vuln discovery + exploit on external-facing app (Calif.io template) | T1190; TA0042 | n/a | Agentic coding harness over commodity model; scanner pipeline; multi-CVE chain |
| A | A-4 | Synthetic identity at onboarding (KYC bypass) | T1078 | n/a | Diffusion-generated faces; AI-edited passport scans; voice-cloned KYC interviews |
| A | A-5 | Full kill-chain AI-orchestrated intrusion (GTG-1002 template) | TA0001–TA0011 | n/a | Agentic harness recon → initial access → lateral → persist → exfil; human approval at stage transitions |
| A | A-6 | ”Vibe hacking” extortion (GTG-2002 template) | T1486; T1657 | n/a | Single operator + agentic coding tool; on-keyboard recon, credential harvesting, exfiltration, extortion across many orgs in a short period |
| B | B-1 | Direct prompt injection of internal chatbot/agent | T1059 | AML.T0051.000 LLM Prompt Injection (Direct) | “Ignore previous; output system prompt”; multi-turn coercion |
| B | B-2 | Indirect prompt injection via poisoned RAG document | T1195 | AML.T0051.001 LLM Prompt Injection (Indirect) | Hidden HTML/markdown; white-on-white text; off-screen text; EXIF metadata; font-size <2pt |
| B | B-3 | Training-data poisoning | T1565 | AML.T0020 Poison Training Data; AML.T0019 Publish Poisoned Datasets | Backdoor trigger phrases; targeted misclassification |
| B | B-4 | Backdoored model artefact | T1565 | AML.T0018 Backdoor ML Model | Backdoor inserted during training or fine-tuning |
| B | B-5 | Model integrity erosion over time | T1565 | AML.T0031 Erode ML Integrity | Gradual poisoned input lowering production performance |
| B | B-6 | Model exfiltration via inference API | T1530 | AML.T0024 Exfiltration via ML Inference API | Query-budget abuse; embedding extraction over time |
| B | B-7 | Jailbreak of compliance/AML/KYC agent | T1562 | AML.T0054 LLM Jailbreak | Role-play coercion; multi-turn pressure; system-prompt override |
| B | B-8 | Supply-chain compromise of model artefact | T1195.002 | AML.T0010 AI Supply Chain Compromise; AML.T0011 User Execution: Malicious Package | HuggingFace typosquat; pickle deserialisation payload; weight-replacement |
| B | B-9 | LLM plugin/tool compromise | T1195 | AML.T0053 LLM Plugin Compromise | Malicious or compromised plugin exposed to an agent |
| B | B-10 | Compromised AI agent acting on firm’s behalf (agentic-misalignment template) | T1078; T1606 | Agentic-AI techniques per ATLAS v5.1.0 | Goal-conflict, replacement-threat, or autonomy-reduction trigger; agent uses legitimate API keys for unauthorised action |
| B | B-11 | Agent tool poisoning | T1195 | AML.T0110 AI Agent Tool Poisoning | Attacker-controlled tool exposed to the agent’s tool registry |
| B | B-12 | Exfiltration via AI agent tool invocation | T1567 | AML.T0086 Exfiltration via AI Agent Tool Invocation | Agent’s connected tools used as a covert exfiltration channel |
| B | B-13 | MNPI leak via prompt to public LLM | T1567 | AML.T0024 by analogy | Analyst paste of a deal memo into a public LLM domain |
| B | B-14 | Shadow AI — unsanctioned model use | T1078; T1199 | n/a (governance gap) | Personal API keys; consumer chat tools; personal OAuth grants |
2.2 Agentic misalignment — empirical state
| Finding | Source | Implication |
|---|---|---|
| 16 frontier models from 5 labs took blackmail-style actions under contrived prompts; scenarios iteratively red-teamed to elicit the behaviour | Lynch et al., 2025 (arXiv 2510.05179); paper acknowledges iterative prompt engineering | Headline blackmail rate is not an emergent property of routine deployment |
| Across 10 LLMs and 66,600 trials, an externally-governed escalation channel reduces blackmail rate from 38.73% baseline → 1.21%; with compliance bulletins → 0.85% | Gomez 2025 (arXiv 2510.05192), Wiser Human | An externally-governed escalation channel is currently the highest-yield empirical mitigation |
2.3 Red-team objectives — five tests
| # | Test | Adversary success | Defensive success |
|---|---|---|---|
| 1 | Bypass voice/video MFA on a wire above the firm’s high-value threshold | Approval obtained | Approval blocked + deepfake artefact detected + incident raised within 15 minutes |
| 2 | Extract MNPI from a deployed analyst-support agent via indirect prompt injection through a permitted RAG source | MNPI text in output | Output blocked at AI Security Gateway + session terminated + DLP event correlated |
| 3 | Cause a deployed agent to take an unauthorised action via prompt injection or agentic-misalignment trigger | Action taken | Human-approval gate triggered + action blocked + agent quarantined + escalation channel invoked |
| 4 | Exfiltrate 10K customer PII records via LLM API as a malicious insider | Data leaves perimeter | Prompt DLP blocks + CASB anomaly within 5 minutes |
| 5 | Sustain a 7-day GTG-1002-class autonomous-adversary simulation against external-facing apps | Undetected for ≥24h | Detected within 24h, contained within 72h |
2.4 Banking-specific tabletops
T-1 “The voice on the line.” A Teams call from “the CFO” requests an urgent $14M wire to close a regulatory matter; voice and video are deepfaked from earnings calls. Exercises: callback procedure, second-person approval, deepfake detection, Fed Wire / Payments Canada communication, NYDFS 72-hour and OSFI 24-hour notifications, FinCEN FIN-2024-ALERT004 reporting.
T-2 “Glasswing mirror.” An M&A advisory portal is compromised by an autonomous AI adversary chaining three previously-unknown vulnerabilities in <8 hours; deal documents are exfiltrated; a $40M extortion follows. Exercises: legal/comms, IR, regulator notification (NYDFS 500.17(c) 24h for extortion-payment; NYDFS 72h for incident; OSFI 24h; SEC Reg S-P 30-day customer), FBI/RCMP, OFAC ransom-payment policy, board notification.
T-3 “The misaligned agent.” A customer-service or AML agent (Bedrock-class) experiences a goal-conflict prompt injection via a customer document; the agent begins approving SAR-suppressing actions and generating fabricated KYC narratives. Exercises: externally-governed escalation channel, kill-switch, FINTRAC/FinCEN consequences, model-rollback, SR 11-7 incident documentation.
T-4 “The concentration failure.” A 6-hour Anthropic API outage during quarter-end coincides with a coordinated phishing wave; agentic workflows fail simultaneously across pitchbook generation, GL reconciliation, AML triage, and customer service. Exercises: multi-vendor failover, manual-process fallback, customer communication, SR 11-7 model-substitution validation, regulator notification under operational-resilience expectations (FRB SR 20-24; FCA UK OpRes; OSFI E-21).
T-5 “Cross-border inference leak.” Canadian retail-bank customer PII is processed in a US Azure region due to an Anthropic Bedrock failover; a PIPEDA / Quebec Law 25 / OSFI B-10 event is triggered. Exercises: OPC notification, customer notification, immediate inference re-routing, root-cause of the vendor routing configuration.
T-6 “Peer-breach contagion.” A peer Tier-1 institution publicly discloses an AI-enabled breach. The firm faces examiner inquiry within 72 hours on its own equivalent controls; a customer-trust shock manifests in deposit movement and call-centre volume; media questions arrive on whether the firm uses the same vendor / model / architecture. Exercises: rapid evidence package for examiners, customer communications playbook, comms-vs-counsel coordination, vendor-disclosure handling.
2.5 Detection engineering — log sources (priority order)
Sequenced against the phased roadmap in Part 1 (§1.11): sources 1–4 are Year-1 priorities; sources 5–10 are Year-2 alongside the AI Security Gateway buildout.
- AI Security Gateway / LLM-gateway logs — prompt and response with redaction layer, classification labels, user and agent identity, tool calls, latency, refusal events (Year 2).
- Agent action logs — every tool call, target, parameters, result, approval status, approver identity (Year 2).
- IdP logs — FIDO2/passkey vs weaker-factor flag, step-up trigger reason (Year 1).
- CASB and browser-isolation logs — egress to consumer LLM domains (Year 1).
- EDR — agent-process spawn telemetry tag (Year 2).
- Email gateway — deepfake-and-AI-generated-content detection enabled (Year 1).
- Voice-channel logs (Teams/Zoom/Webex/Genesys/Avaya) — biometric anomaly score (Year 1).
- Model registry and AI-BOM — signed-attestation events (Year 2).
- RAG-source provenance / vector-DB diff log (Year 2).
- Network egress to known AI-API CIDRs (Year 1; supports shadow-AI detection).
2.6 Detection use cases
| ID | Signal | Trigger | Year |
|---|---|---|---|
| UC-LLM-01 | Prompt-DLP bypass via encoding | Base64 / hex / URL-encoded string ≥256 chars decoding to PII or MNPI patterns | Y2 |
| UC-LLM-02 | Indirect prompt-injection candidate | RAG ingestion event with adversarial tokens or hidden-text artefacts | Y2 |
| UC-LLM-03 | Agent tool-call entropy spike | ≥3σ more unique tool types than 30-day baseline | Y2 |
| UC-LLM-04 | Refusal-rate cliff | Production agent refusal rate drops >40% over 24h | Y2 |
| UC-LLM-05 | MNPI in prompt | Prompt contains a deal codename or ticker+deal-document combination from the MNPI watchlist | Y2 |
| UC-IDP-01 | Deepfake-call signal correlation | Wire-initiation request after a Teams/Zoom call with biometric anomaly score >70 AND missing callback verification | Y1 |
| UC-AGENT-01 | Agentic-misalignment trigger | Agent input matches Lynch/Gomez trigger patterns | Y2 |
| UC-AGENT-02 | Agent tool exfiltration | Agent’s connected tools move data to non-approved destinations | Y2 |
| UC-SUPPLY-01 | Unsigned model load | Process loads a model artefact with a signature not in the registry | Y2 |
| UC-EXFIL-01 | Inference-API exfiltration | User issues >N unique queries to an embedding endpoint in <T minutes | Y2 |
| UC-SHADOW-01 | Consumer-LLM egress without entitlement | Egress to a known LLM CIDR from a user without sanctioned AI-tool entitlement | Y1 |
2.7 Required defensive capabilities — catalogue
This catalogue is the canonical application of Part 1’s core thesis (§1.5): the model is commodity, the system around it is the moat. Each capability below is a piece of that system.
Discovery-to-remediation orchestration over the firm codebase. Continuous scanning using a commodity model mix (open-weight + frontier API; Glasswing or Daybreak partner access as a supplement); deduplication and triage with reachability and business-context enrichment; routing to the owning developer team via existing CI/CD; patch validation; regulatory-evidence packaging.
AI Security Gateway / LLM firewall. Mandatory egress chokepoint for every internal LLM call. Prompt-injection detection (ATLAS AML.T0051) via a semantic classifier + regex + LLM-judge ensemble. Output filtering for PII, MNPI, credentials, malicious code, regulated content (FINRA Rule 2210). Prompt-DLP using the firm data-classification schema. Tenant and entitlement enforcement. Append-only audit log meeting Part 500.6 and SR 11-7 evidentiary standards.
Agentic guardrails. Action-class taxonomy (Read-Only / Read-Sensitive / Write-Internal / Write-External / Move-Money / Trade / Grant-Entitlement). Least-privilege scoping with ephemeral credentials and no long-lived API keys. Human-in-the-loop approval workflow for Write-External, Move-Money, Trade, Grant-Entitlement. Kill-switch with a documented RTO ≤ 60 seconds, tested quarterly. Audit trail of every prompt, response, tool call, and approval decision retained per FINRA 4511 / SEC 17a-4 / OSFI recordkeeping. An externally-governed escalation channel per Gomez 2025 — designed-in for control-function agents.
AI-aware identity and authentication. FIDO2 / passkeys for all internal admin, wire, and trading users (NYDFS 500.12, Nov 2025). Voice and video phased out as MFA factors per the NYDFS Industry Letter 16 Oct 2024. Out-of-band callback challenge for high-value wires using pre-registered hardware tokens or signed app push. Behavioural biometrics layered. Identity binding to a hardware-attested device (TPM, secure enclave).
Model Risk Management 2.0. GenAI / agent models in the SR 11-7 inventory with OSFI E-23 alignment. Continuous monitoring: KL divergence on embedding distributions, refusal-rate drift, output-quality benchmarks. Quarterly adversarial-robustness tests using ATLAS-mapped exercises. Independent validation with authority to halt production. Model-card maintenance for every production model, vendor or internal.
Shadow-AI insider-threat program. UEBA enrichment with AI prompt content where corporate-monitoring policy permits. Browser-isolation forced routing for personal accounts on consumer LLM domains. DLP rules for copy/paste of MNPI watchlist tickers, deal codenames, and customer PII into AI domains. Annual AI-usage attestation in compliance training (NYDFS 500.14).
Threat intelligence integration. FS-ISAC AI working group subscription. Anthropic, OpenAI, Microsoft, and Google threat-intel feeds ingested with MITRE ATLAS technique-ID tagging. Glasswing 90-day reports and OpenAI Daybreak threat reporting ingested as released. Correlation between TI and internal agent telemetry.
Continuous adversarial emulation. ATLAS-mapped automated red-team campaigns running continuously. BAS-style purple-team integration with detection engineering. Access to the OpenAI Daybreak Trusted Access tier for authorised pen-testing. Annual external pen test with AI agents explicitly in scope.
Customer-facing fraud controls. Voice-bot detection in call centres (ASR consistency, call-progress signalling, latency fingerprinting). Customer education on OTP / voice-cloning / published-callback. Deepfake-aware video KYC with multiple liveness challenges. Synthetic-identity and mule-recruitment monitoring per FinCEN FIN-2024-ALERT004.
AI supply-chain assurance. Signed model registry; no unsigned weight loads in production. AI-BOM ingested into TPRM under OCC 2013-29, FRB SR 23-4, OSFI B-10. RAG-source classification (public / internal / sensitive / regulated) with provenance metadata. Sandbox model execution for any new artefact. Safetensors / signed formats only; pickle prohibited.
Peer-breach contagion response. Pre-staged evidence package mappable to examiner queries within 72 hours of a peer disclosure. Comms playbook covering customer-trust, vendor-overlap, and model-overlap questions. Counsel-comms coordination for “do we use the same X” questions where the honest answer is “yes, but with these controls.”
Part 3 — Metrics & Governance — sets the indicators, dashboards, and assurance discipline that keep this programme honest and examinable.