AI Security Audit: a practical guide for LLM agents
Auditing an AI system is not a checklist exercise bolted onto an application audit. LLM-powered agents introduce a new attack surface — non-deterministic outputs, persistent memory, autonomous tool use — that traditional controls were never designed to contain. This guide outlines how Templebit audits AI agents for security and compliance, the controls we expect to see, and the failure modes that show up most often in production.
The four planes of an AI audit
Every AI security audit we run decomposes the system into four planes. Skip one and the audit is incomplete.
Data plane
Training data, RAG corpora, vector stores, and PII flowing through embeddings.
Model plane
Base model, fine-tunes, system prompts, and guardrail policies.
Agent plane
Tool use, autonomous loops, memory stores, and inter-agent messages.
Runtime plane
Secrets, identity, network egress, and human-in-the-loop checkpoints.
How to audit AI agent memory for security risk
Agent memory — short-term scratchpads, long-term vector stores, and shared knowledge bases — is the highest-leverage target in a modern AI system. A single poisoned document can persist into thousands of future sessions. Audit memory against these five failure modes:
- Cross-session leakage — one user's memory surfacing in another user's context window.
- Indirect prompt injection persisted into long-term memory from a single poisoned document.
- PII accumulation — embeddings retaining identifiers long after the source row was deleted.
- Tool-call replay — stored 'successful plans' that re-execute privileged actions on a new task.
- Memory-rank attacks — adversarial content engineered to score high in vector recall.
Concretely: enumerate every write path to the memory store, confirm tenant isolation at the index level (not just the query filter), require provenance metadata on every chunk, and run a recall-poisoning evaluation that injects a benign canary and verifies it can be detected, retrieved, and purged.
The 10-step audit checklist
- Inventory every model, agent, prompt, tool, and memory store. Unknown surfaces cannot be audited.
- Threat-model with STRIDE plus AI-specific categories: prompt injection, model extraction, data poisoning, jailbreaks, tool abuse.
- Classify data crossing the model boundary. Tag PII, regulated, and secret-bearing fields.
- Lock down system prompts and tool schemas in version control. Treat prompt changes as code changes.
- Isolate tool credentials per agent. No shared service accounts. Scope to least privilege at the API level, not in the prompt.
- Wrap every tool call in a policy engine (OPA, Cedar, or a typed allow-list). The model proposes; the policy decides.
- Quarantine untrusted text before it reaches the model: strip HTML, label provenance, and segregate from instructions.
- Stand up an evaluation harness with adversarial suites — Garak, PyRIT, promptfoo — and run on every prompt or model change.
- Log full traces (input, retrieved context, tool calls, outputs) to an append-only store. You cannot investigate what you did not record.
- Define memory TTL, deletion, and right-to-be-forgotten flows before launch — not after the first incident.
Mapping to compliance frameworks
A defensible AI audit traces every control back to a recognized framework. The four below cover the common ground for regulated industries operating LLM systems today.
NIST AI RMF 1.0
Govern, Map, Measure, Manage functions across the AI lifecycle.
ISO/IEC 42001
AI management system requirements — policy, risk treatment, and continual improvement.
OWASP LLM Top 10
Concrete vulnerability classes: LLM01 Prompt Injection through LLM10 Model Theft.
SOC 2 / ISO 27001
Underlying access, change management, and monitoring controls the AI system inherits.
Evaluation harnesses, not vibes
The single biggest gap we see is teams shipping AI features without a regression suite. An AI evaluation harness is the audit equivalent of a test pipeline: a versioned set of adversarial prompts, jailbreak payloads, indirect injection documents, and tool-abuse scenarios that run on every model upgrade, prompt change, or new tool. Pass/fail thresholds become part of your change-management evidence — the same way unit tests do for traditional software.
When to bring in an outside auditor
Internal teams are the right owners for continuous evaluation. An outside auditor is the right call before a regulated launch, after a material model or architecture change, or when board-level assurance is on the line. Templebit's AI and cybersecurity practices run these audits jointly so the findings are technically grounded and audit-defensible in the same report.
Need an AI security audit?
We run end-to-end AI security audits for financial, governmental, and industrial clients — covering data, model, agent, and runtime planes against NIST AI RMF, ISO 42001, and OWASP LLM Top 10.