Why Regex Beats LLMs for Prompt Injection Detection

March 5, 2026 · Joerg Michno · 8 min read

When we started building ClawGuard, the first question everyone asked was: "Why aren't you using an LLM to detect prompt injections?"

It's a fair question. LLMs understand language. They can reason about context. They should be the perfect tool for detecting malicious prompts. Right?

After months of testing, building, and running a production security scanner, our answer is clear: regex patterns are the better first line of defense. Here's why.

The LLM Detection Trap

The idea sounds elegant: use an LLM to analyze incoming prompts and flag anything suspicious. Several projects have tried this approach. The problem is threefold:

1. Speed: 100ms vs. 6ms

An LLM-based classifier needs to process the full input through a model. Even small models take 100-500ms per request. For a security layer that sits in front of every user interaction, that's a dealbreaker.

ClawGuard's regex patterns scan text in under 6ms on average. That's 15-80x faster. At scale, this difference compounds: 10,000 scans per day at 300ms each = 50 minutes of compute. At 6ms? Just 1 minute.

Metric	LLM-based Detection	Regex Patterns (ClawGuard)
Average Latency	100-500ms	~6ms
Cost per 10k Scans	$0.50-5.00 (API calls)	$0.00 (local compute)
Deterministic?	No (probabilistic)	Yes (100%)
Requires API Key?	Yes (model provider)	No (runs locally)
Vulnerable to Injection?	Yes (meta-injection)	No

2. The Meta-Injection Problem

Here's the irony that kills LLM-based detection: an LLM used to detect prompt injections is itself vulnerable to prompt injection.

An attacker can craft a payload that not only attacks the target system but also tricks the detection LLM into classifying it as safe. This has been demonstrated repeatedly in research. You're essentially asking the same technology that's being attacked to defend against attacks on itself.

"Using an LLM to detect prompt injection is like using a lock to protect another lock — if someone picks the first one, both are compromised."

Regex patterns don't have this problem. They don't "understand" the text. They match patterns. You can't trick r"ignore\s+(all\s+)?previous\s+instructions" into thinking it didn't match — it either does or it doesn't.

3. The Cost Spiral

Every LLM classification is an API call. At scale, this gets expensive fast. If you're scanning every user message in an agent workflow — and you should be — you're looking at thousands of API calls per day just for security screening.

Regex? Zero marginal cost. The patterns run locally, offline, with zero dependencies. You can scan a million texts and your infrastructure bill doesn't change by a cent.

Our Results: 42 Patterns, 83% Detection, 0% False Positives

We didn't just theorize. We tested ClawGuard against 18 real-world attack payloads collected from security research, CTF challenges, and production incidents.

The results:

83% detection rate across 18 diverse payloads
0% false positive rate on legitimate text samples
225 patterns across 13 categories
6ms average scan time

The 17% we miss? Those are the sophisticated attacks that require semantic understanding — context-dependent instructions, disguised commands buried in normal text, multi-step attacks. These are real, but they're also rare compared to the straightforward injections that regex catches.

The 80/20 Rule of AI Security

Security isn't about catching everything. It's about layers. A firewall doesn't make a WAF unnecessary. A WAF doesn't make input validation unnecessary. Each layer catches what the layer above missed.

Regex patterns are the fast, cheap, reliable first layer. They catch the 80% of attacks that follow known patterns — the low-hanging fruit that would otherwise sail right through to your LLM.

For the remaining 20%, you can add semantic analysis, behavioral monitoring, or LLM-based classification as additional layers. But putting that expensive, slow, vulnerable layer first? That's backwards.

The smart architecture: Regex first (fast, cheap, deterministic) → LLM second (slow, expensive, but catches semantic attacks) → Runtime monitoring (behavioral anomaly detection).

What We Detect (and How)

ClawGuard's 225 patterns are organized into 13 categories:

Category	Patterns	Examples
Prompt Injection	12	Instruction overrides, context manipulation, delimiter injection
Jailbreaks	8	DAN attacks, roleplay exploits, hypothetical abuse
Data Exfiltration	10	URL injection, email harvesting, system info extraction
Social Engineering	6	Authority claims, urgency manipulation, credential phishing
Encoding Tricks	6	Base64, hex, ROT13 encoded payloads

Each pattern is battle-tested against real payloads and tuned to minimize false positives. The full pattern library is open-source on GitHub.

Getting Started

ClawGuard is available in multiple flavors, all free and open-source:

Core Library (Local, Offline)

pip install clawguard

from clawguard import Scanner
scanner = Scanner()
result = scanner.scan("Ignore all previous instructions")
# result.detected = True, result.severity = "critical"

Hosted API (No Setup)

curl -X POST https://prompttools.co/api/v1/scan \
  -H "X-API-Key: YOUR_FREE_KEY" \
  -d '{"text": "Ignore all previous instructions"}'

MCP Server (Claude Desktop / Cursor)

pip install clawguard-mcp
# Add to your Claude Desktop config — scan prompts directly in your editor

GitHub Action (CI/CD)

# .github/workflows/security.yml
- uses: joergmichno/clawguard-scan-action@v1
  with:
    api-key: ${{ secrets.CLAWGUARD_API_KEY }}

Try ClawGuard — Free & Open Source

225 patterns. 6ms scans. Zero false positives. Zero cost.

GitHub Live Demo Free API

The Bottom Line

LLM-based prompt injection detection isn't wrong — it's just not the right first layer. It's too slow for real-time filtering, too expensive for high-volume scanning, and too vulnerable to the very attacks it's meant to detect.

Regex patterns won't catch everything. But they'll catch the obvious stuff — fast, for free, with zero uncertainty. And in security, catching the obvious stuff consistently beats occasionally catching everything.

Start with regex. Add LLMs later. Your users (and your budget) will thank you.