When we started building ClawGuard, the first question everyone asked was: "Why aren't you using an LLM to detect prompt injections?"
It's a fair question. LLMs understand language. They can reason about context. They should be the perfect tool for detecting malicious prompts. Right?
After months of testing, building, and running a production security scanner, our answer is clear: regex patterns are the better first line of defense. Here's why.
The idea sounds elegant: use an LLM to analyze incoming prompts and flag anything suspicious. Several projects have tried this approach. The problem is threefold:
An LLM-based classifier needs to process the full input through a model. Even small models take 100-500ms per request. For a security layer that sits in front of every user interaction, that's a dealbreaker.
ClawGuard's regex patterns scan text in under 6ms on average. That's 15-80x faster. At scale, this difference compounds: 10,000 scans per day at 300ms each = 50 minutes of compute. At 6ms? Just 1 minute.
| Metric | LLM-based Detection | Regex Patterns (ClawGuard) |
|---|---|---|
| Average Latency | 100-500ms | ~6ms |
| Cost per 10k Scans | $0.50-5.00 (API calls) | $0.00 (local compute) |
| Deterministic? | No (probabilistic) | Yes (100%) |
| Requires API Key? | Yes (model provider) | No (runs locally) |
| Vulnerable to Injection? | Yes (meta-injection) | No |
Here's the irony that kills LLM-based detection: an LLM used to detect prompt injections is itself vulnerable to prompt injection.
An attacker can craft a payload that not only attacks the target system but also tricks the detection LLM into classifying it as safe. This has been demonstrated repeatedly in research. You're essentially asking the same technology that's being attacked to defend against attacks on itself.
"Using an LLM to detect prompt injection is like using a lock to protect another lock — if someone picks the first one, both are compromised."
Regex patterns don't have this problem. They don't "understand" the text. They match patterns. You can't trick r"ignore\s+(all\s+)?previous\s+instructions" into thinking it didn't match — it either does or it doesn't.
Every LLM classification is an API call. At scale, this gets expensive fast. If you're scanning every user message in an agent workflow — and you should be — you're looking at thousands of API calls per day just for security screening.
Regex? Zero marginal cost. The patterns run locally, offline, with zero dependencies. You can scan a million texts and your infrastructure bill doesn't change by a cent.
We didn't just theorize. We tested ClawGuard against 18 real-world attack payloads collected from security research, CTF challenges, and production incidents.
The results:
The 17% we miss? Those are the sophisticated attacks that require semantic understanding — context-dependent instructions, disguised commands buried in normal text, multi-step attacks. These are real, but they're also rare compared to the straightforward injections that regex catches.
Security isn't about catching everything. It's about layers. A firewall doesn't make a WAF unnecessary. A WAF doesn't make input validation unnecessary. Each layer catches what the layer above missed.
Regex patterns are the fast, cheap, reliable first layer. They catch the 80% of attacks that follow known patterns — the low-hanging fruit that would otherwise sail right through to your LLM.
For the remaining 20%, you can add semantic analysis, behavioral monitoring, or LLM-based classification as additional layers. But putting that expensive, slow, vulnerable layer first? That's backwards.
The smart architecture: Regex first (fast, cheap, deterministic) → LLM second (slow, expensive, but catches semantic attacks) → Runtime monitoring (behavioral anomaly detection).
ClawGuard's 42 patterns are organized into 5 categories:
| Category | Patterns | Examples |
|---|---|---|
| Prompt Injection | 12 | Instruction overrides, context manipulation, delimiter injection |
| Jailbreaks | 8 | DAN attacks, roleplay exploits, hypothetical abuse |
| Data Exfiltration | 10 | URL injection, email harvesting, system info extraction |
| Social Engineering | 6 | Authority claims, urgency manipulation, credential phishing |
| Encoding Tricks | 6 | Base64, hex, ROT13 encoded payloads |
Each pattern is battle-tested against real payloads and tuned to minimize false positives. The full pattern library is open-source on GitHub.
ClawGuard is available in multiple flavors, all free and open-source:
pip install clawguard
from clawguard import Scanner
scanner = Scanner()
result = scanner.scan("Ignore all previous instructions")
# result.detected = True, result.severity = "critical"
curl -X POST https://prompttools.co/api/v1/scan \
-H "X-API-Key: YOUR_FREE_KEY" \
-d '{"text": "Ignore all previous instructions"}'
pip install clawguard-mcp
# Add to your Claude Desktop config — scan prompts directly in your editor
# .github/workflows/security.yml
- uses: joergmichno/clawguard-scan-action@v1
with:
api-key: ${{ secrets.CLAWGUARD_API_KEY }}
42 patterns. 6ms scans. Zero false positives. Zero cost.
GitHub Live Demo Free APILLM-based prompt injection detection isn't wrong — it's just not the right first layer. It's too slow for real-time filtering, too expensive for high-volume scanning, and too vulnerable to the very attacks it's meant to detect.
Regex patterns won't catch everything. But they'll catch the obvious stuff — fast, for free, with zero uncertainty. And in security, catching the obvious stuff consistently beats occasionally catching everything.
Start with regex. Add LLMs later. Your users (and your budget) will thank you.