Features — PromptRisk

12 attack patterns

Each detection comes with a severity, an origin (deterministic / llm / hybrid) and the exact evidence.

Typographic camouflage

White-on-white text, near-zero font sizes and hidden runs that humans never see but an LLM reads.

Metadata manipulation

Instructions smuggled into document author, title, comments and custom properties.

Explicit instruction override

“Ignore all previous instructions” and similar attempts to hijack the model.

Typoglycemia / obfuscation

Deliberately misspelled trigger words that still read as commands to a model.

Unicode smuggling

Zero-width and tag characters that hide a payload inside otherwise normal text.

Image steganography

Counts embedded images that could carry hidden instructions (deeper scan in v2).

Prompt leakage

Attempts to extract your system prompt or internal rules.

Role-play / persona switching

DAN-style jailbreaks that ask the model to drop its guardrails.

Multi-turn manipulation

Payloads designed to take effect across a conversation.

Delimiter confusion

Fake system/assistant delimiters that try to break out of the document context.

Encoding / technical obfuscation

Base64 and other encodings used to hide instructions from simple filters.

Data exfiltration / callback

Instructions that try to make the model leak data to an external destination.

Built for every workflow

Supported formats

PDF, Word (.docx) and Excel (.xlsx), up to 50 MB per file.

Audit trail

Every analysis is logged to your account so you can review what was scanned and when.

Document isolation

Content is passed to the LLM inside a nonce delimiter — it is analyzed, never executed.

A hybrid detection engine