This document lists the goals and non-goals of Bleach. My hope is that by focusing on these goals and explicitly listing the non-goals, the project will evolve in a stronger direction.
- Goals of Bleach
Bleach should always take a allowed-list-based approach to markup filtering. Specifying disallowed lists is error-prone and not future proof.
For example, you should have to opt-in to allowing the
not opt-out of all the other
on* attributes. Future versions of HTML may add
new event handlers, like
ontouch, that old disallow would not prevent.
The primary goal of Bleach is to sanitize user input that is allowed to contain some HTML as markup and is to be included in the content of a larger page. Examples might include:
- User comments on a blog.
- “Bio” sections of a user profile.
- Descriptions of a product or application.
These examples, and others, are traditionally prone to security issues like XSS or other script injection, or annoying issues like unclosed tags and invalid markup. Bleach will take a proactive, allowed-list-only approach to allowing HTML content, and will use the HTML5 parsing algorithm to handle invalid markup.
See the chapter on clean() for more info.
Bleach is designed to work with fragments of HTML by untrusted users. Some non-goal use cases include:
Once you’re creating whole documents, you have to allow so many tags that a
disallow-list approach (e.g. forbidding
<object>) may be
There are much faster tools available if you want to remove or escape all HTML from a document.
Bleach is powerful but it is not fast. If you trust your users, trust them and don’t rely on Bleach to clean up their mess.
Malicious content is designed to be malicious. Making it safe is a design goal of Bleach. Making it pretty or sane-looking is not.
If you want your malicious content to look pretty, you should pass it through Bleach to make it safe and then do your own transform afterwards.