This class can be used to remove unwanted tags and data from HTML document.
It takes a string with the HTML document to clean and parses it assuming a given character set encoding.
The class can perform several types of clean-up operations like:
- Removing style definitions
- Remove tags or attributes based on white lists or blacklists
- optimize code (merge inline tags, strip empty inline tags, trim excess new lines)
- Use the HTML tidy extension to clean the document and format the output as XHTML and drop proprietary attributes from Microsoft Word HTML documents
- Drop empty paragraphs
- Remove needless white space
- Fill empty table cells |