Select which cleaning actions to apply. Base cleaning (attribute filtering + formatting) is always applied.
Remove inline styles Strips all style="..." attributes
Convert <b> to <strong> Replaces presentational <b> with semantic <strong>
Convert <div> to <p> Converts inline-only divs to paragraphs; strips attributes from block divs
Remove <span> tags Unwraps and removes all <span> elements
Remove <br> tags Strips all line break elements
Remove <i>, <em>, <u> Unwraps italic, emphasis, and underline tags
Remove empty tags Deletes elements with no text content
Remove <script> & <style> Strips all embedded JavaScript and CSS blocks
Remove all images Strips all <img> tags
Remove all links Keeps link text but removes <a> tags
Remove HTML comments Strips <!-- ... --> comments
Remove <font>, <center> Unwraps legacy formatting tags
Remove table structure Unwraps <table>, <tr>, <td> — keeps text
Flatten lists to paragraphs Converts <li> items to <p> tags
Normalize whitespace Removes excessive spaces, tabs, and blank lines
Extract main content Removes headers, navigation menus, footers, sidebars, ads, and tries to isolate the page's main content area