Clear html code from styles. Automatic cleaning of HTML code from “garbage”. Supported formats for online conversion

Good day, dear readers! I hope you are doing as well as we are - the sun is shining, the birds are singing, it’s warm and summer has come! I’m still working on my dissertation, so for the last month and a half I’ve been writing only once a week, I physically don’t have time. But let's not talk about sad things, let's get down to business!

Once upon a time, I was digging on the Internet to find a script that clears the HTML code of garbage, which, in particular, leaves all of us “beloved” in this regard, Microsoft Word. Previously I used Code cleanup using Adobe Dreamweaver but he had two drawbacks:

    Sometimes it doesn’t clean everything that we would like.

    If there is a very large amount of code, the cleanup script throws an error.

The second point became critical for me, since I had to work with large html tables, from which it was impossible to move away from one site, and they provided all the information in Word.

Thus, after wandering around the Internet for a long time, I found a script that copes with all this management with a bang, and at the same time is completely customizable.

Excel/Word to HTML is an ideal tool for editing the source code of WordPress articles or any other content management system when their built-in composer does not provide all the functions we need. Compose content directly in your browser window without installing an extension or plugin to handle syntax highlighting and other text editing features.

How to use?

Paste the document you want to convert into the Word editor, and then go to the HTML viewer, using the large tabs at the top of the page to generate the code.

Clean up dirty markup with a large button that executes active (checked) options in a list. You can also apply these functions one by one using the CLEAN icon.

Conversion problems that are easily solved by our online HTML converter

The problem of converting word to html has probably always existed along with Microsoft Word. A huge number of styles assigned to texts, such as mso-spacerun:yes, and classes, such as MsoNormal, as well as a clutter of all sorts of span style="font-size:10.0pt" greatly clutter the code. And they often interrupt the native styles specified in the site. If you can still handle simple text by inserting text through the editor's "Insert text only" button, then this method will not work with tables. Our converter is able to easily clean out any unnecessary comments and styles from the future html file, by simply clicking on the buttons.


Online cleaning of HTML from unnecessary CSS styles
  • Remove any unnecessary styles from all text or a selected fragment
  • We remove unnecessary indentation codes, symbols, and other Unicode codes
  • Clean the code from extra spaces and duplicate tags
  • If necessary, completely remove the HTML markup.

Convert Word, Excel, TxT files into pure HTML source code. Without unnecessary styles and comments for direct, correct insertion into site pages.

Supported formats for online conversion:

  • 97–2004 and newer DOC to HTML, DOCX to HTML;
  • XLS to HTML, XLSX to HTML;
  • PPT to HTML, PPTX to HTML;
  • TXT to HTML and many other formats.

Another useful use of the service, instead of spending hours making a table in HTML, make it in 15 minutes in Excel or Word and convert it into clean, beautiful HTML code for inserting into the site.

Hello!

When writing my own WYSIWYG editor, I encountered a problem copying text from Word. There are actually three problems:

  • Word inserts a lot of junk html code that needs to be cleaned
  • For some reason, Word uses paragraphs instead of UL and LI tags to present lists
  • Actually, how to determine that the inserted text is inserted from Word.
In general, to solve these problems, a jquery plugin was written, the full source code of which is available at the end of the article. Usage example:

$('#editor'). msword_html_filter();
The plugin is hung on an event keyup and checks whether the source code inside the editor is pasted from Word; if so, then the cleanup function is launched. In the resulting html, everything possible is added - non-breaking spaces, attributes style And align, tags span, All Mso-classes, empty paragraphs.

Implementation details under the cut.

Most of the regulars used were taken from TinyMCE.

How to determine whether a line contains html code inserted from Word:

If (/class="?Mso|style="[^"]*\bmso-|style="[^""]*\bmso-|w:WordDocument/i.test(content)) ( ... )

Code cleaning function (a jquery editor object is passed to the function):

Function word_filter(editor)( var content = editor.html(); // Word comments like conditional comments etc content = content.replace(/<(!|script[^>]*>.*?<\/script(?=[>\s])|\/?(\?xml(:\w+)?|img|meta|link|style|\w:\w+)(?=[\s\/>]))[^>]* >>gi, ""); // Convert into <(\/?)s>/gi,"<$1strike> ___ ([\s\u00a0]*)<\/span>/gi, function(str, spaces) ( return (spaces.length > "; if (/^\s*\w+\./.test(txt)) ( var matches = /()\./.exec(txt ); if (matches) ( var start = parseInt(matches, 10); list_tag = start>1 ? "" : ""; )else( list_tag = ""; ) ) if(cur_level> " + $(this).html() + "") $(this).remove(); last_level = cur_level; )else( last_level = 0; ) )) $("", editor).removeAttr("style"); $("", editor).removeAttr( "align"); $("span", editor).replaceWith(function() (return $(this).contents();)); $("span:empty", editor).remove(); "", editor).removeAttr("class"); $("p:empty", editor).remove();

Full source text of the plugin is under the spoiler, save to file jquery.msword_html_filter.js

plugin source text

(function($) ( $.fn.msword_html_filter = function(options) ( var settings = $.extend((), options); function word_filter(editor)( var content = editor.html(); // Word comments like conditional comments etc content = content.replace(//gi, ""); // Remove comments, scripts (e.g., msoShowComment), XML tag, VML content, // MS Office namespaced tags, and a few other tags content = content.replace(/<(!|script[^>]*>.*?<\/script(?=[>\s])|\/?(\?xml(:\w+)?|img|meta|link|style|\w:\w+)(?=[\s\/>]))[^>]* >>gi, ""); // Convert into for line-though content = content.replace(/<(\/?)s>/gi,"<$1strike>"); // Replace nbsp entites to char since it"s easier to handle //content = content.replace(/ /gi, "\u00a0"); content = content.replace(/ /gi, " "); // Convert ___ to string of alternating // breaking/non-breaking spaces of same length content = content.replace(/ ([\s\u00a0]*)<\/span>/gi, function(str, spaces) ( return (spaces.length > 0) ? spaces.replace(/./, " ").slice(Math.floor(spaces.length/2)).split("") .join("\u00a0") : "" )); editor.html(content); // Parse out list indent level for lists $("p", editor).each(function())( var str = $(this).attr("style"); var matches = /mso-list:\w+ \ w+(+)/.exec(str); if (matches) ( $(this).data("_listLevel", parseInt(matches, 10)); ) )); // Parse Lists var last_level=0; var pnt = null; $("p", editor).each(function())( var cur_level = $(this).data("_listLevel"); if(cur_level != undefined)( var txt = $(this).text(); var list_tag = ""; if (/^\s*\w+\./.test(txt)) ( var matches = /()\./.exec(txt); if (matches) ( var start = parseInt( matches, 10); list_tag = start>1 ? "" : ""; ); pnt = $(this).prev(); )else( pnt = $(list_tag).appendTo(pnt); ) ) if(cur_level " + $(this).html() + "") $(this).remove(); last_level = cur_level; )else( last_level = 0; ) )) $("", editor).removeAttr("style"); $("", editor).removeAttr( "align"); $("span", editor).replaceWith(function() (return $(this).contents();)); $("span:empty", editor).remove(); "", editor).removeAttr("class"); $("p:empty", editor).remove(); return this.each(function() ( $(this).on("keyup", function ()( var content = $(this).html(); if (/class="?Mso|style="[^"]*\bmso-|style="[^""]*\bmso-|w :WordDocument/i.test(content)) ( word_filter($(this)); ) ));


Performance was tested only in the latest Firefox.

Cleaner is a service for cleaning tags from “garbage” that remains in the document after saving the page in the format from the program.

A long time ago I wrote a similar plugin, but it was made in a hurry, now the mechanism has been completely rewritten.

Code cleaning occurs by searching through the entered line from which a new one is formed containing a “clean” . The plugin removes absolutely everything from tags, including . In unpaired tags, the symbol / (slash) is inserted. Empty tags are removed, for example the construct will be deleted because it contains nothing.

How does html cleaner work?

There are two ways:

  1. In MS Word, select the data you want to clear, to select all, press Ctrl + A. Paste the copied text into the field below (the “Paste MS Office Data” tab must be selected), click the “Finish” button.
  2. Before optimizing the code, select “Save as...” in Word, then select the File type “Web page with filter”, then open the saved file in a text editor, copy the code and paste it into the field below (the “Insert HTML” tab must be selected "), click the "Done" button.

As a result, you will receive pristine html code.
The following attributes remain untouched:

"colspan", "rowspan", "href", "src", "type", "value", "lang", "tabindex", "title", "code", "alt", "target", "dir ", "span", "action", "method"

Get rid of your dirty markup with the free online HTML Cleaner. It’s very easy to compose, edit, format and minify the web code with this online tool. Convert Word docs to tidy HTML and any other visual documents like Excel, PDF, Google Docs etc. It’s extremely simple and efficient to work with the two attached visual and source editor which responds instantly to your actions.

HTML Cleaner is equipped with many useful features to make HTML cleaning and editing as easy as possible. Just paste your code in the text area, set up the cleaning preferences and press the Clean HTML button. It can handle any document created with Microsoft Excel, PowerPoint, Google docs or any other composer. It helps you easily get rid of all inline styles and unnecessary codes which are added by Microsoft Word or other WYSIWYG editors. This HTML editor tool is useful when you’re migrating the content from one website to the other and you want to clean up all alien classes and IDs the source site applies. Use the find and replace tool for your custom commands. The gibberish text generator lets you easily add dummy text to the editor.

On the top of the page you can see the visual editor and the source code editor next to each other. Whichever you modify the changes will be reflected on the other in real time. The visual HTML editor allows beginners to easily compose their content just like when using any other word processor program, while on the right the source editor with highlighted code markup helps the advanced users to adjust the code. This makes this online program a nice tool to learn HTML coding.

Convert Word Documents To Clean HTML

To publish online PDFs, Microsoft Word, Excel, PowerPoint or any other documents composed with different word editor programs or just to copy the content copied from another website, paste the formatted content in the visual editor. The HTML source of the document will be immediately visible in the source editor as well. The control bar above the WYSIWYG editor controls this field while all other source cleaning settings are for editing the source code. Click the Clean HTML button after setting up the cleaning preferences. Copy the cleaned code and publish it on your website.

There’s no guarantee that the program corrects all errors in your code exactly the way you want so please try to enter a syntactically valid HTML.

Convert the HTML tables to structured div elements activating the corresponding checkbox.

Cleaning HTML code from Microsoft Word tags (2000-2007)?

In the past web designers used to build their websites using tables to organize page layout, but in the era of responsive web design tables are outdated and DIV’s are taking their place. This online tool helps you turn your tables to structured div elements with a few simple clicks.

You can make your source code more readable by organizing the tabs hierarchy in a tree view.

Become A Member

This website is a fully functional tool to clean and compose HTML code but you have the possibility to purchase a HTML G membership and access even more professional features. Using the free version of the HTML Cleaner you consent to include links in the edited documents. This cleanup tool might add a promotional third party link to the end of the cleaned documents and you need to leave this code unchanged as long as you use the free version.