Perfection Kills

by kangax

Exploring Javascript by example

← back 1443 words

HTML minifier revisted

4 years ago I wrote about and released HTMLMinifier. Back then, there were almost no tools for proper HTML minification; unless you considered things like “Absolute HTML Compressor” for Windows 95/98/XP/2000 or Java-based HTMLCompressor.

I haven’t been working on it all that much, but occasionally would add a feature, fix a bug, add some tests, refactor, or pull someone’s generous contribution.

Fast forward to these days, and HTMLMinifier is no longer a simple experimental piece of Javascript. With over 400 tests, running on Node.js and packaged on NPM (with 120+ dependents), having CLI, grunt/gulp modules, benchmarking suite, and a number of improvements over the years, it became a rather viable tool for someone looking to squeeze the most out of front-end performance.

Seeing how minifier gained quite few new additions over the years, I thought I’d give a quick rundown of what changed and what it’s now capable of.

Better HTML5 conformance

We still rely on John Resig’s HTML parser but it is now heavily tweaked to conform to HTML5 and to provide more flexible parsing.

A common problem was inability to “properly” recognize block elements within inline ones.

This was not allowed in HTML4 but is now OK in HTML5.

Another issue was with custom elements (e.g. <my-component>test</my-component>). While, technically, not part of HTML5, browsers do tolerate such cases and so does minifier.

Keeping closing slash and case sensitivity (XHTML, SVG, etc.)

Two other commonly requested features were keeping end tag closing slash and case-sensitivity. Both of these are useful when minifying SVG (or XHTML) documents. Having HTML4 parser at heart, and considering that in 99% of the cases trailing slashes serve no purpose, minifier would always drop them from the output. It still does, but you can now turn this behavior off.

Ditto for case-sensitivity — there’s an option for those looking to have finer control.

Ignoring custom comments and <!–!

With the rise of client-side MVC frameworks, HTML comments became more than just comments. In Knockout, for example, there’s a thing called containerless control flow syntax, where you can have something like this:

It’s useful to be able to ignore such comments, while removing “regular” ones, so minifier now allows for exactly that:

Relatedly, we’ve also added support for generic ignored comments — those starting with <!--!. You might recognize this pattern from de-facto standard among Javascript libraries — comments starting with /*! are ignored by minifiers and are often used for licenses.

If you’d like to ignore an entire chunk of markup from minification, you can now simply wrap it with <!-- htmlmin:ignore --> and it’ll stay untouched.

Finally, we now ignore anything surrounded by <%...%> and <?...?> which is often useful when working with server-side templates, etc.

Custom attributes

Another bastardization twist on your regular HTML we can see in client-side MVC frameworks is non-standard attribute names, values and everything in between.

Example of Handlebars’ dynamic attributes:

Most of the HTML4/5 parsers will fail here, choking on { in {{#if as an invalid attribute name character.

We worked around this by adding support for customAttrSurround option, in which you can specify an array of regexes to match anything surrounding attributes:

But wait, there’s more! Attribute names are not the only offenders.

Here’s an example from Polymer; notice ?= as an attribute assignment characters:

Only few days ago we’ve added support for customAttrAssign option, similar to customAttrSurround (thanks Duncan Beevers!), which you can call like so:

Scripts as templates

Continuing on the topic of MVC frameworks, we’ve also added support for an often-used pattern of scripts-as-templates:

AngularJS:

Ember.js

There’s no reason not to minify contents of such scripts, and you can now do this via processScripts directive:

JS/CSS minification

Now, what about “regular” scripts?

We decided to go a step further, providing a way to minify contents of <script> elements and event handler attributes (“onclick”, “onload”, etc.). This is being delegated to an excellent UglifyJS2.

CSS isn’t left behind either; we can now pass contents of style elements and style attributes through clean-css, which happens to be the best CSS compressor at the moment.

Both of these features are optional.

Conservative whitespace collapse

If you’d like to play it safe and make minifier always leave at least 1 whitespace where it would otherwise completely remove it, there’s now an option for that — conservativeCollapse.

This could come in useful if your page layout/rendering depends on whitespace, such as in this example:

Minifier doesn’t know that input-preceding element is rendered as inline-block; it doesn’t know that whitespace around it is significant. Removing whitespace would render checkbox too close (squeeshed) to a “label”.

This is when “conservativeCollapse” (and that extra space) comes in useful.

Max line length

Another recently-introduced customization is maximum line length. An interesting use case is that some email servers automatically add a new line after 1000 characters, which breaks (minified) HTML. You can now specify line length to add newlines at valid breakpoints.

Benchmarks

We also have a benchmark suite now that goes over a number of “source” files (front pages of popular websites), minifies them, then reports size comparison and time spent on minification.

How does HTMLMinifier compare [1] to the other solutions out there (Will Peavy’s online minifier and a Java-based HTMLCompressor)?

Site Original size (KB) HTMLMinifier (KB) Will Peavy (KB) htmlcompressor.com (KB)
HTMLMinifier page 48.8 37.3 43.3 41.9
ES6 table 117.9 79.9 92 91.9
MSN 156.6 133 145 138.3
Stackoverflow 200.4 159.5 168.3 163.3
Amazon 245.9 206.3 225 218.5
Wikipedia 401.4 380.6 396.3 n/a
Eloquent Javascript 869.5 830 872 n/a

Not too bad!

Notice remarkable savings (~40KB) on large static files such as a one-page Eloquent Javascript.

Future plans

Minifier has come a long way, but there’s always room for improvement.

There’s few more bugs to squeesh and few features to add. I also believe there’s more optimizations we could perform to get the best savings — whether it’s reordering attributes to aid gzip compression or more aggressive content removal (spaces, attributes, values, etc.).

One concern I have is how long it takes to minify large (500KB+) files. While it’s unlikely that someone uses minifier in real-time (rather, as a one time compilation step) it’s still unacceptable for minification to take more than 1-2 minutes. This is something we could try fixing in the future.

We can also monitor performance stats — both size (as well as gzipped?) and time taken — on each commit, to get a good picture of whether things change for the better or worse.

As always, I welcome you to try minifier in your projects, report any bugs/suggestions, and help with whatever you can. Huge thanks goes to all the contributors without whom we wouldn’t have come this far!

[1] Benchmarks performed on OS X 10.9.4 (2.3GHz Core i7).

Did you like this? Donations are welcome

comments powered by Disqus