Optimizing HTML
- Why clean markup?
- Markup smells
- Additional optimizations
- Agressive optimizations
- When things go wrong
- Antipatterns
- Tools
- Future considerations
Why clean markup?
Client-side optimization is getting a lot of attention lately, but some of its basic aspects seem to go unnoticed. If you look carefully at pages on the web (even those that are supposed to be highly optimized), it’s easy to spot a good amount of redundancies, and inefficient or archaic structures in their markup. All this baggage adds extra weight to pages that are supposed to be as light as possible.
The reason to keep documents clean is not so much about faster load times, as it is about having a solid and robust foundation to build upon. Clean markup means better accessibility, easier maintenance, and good search engine visibility. Smaller size is just a property of clean documents, and another reason to keep them this way.
In this post, we’ll take a look at HTML optimization: removing some of the common markup smells; reducing document size by getting rid of redundant structures, and employing minification techniques. We’ll look at currently available minification tools, and analyze what they do wrong and right. We’ll also talk about what can be done in a future.
Markup smells
So what are the most common offenders?
1. HTML comments in scripts
One of the gross redundanies nowadays is inclusion of HTML comments — <!-- --> — in script blocks. There’s not much to say here, except that browsers that actually need this error-prevention measure (such as ‘95 Netscape 1.0) are pretty much extinct. Comments in scripts are just an unnecessary baggage and should be removed ferociously.
2. <![CDATA[ … ]> sections
Another often needless error-prevention measure is inclusion of CDATA blocks in SCRIPT elements:
<script type="text/javascript">
//<![CDATA[
...
//]]>
</script>
It’s a noble goal that falls short in reality. While CDATA blocks are a perfectly good way to prevent XML processor from recognizing < and & as start of markup, it is only the case in true XHTML documents — those that are served with “application/xhtml+xml” content-type. Majority of the web is still served as “text/html” (since, for example, IE doesn’t understand XHTML to this date), and so is parsed as HTML by the browsers, not as XML.
Unless you’re serving documents as “application/xhtml+xml”, there’s little reason to have CDATA sections hanging around. Even if you’re planning to use xhtml in a future, it might make sense to remove unnecessary weight from the document, and only add it later, when actually needed.
And, of course, an ultimate solution here is to avoid inline scripts altogether (to take advantage of external scripts caching).
3. onclick=”…”, onmouseover=”“, etc.
There are some valid use cases for intrinsic event attributes, such as for performance reasons or to target ancient browsers (although, I’m not aware of any environment that would understand event attributes — onclick="...", and not property-based assignments — element.onclick = ...). Besides well-known reasons to avoid them, such as separation of concerns and reusability, there’s a matter of markup pollution. By moving event logic to external script, we can take advantage of that script’s caching. Event handler logic doesn’t need to be transferred to client every time document is requested.
4. onclick=”javascript:…”
An interesting confusion of javascript: pseudo protocol and intrinsic event handlers results in this redundant mix (with 106,000 (!) occurrences). The truth is that entire contents of event handler attribute become a body of a function. That function then serves as an event handler (usually, after having its scope augmented to include some or all of the ancestors and element itself). “javascript:” addition merely becomes an unnecessary label and rarely serves any purpose.
5. href=”javascript:void(0)”
Continuting with “javascript:” pseudo protocol, there’s an infamous href="javascript:void(0)" snippet, as a way to prevent default anchor behavior. This terrible practice of course makes anchor completely inacessible when Javascript is disabled/not available/errors out. It should go without saying that ideal solution is to include proper url in href, and stop default anchor behavior in event handler. If, on the other hand, anchor element is created dynamically, and is then inserted into a document (or is hidden initially, then shown via Javascript), plain href="#" is a leaner and faster alternative to “javascript:” version.
6. style=”…”
There’s nothing inherently wrong with style attribute, except that by moving its contents to an external stylesheet, we can take advantage of resource caching. This is similar to avoiding event attributes, mentioned earlier. Even if you only need to style one particular element and are not planning to reuse its styles, remember that style information has to be transferred every time document is requested. Moving style to external resouce prevents this, as stylesheet is transferred once and then cached on a client.
7. <script language=”Javascript” … >
Probably one of the most misunderstood attributes is SCRIPT’s “language”. This attribute is so archaic that it was already deprecated in 1999 (!), 10 years ago, when HTML 4.01 became an official recommendation. There’s absolutely no reason to use this attribute, except for the rare cases when language version needs to be specified (and even that is somewhat unreliable and should probably be avoided if possible).
8. <script charset=”…” … >
Another misunderstanding of SCRIPT element is that with charset attribute. Sometimes I see documents that include this kind of markup:
<script type="text/javascript" charset="UTF-8">
...
</script>
The thing is that charset attribute only really makes sense on “external” SCRIPT elements — those that have “src” attribute. HTML 4.01 even says:
Note that the charset attribute refers to the character encoding of the script designated by the src attribute; it does not concern the content of the SCRIPT element.
Testing shows that actual browsers behavior also matches specs in this regard.
Searching for this pattern, reveals about 2000 occurrences. Not suprising, given that even popular apps like Textmate include wrong usage of charset.
Additional optimizations
We’ve covered some of the bad practices, that almost always have to be avoided. But there’s still more ahead, and that “more” is removing redundant parts. Optimizations explained below are often questionable, as they compromise clarity for size. Therefore I include them here not as a recommendation, but merely as an option. Employ with careful consideration.
1. <style media=”all” …>
HTML 4.01 defines media attribute on STYLE elements, as a way of targeting specific medium — screen, print, handheld, and so on. One of the possible values for media is “all”, which also happens to be a de-facto standard among modern (and not so modern) browsers. If you find yourself using media="all", it should be safe to just omit it and let browser set value implicitly.
Interestingly, HTML 4.01 states that default value for media is “screen”. However, none of the browsers I tested [1] implement it as per specs, and default to “all” instead. This is probably why HTML 5 draft specifies default value as “all” — to match actual browsers’ behavior.
2. <form method=”get” …>
Another default value — GET — of FORM element’s “method” attribute is often specified explicitly. There’s no harm in dropping it, except for lesser clarity. Note that HTML 5 draft leaves this behavior untouched.
3. <input type=”text” …>
INPUT element’s “type” defaults to “text” in both — HTML 4.01 and HTML 5 draft. Dropping this attribute can result in substantial size savings on pages with lots of text fields.
4. <meta http-equiv=”Content-type” …>
Specifying document’s character encoding has always been a source of great confusion. Contrary to common belief, META element that specifies Content-type does not have higher priority over “Content-type” HTTP header that document is served with. When both — header and META element are specified, header takes precedence.
If you control server response and can set up Content-type header properly, it’s safe to omit META element. The only reason to keep it, is to specify encoding when document is viewed offline.
5. <a id=”…” name=”…” …>
The main reason “name” attribute is still used together with “id” is for compatibility with ancient browsers (e.g. Netscape 4). Those couldn’t link to anchors by “id”, so “name” had to be used. If you have elements with pairing name/id’s, and don’t care about ancient browsers, feel free to get rid of this archaic pattern.
Watch out for any side effects. If you’re referencing elements by name in scripts (document.getElementsByName, document.evaluate, document.querySelectorAll, etc.), replacing name’s with id’s might break things. Also remember that document.anchors only returns elements with name attributes.
6. <!doctype html>
A little more than a year ago, Dustin Diaz prposed to use HTML 5 doctype, as a way to cut down on document size. This is not a major optimization, but if you don’t care about validation and need to squeeze every single byte out of the page, using <!doctype html> is a viable option. Tests revealed that this fancy doctype triggers standards mode in a large variety of browsers.
Agressive optimizations
If you’re still craving for more, here are few extreme ideas. Some of these (e.g. omitting optional tags) have been circulating around for a while. Others I haven’t heard mentioned. Even though these might seem way too obtrusive, note that none of them really invalidate a document. That is if document is in HTML, not XHTML. But you’re serving documents as HTML anyway, don’t you? ;)
- Remove HTML comments
- Remove/collapse whitespace
- Remove optional closing tags (
<p>foo</p>→<p>foo) - Remove quotes around attribute values, when allowed (
<p class="foo">→<p class=foo>) - Remove optional values from boolean attributes (
<option selected="selected">→<option selected>) - Munge inline styles, inline scripts and event attributes (if it’s not possible to remove them)
- Munge classes and ids (needs to be in sync with scripts and style declarations)
- Strip scheme names off of URLs (
http://example.com→//example.com)
But we have compression!
Do all of these optimizations even matter when document is compressed? Doesn’t gzip eliminate most of the markup overhead? After all, it’s a textual format we’re talking about!
It still matters.
First of all, it’s good to remember that not everyone is getting gzip. This is very sad, but the good thing is that in such cases HTML optimization plays even more significant role.
Second, even if document is served compressed, there are still savings of 5-10KB after compression (on an average document). Savings are even bigger with large documents. This might not seem like a lot, but in reality every byte counts.
As an example of compressing large document, I munged unofficial HTML version of ECMA-262, 3rd edition specs, which originally weighed about 750KB (131KB gzipped), to 606KB (115KB gzipped). That’s a saving of 16 KB after gzipping, simply by removing whitespace, comments, attribute quotes and optional tags. You can see that optimized version looks the same as the old one.
Finally, optimizations like stripping whitespace and comments actually make resulting document tree lighter, potentially improving page rendering performance.
When things go wrong
As with any optimization, it’s very easy to get carried away. HTML Compact is a good example of HTML compression taken too far. This wonderful Windows app takes “unique” approach at compressing HTML… by writing it into a document via Javascript.
Turning this perfectly clean document:
<html>
<head>
<title></title>
</head>
<body>
<div>
<ul>
<li>foo</li><li>bar</li><li>baz</li>
<!-- few more dozens of list elements ... -->
</ul>
</div>
</body>
</html>
into this mess:
<!--hcpage status="compressed"-->
<html>
<head>
<SCRIPT LANGUAGE="JavaScript" SRC="hc_decoder.js"></SCRIPT>
<title></title>
</head>
<BODY>
<NOSCRIPT>To display this page you need a browser with JavaScript support.</NOSCRIPT>
<SCRIPT LANGUAGE="JavaScript">
<!--
hc_d0("Mv#d|\x3C:,&c@w4YFAtD1 [... and so on, another couple hundreds of characters ...]");
//-->
</SCRIPT>
</BODY>
</html>
Needless to say, this kind of “optimization” should never be performed in the public web. Unless the intention is to make documents inacessible to users and search engines. And it hurts me seeing those NOSCRIPT elements, which fall short in clients behind Javascript-blocking firewalls. Bad idea, bad execution.
Antipatterns
Previous snippet was a good example of optimization anti-pattern. There are, however, few more you should be aware of:
1. Removing doctype
HTML Compresor has an option — on by default — to strip doctype. I can’t think of a case where stripping it would be beneficial. On a contrary, missing doctype triggers quirks mode, and as a result, wreaks havoc on a page layout and behavior. Doctypes should be left alone, or instead, replaced with a shorter — HTML 5 — version.
2. Replacing STRONG with B and EM with I
Another harmful option in the same HTML Compressor is to replace elements with their shorter “alternatives”. The problem here is that B is not really an alternative to STRONG. Neither is I a replacement to EM. STRONG and EM elements have semantic meaning — emphasis, whereas B and I are simply font-style elements; They affect text rendering, but carry no semantic meaning.
Even though browsers usually display these elements identically, screen readers and search engines very much understand the difference.
3. Removing title, alt attributes, and LABEL elements.
A good rule of thumb is to never optimize in exchange of accessibility. You might be tempted to remove that optional “alt” attribute on IMG elements, or “title” on anchors, but saving few dozens of bytes is really not worth often-critical accessibility loss.
Tools
It’s more or less trivial to automate most of the tweaks from “additional optimizations” section. There already exist tools that strip comments, whitespaces, and remove quotes around attribute values. But these are still in their infancy and perform a very limited set of optimizations. We can definitely do better.
A couple of months ago, hakunin and I started working on a similar, Ruby-based compressor, but never had a chance to finish it.
So what do we have so far?
-
Absolute HTML Compressor (desktop, windows)
Does great job, but only after turning off options like stripping doctype and replacing STRONG with I.
-
HTML Compact (desktop, windows)
Makes document inaccessible. Avoid.
-
HTML Compressor (desktop, windows)
Only removes whitespace, and even in whitespace-sensitive elements, such as PRE. Not very useful.
-
Pretty Diff (web-based)
Doesn’t have option to completely remove whitespaces (only collapses them). Doesn’t perform any optimizations except collapsing whitespace and removing newlines. Doesn’t respect whitespace-sensitive elements. Not very useful.
-
htmlcompressor (java-based)
Performs most of the optimizations described here (but doesn’t remove optional tags or shorten boolean attributes). Respects whitespace-sensitive elements. It is more or less best option at the moment.
As you can see, current state of affairs is pretty disappointing. There seem to be no compression tools for Mac/Linux, and those for Windows are hardly useful.
Future considerations
Whereas munging and stripping can (and should) be done during production, markup smells is something that should never happen in the first place. Neither in production, nor in development. Not unless, for whatever reason, they are absolutely necessary.
Unsurprisingly, the best optimization one can do is often a manual one: changing document structure to avoid repeating classes on multiple elements (and instead moving them to parent element), or eliminating chunks that are not immediately needed, and instead loading them dynamically. Replacing miriads of <br>‘s or ‘s used inefficiently for presentational purposes, or that old table-based layout are other good examples of manual cleaning.
As far as all the other little tweaks, I expect more compression tools to appear in the near future, pushing size-reduction boundaries even further.
If you know more ways to optimize HTML, please share. I’d be glad to hear any questions, suggestions or corrections.
- Tested browsers were:
Firefox 1, 1.5, 2, 3, 3.5;
Opera 7.54, 8.54, 9.27, 9.64, 10.10;
Safari 2.0.4, 3.0.4, 4;
Chrome 4 — on Mac OS X 10.6.2.
Internet Explorer 6, 7, 8 on Windows XP Pro SP2, and
Konqueror 4.3.2 on Ubuntu 9.10.
Zach Leatherman said on Dec 29, 2009 @ 9:23
#1Do you have any examples of JavaScript-blocking firewalls? I wasn’t aware of this practice.
Matt Kruse said on Dec 29, 2009 @ 10:37
#2onclick=”javascript:…” is my pet peeve. I still see it so often, and it’s a quick indicator of someone who doesn’t really understand how html and javascript fit together.
I’ve tried to explain it before, including the explanation that “javascript:” is merely functioning as a label. The response? Several times I’ve gotten, “Okay, but it works, so how is it broken? I don’t want to risk something breaking in some browsers if I take it out.”
At that point, it’s time to walk away.
Arnout said on Dec 29, 2009 @ 12:13
#3If you really want to go nuts on optimzing you can start using custom elements as replacements for <div>’s and other elements:
<!doctype html> <html> <head> <title>custom elements</title> </head> <body> <y> <x>custom elements</x> </y> </body> </html>For IE compatiblity you can use the createElement hack. Of course, its not recommend to do so. But it is an option.
kangax (article author) said on Dec 29, 2009 @ 12:50
#4@Zach
I’ve been hearing about some firewalls blocking Javascript for a long time in both — comp.lang.javascript and c.i.w.a.html newsgroups (like this and this posts). Quick search on a web revealed this post of 456bereastreet, where the issue is described in full details. From what I know, this happens with some proxies too (if they are configured to strip javascript, flash, etc.)
Diego Perini said on Dec 29, 2009 @ 12:53
#5@kangax,
good writing as usual and very good points. It was especially good you did split up “suggested”, “additional” and “aggressive” optimizations since you avoid starting wrong discussions about never ending stories.
I take this as another BIG reason to not let humans write HTML pages. ;-)
The HTML output of our WEB assembler tool seems to avoid most of the above extra fat.
@zachleat
Normally Javascript is blocked by proxy servers in corporate sites, Squid is the most known and used proxy server in the world, you can read about a specific module to filter Javascript, VBScript and ActiveX from HTML pages here: http://sites.inka.de/bigred/devel/squid-filter.html
Diego Perini said on Dec 29, 2009 @ 13:01
#6@Arnout,
those tricks do not work, worst in IE where custom tag name cannot be nested and create quite a few problems with getElementsByTagName and other native IE methods. HTML 4.01 has a perfectly defined DTD, only the elements described in that DTD are valid markup.
Alberto Gragera said on Dec 29, 2009 @ 15:33
#7Umm, removing the name attribute from everywhere will break out form submission, won’t it?
kangax (article author) said on Dec 29, 2009 @ 17:00
#8@Alberto Gragera
The goal is to remove name’s that serve as anchors, such as those on
Aelements —<a id="..." name="..." ...>. You can see that HTML 4.01 specs, for example, use name’s for anchoring, not id’s. And already mentioned ECMA-262 specs use both — name’s and id’s for anchoring — something that could be easily avoided.Names of form controls, that carry certain information should obviously be left alone ;)
@Diego
Thanks for kind words and the link to Squid filtering. This proves once again that there are indeed cases of Javascript being stripped off of web pages.
Alberto Gragera said on Dec 29, 2009 @ 17:07
#9I see, I just misunderstood the article point because I’ve never used name for anchoring :)
Mariusz Nowak said on Dec 30, 2009 @ 1:50
#10Wouldn’t it be better to totally omit href attribute in <a> when we don’t use it as link ? It’s valid, of course drawback is that we need to add ‘cursor: pointer’ in css but still I prefer that than later dealing with weird “../#” url’s in address bar.
Other thing is that probably more (semantically) correct would be to use <button> in place of <a> that is not link ? According to spec <button> can live without a form, of course we get some styling hell then but if we speak strictly semantic (?) What’s your opinion on that ?
Andrea Giammarchi said on Dec 30, 2009 @ 9:56
#13good reading as usual. I have a couple of points as well.
As somebody already spotted a link without an href is basically not a link anymore.
Moreover, use the hash rather than Jurassic javascript:void(0) won’t change anything for both search engines and people with JavaScript disabled.
Last, but not least, the hash as link href is the reason a lot of websites flickers every time we click something.
About links I am absolutely in favor of progressive enhancement – a link should point somewhere, or it’s not a link, if it points somewhere all we need to do is this, once the DOM has been loaded (I use a jQuery snippet to save some byte) :
$("a").click(function (e) { // do magic stuff with this.href // or e.target.href ... // location.href is allowed as well // avoid bloody page flickering ... (to top) return false; });With above snippet during the page download links will be useful, search engines will be able to follow links, users won’t be stressed if JS is disabled, users with JS won’t be stressed by flickering or broken links without understanding what they are calling.
Moreover, we can add an Ajax header or a ?ajax=true as query string so that the server can handle different requests/responses.
About id and names … try the classic object + embed Flash inclusion and discover that both Safari, Google, and Opera will return an *array* via document[the_name_or_the_id] so it is highly suggested to choose a unique id or a unique name for both anchors and selections.
Broken tags are the technique used by Google for its home page where the body is truncated and the one in charge to fix this problem is the browser … but we should never forget that progressive render, the one that makes tableless “faster” than table based layouts, simply means that hte render knows we are already inside the BODY tag, it does not need to find the BODY end to render the page but it will eventually close it when, and if, encountered.
Once a node has been opened it is rarly necessary to close it except we could have unexpected nested behaviors, so in my opinion, a couple of closed tags cannot make any difference while the layout will be clearer and less ambiguous.
Finally, I am with W3 since the beginning and I would not suggest missed quotes for inline events or anything that is not possible to validate.
We all need to understand that these optimizations are ridiculous if we use 3 nested divs just to emulate an inline button with images or other CSS decorations … those, plus other good suggestions here, are a waste of bytes much more important than anything else.
The point about external JavaScript and CSS? Easy, a layout can be hardly reused while external files, unless these are not constantly changing, can speed up websites up to 80% (where 20% is the layout and nothing else, the rest is handled where it is necessary: CSS and JS)
Regards (and Happy New Year)
André R. said on Jan 2, 2010 @ 13:41
#15As for the issue with firewalls blocking javascript, I do not fully agree that this is something I as a developer need to take into account in my coding. Of course, make sure your login functionality and other stuff are working without the need for using the noscript tag. But when known libraries (in my case it was TinyMCE being blocked once in a while, but not all the time) gets blocked, then I would say fix/update your firwall..
kangax (article author) said on Jan 2, 2010 @ 14:52
#16@Mariusz Nowak
I think it’s OK to use
Awithout URL, for some script-driven functionality, as long as you handle javascript-less clients gracefully. If anchor only works with Javascript, make sure to display it only when Javascript is available. Even better, make sure that whatever it is that anchor is responsible for (e.g. toggling, xhr request, etc.) works, and only then present it to a user. In other words, don’t be confusing (as N. Zakas says). Make sure users don’t end up with non-functional UI element.@Andrea Giammarchi
Thanks for your comments :) I definitely want to test rendering performance of unclosed elements (preliminary tests show that unclosed ones are actually parsed slightly faster than closed ones!). And you have a very good point about insignificance of these kind of optimizations comparing to things like double/triple nested elements for rounded corners.
@André R.
The point about
NOSCRIPTI was trying to make is that they are essentially an inferior alternative to other ways of graceful degradation / progressive enhancement. By designing pages/apps in such way that main, basic functionality is present from the start, and is then enhanced with scripting, you don’t need to care about firewalls, proxies or whatever else.Using
NOSCRIPTelements is simply not reliable, and employing good old progressive enhancement usually takes just as much time/effort as usingNOSCRIPT. Why prefer something less reliable if better alternative costs the same? ;)Diego Perini said on Jan 3, 2010 @ 9:17
#17@kangax,
another suggestion to avoid problems with proxy and alike products is don’t use META tags to do URL redirects:
[meta http-equiv="refresh" content="0;url=http://webdesigners.com]
shouldn’t be anywhere in new pages, still many old sites (and bad SEOs) use that for so called “door pages” :)
Old Intranet applications also use that trick extensively, but those are not a concern (just used in LANs).
Zach said on Jan 3, 2010 @ 16:38
#18I’d like to see some numbers on these optimizations, first to see if time would be better spent on other optimizations, and second, if they actually speed things up. Are you optimizing for size or speed? Some of these tips seem like they will slow down time it takes the browser to parse the markup. This is dependent on the browser engine, of course, but the more guessing it has to do, the longer it takes.
Tim said on Jan 3, 2010 @ 19:55
#19I’m normally an XHTML guy, so it pains me to see unclosed <p> or <li> tags. However, your initial comments on a simple and solid HTML foundation, robust against future revisions, is a very valuable lesson that I wish more people understood. Sadly, I think there isn’t much overlap of programmers and designers. Programmers tend to get the simple foundation correct, but can’t do anything visually appealing. Meanwhile, the designers make a document in Photoshop and then just slice it into tables and images until the cows come home, making sure that image rollovers are done entirely without CSS.
I came to this article by a Twitter post from @paul_irish. Very glad I found it :)
Gabriel said on Jan 4, 2010 @ 4:05
#20Hi. Nice tips.
About point 5, “id” and “name” are required when you try to get a flash movie object() for js as3 communication.
gervais.b said on Jan 4, 2010 @ 12:05
#21Usefull article, I like many of your tips but I’m a developper and I use many frameworks who generate tags for me.. The framweorks writers will also take care of these tips.
@Andrea Giammarch you said :Moreover, we can add an Ajax header or a ?ajax=true as query string so that the server can handle different requests/responses.
I don’t know anything about an Ajax header. Can you tell me more about that ?
Thanls
Rob said on Jan 5, 2010 @ 0:27
#22Remember that shifting scripts and styles into externally-referenced files so that they can be better cached is probably the wrong thing to do. If caching is your priority, you really want to focus on making sure that your HTML page is getting cached, rather than making it a few bytes smaller.
WebDesignExpert.Me said on Jan 5, 2010 @ 1:27
#23As you have said, removing Title, Alt attributes for images is not a good idea. In fact we need these and similar other attributes for SEO reasons also. (Not only for accessibility). If they are not present, we should in fact add these attributes to the IMG tags. Further, Label tags in forms and TITLE tags in anchor tags also are employed to provide more descriptive content to screen readers used for accessibility reasons/search engines.
jive said on Jan 5, 2010 @ 7:28
#25I’ve seen some sites use XML and style everything with XSL, so they don’t send a bunch of repetitive tags over. A few pages only had like 3 nodes on the page. However I’m not sure how that plays out with SEO.
SMiGL said on Jan 5, 2010 @ 7:59
#26Good collection. Thanks!
Shawn Medero said on Jan 5, 2010 @ 14:06
#27@Zach Leatherman
I know in the early 2000ish era employees surfing the web from inside Kodak’s corporate network went through a proxy that stripped all external Javascript references and inline Javascript. I’ve heard similar tales from other corporate firewalls and proxies. It is hard to imagine this is widespread, but it was practiced for a time.
kangax (article author) said on Jan 5, 2010 @ 17:33
#28@Zach
I was actually wondering about same thing, but haven’t had a chance to perform any extensive benchmarking. Based on few cursory tests, unclosed elements are actually parsed faster than closed ones in at least Firefox 3.6. Safari 4 shows similar results. Opera 10.10 seem to be consistently slower with unclosed elements, but 10.50 alpha already shows identical numbers. Even though results vary based on a browser, the difference in parsing of ~3500 elements is only few dozens of milliseconds (~15ms in FF3.6, ~1-2ms in Safari 4, ~30ms in Opera 10) on my machine.
Similarly, unquoted attributes seem to be parsed ~3 times faster than quoted ones in Firefox 3.6 (using this test page). Safari 4 and Opera 10.10 give more or less identical results. I also tried experimenting with different attribute values and noticed that Firefox seems to be taking a noticeable hit with values containing characters like “-” (e.g.
<a ... name="some-value-with-dashes">...</a>→<a ... name=some-value-with-dashes>...</a>). This needs more investigation.As far as items from “additional” and “aggressive” sections, those are definitely micro optimizations. They don’t even come close to other well-known techniques such as turning on compression, setting up expires headers, and reducing overall number of requests. Tweaking HTML should definitely be one of the least important optimizations.
Mathias Bynens said on Jan 6, 2010 @ 5:08
#29@kangax, In these performance tests, have you tried putting the section with unclosed elements before the one with closed elements, instead of after it? The results are different if you do, so I’m afraid these tests mean nothing.
Mathias Bynens said on Jan 6, 2010 @ 6:18
#30Those tests have to be run locally, not online, to prevent network load from interfering with the results. But still, even then it seems to matter which elements come first in the document.
Zach Leatherman said on Jan 6, 2010 @ 7:11
#31Thanks for the links guys, learned a lot there.
I have personally witnessed the Juniper SSL VPN run my JavaScript through its own security filter (changing it in the process), so it isn’t just Proxy servers that perform this practice.
Given that the amount of bad code out there will always outweigh the good, I would guess that the number of people configuring their VPN or Proxy to mangle/disable JavaScript will become less over time. Of course, this is still a very important problem to address, but Proxies/VPN’s are not the primary driver here.
RazorX said on Jan 6, 2010 @ 13:20
#32Uh yeah… its not so black and white, if you remove name= out of your FORM, then upon submission your name/value data pairs are unread… but I see above that someone already noticed this too. Dreamweaver is very name= happy so it would be nice to be able to specify when name= gets automatically added within Dreamweaver. Anyone know how that could be done? I’m fine with removing name= when its not needed, but I don’t have the time to go through each page and cherry pick’em out. :)
kangax (article author) said on Jan 6, 2010 @ 20:48
#33@Mathias Bynens
Thanks for looking into this and debunking my poor testing methodology :) Ironically, I thought about switching places of 2 versions, but then completely overlooked it in the end. Trying out your version, locally, I do in fact see that first section is consistently parsed faster. I tried these variations separately (only leaving one section — closed or unclosed) and simply took average of 10 runs:
Based on these results, unclosed seems to be parsed slightly slower, although that abnormally high value at the end (239) somewhat spoils the result. If we take a median of first set, it is 200.5. Comparing it to 199 — median of the second set, the difference is not that significant anymore.
It looks like we are only scratching a surface here, and more tests need to be made in order to make any solid conclusions. I would be curious to see any other findings on parsing performance. As of now, the difference seems to be insignificant (and besides, how often do we really have documents with as many elements as there are in a testcase? :))
Please, let me know if I missed or overlooked something.
RazorX said on Jan 7, 2010 @ 6:21
#34Kangax you stated to “Remove optional values from boolean attributes (option selected=”selected”) to option selected” Ok sounds good, but Dreamweaver CS4 automatically inserts (option selected=”selected”) when you highlight a list value to be initially selected. I’m all for clean code, but again since I am self-employed as a full time web developer I don’t have the extra time to go back and manually modify every occurrence. If you can tell me how to hack Dreamweaver CS4 and force it to only do (option selected) then I will do it.
To me this is all a question of speed, time, and money. If it all validates as XHTML then the client is happy and I’m happy, but anyone who owns their own business understands with Windex-like clearness that more code maintenance time equals more project time consumed. I’m fine with clean code, but not at the nit-picky expense of chewing up an extra hour out of each day just to satisfy the invisible rules of perfectionism and cherry-delete Dreamweaver’s automated code that still validates.
So bottom-line, unless someone can tell me how to hack Dreamweaver CS4 to produce this, I won’t be wasting valuable project time to manually modify every occurrence of (option selected=”selected”) because it still validates.
Mathias Bynens said on Jan 7, 2010 @ 6:24
#35@kangax:
I’m not sure if you got this via Twitter, but anyway, here goes: performance test of closed vs. unclosed elements, using
.innerHTML. This test calculates the number of parses per second, which is probably a better methodology.Zach Leatherman said on Jan 7, 2010 @ 6:43
#36@RazorX,
The tip was to remove
namefrom<a>elements only, and to useidinstead: no mention was made of form elements. Just remember, your code and your tools are your responsibility.Michael Pehl said on Jan 8, 2010 @ 1:33
#37Nice tipps.
@RazorX
As far as i know there is no way to “hack” DW, although i think this “tool” is not for real (x)html coders anyway.
Personally i use Notepad++ or UltraEdit to code, so i have always full “control” ;)
IMO these tipps are nice, but they will only save some bytes and maybe sometimes you will get problems with older browsers.
YSlow for example gives you nice tipps what you can do with just minifying js/css and compressing your graphics.
In my case it is nearly impossible for me to check every website for any tags i can remove or whatever, because we have over 4500 html/php/aspx files.
But anyway, maybe when Google tells the developers/coders how the code has to be to get better ranking, …
Dave Artz said on Jan 12, 2010 @ 13:50
#40Great collection of tips, save for one point.
Do you have any evidence that search engines and screen readers interpret <b> any differently than <strong>? Matt Cutts has said they are weighted the same in Google. On the internets its rumoured MSN and Yahoo weigh <b> more. As for accessibility, screen readers have long understood what bold is.
Put me in the camp that thinks <strong></strong> is 10 characters too many and holds no more semantic value than what <b> *means* :)
Ahmada said on Jan 12, 2010 @ 15:36
#41I wont agree with items 3,4,5 on aggressive optimization!
even if they validate with html 4.0
It’s considered bad practice!
btw I like the term ( Front End Optimzation )!
haRacz said on Jan 15, 2010 @ 11:47
#43Performance Implications of charset
it explains why advice #4 in Additional optimizations (meta http-equiv=”Content-type”) is bad :)
dougwig said on Jan 16, 2010 @ 19:27
#44Onclick is back, mostly for performance reasons. Attaching 20 or 30 onclick behaviors on page load takes its toll if your site is complex. See the following vid on High Performance Javascript: http://video.yahoo.com/watch/1041101/3881103
qammar said on Jan 20, 2010 @ 5:48
#45Hi, very well written and good effort, this will help all the coder, all those
who think they are expert in web development one.I must say that in code optimization
we should not ignore the code flow, and w3c recommendation.
well, we have some books in pdf format for the programmer,web designer, and web
developers. that anyone can freely download and use here
http://www.pkshops.com/freeebooks.php
much thanks for sharing.
qammar feroz
http://www.pkshops.com
Matt said on Jan 21, 2010 @ 18:58
#46This still feels like severely premature optimization to me – has anybody EVER said, “well, the response time of the server is great, but the 25 extra bytes of HTML is f**king up the whole thing!” It’s almost as bad as the idiots who squish the daylights out of their JS files, then send them with no-cache set in the headers…
Jan said on Jan 22, 2010 @ 6:54
#47Thanks for the great write up. Your first recommendation to remove the (//<!−−) comments from Javascript will make page validation fail on the W3C validator, as well as another validator I tried. Although I don’t like keeping ancient stuff like that on my page, being able to use page validation seems more important in this case.
Evan said on Feb 10, 2010 @ 3:35
#50Very interesting post, thanks!
There’s a typo under item 5 of Markup smells: “Contituting” instead of “Continuing”. Just FYI.
Jan said on Feb 10, 2010 @ 4:07
#51Actually, I need to qualify that last comment, it turns out a lot of javascript will pass correctly through validation as long as there are no html tags inside the script.
Gale said on Feb 23, 2010 @ 8:00
#52You can’t send variables to server script without name attribute
Jackson Lamberton said on Apr 25, 2010 @ 7:15
#59hiya. good info. It’s kinda interesting though, that most people do everything by hand. Really! For somethings it’s just obvious that it should, like ppc and seo. really!
Garrett said on Jun 4, 2010 @ 20:47
#62Stripping of scheme for stylesheets results is fine advice, but can result in a duplicate request in MSIE for CSS files.
http://stevesouders.com/tests/schema-double.php
Jason said on Oct 11, 2010 @ 8:09
#63Lots of interesting ideas. I’m guilty of inlining event handlers and (especially) styles when the focus is more on speed of development than it is on speed of execution, and I never knew script language=”…” was deprecated.
I would shy away from cutting out form method=”get” and input type=”text” because I consider them slightly lesser forms of the semantic difference you mention between strong and b.
For the aggressive optimizations, I would certainly do 1, 2 and 6, maybe 7 if I had an appropriately reliable tool to do so, and I would consider the others except for 3. I understand why optional tags are optional, but being explicit has always felt far safer to me.
Also, you might want to consider adding to the article, in big bold letters, “REMOVE NAME ATTRIBUTES FROM A TAGS ONLY”, and “ONLY REMOVE THE CHARSET META TAG IF YOU SPECIFY CHARSET IN THE HTTP HEADERS”. :)
Pekka @ RX Confidential said on Dec 9, 2010 @ 15:57
#65Any news on the Ruby based compressor? Went over to Github but seems nothing much has been done for a long time. LOL, don’t get me wrong, I totally understand how stuff gets in the way of development but after trying a few tools I just haven’t found one I like and I’m very curious about one based on Ruby and developed by someone I can see for myself know what theyre talking about.
Simaho said on Mar 11, 2011 @ 13:17
#67Checked your Minifier with my site that uses HTML5 doctype and the minifier mentioned that small is only presentational. According to the Spec small now has a semantic meaning.
Small Element HTML5
Pretty good job anyways.
Simaho
cash loans payday said on Apr 9, 2011 @ 3:07
#69I’d been just browsing here and there but happened to be you just read this post. I must say that i’m in the hand of luck today in any other case getting this excellent post to learn to read wouldn’t are actually achievable for me, at least. Really appreciate your content.
musokna said on Apr 21, 2011 @ 17:43
#70Ooo maximus-okna – окна, производство и установка окон, остекление, Пластиковые окна дешево.
Lars said on Apr 27, 2011 @ 0:55
#71The G-WAN/Linux Web server reduces HTML, CSS, and Javascript on the fly – never touching files on disk and serving an optimized copy stored in memory (great to keep one single human-readable version of the files).
G-WAN also translates small CSS images into CSS Data URIs to save HTTP requests.
Besides content reduction, G-WAN is also faster than Nginx, Lighttpd (or even HTTP accelerators like Varnish and Apache Traffic Server, see Nicolas Bonvin’s blog) and offers native ANSI C scripts that scale better than C#, Java or PHP.
Life can be made easier!
Navigationsgeräte said on Aug 3, 2011 @ 15:19
#72Aw, this was a very nice post. In thought I want to put in writing like this moreover – taking time and precise effort to make a very good article… but what can I say… I procrastinate alot and certainly not appear to get one thing done.
liqiao said on Aug 8, 2011 @ 18:01
#73般若品牌策划设计有限公司