Perfection kills

Exploring Javascript by example

Archives Posts

JScript and DOM changes in IE9 preview 3

June 24th, 2010 by kangax

3rd preview of IE9 was released yesterday, with some amazing additions, like canvas element and an extensive ES5 support. I’ve been digging through it a little, to see what has changed and what hasn’t — mainly looking at JScript and DOM. I posted some of the findings on twitter, but want to also list them here, as it’s not very convenient to share code snippets in 140 characters. Referencing it all in one place will hopefully make it easier for IE team to find and fix these deficiencies.

ECMAScript 5 and JScript

The big news is that IE9pre3 has (almost) full support for ES5. By “full support”, I mean that it implements majority of new API, such as Object.create, Object.defineProperty, String.prototype.trim, Array.isArray, Date.now, and many other additions. As of now, IE9 implements the largest number of new methods; even more than latest Chrome, Safari and Firefox. Unbelievable, isn’t it? :)

screenshot of es5 compatibility table

You can see the results in this compatibility table (note that it lists results of mere “existence” testing, not any kind of conformance).

What’s missing is strict mode, which actually isn’t implemented in any of the browsers yet.

Some of the things I noticed:

ES5 Object.getPrototypeOf on host objects seems to lie, always returning null instead of proper value of [[Prototype]]:

  Object.getPrototypeOf(document.body); // null
  Object.getPrototypeOf(document); // null
  Object.getPrototypeOf(alert); // null
  Object.getPrototypeOf(document.childNodes); // null

This doesn’t happen in other browsers that implement Object.create at the moment, such as latest Chrome, WebKit or Firefox. In Chrome, for example:

  Object.getPrototypeOf(document.body) === HTMLBodyElement.prototype;
  Object.getPrototypeOf(document) === HTMLDocument.prototype;
  Object.getPrototypeOf(alert) === Function.prototype;
  Object.getPrototypeOf(document.childNodes) === NodeList.prototype

… and so on.

Interestingly, bound functions in IE9pre3 are represented as “function(){ [native code] }”, similar to host objects:

  var bound = (function f(x, y){ return this; }).bind({ x: 1 });
  bound + ''; // "function(){ [native code] }"
 
  // compare to
 
  alert + ''; // "function alert(){ [native code] }"

Note how function representation does not include identifier (f), parameters (x and y), nor representation of function body (return this;). This of course proves once again that relying on function decompilation is NOT a good idea.

Whitespace character class (as in /\s/) still doesn’t match majority of whitespace characters (as defined by specs). These include “U+00A0”, “U+2000” to “U+200A”, “U+3000”, etc. The test is available here. Curiously, ES5 String.prototype.trim seems to “understand” those characters as whitespace very well, producing empty string — as expected — for something like '\u00A0'.trim().

It was nice to see that ES5 Array.isArray is about 20 times faster than custom implementation, such as this one:

  function isArray(o) {
    return Object.prototype.toString.call(o) === "[object Array]";
  }

The difference in speed is similar to other browsers that implement this method.

An infamous, 10+ year-old JScript NFE bug, which I described at length before, is finally fixed:

  var f = function g() { return f === g; };
  typeof g; // "undefined"
 
  f(); // true

arguments’ [[Class]] is now an “Arguments”, just like ES5 specifies it:

  var args = (function(){ return arguments; })();
  Object.prototype.toString.call(args); // "[object Arguments]"

DOM

Unfortunately, the entire host objects infrastructure still looks very similar to the one from IE8. Host objects don’t inherit from Object.prototype, don’t report proper typeof, and don’t even have basic properties like “length” or “prototype”, which all function objects must have:

  alert instanceof Object; // false
  typeof alert; // "object"
  alert.length; // undefined

Because they don’t inherit from Object.prototype, we don’t have any of Object.prototype methods, naturally:

  alert.toString; // undefined
  alert.constructor; // undefined
  alert.hasOwnProperty; undefined

Object.prototype is not the only object host methods fail to inherit from. In majority of modern browsers, host objects also inherit from Function.prototype and so have Function.prototype methods like call and apply. This doesn’t happen in IE9pre3.

  alert instanceof Function; // false
  document.createElement instanceof Function; // false
 
  alert.call; // undefined

Curiously, call and apply are present on some host objects, but they are still not inherited from Function.prototype:

  typeof document.createElement.call; // "function"
  document.createElement.call === Function.prototype.call; // false

Host objects’ [[Class]] is far from ideal as well. IE9pre3 actually violates ES5, which says that objects implementing [[Call]] (or in other words — are callable) should have [[Class]] of “Function” — even if they are host objects. In IE9pre3, alert is a callable host object, yet it reports its [[Class]] as “Object” not “Function”. Not good.

  Object.prototype.toString.call(alert); // "[object Object]"
  Object.prototype.toString.call(document.createElement); // "[object Object]"

IE9pre3 still messes up DOM objects’ attributes and properties, although not as badly as earlier versions:

  var el = document.createElement('p');
  el.setAttribute('x', 'y');
  el.x; // 'y'
 
  el.foobarbaz = 'moo';
  el.hasAttribute('foobarbaz'); // true
  el.getAttribute('foobarbaz'); // 'moo'

Some old, humorous bugs can still be seen in IE9pre3, such as methods returning “string” when applied typeof on:

  typeof Option.create; // "string"
  typeof Image.create; // "string"
  typeof document.childNodes.item; // "string"

Undeclared assignments still throw error when same-id’ed elements are present in DOM, however not with same-name’ed elements (as it was in previous versions):

  <div id="foo"></div>
  <a name="bar"></a>
  ...
  <script>
    foo = function(){ /* ... */ }; // Error
    bar = function(){ /* ... */ }; // no Error
  </script>

Similarly to IE8, only Element and specific element type interfaces (HTMLDivElement, HTMLScriptElement, HTMLSpanElement, etc.) are exposed as same-named global properties. Node and HTMLElement are still missing, and element’s prototype chain most likely still looks like this:

  document.createElement('div');
    |
    | [[Prototype]]
    v
  HTMLDivElement.prototype
    |
    | [[Prototype]]
    v
  Element.prototype
    |
    | [[Prototype]]
    v
  null

…rather than what can be seen in almost all other modern browsers:

  document.createElement('div');
    |
    | [[Prototype]]
    v
  HTMLDivElement.prototype
    |
    | [[Prototype]]
    v
  HTMLElement.prototype
    |
    | [[Prototype]]
    v
  Element.prototype
    |
    | [[Prototype]]
    v
  Node.prototype
    |
    | [[Prototype]]
    v
  Object.prototype
    |
    | [[Prototype]]
    v
  null

getComputedStyle from DOM Level 2 is still missing, however its value is mysteriously a null, not undefined. The property actually exists on an object, but has a value of null. Hopefully, this is just a placeholder and proper method will be added before final release.

  document.defaultView.getComputedStyle; // null
  'getComputedStyle' in document.defaultView; // true

Array.prototype.slice can now convert certain host objects (e.g. NodeList’s) to arrays — something that majority of modern browsers have been able to do for quite a while:

  Array.prototype.slice.call(document.childNodes) instanceof Array; // true

That’s it for now.

Unfortunately, I don’t have much time to look into these things extensively, at the moment. There might be more updates on twitter.

As always, any corrections, suggestions, and additions are much appreciated.

Archives Posts

Tag is not an element. Or is it?

June 1st, 2010 by kangax

It’s interesting how widely some misconceptions spread around. The one I noticed recently is the “issue” of elements vs. tags. The problem is that people say tags when they mean elements, and do it so often that it’s not clear if the distinction is still relevant.

Or if anyone even cares anymore.

Elements vs. tags

If you look at section 3 of HTML 4.01 — “on SGML and HTML”, there’s an explicit note about elements not being tags. In HTML 4.01,
<p>foo bar</p> is an element, not a tag. An element consists of a start tag, content, and an end tag. In case of <p>foo bar</p>, <p> is a start tag, foo bar is content, and </p> is an end tag.

In other words, elements consist of tags.

Optional tags

The distinction between tags and elements becomes slightly less clear once we start dealing with elements that have optional tags, as defined by HTML 4.01. For example, <p> or <td> elements don’t have to have end tags. They could very well exist without them. When parser finds <p>foo bar in markup, it still creates an element. There’s no end </p> tag, but parser doesn’t really need it; start <p> tag already denotes what kind of element it is.

  <p>foo bar
 
  <tr>
    <td>baz
    <td>qux
  </tr>

But that’s not all.

Some elements, besides having optional end tags, have empty content model, which means that they can’t have any content at all. And when an element is not allowed to have any content and has an optional tag, it’s called an empty element. Not only are end tags optional in such elements, but they must be completely omitted. These, unfortunately, are not some obscure elements, but are very much useful ones like <br>, <link>, <img>, <input>, <meta> and few others.

What’s interesting is that <br> is still an element, only an element that consists of start tag only. It’s just that its content and end tag must never be present. The fact that <br>, <img> or other empty elements consist of start tags only, makes things rather confusing.

And we’re not even talking about elements with both tags optional — <html>, <head>, <body>. Those could exist without any visible traces at all, and are only created based on the context.

  <html>
    <!-- 
            There's no HEAD start tag, no HEAD end tag, and no HEAD content here. 
            Yet, HEAD element is still created implicilty.
            This happens because content model of HTML element is defined as `head, body`, 
            which means that both elements should be present in HTML element in that order. 
            As soon as BODY start tag is found, even if HEAD tags are not present, 
            HEAD element is created automatically.  -->
    <body>
    ...
    </body>
  </html>

Which confusion?

So which practical implications does this confusion actually have?

For one, saying something like “insert an image after a <p> tag” is ranging from “wrong” to “ambiguous”, since we can’t insert anything but a chunk of text after a <p> tag, and <p> tag can be either a start one (<p>) or an end one (</p>). In this case, a better way would be to say — “insert an <img> tag after a start <p> tag”:

  <p>
    <img ...> <!-- IMG tag is inserted after a start P tag -->
    ...
  </p>

in which case <img> element would become a child of <p> element. Or we could say — “insert an <img> tag after an end <p> tag”:

  <p>
    ...
  </p>
  <img ...> <!-- IMG tag is inserted after an end P tag -->

in which case <img> element would be a sibling following <p> one.

Of course, most of the time, what people really mean by “insert an image after a <P> tag” is a second version. It’s just that “element” is accidentally replaced with a “tag”. An even better way — and the one that avoids mention of tags in the first place — is to say “insert an <IMG> element after a <P> element”. This version leaves no room for incorrect interpretation.

Global confusion

What’s interesting about all this is not so much the finer points of difference between tags and elements, but just how widely this misconception prevails. Google search returns 480,000 results for “div tag”, but only 137,000 for “div element”. For an empty element, such as img, the difference is even scarier — “img tag” returns 959,000 results, while “img element” only 48,200. An element is confused for a tag everywhere, from blogs, articles, and mailing lists to books, references, and frameworks.

Pedantry or an important distinction?

Once you start thinking about the distinction, edges become somewhat blurry. Are all of the examples above really wrong?

When describing “image_tag”, Ruby on Rails documentation says “Returns an html image tag …”. The returned string — “<img …>” — can actually very well be considered an image (start) tag. Yes, the string represents an element, but since an element is empty, it’s also a string that consists of <img> tag only, and so can probably be called an “image” tag.

At the same time, “javascript_include_tag” already crosses the line of correctness. It still uses “Returns an html script tag, but already returns a string that can only be considered an element — “<script type=”text/javascript” src=”…”></script>”, since there’s now a start tag, content (empty), and an end tag.

w3schools is just plain wrong [1], saying things like “The <div> tag defines a division or a section in an HTML document.” or “The <div> tag is often used to group block-elements to format them with styles.”. Tags do not define division, they represent elements, and it is elements that have certain semantic meaning; in this case — division.

In some of the popular articles, we can find phrases like “… the nearer ancestor of our <footer> tag is the <body> tag …”, in which case it’s pretty clear that “tag” is not the right word at all; Tags can not be ancestors, but elements can.

However, saying that “browser supports <video> tag” is technically not wrong, since browsers supporting <video> element, most definitely can parse and understand <video> tags as well (it is by recognizing video tags that they are able to create video elements in DOM).

Speaking of DOM…

What about DOM?

Before I knew the difference between tags and elements, I would always think in terms of tags when talking about HTML, and in terms of elements when talking about DOM. It just made sense that HTML, being markup language, consists of tags, while HTML DOM — or rather, the document available for scripting — is a tree-like structure consisting of elements, and other kinds of nodes. I knew that browser parses HTML markup (and so tags), and then creates a tree-like structure to represent a document, in which case tags essentially become elements. The fact that elements are not just kinds of nodes, but are also chunks of text in markup seemed very strange when I first found out about it.

It seems that this is exactly how most of the people think about tags vs. elements. Tags exist in HTML (text), and elements – in document (DOM). This would explain why tags prevail in discussions about HTML, or markup in general; and why elements are mostly mentioned in context of scripting, rendering, etc.

Nevertheless, I believe that keeping terminology straight is important. Things should be called as they really are, to avoid the ambiguity that we’ve seen in the previous example. A method named something like forEachTag should not iterate over each element, and vice-versa; technical discussions, articles, and documentation should really strive to use proper terms.

What now?

The attempts at demystification were already made in the past, yet the effect is barely visible. So I wonder — why? Is it too unintuitive to speak in terms of elements in context of HTML, or is this a lack of explanation and exposure of the subject? Does the distinction even matter? Or does it matter in technical discussions only? Does it make sense to distinguish these two entities, or should we just try to infer the exact meaning based on the context, as it seems to be done right now? Are we all simply used to the word “tag”, and don’t care about the difference most of the time?

What do you think?

[1] …which is not surprising, considering the amount of other misconceptions on that site, such as classifying HTML comments as tags.

Filed under don'ts, html having 10 Comments »