Perfection kills

Exploring Javascript by example

How ECMAScript 5 still does not allow to subclass an array

July 15th, 2010 by kangax

Subclassing an array in Javascript has never been a trivial task. At least for a certain meaning of “subclassing an array”. Curiously, new edition of the language — ECMAScript 5 — still does not allow to fully subclass an array.

Not everything is lost though, and there are few ways ECMAScript 5 makes this task closer to the ideal. However, there are few fundamental issues which prevent true array subclassing from happening.

Let’s talk about that.

Today we’ll take a look at what it means to subclass an array, what some of the existing implementations/workarounds are, and which drawbacks those implementations have; We’ll see what ECMAScript 5 brings to the table, and what those fundamental issues with subclassing are. We’ll also talk about alternative approaches to subclassing an array, such as using wrappers, and get to know their limitations.

But first, what does it mean to subclass an array? And why do we even need it?

Why subclass an array?

We can define “subclassing an array” as the process of creating an object which inherits from native Array object (has Array.prototype in its prototype chain), and follows behavior similar (or identical) to native array.

The last point about behavior similar to native array is actually very important, as we’ll see later on. Having “subclass” of array could be thought of as being able to create an array object, but an object which would inherit not directly from Array, but from another object, and only then from Array.

In other words, we want behavior similar to this:

var sub = new SubArray(1, 2, 3);
sub; // [1, 2, 3]
 
sub.length; // 3
sub[1]; // 2
 
sub.push(4);
sub; // [1, 2, 3, 4]
 
// etc.
 
sub intanceof SubArray; // true
sub intanceof Array; // true

Note how SubArray constructor creates a sub object identical in its behavior to array (object has “length” property, numeric “0”, “1”, “2” properties, and inherits Array.prototype.* methods). At the same time, it is SubArray that a sub object directly inherits from, not Array.

So what exactly is the purpose of doing all this? Why subclass an array in such way?

There are usually two reasons:

1. Avoid pollution of global Array

Javascript prototypal nature makes it easy to extend all array objects with custom methods. Instead of assigning to direct properties of array objects, it’s much easier and more efficient to assign to array’s “prototype” object (the one that’s usually accessed via Array.prototype).

Array.prototype.last = function () {
  return this[this.length-1];
};
// ...
[1, 2, 3].last(); // 3

However, extending Array.prototype comes with the price; And that price is chance of collisions. When scripts coexist with other scripts in an application, it’s important for those scripts not to conflict with each other. Extending Array.prototype, while tempting and seemingly useful, unfortunately isn’t very safe in a diverse environment. Different scripts can end up defining same-named methods, but with different behavior. Such scenario often leads to inconsistent behavior and hard-to-track errors.

Collisions can happen not only with user-defined code, but also with proprietary methods implemented by environment itself (e.g. Array.prototype.indexOf from JavaScript 1.6, before it was standardized by ES5) or from future standards (e.g. Array.prototype.map, Array.prototype.reduce, etc. — now all part of ES5).

Using constructor function other than native Array — but with same behavior — would allow to avoid such collisions. Instead of extending Array.prototype, another object would be extended (say, SubArray.prototype) and then used to initialize (sub)array objects. Any third party code which depends on methods from Array.prototype would still be able to safely use them.

2. Create data structures naturally inheriting from array

Another reason to subclass an array is to be able to create data structures, which naturally inherit from array; such as Stack, List, Queue, Set, etc. While there are certainly valid use cases for these structures, in this article I will instead focus on the first aspect — reducing chance of collisions. It is somewhat more relevant in context of cross-browser scripting.

Naive approach

Creating objects that inherit from other objects is more or less straightforward in Javascript. We can use well-known clone method:

function clone(obj) {
  function F() { }
  F.prototype = obj;
  return new F();
}

and then set-up inheritance like this:

function Child() { }
Child.prototype = clone(Parent.prototype);

clone might look confusing, but all it does is create an object with another object as nearest ancestor in its prototype chain. It uses intermediate function to avoid executing “parent” constructor. In this example, new Child creates an object with Child.prototype as first object in the prototype chain, Parent.prototype — second, and so on. To visualize, the prototype chain here looks like this:

new Child()
    |
    | [[Prototype]]
    |
    v
Child.prototype
    |
    | [[Prototype]]
    |
    v
Parent.prototype
    |
    | [[Prototype]]
    |
    v
Object.prototype
    |
    | [[Prototype]]
    |
    v
   null

Using clone method is exactly what person attempts when trying to subclass an array for the first time:

function SubArray() {
  // Take any arguments passed to constructor and add them to an instance
  this.push.apply(this, arguments);
}
SubArray.prototype = clone(Array.prototype);
 
var sub = new SubArray(1, 2, 3);

The approach seems reasonable. After all, the goal is to create an object that inherits from Array, so there’s no reason tried-and-true clone wouldn’t work. Or is there? As with few other things in Javascript, it’s not as trivial as it seems.

Problems with naive approach

So what exactly is wrong with subclassing array using clone method? Let’s take a look at how previously declared SubArray function behaves. We’ll be using native array object alongside, for comparison.

var arr = new Array(1, 2, 3);
var sub = new SubArray(1, 2, 3);
 
arr.length; // 3
sub.length; // 0 (in IE<8)
 
arr.length = 2;
sub.length = 2;
 
arr; // [1, 2]
sub; // [1, 2, 3]
 
arr[10] = 'foo';
sub[10] = 'foo';
 
arr.length; // 11
sub.length; // 2

There’s clearly some kind of inconsistency here. Even not counting a bug in IE<8. But what is this strange relation between length and numeric properties in array? And why doesn’t subclassed array behave identical? To understand this, we need to look into what array objects in Javascript really are.

Special nature of arrays

It turns out that arrays in Javascript are almost like plain Object objects, except for one little difference in behavior. The crux of this difference is summarized concisely in one paragraph of specification (15.4):

Array objects give special treatment to a certain class of property names. A property name P (in the form of a string value) is an array index if and only if ToString(ToUint32(P)) is equal to P and ToUint32(P) is not equal to 2^32 – 1. Every Array object has a length property whose value is always a nonnegative integer less than 2^32. The value of the length property is numerically greater than the name of every property whose name is an array index; whenever a property of an Array object is created or changed, other properties are adjusted as necessary to maintain this invariant. Specifically, whenever a property is added whose name is an array index, the length property is changed, if necessary, to be one more than the numeric value of that array index; and whenever the length property is changed, every property whose name is an array index whose value is not smaller than the new length is automatically deleted. This constraint applies only to properties of the Array object itself and is unaffected by length or array index properties that may be inherited from its prototype.

For those allergic to the condensed language of ECMA-262, here’s a short summary.

Array objects treat “numeric” properties in a special way. Whenever such property changes, value of array’s “length” property is adjusted as well; it’s adjusted in such was as to make sure that it is always one more than the greatest numeric (own) property of an array. Similarly, when “length” property is changed, numeric properties are adjusted accordingly (but only those that are larger than value of “length”).

We have already seen relation between numeric properties and length in the previous example, but let’s take a look at it again, step by step:

1) When array object is created, its “length” property is set to a value one more than the largest index of an array.

  var arr = ['x', 'y', 'z'];
  arr.length; // 3 (1 greater than largest index of an array — 2 in this case)
 
  arr = ['foo'];
  arr.length; // 1 (1 greater than largest index of an array — 0 in this case)

2) When numeric properties change, so does “length” change — to maintain the relationship of being 1 greater than the largest index.

var arr = ['x', 'y'];
arr.length; // 2, as expected
 
arr[2] = 'z'; // add another numeric property (2) larger than the largest existing one (1)
arr.length; // 3 — length is changed to be 1 greater than (new) largest index (2)

3) When “length” property changes, numeric properties are adjusted in such way so that greatest index is 1 smaller than value of “length”.

var arr = ['x', 'y', 'z'];
arr.length = 2;
 
arr; // ['x', 'y'] — note how last element (z) is deleted, because being at 2nd index, 
     //              it doesn't satisfy criteria of largest index being 1 less than length
 
arr.length = 4;
 
arr; // ['x', 'y'] — "increasing" length doesn't affect numeric properties...
 
arr.join(); // "x,y,," ...but has consequences visible in other cases, such as when using `Array.prototype.join`
 
arr.push('z');
arr; // ['x', 'y', undefined, undefined, 'z'] — ...or when using `Array.prototype.push`

Now you know the “special” nature of Array objects in Javascript, which is in the relationship between “length” and numeric properties. One little detail we haven’t looked at is that array’s “length” property MUST always have a value of non-negative integer less than 2^32. Whenever this condition is violated, a RangeError is thrown:

var arr = [];
arr.length = Math.pow(2, 32); // RangeError
 
arr.length; // 0 (length is still 0, as it initially was)
 
arr.length = Math.pow(2, 32) - 1; // set length to maximum allowed value
 
arr.length++; // RangeError (when setting length explicitly)
arr.push(1); // RangeError (or when setting length implicitly)

Function objects and [[Construct]]

It should start to make sense why there are discrepancies in behavior of objects created via SubArray and Array functions. Even though SubArray creates an object that inherits from Array.prototype, that object completely lacks array’s special behavior. The SubArray instance is nothing more than a plain Object object (as if it was created via an object literal — { }).

But why does SubArray create an Object object and not an Array object? The core of this issue is in the way functions work in ECMAScript.

When new operator is applied to an object — as in new SubArray — that object’s internal [[Construct]] method is called. In our case, it is [[Construct]] of SubArray function. SubArray — being a native function — has [[Construct]] that’s specified to create a plain Object object, and invoke corresponding function providing newly created object as this value. Any native function, including SubArray, should create an Object object and return it as a result.

Now it’s worth mentioning that it’s possible to sort of supersede return value of [[Construct]] by explicitly returning non-primitive value from constructor function:

function SubArray() {
  this.push.apply(this, arguments);
  return []; // return array object explicitly
}

— but in that case, returned object does NOT inherit from constructor’s “prototype” (SubArray.prototype in this case); neither is constructor function invoked with that object as this value:

var sub = new SubArray(1, 2, 3);
 
// Object doesn't have 1, 2, 3, as constructor was never called with `this` value referencing returned object
sub; // []
 
// SubArray is not in the prototype chain of returned object
sub instanceof SubArray; // false

As you can see, creating an object that inherits from Array.prototype is only part of the story. The biggest issue is to preserve the special relation of length and numeric properties. This is why using regular clone approach is not quite up to the task.

The importance of array special behavior

A reasonable question at this point is — “Why does array special behavior matter”? Why would we want to preserve relationship between length and numeric properties when subclassing an array? It turns out that consequences of proper length are not only visible when working with length directly, but also indirectly, when performing other tasks via Array.prototype.* methods.

Take for example Array.prototype.push — a method to append items to the end of array. To determine from which position to start inserting elements into, push retrieves a value of array’s “length”. If length is not preserved properly, elements are inserted at the wrong location:

var arr = ['x', 'y'];
arr.length = 5;
arr.push('z'); // 'z' is inserted at 5th index, since that is what the value of "length" is
arr; // ['x', 'y', undefined, undefined, undefined, 'z']

Take another method — Array.prototype.join. Used to return a representation of an array by concatenating all elements with a separator, Array.prototype.join also uses length property to determine when to stop concatenating values:

var arr = ['x', 'y'];
arr.join(); // "x,y"
arr.length = 5;
arr.join(); // "x,y,,,"

Same goes for Array.prototype.concat — method used to produce a new array by concatenating values passed to concat:

var arr = ['x'];
arr.length = 3;
arr.concat('y'); // ['x', undefined, undefined, 'y']

Finally, the special behavior is often cleverly exploited in other situations, such as to “clear” an array (i.e. delete all of its numeric properties):

var arr = [1, 2, 3];
arr.length = 0;
arr; // [] — setting length to 0 effectively removes all numeric properties (elements) of an array

Existing solutions

Now that we’re familiar with the theory, let’s see what the situation is with subclassing arrays in practice. There have been few attempts in the past, with various levels of “success”. Here are a couple of most popular ones:

Andrea Giammarchi solution

One of the recent implementations is Stack, by Andrea Giammarchi, which looks like this:

var Stack = (function(){ // (C) Andrea Giammarchi - Mit Style License
 
  function Stack(length) {
    if (arguments.length === 1 && typeof length === "number") {
      this.length = -1 < length && length === length << 1 >> 1 ? length : this.push(length);
    }
    else if (arguments.length) {
      this.push.apply(this, arguments);
    }
  };
 
  function Array() { };
  Array.prototype = [];
 
  Stack.prototype = new Array;
  Stack.prototype.length = 0;
  Stack.prototype.toString = function () {
    return this.slice(0).toString();
  };  
 
  Stack.prototype.constructor = Stack;
  return Stack;
})();

It’s an interesting solution, which mainly works around IE<8 bug with Array.prototype.push and length property. However, as should be obvious by now, it doesn’t really solve the problem of maintaining relation between length and numeric properties:

var stack = new Stack('x', 'y');
stack.length;           // 2
 
// so far so good
 
stack.push('z');
stack.length;           // 3
 
// still good
 
stack[3] = 'foo';
stack.length;           // 3
 
// not good anymore (length should have been changed to 4)
 
stack.length = 2;
stack[2];               // 'z'
 
// still not good (element at 2nd index should have been deleted)

Dean Edwards solution

Another popular solution is by Dean Edwards. This one takes a completely different approach — instead of creating an object that inherits from Array.prototype, an actual Array constructor is “borrowed” from the context of another iframe.

// create an <iframe>
var iframe = document.createElement("iframe");
iframe.style.display = "none";
document.body.appendChild(iframe);
 
// write a script into the &lt;iframe> and steal its Array object
frames[frames.length - 1].document.write(
  "<script>parent.Array2 = Array;<\/script>";
);

The reason this “works” is due to browsers creating separate execution environments for each frame in a document. Each such environment has a separate set of both — built-in and host objects. Built-in objects include global Array constructor, among others. Array object of one iframe is different from Array object of another iframe. They also don’t have any kind of hierarchical relation:

// assuming that SubArray was borrowed from another iframe
 
var sub = new SubArray(1, 2, 3);
 
sub instanceof SubArray; // true
sub instanceof Array; // false
sub instanceof Object; // false

Notice how sub is reported as NOT an instance of Array, and NOT an instance of Object. This is because neither Array, nor Object are anywhere in the prototype chain of sub object. Instead, prototype chain consists of SubArray.prototype, followed by <Object from another iframe>.prototype:

new SubArray()
    |
    | [[Prototype]]
    |
    v
<another iframe>.Array.prototype
    |
    | [[Prototype]]
    |
    v
<another iframe>.Object.prototype
    |
    | [[Prototype]]
    |
    v
   null

This brings us to one “consideration” with this approach — difficulties determining the nature of an object derived from such iframe. It’s no longer possible to determine that an object is an array using instanceof or constructor checks [1]:

  // is this object an array?
 
  sub instanceof Array; // false
  sub.constructor === Array; // false

It is, however, still possible to use [[Class]] check (we’ll talk about [[Class]] later on):

  Object.prototype.toString.call(sub) === '[object Array]'; // true

Another, more inherent, downside of this approach is that it doesn’t work in non-browser environments (or, more precisely, in any environment without support for iframes). This problem is likely to become even bigger, given that server-side Javascript implementations are rising quite fast.

Finally, it was reported that Array borrowing can cause mixed content warning in IE6, among few other minor issues.

Other than that, iframe-based array “subclassing” is free of downsides of solutions like Stack, since we’re dealing with real array objects, and so proper length/indices relation.

ECMAScript 5 accessors to the rescue

But let’s talk about ECMAScript 5, which as I mentioned in the beginning, brings something that helps with subclassing arrays. This “something” is actually nothing but property accessors. These useful language constructs have been present in some popular implementations (SpiderMonkey, JavaScriptCore, and others) as a non-standard extension for quite a while now. They are now standardized by the new edition of the language.

Using accessors, it’s rather trivial to create an Object object with special length/indices relation — relation that’s identical to that of Array objects! And since we already know how to create an object with Array.prototype in its prototype chain, combining these two aspects would allow for a complete emulation of arrays.

There’s one little detail about implementation. Since ECMAScript (including last, 5th version) doesn’t provide any catch-all (aka __noSuchMethod__) mechanism, it’s not possible to change value of length property of an object when numeric property is modified; in other words, we can’t intercept the moment when ‘0’, ‘1’, ‘2’, ‘15’, etc. properties are being set. However, accessors allow us to intercept any read access of length property and return proper value, depending on which numeric properties object has at that moment. This is all we really need.

Here’s an implementation of it, at about 45 lines of code:

var makeSubArray = (function(){
 
  var MAX_SIGNED_INT_VALUE = Math.pow(2, 32) - 1,
      hasOwnProperty = Object.prototype.hasOwnProperty;
 
  function ToUint32(value) {
    return value >>> 0;
  }
 
  function getMaxIndexProperty(object) {
    var maxIndex = -1, isValidProperty;
 
    for (var prop in object) {
 
      isValidProperty = (
        String(ToUint32(prop)) === prop && 
        ToUint32(prop) !== MAX_SIGNED_INT_VALUE && 
        hasOwnProperty.call(object, prop));
 
      if (isValidProperty && prop > maxIndex) {
        maxIndex = prop;
      }
    }
    return maxIndex;
  }
 
  return function(methods) {
    var length = 0;
    methods = methods || { };
 
    methods.length = {
      get: function() {
        var maxIndexProperty = +getMaxIndexProperty(this);
        return Math.max(length, maxIndexProperty + 1);
      },
      set: function(value) {
        var constrainedValue = ToUint32(value);
        if (constrainedValue !== +value) {
          throw new RangeError();
        }
        for (var i = constrainedValue, len = this.length; i < len; i++) {
          delete this[i];
        }
        length = constrainedValue;
      }
    };
    methods.toString = {
      value: Array.prototype.join
    };
    return Object.create(Array.prototype, methods);
  };
})();

We can now create “sub arrays” via makeSubArray function. It accepts one argument — an object with methods to add to [[Prototype]] of returned “sub array”.

var subMethods = {
  last: {
    value: function() {
      return this[this.length - 1];
    }
  }
};
var sub = makeSubArray(subMethods);
var sub2 = makeSubArray(subMethods);
// etc.

We can also hide this factory method behind a constructor, to make it similar to Array’s one:

var SubArray = (function() {
  var methods = { 
    last: { 
      value: function() {
        return this[this.length - 1];
      } 
    }
  };
  return function() {
    var arr = makeSubArray(methods);
    if (arguments.length === 1) {
      arr.length = arguments[0];
    }
    else {
      arr.push.apply(arr, arguments);
    }
    return arr;
  };
})();

And then use it as you would use regular Array constructor:

var sub = new SubArray(1, 2, 3);
 
sub.length; // 3
sub; // [1, 2, 3]
 
sub.length = 1;
sub; // [1]
 
sub[10] = 'x';
sub.push(1);

You can find this version of SubArray together with unit tests in Gtihub repository. For brevity, I made this implementation mainly take care of length/indices relation; certain methods (e.g. concat) do not behave identical to Array and need to be implemented accordingly.

[[Class]] limitations

The implementation we have just seen — the one utilizing property accessors — is great. It doesn’t require any host objects (such as iframes); it preserves relation between length and numeric properties; it even disallows out-of-range values for length or indices. All it requires is support for ES5 (or even just Object.create method).

But the dramatic title of this post is not there just for fun. There’s one little detail we’re missing in this otherwise complete implementation. And that detail is proper [[Class]] value — something that ECMAScript still doesn’t give full control over.

I wrote about [[Class]] before, when explaining how to detect arrays. In a nutshell, [[Class]] is an internal property of objects in ECMAScript. Its value is never exposed directly, but can still be inspected using certain methods (e.g. Object.prototype.toString). The usefulness of [[Class]] is that it allows to detect type of objects without relying on instanceof operator or checking object’s constructor — both of which fall short to detect objects from other contexts (e.g. iframes), as we’ve seen earlier.

Now, since objects created by makeSubArray are nothing but plain Object objects (only with special length getters/setters), their [[Class]] is also that of “Object” not an “Array”! We’ve taken care of length/indices relation, we’ve set up Array.prototype inheritance, but there’s no way to change object’s [[Class]] value. And so this solution can not claim to be complete.

Does [[Class]] matter?

You might be wondering — what are the actual implications of these pseudo-array objects having [[Class]] of “Object” not an “Array”. Do we even care? Well, for once, there’s an issue with object detection. Ironically, the solution I proposed to detect arrays relies on [[Class]], and so would fall short with objects like these.

// assuming that `sub` is a pseudo-array
Object.prototype.toString.call(sub) === '[object Array]'; // false

Another, probably more important, implication is that some of the methods in ECMAScript actually rely on [[Class]] value. For example, a well-known Function.prototype.apply accepts an array as its second argument (as well as an arguments object). Section 15.3.4.3 of ES3 says — “if argArray is neither an array nor an arguments object (see 10.1.8), a TypeError exception is thrown”. What this means is that if we pass pseudo-array object as a second argument to apply it will throw TypeError. apply doesn’t know or care if an object inherits from Array.prototype; neither does it care about object implementing special length/indices behavior. All it cares is that object is of proper type — type that we, unfortunately, can not emulate.

// assuming that `sub` is a pseudo-array
someFunction.apply(this, sub); // TypeError

There’s some vagueness in specification on this matter. For example, in Date.prototype.setTime spec says “If the this value is not a Date object, throw a TypeError exception.”, but in Date.prototype.getTime, it uses [[Class]] rather than just “not a Date object” — “If the this value is not an object whose [[Class]] property is “Date”, throw a TypeError exception”.

It’s probably safe to assume that these 2 phrases — “Date object” and “object with [[Class]] of ‘Date’” — have identical meaning. Ditto for “Array object” and “object with [[Class]] of ‘Array’”, as well as others.

Function.prototype.apply is not the only method sensitive to [[Class]] of an object. Array.prototype.concat, for example, follows different algorithm based on whether an object is an array or not (in other words — whether it has [[Class]] of “Array” or not).

// array ([[Class]] == "Array")
var arr = ['x', 'y'];
 
// object with numeric properties ([[Class]] == "Object")
var obj = { '0': 'x', '1': 'y' };
 
[1,2,3].concat(arr); // [1, 2, 3, 'x', 'y']
[1,2,3].concat(obj); // [1, 2, 3, { '0': 'x', '1': 'y' }]

As you can see, array values are “flattened”, whereas non-array ones are left as is. It is certainly possible to give these pseudo-arrays custom implementation of concat (and “fix” any other of Array.prototype.* methods), but the problem with Function.prototype.apply can not be solved.

It’s worth mentioning that another downside of accessor-based pseudo-array approach is performance. I haven’t done any tests, but it’s pretty clear that an implementation which has to enumerate over all numeric properties on every access of length property is not going to perform well. This is why I can’t recommend this solution for anything other than educational purposes.

Wrappers. Direct property injection.

Realizing a somewhat futile nature of subclassing arrays in Javascript often makes alternative solutions look very attractive. One of such solutions is using wrappers. Wrapper approach avoids setting up inheritance or emulating length/indices relation. Instead, a factory-like function can create a plain Array object, and then augment it directly with any custom methods. Since returned object is an Array one, it maintains proper length/indices relation, as well as [[Class]] of “Array”. It also inherits from Array.prototype, naturally.

function makeSubArray() {
  var arr = [ ];
  arr.push.apply(arr, arguments);
  arr.last = function() { 
    return this[this.length - 1];
  };
  return arr;
}
 
var sub = makeSubArray(1, 2, 3);
sub instanceof Array; // true
 
sub.length; // 3
sub.last(); // 3

While direct extension of array object is a beautiful, simplistic solution, it’s not without downsides. The main disadvantage is that on each invocation of constructor, an array needs to be extended with N number of methods. The time it takes to create an array is no longer a constant (if methods were on SubArray.prototype), but is directly proportional to the number of methods that need to be added.

Wrappers. Prototype chain injection.

To overcome the problem of “N methods”, another variation of wrappers can be used — the one in which object’s prototype chain is augmented, rather than object itself. Let’s see how this could be done:

function SubArray() { }
SubArray.prototype = new Array;
SubArray.prototype.last = function() {
  return this[this.length - 1];
};
 
function makeSubArray() {
  var arr = [ ];
  arr.push.apply(arr, arguments);
  arr.__proto__ = SubArray.prototype;
  return arr;
}

The idea is simple. When makeSubArray function is executed, two things happen: 1) an array object is created and is populated with any passed arguments; 2) object’s prototype chain is augmented in such way so that next object is SubArray.prototype, not original Array.prototype. The augmentation of prototype chain is done via non-standard __proto__ property.

But what happens in makeSubArray function is of course only half of the story. To make sure that object has Array.prototype in its prototype chain, we need to make SubArray.prototype inherit from it. This is exactly what’s being done on a second line of this snippet (SubArray.prototype = new Array). Prototype chain of an object returned from makeSubArray now looks like this:

new SubArray()
    |
    | [[Prototype]]
    |
    v
SubArray.prototype
    |
    | [[Prototype]]
    |
    v
Array.prototype
    |
    | [[Prototype]]
    |
    v
Object.prototype
    |
    | [[Prototype]]
    |
    v
   null

And because returned object is actually an Array, not an Object one, we also get length/indices relation as well as proper [[Class]] value. In fact, we can go even further and move initialization logic into SubArray constructor itself:

function SubArray() {
  var arr = [ ];
  arr.push.apply(arr, arguments);
  arr.__proto__ = SubArray.prototype;
  return arr;
}
SubArray.prototype = new Array;
SubArray.prototype.last = function() {
  return this[this.length - 1];
};
 
var sub = new SubArray(1, 2, 3);
 
sub instanceof SubArray; // true
sub instanceof Array; // true

Even though augmenting prototype chain is a more performant solution, there’s a clear downside — it relies on non-standard __proto__ property. ECMAScript, unfortunately, does not allow to set [[Prototype]] of an object — internal property referencing immediate ancestor in its prototype chain. Not even in 5th edition. Even though __proto__ is supported by a rather large number of implementations, it is far from being truly compatible.

Summary

So here it is; all the fun intricacies of subclassing arrays in Javascript.

We’ve seen that contrary to what might seem, actual inheritance is by far not the only aspect of subclassing arrays in Javascript; that arrays are different from regular objects by having special length/indices relation; how this length/indices relation is important and has nothing to do with prototype chain of an object; how arrays have special [[Class]] value of “Array” which is also rather important, and isn’t inherited either; how it’s not possible to change [[Class]] value of an object — not even in ECMAScript 5. We looked at different ways to “subclass” an array, starting from borrowing Array constructors from other contexts, and ending with augmentation of prototype chain. We examined benefits and downsides of each one of those solutions.

What we haven’t touched upon is the performance metrics of each of the implementations — perhaps a good topic for another discussion.

On this note, I leave you with a table summarizing pros/cons of the above mentioned techniques.

Proper [[Class]] length/indices Uses native objects only Requires ES3 only
Stack (Andrea Giammarchi) No No Yes Yes
IFrame borrowing (Dean Edwards) Yes Yes No Yes
Accessors No Yes Yes No
Direct extension Yes Yes Yes Yes
Prototype extension Yes Yes Yes No

[1] Whether this endeavor is something worth pursuing is a topic for another discussion

P.S. Big thanks to John David Dalton for reviewing an article and giving useful suggestions.

Categories: ECMA-262, ES5, [[Class]], [[Construct]], [[Prototype]], iframes, isArray 13 Comments »

JScript and DOM changes in IE9 preview 3

June 24th, 2010 by kangax

3rd preview of IE9 was released yesterday, with some amazing additions, like canvas element and an extensive ES5 support. I’ve been digging through it a little, to see what has changed and what hasn’t — mainly looking at JScript and DOM. I posted some of the findings on twitter, but want to also list them here, as it’s not very convenient to share code snippets in 140 characters. Referencing it all in one place will hopefully make it easier for IE team to find and fix these deficiencies.

ECMAScript 5 and JScript

The big news is that IE9pre3 has (almost) full support for ES5. By “full support”, I mean that it implements majority of new API, such as Object.create, Object.defineProperty, String.prototype.trim, Array.isArray, Date.now, and many other additions. As of now, IE9 implements the largest number of new methods; even more than latest Chrome, Safari and Firefox. Unbelievable, isn’t it? :)

screenshot of es5 compatibility table

You can see the results in this compatibility table (note that it lists results of mere “existence” testing, not any kind of conformance).

What’s missing is strict mode, which actually isn’t implemented in any of the browsers yet.

Some of the things I noticed:

ES5 Object.getPrototypeOf on host objects seems to lie, always returning null instead of proper value of [[Prototype]]:

  Object.getPrototypeOf(document.body); // null
  Object.getPrototypeOf(document); // null
  Object.getPrototypeOf(alert); // null
  Object.getPrototypeOf(document.childNodes); // null

This doesn’t happen in other browsers that implement Object.create at the moment, such as latest Chrome, WebKit or Firefox. In Chrome, for example:

  Object.getPrototypeOf(document.body) === HTMLBodyElement.prototype;
  Object.getPrototypeOf(document) === HTMLDocument.prototype;
  Object.getPrototypeOf(alert) === Function.prototype;
  Object.getPrototypeOf(document.childNodes) === NodeList.prototype

… and so on.

Interestingly, bound functions in IE9pre3 are represented as “function(){ [native code] }”, similar to host objects:

  var bound = (function f(x, y){ return this; }).bind({ x: 1 });
  bound + ''; // "function(){ [native code] }"
 
  // compare to
 
  alert + ''; // "function alert(){ [native code] }"

Note how function representation does not include identifier (f), parameters (x and y), nor representation of function body (return this;). This of course proves once again that relying on function decompilation is NOT a good idea.

Whitespace character class (as in /\s/) still doesn’t match majority of whitespace characters (as defined by specs). These include “U+00A0”, “U+2000” to “U+200A”, “U+3000”, etc. The test is available here. Curiously, ES5 String.prototype.trim seems to “understand” those characters as whitespace very well, producing empty string — as expected — for something like '\u00A0'.trim().

It was nice to see that ES5 Array.isArray is about 20 times faster than custom implementation, such as this one:

  function isArray(o) {
    return Object.prototype.toString.call(o) === "[object Array]";
  }

The difference in speed is similar to other browsers that implement this method.

An infamous, 10+ year-old JScript NFE bug, which I described at length before, is finally fixed:

  var f = function g() { return f === g; };
  typeof g; // "undefined"
 
  f(); // true

arguments’ [[Class]] is now an “Arguments”, just like ES5 specifies it:

  var args = (function(){ return arguments; })();
  Object.prototype.toString.call(args); // "[object Arguments]"

DOM

Unfortunately, the entire host objects infrastructure still looks very similar to the one from IE8. Host objects don’t inherit from Object.prototype, don’t report proper typeof, and don’t even have basic properties like “length” or “prototype”, which all function objects must have:

  alert instanceof Object; // false
  typeof alert; // "object"
  alert.length; // undefined

Because they don’t inherit from Object.prototype, we don’t have any of Object.prototype methods, naturally:

  alert.toString; // undefined
  alert.constructor; // undefined
  alert.hasOwnProperty; undefined

Object.prototype is not the only object host methods fail to inherit from. In majority of modern browsers, host objects also inherit from Function.prototype and so have Function.prototype methods like call and apply. This doesn’t happen in IE9pre3.

  alert instanceof Function; // false
  document.createElement instanceof Function; // false
 
  alert.call; // undefined

Curiously, call and apply are present on some host objects, but they are still not inherited from Function.prototype:

  typeof document.createElement.call; // "function"
  document.createElement.call === Function.prototype.call; // false

Host objects’ [[Class]] is far from ideal as well. IE9pre3 actually violates ES5, which says that objects implementing [[Call]] (or in other words — are callable) should have [[Class]] of “Function” — even if they are host objects. In IE9pre3, alert is a callable host object, yet it reports its [[Class]] as “Object” not “Function”. Not good.

  Object.prototype.toString.call(alert); // "[object Object]"
  Object.prototype.toString.call(document.createElement); // "[object Object]"

IE9pre3 still messes up DOM objects’ attributes and properties, although not as badly as earlier versions:

  var el = document.createElement('p');
  el.setAttribute('x', 'y');
  el.x; // 'y'
 
  el.foobarbaz = 'moo';
  el.hasAttribute('foobarbaz'); // true
  el.getAttribute('foobarbaz'); // 'moo'

Some old, humorous bugs can still be seen in IE9pre3, such as methods returning “string” when applied typeof on:

  typeof Option.create; // "string"
  typeof Image.create; // "string"
  typeof document.childNodes.item; // "string"

Undeclared assignments still throw error when same-id’ed elements are present in DOM, however not with same-name’ed elements (as it was in previous versions):

  <div id="foo"></div>
  <a name="bar"></a>
  ...
  <script>
    foo = function(){ /* ... */ }; // Error
    bar = function(){ /* ... */ }; // no Error
  </script>

Similarly to IE8, only Element and specific element type interfaces (HTMLDivElement, HTMLScriptElement, HTMLSpanElement, etc.) are exposed as same-named global properties. Node and HTMLElement are still missing, and element’s prototype chain most likely still looks like this:

  document.createElement('div');
    |
    | [[Prototype]]
    v
  HTMLDivElement.prototype
    |
    | [[Prototype]]
    v
  Element.prototype
    |
    | [[Prototype]]
    v
  null

…rather than what can be seen in almost all other modern browsers:

  document.createElement('div');
    |
    | [[Prototype]]
    v
  HTMLDivElement.prototype
    |
    | [[Prototype]]
    v
  HTMLElement.prototype
    |
    | [[Prototype]]
    v
  Element.prototype
    |
    | [[Prototype]]
    v
  Node.prototype
    |
    | [[Prototype]]
    v
  Object.prototype
    |
    | [[Prototype]]
    v
  null

getComputedStyle from DOM Level 2 is still missing, however its value is mysteriously a null, not undefined. The property actually exists on an object, but has a value of null. Hopefully, this is just a placeholder and proper method will be added before final release.

  document.defaultView.getComputedStyle; // null
  'getComputedStyle' in document.defaultView; // true

Array.prototype.slice can now convert certain host objects (e.g. NodeList’s) to arrays — something that majority of modern browsers have been able to do for quite a while:

  Array.prototype.slice.call(document.childNodes) instanceof Array; // true

That’s it for now.

Unfortunately, I don’t have much time to look into these things extensively, at the moment. There might be more updates on twitter.

As always, any corrections, suggestions, and additions are much appreciated.

Categories: ECMA-262, [[Class]], isArray, review, strict-mode 20 Comments »

Tag is not an element. Or is it?

June 1st, 2010 by kangax

It’s interesting how widely some misconceptions spread around. The one I noticed recently is the “issue” of elements vs. tags. The problem is that people say tags when they mean elements, and do it so often that it’s not clear if the distinction is still relevant.

Or if anyone even cares anymore.

Elements vs. tags

If you look at section 3 of HTML 4.01 — “on SGML and HTML”, there’s an explicit note about elements not being tags. In HTML 4.01,
<p>foo bar</p> is an element, not a tag. An element consists of a start tag, content, and an end tag. In case of <p>foo bar</p>, <p> is a start tag, foo bar is content, and </p> is an end tag.

In other words, elements consist of tags.

Optional tags

The distinction between tags and elements becomes slightly less clear once we start dealing with elements that have optional tags, as defined by HTML 4.01. For example, <p> or <td> elements don’t have to have end tags. They could very well exist without them. When parser finds <p>foo bar in markup, it still creates an element. There’s no end </p> tag, but parser doesn’t really need it; start <p> tag already denotes what kind of element it is.

  <p>foo bar
 
  <tr>
    <td>baz
    <td>qux
  </tr>

But that’s not all.

Some elements, besides having optional end tags, have empty content model, which means that they can’t have any content at all. And when an element is not allowed to have any content and has an optional tag, it’s called an empty element. Not only are end tags optional in such elements, but they must be completely omitted. These, unfortunately, are not some obscure elements, but are very much useful ones like <br>, <link>, <img>, <input>, <meta> and few others.

What’s interesting is that <br> is still an element, only an element that consists of start tag only. It’s just that its content and end tag must never be present. The fact that <br>, <img> or other empty elements consist of start tags only, makes things rather confusing.

And we’re not even talking about elements with both tags optional — <html>, <head>, <body>. Those could exist without any visible traces at all, and are only created based on the context.

  <html>
    <!-- 
            There's no HEAD start tag, no HEAD end tag, and no HEAD content here. 
            Yet, HEAD element is still created implicilty.
            This happens because content model of HTML element is defined as `head, body`, 
            which means that both elements should be present in HTML element in that order. 
            As soon as BODY start tag is found, even if HEAD tags are not present, 
            HEAD element is created automatically.  -->
    <body>
    ...
    </body>
  </html>

Which confusion?

So which practical implications does this confusion actually have?

For one, saying something like “insert an image after a <p> tag” is ranging from “wrong” to “ambiguous”, since we can’t insert anything but a chunk of text after a <p> tag, and <p> tag can be either a start one (<p>) or an end one (</p>). In this case, a better way would be to say — “insert an <img> tag after a start <p> tag”:

  <p>
    <img ...> <!-- IMG tag is inserted after a start P tag -->
    ...
  </p>

in which case <img> element would become a child of <p> element. Or we could say — “insert an <img> tag after an end <p> tag”:

  <p>
    ...
  </p>
  <img ...> <!-- IMG tag is inserted after an end P tag -->

in which case <img> element would be a sibling following <p> one.

Of course, most of the time, what people really mean by “insert an image after a <P> tag” is a second version. It’s just that “element” is accidentally replaced with a “tag”. An even better way — and the one that avoids mention of tags in the first place — is to say “insert an <IMG> element after a <P> element”. This version leaves no room for incorrect interpretation.

Global confusion

What’s interesting about all this is not so much the finer points of difference between tags and elements, but just how widely this misconception prevails. Google search returns 480,000 results for “div tag”, but only 137,000 for “div element”. For an empty element, such as img, the difference is even scarier — “img tag” returns 959,000 results, while “img element” only 48,200. An element is confused for a tag everywhere, from blogs, articles, and mailing lists to books, references, and frameworks.

Pedantry or an important distinction?

Once you start thinking about the distinction, edges become somewhat blurry. Are all of the examples above really wrong?

When describing “image_tag”, Ruby on Rails documentation says “Returns an html image tag …”. The returned string — “<img …>” — can actually very well be considered an image (start) tag. Yes, the string represents an element, but since an element is empty, it’s also a string that consists of <img> tag only, and so can probably be called an “image” tag.

At the same time, “javascript_include_tag” already crosses the line of correctness. It still uses “Returns an html script tag, but already returns a string that can only be considered an element — “<script type=”text/javascript” src=”…”></script>”, since there’s now a start tag, content (empty), and an end tag.

w3schools is just plain wrong [1], saying things like “The <div> tag defines a division or a section in an HTML document.” or “The <div> tag is often used to group block-elements to format them with styles.”. Tags do not define division, they represent elements, and it is elements that have certain semantic meaning; in this case — division.

In some of the popular articles, we can find phrases like “… the nearer ancestor of our <footer> tag is the <body> tag …”, in which case it’s pretty clear that “tag” is not the right word at all; Tags can not be ancestors, but elements can.

However, saying that “browser supports <video> tag” is technically not wrong, since browsers supporting <video> element, most definitely can parse and understand <video> tags as well (it is by recognizing video tags that they are able to create video elements in DOM).

Speaking of DOM…

What about DOM?

Before I knew the difference between tags and elements, I would always think in terms of tags when talking about HTML, and in terms of elements when talking about DOM. It just made sense that HTML, being markup language, consists of tags, while HTML DOM — or rather, the document available for scripting — is a tree-like structure consisting of elements, and other kinds of nodes. I knew that browser parses HTML markup (and so tags), and then creates a tree-like structure to represent a document, in which case tags essentially become elements. The fact that elements are not just kinds of nodes, but are also chunks of text in markup seemed very strange when I first found out about it.

It seems that this is exactly how most of the people think about tags vs. elements. Tags exist in HTML (text), and elements – in document (DOM). This would explain why tags prevail in discussions about HTML, or markup in general; and why elements are mostly mentioned in context of scripting, rendering, etc.

Nevertheless, I believe that keeping terminology straight is important. Things should be called as they really are, to avoid the ambiguity that we’ve seen in the previous example. A method named something like forEachTag should not iterate over each element, and vice-versa; technical discussions, articles, and documentation should really strive to use proper terms.

What now?

The attempts at demystification were already made in the past, yet the effect is barely visible. So I wonder — why? Is it too unintuitive to speak in terms of elements in context of HTML, or is this a lack of explanation and exposure of the subject? Does the distinction even matter? Or does it matter in technical discussions only? Does it make sense to distinguish these two entities, or should we just try to infer the exact meaning based on the context, as it seems to be done right now? Are we all simply used to the word “tag”, and don’t care about the difference most of the time?

What do you think?

[1] …which is not surprising, considering the amount of other misconceptions on that site, such as classifying HTML comments as tags.

Categories: don'ts, html 10 Comments »

« Previous Entries