Perfection kills

Exploring Javascript by example

Archives Posts

How ECMAScript 5 still does not allow to subclass an array

July 15th, 2010 by kangax

Subclassing an array in Javascript has never been a trivial task. At least for a certain meaning of “subclassing an array”. Curiously, new edition of the language — ECMAScript 5 — still does not allow to fully subclass an array.

Not everything is lost though, and there are few ways ECMAScript 5 makes this task closer to the ideal. However, there are few fundamental issues which prevent true array subclassing from happening.

Let’s talk about that.

Today we’ll take a look at what it means to subclass an array, what some of the existing implementations/workarounds are, and which drawbacks those implementations have; We’ll see what ECMAScript 5 brings to the table, and what those fundamental issues with subclassing are. We’ll also talk about alternative approaches to subclassing an array, such as using wrappers, and get to know their limitations.

But first, what does it mean to subclass an array? And why do we even need it?

Why subclass an array?

We can define “subclassing an array” as the process of creating an object which inherits from native Array object (has Array.prototype in its prototype chain), and follows behavior similar (or identical) to native array.

The last point about behavior similar to native array is actually very important, as we’ll see later on. Having “subclass” of array could be thought of as being able to create an array object, but an object which would inherit not directly from Array, but from another object, and only then from Array.

In other words, we want behavior similar to this:

var sub = new SubArray(1, 2, 3);
sub; // [1, 2, 3]
 
sub.length; // 3
sub[1]; // 2
 
sub.push(4);
sub; // [1, 2, 3, 4]
 
// etc.
 
sub intanceof SubArray; // true
sub intanceof Array; // true

Note how SubArray constructor creates a sub object identical in its behavior to array (object has “length” property, numeric “0”, “1”, “2” properties, and inherits Array.prototype.* methods). At the same time, it is SubArray that a sub object directly inherits from, not Array.

So what exactly is the purpose of doing all this? Why subclass an array in such way?

There are usually two reasons:

1. Avoid pollution of global Array

Javascript prototypal nature makes it easy to extend all array objects with custom methods. Instead of assigning to direct properties of array objects, it’s much easier and more efficient to assign to array’s “prototype” object (the one that’s usually accessed via Array.prototype).

Array.prototype.last = function () {
  return this[this.length-1];
};
// ...
[1, 2, 3].last(); // 3

However, extending Array.prototype comes with the price; And that price is chance of collisions. When scripts coexist with other scripts in an application, it’s important for those scripts not to conflict with each other. Extending Array.prototype, while tempting and seemingly useful, unfortunately isn’t very safe in a diverse environment. Different scripts can end up defining same-named methods, but with different behavior. Such scenario often leads to inconsistent behavior and hard-to-track errors.

Collisions can happen not only with user-defined code, but also with proprietary methods implemented by environment itself (e.g. Array.prototype.indexOf from JavaScript 1.6, before it was standardized by ES5) or from future standards (e.g. Array.prototype.map, Array.prototype.reduce, etc. — now all part of ES5).

Using constructor function other than native Array — but with same behavior — would allow to avoid such collisions. Instead of extending Array.prototype, another object would be extended (say, SubArray.prototype) and then used to initialize (sub)array objects. Any third party code which depends on methods from Array.prototype would still be able to safely use them.

2. Create data structures naturally inheriting from array

Another reason to subclass an array is to be able to create data structures, which naturally inherit from array; such as Stack, List, Queue, Set, etc. While there are certainly valid use cases for these structures, in this article I will instead focus on the first aspect — reducing chance of collisions. It is somewhat more relevant in context of cross-browser scripting.

Naive approach

Creating objects that inherit from other objects is more or less straightforward in Javascript. We can use well-known clone method:

function clone(obj) {
  function F() { }
  F.prototype = obj;
  return new F();
}

and then set-up inheritance like this:

function Child() { }
Child.prototype = clone(Parent.prototype);

clone might look confusing, but all it does is create an object with another object as nearest ancestor in its prototype chain. It uses intermediate function to avoid executing “parent” constructor. In this example, new Child creates an object with Child.prototype as first object in the prototype chain, Parent.prototype — second, and so on. To visualize, the prototype chain here looks like this:

new Child()
    |
    | [[Prototype]]
    |
    v
Child.prototype
    |
    | [[Prototype]]
    |
    v
Parent.prototype
    |
    | [[Prototype]]
    |
    v
Object.prototype
    |
    | [[Prototype]]
    |
    v
   null

Using clone method is exactly what person attempts when trying to subclass an array for the first time:

function SubArray() {
  // Take any arguments passed to constructor and add them to an instance
  this.push.apply(this, arguments);
}
SubArray.prototype = clone(Array.prototype);
 
var sub = new SubArray(1, 2, 3);

The approach seems reasonable. After all, the goal is to create an object that inherits from Array, so there’s no reason tried-and-true clone wouldn’t work. Or is there? As with few other things in Javascript, it’s not as trivial as it seems.

Problems with naive approach

So what exactly is wrong with subclassing array using clone method? Let’s take a look at how previously declared SubArray function behaves. We’ll be using native array object alongside, for comparison.

var arr = new Array(1, 2, 3);
var sub = new SubArray(1, 2, 3);
 
arr.length; // 3
sub.length; // 0 (in IE<8)
 
arr.length = 2;
sub.length = 2;
 
arr; // [1, 2]
sub; // [1, 2, 3]
 
arr[10] = 'foo';
sub[10] = 'foo';
 
arr.length; // 11
sub.length; // 2

There’s clearly some kind of inconsistency here. Even not counting a bug in IE<8. But what is this strange relation between length and numeric properties in array? And why doesn’t subclassed array behave identical? To understand this, we need to look into what array objects in Javascript really are.

Special nature of arrays

It turns out that arrays in Javascript are almost like plain Object objects, except for one little difference in behavior. The crux of this difference is summarized concisely in one paragraph of specification (15.4):

Array objects give special treatment to a certain class of property names. A property name P (in the form of a string value) is an array index if and only if ToString(ToUint32(P)) is equal to P and ToUint32(P) is not equal to 2^32 – 1. Every Array object has a length property whose value is always a nonnegative integer less than 2^32. The value of the length property is numerically greater than the name of every property whose name is an array index; whenever a property of an Array object is created or changed, other properties are adjusted as necessary to maintain this invariant. Specifically, whenever a property is added whose name is an array index, the length property is changed, if necessary, to be one more than the numeric value of that array index; and whenever the length property is changed, every property whose name is an array index whose value is not smaller than the new length is automatically deleted. This constraint applies only to properties of the Array object itself and is unaffected by length or array index properties that may be inherited from its prototype.

For those allergic to the condensed language of ECMA-262, here’s a short summary.

Array objects treat “numeric” properties in a special way. Whenever such property changes, value of array’s “length” property is adjusted as well; it’s adjusted in such was as to make sure that it is always one more than the greatest numeric (own) property of an array. Similarly, when “length” property is changed, numeric properties are adjusted accordingly (but only those that are larger than value of “length”).

We have already seen relation between numeric properties and length in the previous example, but let’s take a look at it again, step by step:

1) When array object is created, its “length” property is set to a value one more than the largest index of an array.

  var arr = ['x', 'y', 'z'];
  arr.length; // 3 (1 greater than largest index of an array — 2 in this case)
 
  arr = ['foo'];
  arr.length; // 1 (1 greater than largest index of an array — 0 in this case)

2) When numeric properties change, so does “length” change — to maintain the relationship of being 1 greater than the largest index.

var arr = ['x', 'y'];
arr.length; // 2, as expected
 
arr[2] = 'z'; // add another numeric property (2) larger than the largest existing one (1)
arr.length; // 3 — length is changed to be 1 greater than (new) largest index (2)

3) When “length” property changes, numeric properties are adjusted in such way so that greatest index is 1 smaller than value of “length”.

var arr = ['x', 'y', 'z'];
arr.length = 2;
 
arr; // ['x', 'y'] — note how last element (z) is deleted, because being at 2nd index, 
     //              it doesn't satisfy criteria of largest index being 1 less than length
 
arr.length = 4;
 
arr; // ['x', 'y'] — "increasing" length doesn't affect numeric properties...
 
arr.join(); // "x,y,," ...but has consequences visible in other cases, such as when using `Array.prototype.join`
 
arr.push('z');
arr; // ['x', 'y', undefined, undefined, 'z'] — ...or when using `Array.prototype.push`

Now you know the “special” nature of Array objects in Javascript, which is in the relationship between “length” and numeric properties. One little detail we haven’t looked at is that array’s “length” property MUST always have a value of non-negative integer less than 2^32. Whenever this condition is violated, a RangeError is thrown:

var arr = [];
arr.length = Math.pow(2, 32); // RangeError
 
arr.length; // 0 (length is still 0, as it initially was)
 
arr.length = Math.pow(2, 32) - 1; // set length to maximum allowed value
 
arr.length++; // RangeError (when setting length explicitly)
arr.push(1); // RangeError (or when setting length implicitly)

Function objects and [[Construct]]

It should start to make sense why there are discrepancies in behavior of objects created via SubArray and Array functions. Even though SubArray creates an object that inherits from Array.prototype, that object completely lacks array’s special behavior. The SubArray instance is nothing more than a plain Object object (as if it was created via an object literal — { }).

But why does SubArray create an Object object and not an Array object? The core of this issue is in the way functions work in ECMAScript.

When new operator is applied to an object — as in new SubArray — that object’s internal [[Construct]] method is called. In our case, it is [[Construct]] of SubArray function. SubArray — being a native function — has [[Construct]] that’s specified to create a plain Object object, and invoke corresponding function providing newly created object as this value. Any native function, including SubArray, should create an Object object and return it as a result.

Now it’s worth mentioning that it’s possible to sort of supersede return value of [[Construct]] by explicitly returning non-primitive value from constructor function:

function SubArray() {
  this.push.apply(this, arguments);
  return []; // return array object explicitly
}

— but in that case, returned object does NOT inherit from constructor’s “prototype” (SubArray.prototype in this case); neither is constructor function invoked with that object as this value:

var sub = new SubArray(1, 2, 3);
 
// Object doesn't have 1, 2, 3, as constructor was never called with `this` value referencing returned object
sub; // []
 
// SubArray is not in the prototype chain of returned object
sub instanceof SubArray; // false

As you can see, creating an object that inherits from Array.prototype is only part of the story. The biggest issue is to preserve the special relation of length and numeric properties. This is why using regular clone approach is not quite up to the task.

The importance of array special behavior

A reasonable question at this point is — “Why does array special behavior matter”? Why would we want to preserve relationship between length and numeric properties when subclassing an array? It turns out that consequences of proper length are not only visible when working with length directly, but also indirectly, when performing other tasks via Array.prototype.* methods.

Take for example Array.prototype.push — a method to append items to the end of array. To determine from which position to start inserting elements into, push retrieves a value of array’s “length”. If length is not preserved properly, elements are inserted at the wrong location:

var arr = ['x', 'y'];
arr.length = 5;
arr.push('z'); // 'z' is inserted at 5th index, since that is what the value of "length" is
arr; // ['x', 'y', undefined, undefined, undefined, 'z']

Take another method — Array.prototype.join. Used to return a representation of an array by concatenating all elements with a separator, Array.prototype.join also uses length property to determine when to stop concatenating values:

var arr = ['x', 'y'];
arr.join(); // "x,y"
arr.length = 5;
arr.join(); // "x,y,,,"

Same goes for Array.prototype.concat — method used to produce a new array by concatenating values passed to concat:

var arr = ['x'];
arr.length = 3;
arr.concat('y'); // ['x', undefined, undefined, 'y']

Finally, the special behavior is often cleverly exploited in other situations, such as to “clear” an array (i.e. delete all of its numeric properties):

var arr = [1, 2, 3];
arr.length = 0;
arr; // [] — setting length to 0 effectively removes all numeric properties (elements) of an array

Existing solutions

Now that we’re familiar with the theory, let’s see what the situation is with subclassing arrays in practice. There have been few attempts in the past, with various levels of “success”. Here are a couple of most popular ones:

Andrea Giammarchi solution

One of the recent implementations is Stack, by Andrea Giammarchi, which looks like this:

var Stack = (function(){ // (C) Andrea Giammarchi - Mit Style License
 
  function Stack(length) {
    if (arguments.length === 1 && typeof length === "number") {
      this.length = -1 < length && length === length << 1 >> 1 ? length : this.push(length);
    }
    else if (arguments.length) {
      this.push.apply(this, arguments);
    }
  };
 
  function Array() { };
  Array.prototype = [];
 
  Stack.prototype = new Array;
  Stack.prototype.length = 0;
  Stack.prototype.toString = function () {
    return this.slice(0).toString();
  };  
 
  Stack.prototype.constructor = Stack;
  return Stack;
})();

It’s an interesting solution, which mainly works around IE<8 bug with Array.prototype.push and length property. However, as should be obvious by now, it doesn’t really solve the problem of maintaining relation between length and numeric properties:

var stack = new Stack('x', 'y');
stack.length;           // 2
 
// so far so good
 
stack.push('z');
stack.length;           // 3
 
// still good
 
stack[3] = 'foo';
stack.length;           // 3
 
// not good anymore (length should have been changed to 4)
 
stack.length = 2;
stack[2];               // 'z'
 
// still not good (element at 2nd index should have been deleted)

Dean Edwards solution

Another popular solution is by Dean Edwards. This one takes a completely different approach — instead of creating an object that inherits from Array.prototype, an actual Array constructor is “borrowed” from the context of another iframe.

// create an <iframe>
var iframe = document.createElement("iframe");
iframe.style.display = "none";
document.body.appendChild(iframe);
 
// write a script into the &lt;iframe> and steal its Array object
frames[frames.length - 1].document.write(
  "<script>parent.Array2 = Array;<\/script>";
);

The reason this “works” is due to browsers creating separate execution environments for each frame in a document. Each such environment has a separate set of both — built-in and host objects. Built-in objects include global Array constructor, among others. Array object of one iframe is different from Array object of another iframe. They also don’t have any kind of hierarchical relation:

// assuming that SubArray was borrowed from another iframe
 
var sub = new SubArray(1, 2, 3);
 
sub instanceof SubArray; // true
sub instanceof Array; // false
sub instanceof Object; // false

Notice how sub is reported as NOT an instance of Array, and NOT an instance of Object. This is because neither Array, nor Object are anywhere in the prototype chain of sub object. Instead, prototype chain consists of SubArray.prototype, followed by <Object from another iframe>.prototype:

new SubArray()
    |
    | [[Prototype]]
    |
    v
<another iframe>.Array.prototype
    |
    | [[Prototype]]
    |
    v
<another iframe>.Object.prototype
    |
    | [[Prototype]]
    |
    v
   null

This brings us to one “consideration” with this approach — difficulties determining the nature of an object derived from such iframe. It’s no longer possible to determine that an object is an array using instanceof or constructor checks [1]:

  // is this object an array?
 
  sub instanceof Array; // false
  sub.constructor === Array; // false

It is, however, still possible to use [[Class]] check (we’ll talk about [[Class]] later on):

  Object.prototype.toString.call(sub) === '[object Array]'; // true

Another, more inherent, downside of this approach is that it doesn’t work in non-browser environments (or, more precisely, in any environment without support for iframes). This problem is likely to become even bigger, given that server-side Javascript implementations are rising quite fast.

Finally, it was reported that Array borrowing can cause mixed content warning in IE6, among few other minor issues.

Other than that, iframe-based array “subclassing” is free of downsides of solutions like Stack, since we’re dealing with real array objects, and so proper length/indices relation.

ECMAScript 5 accessors to the rescue

But let’s talk about ECMAScript 5, which as I mentioned in the beginning, brings something that helps with subclassing arrays. This “something” is actually nothing but property accessors. These useful language constructs have been present in some popular implementations (SpiderMonkey, JavaScriptCore, and others) as a non-standard extension for quite a while now. They are now standardized by the new edition of the language.

Using accessors, it’s rather trivial to create an Object object with special length/indices relation — relation that’s identical to that of Array objects! And since we already know how to create an object with Array.prototype in its prototype chain, combining these two aspects would allow for a complete emulation of arrays.

There’s one little detail about implementation. Since ECMAScript (including last, 5th version) doesn’t provide any catch-all (aka __noSuchMethod__) mechanism, it’s not possible to change value of length property of an object when numeric property is modified; in other words, we can’t intercept the moment when ‘0’, ‘1’, ‘2’, ‘15’, etc. properties are being set. However, accessors allow us to intercept any read access of length property and return proper value, depending on which numeric properties object has at that moment. This is all we really need.

Here’s an implementation of it, at about 45 lines of code:

var makeSubArray = (function(){
 
  var MAX_SIGNED_INT_VALUE = Math.pow(2, 32) - 1,
      hasOwnProperty = Object.prototype.hasOwnProperty;
 
  function ToUint32(value) {
    return value >>> 0;
  }
 
  function getMaxIndexProperty(object) {
    var maxIndex = -1, isValidProperty;
 
    for (var prop in object) {
 
      isValidProperty = (
        String(ToUint32(prop)) === prop && 
        ToUint32(prop) !== MAX_SIGNED_INT_VALUE && 
        hasOwnProperty.call(object, prop));
 
      if (isValidProperty && prop > maxIndex) {
        maxIndex = prop;
      }
    }
    return maxIndex;
  }
 
  return function(methods) {
    var length = 0;
    methods = methods || { };
 
    methods.length = {
      get: function() {
        var maxIndexProperty = +getMaxIndexProperty(this);
        return Math.max(length, maxIndexProperty + 1);
      },
      set: function(value) {
        var constrainedValue = ToUint32(value);
        if (constrainedValue !== +value) {
          throw new RangeError();
        }
        for (var i = constrainedValue, len = this.length; i < len; i++) {
          delete this[i];
        }
        length = constrainedValue;
      }
    };
    methods.toString = {
      value: Array.prototype.join
    };
    return Object.create(Array.prototype, methods);
  };
})();

We can now create “sub arrays” via makeSubArray function. It accepts one argument — an object with methods to add to [[Prototype]] of returned “sub array”.

var subMethods = {
  last: {
    value: function() {
      return this[this.length - 1];
    }
  }
};
var sub = makeSubArray(subMethods);
var sub2 = makeSubArray(subMethods);
// etc.

We can also hide this factory method behind a constructor, to make it similar to Array’s one:

var SubArray = (function() {
  var methods = { 
    last: { 
      value: function() {
        return this[this.length - 1];
      } 
    }
  };
  return function() {
    var arr = makeSubArray(methods);
    if (arguments.length === 1) {
      arr.length = arguments[0];
    }
    else {
      arr.push.apply(arr, arguments);
    }
    return arr;
  };
})();

And then use it as you would use regular Array constructor:

var sub = new SubArray(1, 2, 3);
 
sub.length; // 3
sub; // [1, 2, 3]
 
sub.length = 1;
sub; // [1]
 
sub[10] = 'x';
sub.push(1);

You can find this version of SubArray together with unit tests in Gtihub repository. For brevity, I made this implementation mainly take care of length/indices relation; certain methods (e.g. concat) do not behave identical to Array and need to be implemented accordingly.

[[Class]] limitations

The implementation we have just seen — the one utilizing property accessors — is great. It doesn’t require any host objects (such as iframes); it preserves relation between length and numeric properties; it even disallows out-of-range values for length or indices. All it requires is support for ES5 (or even just Object.create method).

But the dramatic title of this post is not there just for fun. There’s one little detail we’re missing in this otherwise complete implementation. And that detail is proper [[Class]] value — something that ECMAScript still doesn’t give full control over.

I wrote about [[Class]] before, when explaining how to detect arrays. In a nutshell, [[Class]] is an internal property of objects in ECMAScript. Its value is never exposed directly, but can still be inspected using certain methods (e.g. Object.prototype.toString). The usefulness of [[Class]] is that it allows to detect type of objects without relying on instanceof operator or checking object’s constructor — both of which fall short to detect objects from other contexts (e.g. iframes), as we’ve seen earlier.

Now, since objects created by makeSubArray are nothing but plain Object objects (only with special length getters/setters), their [[Class]] is also that of “Object” not an “Array”! We’ve taken care of length/indices relation, we’ve set up Array.prototype inheritance, but there’s no way to change object’s [[Class]] value. And so this solution can not claim to be complete.

Does [[Class]] matter?

You might be wondering — what are the actual implications of these pseudo-array objects having [[Class]] of “Object” not an “Array”. Do we even care? Well, for once, there’s an issue with object detection. Ironically, the solution I proposed to detect arrays relies on [[Class]], and so would fall short with objects like these.

// assuming that `sub` is a pseudo-array
Object.prototype.toString.call(sub) === '[object Array]'; // false

Another, probably more important, implication is that some of the methods in ECMAScript actually rely on [[Class]] value. For example, a well-known Function.prototype.apply accepts an array as its second argument (as well as an arguments object). Section 15.3.4.3 of ES3 says — “if argArray is neither an array nor an arguments object (see 10.1.8), a TypeError exception is thrown”. What this means is that if we pass pseudo-array object as a second argument to apply it will throw TypeError. apply doesn’t know or care if an object inherits from Array.prototype; neither does it care about object implementing special length/indices behavior. All it cares is that object is of proper type — type that we, unfortunately, can not emulate.

// assuming that `sub` is a pseudo-array
someFunction.apply(this, sub); // TypeError

There’s some vagueness in specification on this matter. For example, in Date.prototype.setTime spec says “If the this value is not a Date object, throw a TypeError exception.”, but in Date.prototype.getTime, it uses [[Class]] rather than just “not a Date object” — “If the this value is not an object whose [[Class]] property is “Date”, throw a TypeError exception”.

It’s probably safe to assume that these 2 phrases — “Date object” and “object with [[Class]] of ‘Date’” — have identical meaning. Ditto for “Array object” and “object with [[Class]] of ‘Array’”, as well as others.

Function.prototype.apply is not the only method sensitive to [[Class]] of an object. Array.prototype.concat, for example, follows different algorithm based on whether an object is an array or not (in other words — whether it has [[Class]] of “Array” or not).

// array ([[Class]] == "Array")
var arr = ['x', 'y'];
 
// object with numeric properties ([[Class]] == "Object")
var obj = { '0': 'x', '1': 'y' };
 
[1,2,3].concat(arr); // [1, 2, 3, 'x', 'y']
[1,2,3].concat(obj); // [1, 2, 3, { '0': 'x', '1': 'y' }]

As you can see, array values are “flattened”, whereas non-array ones are left as is. It is certainly possible to give these pseudo-arrays custom implementation of concat (and “fix” any other of Array.prototype.* methods), but the problem with Function.prototype.apply can not be solved.

It’s worth mentioning that another downside of accessor-based pseudo-array approach is performance. I haven’t done any tests, but it’s pretty clear that an implementation which has to enumerate over all numeric properties on every access of length property is not going to perform well. This is why I can’t recommend this solution for anything other than educational purposes.

Wrappers. Direct property injection.

Realizing a somewhat futile nature of subclassing arrays in Javascript often makes alternative solutions look very attractive. One of such solutions is using wrappers. Wrapper approach avoids setting up inheritance or emulating length/indices relation. Instead, a factory-like function can create a plain Array object, and then augment it directly with any custom methods. Since returned object is an Array one, it maintains proper length/indices relation, as well as [[Class]] of “Array”. It also inherits from Array.prototype, naturally.

function makeSubArray() {
  var arr = [ ];
  arr.push.apply(arr, arguments);
  arr.last = function() { 
    return this[this.length - 1];
  };
  return arr;
}
 
var sub = makeSubArray(1, 2, 3);
sub instanceof Array; // true
 
sub.length; // 3
sub.last(); // 3

While direct extension of array object is a beautiful, simplistic solution, it’s not without downsides. The main disadvantage is that on each invocation of constructor, an array needs to be extended with N number of methods. The time it takes to create an array is no longer a constant (if methods were on SubArray.prototype), but is directly proportional to the number of methods that need to be added.

Wrappers. Prototype chain injection.

To overcome the problem of “N methods”, another variation of wrappers can be used — the one in which object’s prototype chain is augmented, rather than object itself. Let’s see how this could be done:

function SubArray() { }
SubArray.prototype = new Array;
SubArray.prototype.last = function() {
  return this[this.length - 1];
};
 
function makeSubArray() {
  var arr = [ ];
  arr.push.apply(arr, arguments);
  arr.__proto__ = SubArray.prototype;
  return arr;
}

The idea is simple. When makeSubArray function is executed, two things happen: 1) an array object is created and is populated with any passed arguments; 2) object’s prototype chain is augmented in such way so that next object is SubArray.prototype, not original Array.prototype. The augmentation of prototype chain is done via non-standard __proto__ property.

But what happens in makeSubArray function is of course only half of the story. To make sure that object has Array.prototype in its prototype chain, we need to make SubArray.prototype inherit from it. This is exactly what’s being done on a second line of this snippet (SubArray.prototype = new Array). Prototype chain of an object returned from makeSubArray now looks like this:

new SubArray()
    |
    | [[Prototype]]
    |
    v
SubArray.prototype
    |
    | [[Prototype]]
    |
    v
Array.prototype
    |
    | [[Prototype]]
    |
    v
Object.prototype
    |
    | [[Prototype]]
    |
    v
   null

And because returned object is actually an Array, not an Object one, we also get length/indices relation as well as proper [[Class]] value. In fact, we can go even further and move initialization logic into SubArray constructor itself:

function SubArray() {
  var arr = [ ];
  arr.push.apply(arr, arguments);
  arr.__proto__ = SubArray.prototype;
  return arr;
}
SubArray.prototype = new Array;
SubArray.prototype.last = function() {
  return this[this.length - 1];
};
 
var sub = new SubArray(1, 2, 3);
 
sub instanceof SubArray; // true
sub instanceof Array; // true

Even though augmenting prototype chain is a more performant solution, there’s a clear downside — it relies on non-standard __proto__ property. ECMAScript, unfortunately, does not allow to set [[Prototype]] of an object — internal property referencing immediate ancestor in its prototype chain. Not even in 5th edition. Even though __proto__ is supported by a rather large number of implementations, it is far from being truly compatible.

Summary

So here it is; all the fun intricacies of subclassing arrays in Javascript.

We’ve seen that contrary to what might seem, actual inheritance is by far not the only aspect of subclassing arrays in Javascript; that arrays are different from regular objects by having special length/indices relation; how this length/indices relation is important and has nothing to do with prototype chain of an object; how arrays have special [[Class]] value of “Array” which is also rather important, and isn’t inherited either; how it’s not possible to change [[Class]] value of an object — not even in ECMAScript 5. We looked at different ways to “subclass” an array, starting from borrowing Array constructors from other contexts, and ending with augmentation of prototype chain. We examined benefits and downsides of each one of those solutions.

What we haven’t touched upon is the performance metrics of each of the implementations — perhaps a good topic for another discussion.

On this note, I leave you with a table summarizing pros/cons of the above mentioned techniques.

Proper [[Class]] length/indices Uses native objects only Requires ES3 only
Stack (Andrea Giammarchi) No No Yes Yes
IFrame borrowing (Dean Edwards) Yes Yes No Yes
Accessors No Yes Yes No
Direct extension Yes Yes Yes Yes
Prototype extension Yes Yes Yes No

[1] Whether this endeavor is something worth pursuing is a topic for another discussion

P.S. Big thanks to John David Dalton for reviewing an article and giving useful suggestions.

Archives Posts

JScript and DOM changes in IE9 preview 3

June 24th, 2010 by kangax

3rd preview of IE9 was released yesterday, with some amazing additions, like canvas element and an extensive ES5 support. I’ve been digging through it a little, to see what has changed and what hasn’t — mainly looking at JScript and DOM. I posted some of the findings on twitter, but want to also list them here, as it’s not very convenient to share code snippets in 140 characters. Referencing it all in one place will hopefully make it easier for IE team to find and fix these deficiencies.

ECMAScript 5 and JScript

The big news is that IE9pre3 has (almost) full support for ES5. By “full support”, I mean that it implements majority of new API, such as Object.create, Object.defineProperty, String.prototype.trim, Array.isArray, Date.now, and many other additions. As of now, IE9 implements the largest number of new methods; even more than latest Chrome, Safari and Firefox. Unbelievable, isn’t it? :)

screenshot of es5 compatibility table

You can see the results in this compatibility table (note that it lists results of mere “existence” testing, not any kind of conformance).

What’s missing is strict mode, which actually isn’t implemented in any of the browsers yet.

Some of the things I noticed:

ES5 Object.getPrototypeOf on host objects seems to lie, always returning null instead of proper value of [[Prototype]]:

  Object.getPrototypeOf(document.body); // null
  Object.getPrototypeOf(document); // null
  Object.getPrototypeOf(alert); // null
  Object.getPrototypeOf(document.childNodes); // null

This doesn’t happen in other browsers that implement Object.create at the moment, such as latest Chrome, WebKit or Firefox. In Chrome, for example:

  Object.getPrototypeOf(document.body) === HTMLBodyElement.prototype;
  Object.getPrototypeOf(document) === HTMLDocument.prototype;
  Object.getPrototypeOf(alert) === Function.prototype;
  Object.getPrototypeOf(document.childNodes) === NodeList.prototype

… and so on.

Interestingly, bound functions in IE9pre3 are represented as “function(){ [native code] }”, similar to host objects:

  var bound = (function f(x, y){ return this; }).bind({ x: 1 });
  bound + ''; // "function(){ [native code] }"
 
  // compare to
 
  alert + ''; // "function alert(){ [native code] }"

Note how function representation does not include identifier (f), parameters (x and y), nor representation of function body (return this;). This of course proves once again that relying on function decompilation is NOT a good idea.

Whitespace character class (as in /\s/) still doesn’t match majority of whitespace characters (as defined by specs). These include “U+00A0”, “U+2000” to “U+200A”, “U+3000”, etc. The test is available here. Curiously, ES5 String.prototype.trim seems to “understand” those characters as whitespace very well, producing empty string — as expected — for something like '\u00A0'.trim().

It was nice to see that ES5 Array.isArray is about 20 times faster than custom implementation, such as this one:

  function isArray(o) {
    return Object.prototype.toString.call(o) === "[object Array]";
  }

The difference in speed is similar to other browsers that implement this method.

An infamous, 10+ year-old JScript NFE bug, which I described at length before, is finally fixed:

  var f = function g() { return f === g; };
  typeof g; // "undefined"
 
  f(); // true

arguments’ [[Class]] is now an “Arguments”, just like ES5 specifies it:

  var args = (function(){ return arguments; })();
  Object.prototype.toString.call(args); // "[object Arguments]"

DOM

Unfortunately, the entire host objects infrastructure still looks very similar to the one from IE8. Host objects don’t inherit from Object.prototype, don’t report proper typeof, and don’t even have basic properties like “length” or “prototype”, which all function objects must have:

  alert instanceof Object; // false
  typeof alert; // "object"
  alert.length; // undefined

Because they don’t inherit from Object.prototype, we don’t have any of Object.prototype methods, naturally:

  alert.toString; // undefined
  alert.constructor; // undefined
  alert.hasOwnProperty; undefined

Object.prototype is not the only object host methods fail to inherit from. In majority of modern browsers, host objects also inherit from Function.prototype and so have Function.prototype methods like call and apply. This doesn’t happen in IE9pre3.

  alert instanceof Function; // false
  document.createElement instanceof Function; // false
 
  alert.call; // undefined

Curiously, call and apply are present on some host objects, but they are still not inherited from Function.prototype:

  typeof document.createElement.call; // "function"
  document.createElement.call === Function.prototype.call; // false

Host objects’ [[Class]] is far from ideal as well. IE9pre3 actually violates ES5, which says that objects implementing [[Call]] (or in other words — are callable) should have [[Class]] of “Function” — even if they are host objects. In IE9pre3, alert is a callable host object, yet it reports its [[Class]] as “Object” not “Function”. Not good.

  Object.prototype.toString.call(alert); // "[object Object]"
  Object.prototype.toString.call(document.createElement); // "[object Object]"

IE9pre3 still messes up DOM objects’ attributes and properties, although not as badly as earlier versions:

  var el = document.createElement('p');
  el.setAttribute('x', 'y');
  el.x; // 'y'
 
  el.foobarbaz = 'moo';
  el.hasAttribute('foobarbaz'); // true
  el.getAttribute('foobarbaz'); // 'moo'

Some old, humorous bugs can still be seen in IE9pre3, such as methods returning “string” when applied typeof on:

  typeof Option.create; // "string"
  typeof Image.create; // "string"
  typeof document.childNodes.item; // "string"

Undeclared assignments still throw error when same-id’ed elements are present in DOM, however not with same-name’ed elements (as it was in previous versions):

  <div id="foo"></div>
  <a name="bar"></a>
  ...
  <script>
    foo = function(){ /* ... */ }; // Error
    bar = function(){ /* ... */ }; // no Error
  </script>

Similarly to IE8, only Element and specific element type interfaces (HTMLDivElement, HTMLScriptElement, HTMLSpanElement, etc.) are exposed as same-named global properties. Node and HTMLElement are still missing, and element’s prototype chain most likely still looks like this:

  document.createElement('div');
    |
    | [[Prototype]]
    v
  HTMLDivElement.prototype
    |
    | [[Prototype]]
    v
  Element.prototype
    |
    | [[Prototype]]
    v
  null

…rather than what can be seen in almost all other modern browsers:

  document.createElement('div');
    |
    | [[Prototype]]
    v
  HTMLDivElement.prototype
    |
    | [[Prototype]]
    v
  HTMLElement.prototype
    |
    | [[Prototype]]
    v
  Element.prototype
    |
    | [[Prototype]]
    v
  Node.prototype
    |
    | [[Prototype]]
    v
  Object.prototype
    |
    | [[Prototype]]
    v
  null

getComputedStyle from DOM Level 2 is still missing, however its value is mysteriously a null, not undefined. The property actually exists on an object, but has a value of null. Hopefully, this is just a placeholder and proper method will be added before final release.

  document.defaultView.getComputedStyle; // null
  'getComputedStyle' in document.defaultView; // true

Array.prototype.slice can now convert certain host objects (e.g. NodeList’s) to arrays — something that majority of modern browsers have been able to do for quite a while:

  Array.prototype.slice.call(document.childNodes) instanceof Array; // true

That’s it for now.

Unfortunately, I don’t have much time to look into these things extensively, at the moment. There might be more updates on twitter.

As always, any corrections, suggestions, and additions are much appreciated.

Archives Posts

Sputniktests web runner

November 9th, 2009 by kangax

Intro

Sputniktests is an ECMA-262 conformance test suite made by Google. For those who don’t know, ECMA-262 is a standard behind well-known implementations like JScript, JavaScript and others. It’s what describes ECMAScript language.

Ever since Sputniktests release few months ago, I wanted to see how various browsers conform to the standard. Unfortunately it wasn’t very easy to do so. The way test suite could be executed is by running a python script, passing it an executable file of implementation such as V8 or Rhino. It wasn’t possible to just check conformance of any browser, especially browser with implementation that can’t be run separately.

I realized that a “web runner” for Sputniktests would be a useful thing to have and made one. In the end, it was a fun little exercise that made me understand ECMAScript language just a bit better.


Sputniktests screenshot

Web runner is merely a wrapper around original test suite, made fit to run in a browser environment. Its job is to execute tests sequentially and log any errors/failures in the process. When done, it reports elapsed time and number of errors.

Why it doesn’t always matter

Contrary to something like Acid test, Sputniktests is not immediately useful. Passing it fully does not necessarily make a browser more capable than the other one, with lower score. Many failures in modern browsers are rather insignificant from practical point of view and might not even affect any real world applications.

But there’s still a huge value in a conformance test suite like this. By testing every single detail of ECMAScript implementation, Sputniktests could help minimize regressions, both — functional and performance ones. It could serve as an excellent foundation for creating a new ECMAScript implementation. And last, but not least, it could help browser implementors find actual valid bugs in browser engines.

There’s an important point to understand regarding test suite failures: not all of them can — or even should — be fixed, and here’s why:

Proprietary extensions

It is a well-known fact that specifications allows implementations to introduce proprietary extensions. JScript and JavaScript ™ have been doing this for years. JScript’s conditional comments and JavaScript’s getters/setters demonstrate it very well. Another famous example is the way function declarations are treated in statements.

The point here is simple. Failure in Sputniktests can be the result of proprietary extension and might not even be considered a bug.

ECMAScript 5th edition

Another cause of “valid failures” might be the next edition of ECMAScript, currently draft. Some browsers have already started implementing parts of it and might fail to comply with 3rd edition that Sputniktests checks against. For-in handling is a good example of such “misunderstanding”.

Backwards compatibility

Finally, there’s always a beloved backwards compatibility to keep in mind. It might not be possible to fix otherwise valid bug/deviation due to this wonderful constraint.

How runner works

Runner works very simply. First, a query of tests is initialized and populated with all of the 5000+ tests. Then, a table of tests to ignore is initialized and is later used for… ignoring certain conflicting or complex tests. Finally, runner starts picking tests from the query, with a certain interval in between — to keep UI functional during this rather intensive process. Note that interval can be changed on the main screen before starting test suite but defaults to 50ms.

For every test, runner creates a new iframe, inserts it into a current document and writes a script element into it. This is done to keep tests isolated from each other, so that one test wouldn’t affect environment of the next one. Once script is executed, a meta data is printed to the screen: name of current test, total number of errors/failures, elapsed time, etc. Iframe is then deleted.

Before adding actual test script to an iframe, runner first injects a complementary script into it. That script defines global $ERROR, $FAIL and $PRINT and simply proxies them to same-named functions of main (parent) document. When these methods are called, they write an output to main document log area.

Browser comparison

So how do modern and not so modern browsers stand against standard? Here’s a comparison table (note that less score is better and that score represents total number of errors and failures):

Sputniktests results chart

We can see few interesting things here:

  • Surprisingly, Opera 9.64 is a winner. Even more strange is that Opera 10 has some serious regressions and falls far far behind, joining ancient Safari 2.x
  • I was expecting Safari 4 to beat Firefox 3.5 (or 3.7), but it doesn’t even compare with Firefox 2.x
  • Firefox 3.7 (currently alpha) performs 1 point worse than Firefox 3.5
  • It’s amusing to see Internet Explorer results. The latest and greatest 8th version is practically identical to IE 5.5 (!!!). This hints at how fast bugs are being fixed in JScript.
  • Chrome 4 gets surprisingly low number (in between Firefox 3 and Firefox 2). I thought it would beat everyone else, considering that Sputniktests was originally developed to aid Chrome conform to the standard.
  • Out of all latest browsers (not considering regressed Opera 10), Konqueror gets the poorest score and probably needs to work on its compliance in the near future.

Notable deviations

Here are some of the bugs and quirks I noticed in few browsers. Each is accompanied with a short explanation.

1) for (var prop in null) { }
for (var prop in undefined) { }

These statements should actually result in a TypeError, and the explanation to that is pretty simple. During evaluation, an expression on the right hand side of in is being applied internal ToObject method. This internal method is the one that throws TypeError when given null or undefined value.

You might be wondering if ToObject is used anywhere else and has similar consequences? It does. Roughly, in 3 cases:

  • foo[bar]
  • with (foo) …
  • for (bar in foo) …

When foo evaluates to null or undefined, in any of these cases, TypeError is inevitable. Most browsers, however, throw error with first two statements, but not the last one. This is, arguably, a more useful behavior, even though technically, not ECMA-compliant.

Note that 5th edition of ECMAScript actually changes “for-in” to do exactly what most of the browsers currently do — not throw TypeError, but instead proceed as if foo was an empty object.

2) Number('\u00A0') === 0

When Number is called as a function, it performs type conversion. String to number type conversion is expressed in rather involved algorithm, but one of the simplest rules there is that when string consists of a whitespace character (or is empty), the result is 0. This means that both — Number('') and Number(' ') should evaluate to 0.

Some browsers, however, fail to comply in regards to the notion of whitespace character. Passing plain U+0020 does the job, but U+00A0 (and a whole slew of other ones) often doesn’t. Instead, NaN is returned for what should really be a 0.

3) parseFloat(“\u205F -1.1”)

Similar bug exists with handling of white space characters by parseFloat. Spec explains that any leading whitespace is ignored in input string. Something like parseFloat(' 2.5 ') should result in 2.5. And again, some implementations fail with rarer whitespace characters, such as U+205F or U+1680. Interestingly, only Opera is fully conforming here. Firefox and Webkit both fail one way or another.

4) Error.prototype.message

This one looks like a real bug in WebKit. WebKit throws “Unknown error” when merely attempting to access Error.prototype.message. Sputniktests actually managed to mess up here as well: test suite asserts that the property is an empty string, whereas specs say that Error.prototype.message is an implementation-dependent string (which means that it could as well be “foo-bar_BaZ”). Sputniktests need to check type of a property — typeof Error.prototype.message == 'string', and WebKit needs to stop throwing error.

5) EvalError and other xxxError ones are non-enumerable global properties

This one seems like a rather insignificant compliance. All properties of global object are specified to be non-enumerable (that is — have {DontEnum} internal attribute set on). However, at least WebKit enumerates over all of the global EvalError, RangeError, SyntaxError, etc.

6) [[Construct]] and .prototype of built-in objects.

There’s a whole slew of failures in Firefox due to built-in objects having what they shouldn’t have — prototype property and [[Construct]] method. To remind you, [[Construct]] is an internal method that’s called when applying new operator to an object — usually a function. It is basically what makes certain objects “constructable”, and what every native function object has intrinsically. The failing built-ins are global methods like parseInt, isNaN, encodeURI, as well as properties of Object.prototype, Array.prototype, and so on. To quote specs:

“None of the built-in functions described in this section shall initially have a prototype property unless otherwise specified in the description of a particular function”

and:

“None of the built-in functions described in this section shall implement the internal [[Construct]] method unless otherwise specified in the description of a particular function.”

7) typeof new RegExp() === 'function'

This is probably one of the most famous WebKit deviations. As you might know, a large number of browsers make regex objects callable. Callable regular expressions allow to replace /(a|b)/.exec('a') with simply /(a|b)/('a'). I’m not sure where this non-standard behavior originates from, but it’s probably still kept around for backwards compatibility.

Interestingly, regex objects in WebKit seem to actually implement internal [[Call]] method. As per specs, any native object that implements [[Call]] should return “function” when applied typeof to, so WebKit merely follows the standard here. However, this little addition results in a side effect: regex objects are being reported as functions — typeof /x/ == 'function'.

Older Firefox (e.g. Firefox 2), by the way, behaves just like WebKit here.

8) new RegExp(undefined)

Another bug in Firefox is the way RegExp constructor treats pattern of undefined value. Specs mandate that when undefined, pattern should simply become an empty string (i.e. functionally identical to new RegExp('')). WebKit and Opera do just that, but Firefox converts undefined into its string representation — “undefined”, making regex behave as if it was created literally via /undefined/.

9) "".search() and "--undefined--".search()

This one is related to a previous bug. The purpose of String.prototype.search is to find offset within the string where a given pattern matches. As usual, all is nice and well, until we start dealing with non-trivial input values.

When given a non-regex object as a first argument, String.prototype.search should apply new RegExp() on it. This means that "".search() is functionally identical to "".search(new RegExp()), where undefined value is being applied new RegExp on. This expression essentially matches empty regex against empty string. The result of "".search(), quite obviously, should be 0, since empty regex (i.e. nothingness) matches at the very first position of empty string it’s being applied to.

Firefox, however, erroneously makes /undefined/ out of new RegExp(), and fails to match empty string at 0th position. For the very same reasons, it returns 2 in "--undefined--".search(), instead of correct 0.

10) "foo".substring(0, undefined);

Another weird quirk in Firefox is the way it handles second argument — ending position — of String.prototype.substring. Spec clearly states that when undefined, position is considered to be end of a string. For example, "foobar".substring(0, 2) should return "fo", but "foobar".substring(0)"foobar", since end position is considered to be at the end of a string.

Firefox does this partially right, producing proper result when argument is missing — "foobar".substring(0) === "foobar", but somehow fails to do the same, when passing undefined value explicitly — "foobar".substring(0, undefined) === "".

11) Line terminators in regex literals

An interesting quirk present in both — Firefox and Opera, but not in WebKit is related to regular expression literals. Spec makes it clear that regex literals are not allowed to have line terminators in their bodies. Not even when escaped with backslash. Firefox and Opera, however, seem to be perfectly fine with line terminators as long as those are escaped: eval("/\\\u000A/") results in an invalid regex literal that looks like:

/\
/

Test suite errors and oversights

Sputniktests is a truly outstanding effort. I’m amazed at the amount of work that was put into it. However, the project is still in its infancy, and there are clearly some things that could be done better.

What striked me as being inconsistent and harmful is the way Sputniktests declares variables: sometimes using proper declarations (var foo = 'bar'), other times — using undeclared assignments (foo = bar). Undeclared assignments is a very bad practice, and there’s no reason to rely on it here or anywhere. It would be nice to see this changed in the future versions.

Other inconsistencies are with usage of $PRINT function. Sometimes it’s used to log additional information about tests, but not always.

There are cases when tests rely on compliance of other components and, as a result, give false positives. For example, a test for function expression in for-in statement assumes that prototype property of a function is enumerable:

for (x in function __func(){return 0;}){
  if (x == "prototype") 
    var __reached = 1;
}
if (__reached !== 1) {
    $ERROR('#2: function expession inside of for-in expression is allowed');
}

Per specification, prototype property of function object is in fact enumerable (it only has {DontDelete} attribute set on). But Firefox, for example, makes prototype non-enumerable and so fails this test. It fails it erroneously because function expression in for-in statements — what this test is actually supposed to ensure — is allowed in Firefox just fine.

A similar case of false results happens when testing for Array.prototype compliance. Array.prototype should itself be an array object; its internal [[Class]] should be that of all array objects — “Array”. The test, unfortunately, checks this compliance by deleting Array.prototype.toString, then calling toString on Array.prototype, letting Object.prototype.toString propagate through and ensuring that [[Class]] of Array.prototype is “Array”.

delete Array.prototype.toString;
if (Array.prototype.toString() !== "[object " + "Array" + "]") {
  $ERROR(/* ... */);
}

Clients that have non-deletable Array.prototype.toString fail this test even with fully conforming Array.prototype.

It might be safer to use call here, but then clients with non-conforming call could result in false positives as well:

// Is Array.prototype's [[Class]] an "Array"?
if (Object.prototype.toString.call(Array.prototype) !== "[object Array]") {
  $ERROR(/* ... */);
}

It is, of course, very hard to avoid these false positives. We can only guess which things are more likely to be compliant. We can also ignore these errors: if certain environment fails one test due to non-conformance of unrelated component, that component should simply be fixed as well.

Test suite has some minor inconsistencies — missing semicolons here and there, or extra ones (after statements). There are superfluous !(... == ...) used instead of (... != ...), as well as if (... == true) instead of if (...). I also noticed few missing conformance checks.

I have no doubt all these annoyances will be gone in the future.

Future work

Having extensive compliance test suite can really help modern browsers achieve even better conformance. I hope we’ll see some of the bugs revealed through the Sputniktests fixed in the near future. I hope we’ll also see less regressions, if browser implementors integrate it into existing test suites. I also hope Sputniktests can help people learn and understand ECMAScript better.

Web runner is published on github, so that anyone can contribute easily. There are many more things we can improve. I can think of additional features like running separate sections of tests or even individual ones; being able to see test contents right in a browser, or make it possible to pause/resume test suite execution.

Any comments, corrections, suggestions are as always very much welcomed.

And finally, I would like to, once again, thank Sputniktest team for their outstanding efforts to help move web forward.

« Previous Entries