← back 2758 words 11 December 2014

Know thy reference

Abusing leaky abstractions for a better understanding of “this”

It was a sunny Monday morning that I woke up to an article on HackerNews, simply named “This in Javascript”. Curious to see what all the attention is about, I started skimming through. As expected, there were mentions of this in global scope, this in function calls, this in constructor instantiation, and so on. It was a long article. And the more I looked through, the more I realized just how overwhelming this topic might seem to folks unfamiliar with intricacies of this, especially when thrown into a myriad of various examples with seemingly random behavior.

It made me remember a moment from few years ago when I first read Crockford’s Good Parts. In it, Douglas succinctly laid out a piece of information that immediately made everything much clearer in my head:

The `this` parameter is very important in object oriented programming, and its value is determined by the invocation pattern. There are four patterns of invocation in JavaScript: the method invocation pattern, the function invocation pattern, the constructor invocation pattern, and the apply invocation pattern. The patterns differ in how the bonus parameter this is initialized.

Determined by invocation and only 4 cases? Well, that’s certainly pretty simple.

With this thought in mind, I went back to HackerNews, wondering if anyone else thought the subject was presented as something way too complicated. I wasn’t the only one. Lots of folks chimed in with the explanation similar to that from Good Parts, like this one:

Even more simply, I'd just say:
1) The keyword "this" refers to whatever is left of the dot at call-time.
2) If there's nothing to the left of the dot, then "this" is the root scope (e.g. Window).
3) A few functions change the behavior of "this"—bind, call and apply
4) The keyword "new" binds this to the object just created

Great and simple breakdown. But one point caught my attention — “whatever is left of the dot at call-time”. Seems pretty self-explanatory. For foo.bar(), this would refer to foo; for foo.bar.baz(), this would refer to foo.bar, and so on. But what about something like (f = foo.bar)()? After all, it seems that “whatever is left of the dot at call time” is foo.bar. Would that make this refer to foo?

Eager to save the world from unusual results in obscure cases, I rushed to leave a prompt comment on how the concept of “left of the dot” could be hairy. That for best results, one should understand concept of references, and their base values.

It is then that I shockingly realized that this concept of references actually hasn’t been covered all that much! In fact, searching for “javascript reference” yielded anything from cheatsheets to “pass-by-reference vs. pass-by-value” discussions, and not at all what I wanted. It had to be fixed.

And so this brings me here.

I’ll try to explain what these mysterious References are in Javascript (by which, of course, I mean ECMAScript) and how fun it is to learn this behavior through them. Once you understand References, you’ll also notice that reading ECMAScript spec is much easier.

But before we continue, quick disclaimer on the excerpt from Good Parts.

Good Parts 2.0

The book was written in the times when ES3 roamed the prairies, and now we’re in a full state of ES5.

What changed? Not much.

There’s 2 additions, or rather sub-points to the list of 4:

method invocation
function invocation
- “use strict” mode (new in ES5)
constructor invocation
apply invocation
- Function.prototype.bind (new in ES5)

Function invocation that happens in strict mode now has its this values set to undefined. Actually, it would be more correct to say that it does NOT have its this “coerced” to global object. That’s what was happening in ES3 and what happens in ES5-non-strict. Strict mode simply avoids that extra step, letting undefined propagate through.

And then there’s good old Function.prototype.bind which is hard to even call an addition. It’s essentially call/apply wrapped in a function, permanently binding this value to whatever was passed to bind(). It’s in the same bracket as call and apply, except for its “static” nature.

Alright, on to the References.

Reference Specification Type

To be honest, I wasn’t that surprised to find very little information on References in Javascript. After all, it’s not part of the language per se. References are only a mechanism, used to describe certain behaviors in ECMAScript. They’re not really “visible” to the outside world. They are vital for engine implementors, and users of the language don’t need to know about them.

Except when understanding them brings a whole new level of clarity.

Coming back to my original “obscure” example:

How do we know that 1st one’s this references foo, but 2nd one — global object (or undefined)?

Astute readers will rightfully notice — “well, the expression to the left of () evaluates to f, right after assignment; and so it’s the same as calling f(), making this function invocation rather than method invocation.”

Alright, and what about this:

“Oh, that’s just grouping operator! It evaluates from left to right so it must be the same as foo.bar(), making this reference foo”

“Strange”

And how about this:

“Well… considering last example, it must be undefined as well then? There must be something about those parenthesis”

“Ok, I’m confused”

Theory

ECMAScript defines Reference as a “resolved name binding”. It’s an abstract entity that consists of three components — base, name, and strict flag. The first 2 are what’s important for us at the moment.

There are 2 cases when Reference is created: in the process of Identifier resolution and during property access. In other words, foo creates a Reference and foo.bar (or foo['bar']) creates a Reference. Neither literals — 1, "foo", /x/, { }, [ 1,2,3 ], etc., nor function expressions — (function(){}) — create references.

Here’s a simple cheat sheet:

Cheat sheet

Example	Reference?	Notes
"foo"	No
123	No
/x/	No
({})	No
(function(){})	No
foo	Yes	Could be unresolved reference if `foo` is not defined
foo.bar	Yes	Property reference
(123).toString	Yes	Property reference
(function(){}).toString	Yes	Property reference
(1,foo.bar)	No	Already evaluated, BUT see grouping operator exception
(f = foo.bar)	No	Already evaluated, BUT see grouping operator exception
(foo)	Yes	Grouping operator does not evaluate reference
(foo.bar)	Yes	Ditto with property reference

Don’t worry about last 4 for now; we’ll take a look at those shortly.

Every time a Reference is created, its components — “base”, “name”, “strict” — are set to some values. The strict flag is easy — it’s there to denote if code is in strict mode or not. The “name” component is set to identifier or property name that’s being resolved, and the base is set to either property object or environment record.

It might help to think of References as plain JS objects with a null [[Prototype]] (i.e. with no “prototype chain”), containing only “base”, “name”, and “strict” properties; this is how we can illustrate them below:

When Identifier foo is resolved, a Reference is created like so:

and this is what’s created for property accessor foo.bar:

This is a so-called “Property Reference”.

There’s also a 3rd scenario — Unresolvable Reference. When an Identifier can’t be found anywhere in the scope chain, a Reference is returned with base value set to undefined:

As you probably know, Unresolvable References could blow up if not “properly used”, resulting in an infamous ReferenceError (“foo is not defined”).

Essentially, References are a simple mechanism of representing name bindings; it’s a way to abstract both object-property resolution and variable resolution into a unified data structure — base + name — whether that base is a regular JS object (as in property access) or an Environment Record (a link in a “scope chain”, as in identifier resolution).

So what’s the use of all this? Now that we know what ECMAScript does under the hood, how does this apply to this behavior, foo() vs. foo.bar() vs. (f = foo.bar)() and all that?

Function call

What do foo(), foo.bar(), and (f = foo.bar)() all have in common? They’re function calls.

If we take a look at what happens when Function Call takes place, we’ll see something very interesting:

Notice highlighted step 6, which basically explains both #1 and #2 from Crockford’s list of 4.

We take expression before (). Is it a property reference? (foo.bar()) Then use its base value as this. And what’s a base value of foo.bar? We already know that it’s foo. Hence foo.bar() is called with this=foo.

Is it NOT a property reference? Ok, then it must be a regular reference with Environment Record as its base — foo(). In that case, use ImplicitThisValue as this (and ImplicitThisValue of Environment Record is always set to undefined). Hence foo() is called with this=undefined.

Finally, if it’s NOT a reference at all — (function(){})() — use undefined as this value again.

Are you feeling like this right now?

Assignment, comma, and grouping operators

Armed with this knowledge, let’s see if if we can explain this behavior of (f = foo.bar)(), (1,foo.bar)(), and (foo.bar)() in terms more robust than “whatever is left of the dot”.

Let’s start with the first one. The expression in question is known as Simple Assignment (=). foo = 1, g = function(){}, and so on. If we look at the steps taken to evaluate Simple Assignment, we’ll see one important detail:

Notice that the expression on the right is passed through internal GetValue() before assignment. GetValue() in its turn, transforms foo.bar Reference into an actual function object. And of course then we proceed to the usual Function Call with NOT a reference, which results in this=undefined. As you can see, (f = foo.bar)() only looks similar to foo.bar() but is actually “closer” to (function(){})() in a sense that it’s an (evaluated) expression rather than an (untouched) Reference.

The same story happens with comma operator:

(1,foo.bar)() is evaluated as a function object and Function Call with NOT a reference results in this=undefined.

Finally, what about grouping operator? Does it also evaluate its expression?

And here we’re in for surprise!

Even though it’s so similar to (1,foo.bar)() and (f = foo.bar)(), grouping operator does NOT evaluate its expression. It even says so plain and simple — it may return a reference; no evaluation happens. This is why foo.bar() and (foo.bar)() are absolutely identical, having this set to foo since a Reference is created and passed to a Function call.

Returning References

It’s worth mentioning that ES5 spec technically allows function calls to return a reference. However, this is only reserved for host objects, and none of the built-in (or user-defined) functions do that.

An example of this (non-existent, but permitted) behavior is something like this:

Of course, the current behavior is that non-Reference is passed to a Function call, resulting in this=undefined/global object (unless bar was already bound to foo earlier).

typeof operator

Now that we understand References, we can take a look in few other places for a better understanding. Take, for example, typeof operator:

Here is that “secret” for why we can pass unresolvable reference to typeof and not have it blow up.

On the other hand, if we were to use unresolvable reference without typeof, as a plain statement somewhere in code:

Notice how Reference is passed to GetValue() which is then responsible for stopping execution if Reference is an unresolvable one. It all starts to make sense.

delete operator

Finally, what about good old delete operator?

What might have looked like mambo-jumbo is now pretty nice and clear:

If it’s not a reference, return true (delete 1, delete /x/)
If it’s unresolvable reference (delete iDontExist)
- if in strict mode, throw SyntaxError
- if not in strict mode, return true
If it’s a property reference, actually try to delete a property (delete foo.bar)
If it’s a reference with Environment Record as base (delete foo)
- if in strict mode, throw SyntaxError
- if not in strict mode, attempt to delete it (further algorithm follows)

Summary

And that’s a wrap!

Hopefully you now understand the underlying mechanism of References in Javascript; how they’re used in various places and how we can “utilize” them to explain this behavior even in non-trivial constructs.

Note that everything I mentioned in this post was based on ES5, being current standard and the most implemented one at the moment. ES6 might have some changes, but that’s a story for another day.

If you’re curious to know more — check out section 8.7 of ES5 spec, including internal methods GetValue(), PutValue(), and more.

P.S. Big thanks to Rick Waldron for review and suggestions!

Did you like this? Donations are welcome

Perfection Kills