Perfection kills

Exploring Javascript by example

Archives Posts

Whitespace deviations

August 23rd, 2009 by kangax

I am reading a Regular Expression Cookbook by Jan Goyvaerts and Steven Levithan. It’s a truly excellent book on a subject, with an incredible level of attention to details. I am only half-way through the book, but have already learned few things about regular expressions – both general and javascript-related ones.

One thing I noticed missing in the book was a mention of whitespace character class (\s) discrepancies in current ECMAScript implementations. Cookbook rightfully explains that \s in Javascript matches any character defined as whitespace by the Unicode standard. What it fails to mention is how horribly this rule is actually implemented in modern browsers. While most of the implementations correctly handle ASCII whitespace characters, such as – U+0020 (Space), U+000B (Vertical Tab) and U+000A (Line Feed) – there’s much more chaos in anything above U+2000 (EN QUAD) point.

In practice such non-conformance can lead to surprising results when implementing something like trim function. If trim were to utilize \s, than it could miss quite common characters like U+00A0 (No-Break Space); In fact, trim used in jQuery or Prototype uses exactly that – standard whitespace character class (\s) – and so fails with any of these troublesome characters. One of the solutions, of course, is to replace \s with a custom character class, e.g.: – [\u0009\u000A\u000B\u000C\u000D\u0020\u00A0\u1680\u180E\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200A\u202F\u205F\u3000\u2028\u2029]

This topic comes up once in a while on comp.lang.javascript and there have been some efforts to document these discrepancies. I wanted to make a simple table of modern browsers compliance and used a test provided once by Richard Cornford (also available online for anyone to try it out).

Here’s a table demonstrating above mentioned deviations. It’s good to see Safari 4+ and Chrome 2+ conforming to specs fully. Hopefully, upcoming versions of Firefox will also take care of the remaining “failures”.

Code point / Browser Firefox 2-3.5 Safari 2.0-3.2.1 Safari 4 Opera 9.25, 9.64 Opera 10 IE 6-8 Chrome 2-3 Konqueror 4.2.2
(0×0009) [ASCII Tab] PASS PASS PASS PASS PASS PASS PASS PASS
(0×000A) [ASCII Line Feed] PASS PASS PASS PASS PASS PASS PASS PASS
(0×000B) [ASCII Vertical Tab] PASS PASS PASS PASS PASS PASS PASS FAIL
(0×000C) [ASCII Form Feed] PASS PASS PASS PASS PASS PASS PASS PASS
(0×000D) [ASCII Carriage Return] PASS PASS PASS PASS PASS PASS PASS PASS
(0×0020) SPACE PASS PASS PASS PASS PASS PASS PASS PASS
(0×00A0) NO-BREAK SPACE PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×1680) OGHAM SPACE MARK FAIL FAIL PASS PASS FAIL FAIL PASS FAIL
(0×180E) MONGOLIAN VOWEL SEPARATOR FAIL FAIL PASS FAIL FAIL FAIL PASS FAIL
(0×2000) EN QUAD PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×2001) EM QUAD PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×2002) EN SPACE PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×2003) EM SPACE PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×2004) THREE-PER-EM SPACE PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×2005) FOUR-PER-EM SPACE PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×2006) SIX-PER-EM SPACE PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×2007) FIGURE SPACE PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×2008) PUNCTUATION SPACE PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×2009) THIN SPACE PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×200A) HAIR SPACE PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×2028) LINE SEPARATOR PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×2029) PARAGRAPH SEPARATOR PASS FAIL PASS PASS PASS FAIL PASS FAIL
(0×202F) NARROW NO-BREAK SPACE FAIL FAIL PASS PASS PASS FAIL PASS FAIL
(0×205F) MEDIUM MATHEMATICAL SPACE FAIL FAIL PASS FAIL FAIL FAIL PASS FAIL
(0×3000) IDEOGRAPHIC SPACE PASS FAIL PASS PASS PASS FAIL PASS FAIL

Tests for Firefox, Safari and Opera were performed on Mac OS X (10.5.8); IE and Chrome – on Windows XP Pro SP2 (via VMWare); and Konqueror – on Ubuntu 9.04 (via VMWare)

Edit (28/09/2009)

Clarified operating systems (and their versions) used for testing; Aligned characters in a table by code point; Updated Opera to 10RC, added Chrome 3 to results, combined FF columns into one, since they are the identical; Sorted table by code point. Thanks to Dr J R Stockton and Luke Smith for suggestions.

Edit (04/09/2009)

Updated Opera 10RC to Opera 10 (Thanks to Garrett Smith for test); tested and updated table with results of Safari 2.x and older 3.x versions; fixed a bug in a testcase where `char` identifier (one of future reserved words as per ES3) would prevent script parsing in Safari 2.x

Filed under Uncategorized having 12 Comments »

Archives Posts

Detecting global variable leaks

August 8th, 2009 by kangax

detect-global bookmarklet

I have recently stumbled upon a blog post by Remy Sharp on detecting global variable leaks. As you probably know, Javascript is notorious at making such leaks way too easy. The problem is mainly with undeclared assignments which result in global variable declarations when variables are not found in the scope chain.

  (function(){
    var x = 1; // <== accidentally changed "," to ";"
        y = 2; // <== `y` is now a global variable
  })();

To be more precise, undeclared assignment actually results in global property assignment, not global variable declaration. The difference between two is rather subtle: variable declaration creates non-deletable property of a global object, whereas explicit or implicit property assignment creates deletable one. Another peculiarity can be observed in IE, where global property assignment is disallowed if there’s an element in a document with the same-named ID or NAME value. Global variable declaration, on the other hand, quietly overwrites existing property in cases like this.

Remy solves the problem with a bookmarklet that creates a blank context (essentially a window in an empty iframe), then uses that clean context to get the difference with the main one. The list of found variables is dumped into a console.

It’s worth mentioning that JSLint already allows detecting undeclared assignments, but JSLint can hurt feelings so we won’t use it. Well, actually JSLint performs so many validations, that it’s not always possible to detect undeclared assignments in huge scripts of legacy applications (like the one I wanted to examine). Running a test such as in this bookmarklet can be “applied on” any script.

The bookmarklet worked like a charm, but as soon as I plugged it into one of our applications, I was greeted with dozens of Prototype and Scriptaculous -related methods. On top of those, there were few google analytics and Mozilla -specific ones. Unfortunately, the original code was obfuscated and almost unreadable so I reproduced it from the scratch, this time making it possible to toggle certain property sets on and off. These property sets are – Prototype, Scriptaculous, Mozilla, Google Analytics and Firebug ones. The code is structured in such way that it should be easy to augment it with additional sets.

In the end, I found few leaks in one of our applications and even one in firebug (now fixed).

As usual, the bookmarklet and its source are on github.

Feel free to fork it.

Edit [9/5/2009]

Clarified global variable declaration vs. global property assignment (thanks to Garrett Smith)

Filed under bookmarklet having 18 Comments »