Type Checking: A Necessary Evil

JavaScript's type system is a constant source of programmer confusion and application bugs. Numerous talks and blog posts have been dedicated to its quirks. As an example, review the output of the following computations, taken from Node's REPL:

> 1 + 1
2                   // 1 + 1 = 2. seems right  
> 1 + '1'
'11'                // classic JavaScript  
> {} + []
0                   // object plus array is 0. wat?  
> [] + {}
'[object Object]'   // but array plus object is a string  
> {} - []
-0                  // sigh
> [] - {}
NaN                 // this is something I could actually expect  
> ({} + []) === 0
false               // but... I thought this was 0?  

Some of these results are more surprising than others. Some are completely nonsensical. For the most part, JavaScript's type quirks are minor annoyances. In other cases, they can lead to rather large bugs. With the increasing adoption of Node.js and server side JavaScript, the potential severity of these bugs is at an all time high.

On January 4th, 2016, a remote memory disclosure vulnerability related to the Node's Buffer was disclosed. The vulnerability, nicknamed nodebleed, was not a bug in Buffer, but rather a lack of type checking in the ws module. The offending code intended to create a Buffer containing a specified string. However, if a number was passed instead of a string, a Buffer was created containing uninitialized memory (this is documented behavior). This allowed a remote user to retrieve potentially sensitive data. The bug was ultimately fixed by checking if the input was a number, and converting it to a string.

Performing Type Checks

Clearly, type checking is important in a language like JavaScript. Unfortunately, this isn't always straightforward. Some example pitfalls include:

  • Ambiguity in typeof. null and all objects, except functions, return 'object'. Boxed primitives can be confusing - typeof true and typeof Boolean(true) evaluate to 'boolean', while typeof new Boolean(true) evaluates to 'object'.
  • instanceof is context specific, meaning that it does not behave as you would expect when data crosses context boundaries. For example, [] instanceof Array evaluates to true, as does vm.runInNewContext('[] instanceof Array'). However, vm.runInNewContext('foo instanceof Array', {foo: []}) evaluates to false because foo is from one context, while the Array constructor in question is from another.
  • Object.prototype.toString.call() isn't future proof. A common way to determine a variable's data type is to use Object.prototype.toString.call(someVariable), which exposes the variable's [[Class]] internal slot. The upcoming @@toStringTag will allow an object to spoof the value displayed using this technique.
  • Node has access to detailed type information, but requires dropping down to the C++ layer. Node is at the mercy of V8's C++ API, which changes relatively frequently. Additionally, this technique cannot be used in the browser.

In the large majority of cases, using a combination of typeof, instanceof, and Object.prototype.toString.call() is more than adequate. However, if you find yourself in an awkward situation, you're using Node, and C++ compilation is an option for you, then you may want to consider querying V8 for type information directly. I recommend the v8is module, which exposes V8's type checking functions to JavaScript. By going straight to the source of truth (V8), you can retrieve information that isn't available in JavaScript natively. For example, at the C++ level V8 has Int32 and Uint32 data types to represent signed and unsigned 32-bit integers. In JavaScript, number is the only numeric type.

Once you've established a variable's type, you typically either:

  1. Process the variable, as it is an acceptable type. This is the happy path.
  2. Throw a TypeError if the data type is unacceptable. This is an excellent way to identify, and bring attention to, programmer errors.
  3. Coerce the data to an acceptable type. This option can get complicated.

Coercing Types

JavaScript does a lot of implicit type coercion. For example, the expression 1 + '1' coerces the number 1 to the string '1', before performing a string concatenation resulting in '11'. This type of implicit, dynamic behavior can lead to subtle bugs. However, in many cases explicit coercion can be extremely useful. And, like all aspects JavaScript, there are many nuances that you must be aware of.

Numeric coercions are a great example of nuance. For example, bitwise operators convert their operators to 32-bit signed integers. The result of the operation is also a 32-bit signed integer. The exception is the logical right shift (>>>), which yields a 32-bit unsigned integer. Therefore, you can use value | 0 and value >>> 0 to convert value to a 32-bit signed or unsigned integer. Be aware that data loss is possible, as typical JavaScript numbers are 64-bit IEEE floating point numbers. The following example shows some of these subtleties in action.

> let value = Math.pow(2, 32) - 1; // largest 32-bit value
undefined  
> value;
4294967295                         // value as 64-bit IEEE  
> value | 0;
-1                                 // value as 32-bit signed integer
> value >>> 0;
4294967295                         // value as 32-bit unsigned integer  
> value = value + 1;               // value no longer fits in 32 bits
4294967296                         // value does fit in 64 bits  
> value | 0;                       // lower 32 bits are all zero
0  
> value >>> 0;
0  
> value = '1';                     // value is a numeric string
'1'  
> value === (value >>> 0)          // is value an unsigned integer?
false  
> +value === (value >>> 0)         // does value coerce to an unsigned int?
true  

Some other coercions which are useful and much less confusing are:

  • value + '' and String(value) to convert to a string.
  • !!value to convert to a Boolean. Falsy values become false. Truthy values become true.
  • +value to convert to a number. While not as readable as parseInt() and parseFloat(), it properly handles edge cases like '0xf'.

Conclusion

JavaScript is a dynamic language, where type checking is both tricky and necessary. By cautiously and thoroughly verifying and coercing data types, you can produce production ready code. I suggest browsing the Node core source code, as type checking and explicit coercion is extremely common. This is what allows Node to provide function overloading (see net.Server.prototype.listen() variants) and optional parameters (see the child_process functions).

One last note - if you find yourself doing complex validation, you should probably look into joi. Joi is the battle tested, production hardened module responsible for data validation in hapi, but it can be used in any Node application.