Node.js Profiling Using the V8 Tick Profiler

A key aspect of understanding a program's runtime behavior is identifying where it is spending its time. Generally, an application will either be waiting on I/O or performing calculations using the CPU. Node.js is excellent at handling large amounts of I/O, but must be cautious about CPU utilization because of its single threaded nature. Luckily, developers can identify, and even visualize CPU hotspots in their code using a profiler.

A sampling profiler, like the one that ships with Node.js, periodically inspects an application as it runs. Each sample includes the program's call stack, with the item at the top of the stack representing the function that was executing when the sample was collected. Once a profile is collected, it can be analyzed in a variety of ways to determine where any CPU hotspots might exist.

Setup

The examples in this post will be based on the simple HTTP server shown in Listing 1, unless otherwise stated. This server listens on port 3000, and responds to all requests with the string Hello World.

'use strict';  
const http = require('http');

http.createServer(function onRequest (request, response) {  
  response.writeHead(200, { 'Content-Type': 'text/plain' });
  response.end('Hello World\n');
}).listen(3000);

Listing 1. A simple Node.js HTTP server.

To create a meaningful CPU profile, the server from the previous listing must be placed under load. To create artificial HTTP loads, the examples in this post use the autocannon module. Install autocannon globally using the command npm i -g autocannon. autocannon is highly configurable, but for the purposes of this post, the command shown in Listing 2 should be adequate for creating a load on the server.

$ autocannon http://localhost:3000

Listing 2. Providing a load on the HTTP server using autocannon.

The V8 Tick Profiler

Node exposes V8's built-in CPU tick profiler via the --prof command line flag. Assuming that the server from Listing 1 is saved as server.js, it can be profiled using the command shown in Listing 3. Enabling the profiler will not cause anything to be displayed. However, the profiler's output will be written to a file in the current directory. The file's name will be isolate-0xnnnnnnnnn-v8.log, where nnnnnnnnn will be a hexadecimal string. Each profiler run will create a new file with a different hex string.

node --prof server.js  

Listing 3. Enabling V8 tick profiling of the example server.

Start the server with profiling enabled, and then apply load using autocannon. By default autocannon will run for ten seconds. Once autocannon has finished running, kill the server using Control+C.

At this point, the profiler's output file should be written. The file contains a lot of data and is not meant to be read by humans. Instead, the raw data can be converted to a more human readable format using Node's --prof-process flag, as shown in Listing 4 (substituting your profiler log file's name for isolate-0xnnnnnnnnn-v8.log). The formatted output is written to a new file, profile.txt. This is done because processing the profiling output can take some time, and still result in a large amount of output.

node --prof-process isolate-0xnnnnnnnnn-v8.log > profile.txt  

Listing 4. Enabling V8 tick profiling of the example server.

Interpreting the Profiler Output

Open profile.txt using a text editor. The resulting output should resemble Listing 5. Your output will differ, at least slightly. Listing 5 also indicates where output has been truncated to save space.

Listing 5 begins with a high-level summary line, followed by several sections denoted by square braces (for example, [Shared libraries]:). The summary line displays the name of the original profiler output file, as well as a breakdown of the ticks that were profiled.

Although it is the fourth section listed, the Summary section is a good place to begin breaking down the formatted profiler output. The Summary section breaks down CPU ticks by JavaScript, C++, GC (garbage collection), and operating system specific shared libraries. Occasionally, the profiler cannot accurately determine what is executing and marks these ticks as unaccounted. In this example, 4,935 ticks, or 52.6% of the ticks, occurred in shared libraries. The summary also shows that JavaScript code only consumed 12.1% of the total execution time. The nonlib column indicates the percentage of total time spent when not taking the shared libraries into account. In this example, 25.4% of the non-library time was spent in JavaScript code. This is roughly double the total time (12.1%), which makes sense because approximately half of the total time was spent in library code.

The Shared libraries, JavaScript, and C++ sections provide a more fine-grained breakdown of the fields in the Summary section. The ticks, total, and nonlib fields hold the same meaning as they do in the Summary section. In the shared libraries section, the name field denotes the path of the library. The name field in the C++ section specifies the name of the function. In this example, most of the top C++ functions are located in the V8 internals, as indicated by the v8::internal namespace.

Statistical profiling result from isolate-0x103800000-v8.log, (9385 ticks, 1162 unaccounted, 0 excluded).

 [Shared libraries]:
   ticks  total  nonlib   name
   4769   50.8%          /usr/lib/system/libsystem_kernel.dylib
     63    0.7%          /usr/lib/system/libsystem_malloc.dylib
... output truncated ...

 [JavaScript]:
   ticks  total  nonlib   name
    109    1.2%    2.4%  LazyCompile: *EventEmitter events.js:11:22
     80    0.9%    1.8%  LazyCompile: *ReadableState _stream_readable.js:35:23
     75    0.8%    1.7%  LazyCompile: *IncomingMessage _http_incoming.js:20:25
... output truncated ...

 [C++]:
   ticks  total  nonlib   name
     84    0.9%    1.9%  v8::internal::(anonymous namespace)::Invoke(v8::internal::Isolate*, bool, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, v8::internal::Handle<v8::internal::Object>)
     79    0.8%    1.8%  v8::internal::JSObject::MigrateToMap(v8::internal::Handle<v8::internal::JSObject>, v8::internal::Handle<v8::internal::Map>, int)
     76    0.8%    1.7%  v8::internal::LookupIterator::State v8::internal::LookupIterator::LookupInRegularHolder<false>(v8::internal::Map*, v8::internal::JSReceiver*)
     50    0.5%    1.1%  _http_parser_execute
... output truncated ...

 [Summary]:
   ticks  total  nonlib   name
   1131   12.1%   25.4%  JavaScript
   2157   23.0%   48.5%  C++
      6    0.1%    0.1%  GC
   4935   52.6%          Shared libraries
   1162   12.4%          Unaccounted

 [C++ entry points]:
   ticks    cpp   total   name
    218   25.9%    2.3%  v8::internal::Runtime_InstanceOf(int, v8::internal::Object**, v8::internal::Isolate*)
    103   12.2%    1.1%  v8::internal::Runtime_KeyedGetProperty(int, v8::internal::Object**, v8::internal::Isolate*)
     78    9.3%    0.8%  v8::internal::Runtime_SetProperty(int, v8::internal::Object**, v8::internal::Isolate*)
     72    8.5%    0.8%  v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*)
     63    7.5%    0.7%  v8::internal::Runtime_GetProperty(int, v8::internal::Object**, v8::internal::Isolate*)

 [Bottom up (heavy) profile]:
  Note: percentage shows a share of a particular caller in the total
  amount of its parent calls.
  Callers occupying less than 2.0% are not shown.

   ticks parent  name
   4769   50.8%  /usr/lib/system/libsystem_kernel.dylib

   1162   12.4%  UNKNOWN
    224   19.3%    LazyCompile: *Socket._writeGeneric net.js:665:42
    224  100.0%      LazyCompile: *clearBuffer _stream_writable.js:403:21
    224  100.0%        LazyCompile: *end _http_outgoing.js:550:45
    224  100.0%          LazyCompile: *onRequest /private/tmp/server.js:4:38
    224  100.0%            LazyCompile: *emit events.js:136:44

Listing 5. Example processed profiler output.

The name field in the JavaScript section specifies the function name, as well as its location in the source code. An example is the EventEmitter constructor, which is displayed as EventEmitter events.js:11:22. In this example, the EventEmitter constructor entry is prefixed with LazyCompile: *. LazyCompile means that the function was lazily compiled at runtime. The asterisk indicates that the function was optimized. Unoptimized LazyCompile functions are marked with a tilde (~). A non-exhaustive list of function types in the JavaScript section is shown below.

  • LazyCompile - A function that is lazily compiled during execution.
  • RegExp - A function that is part of the regular expression engine.
  • Builtin - A JavaScript function that is part of the runtime.
  • Stub - A C function that is part of the JavaScript runtime.

The C++ Entry Points Section

The C++ entry points section describes the places where time is spent transitioning from JavaScript code to C++ code. Crossing the C++/JavaScript boundary can become expensive quickly, so it's helpful to understand where any potential bottlenecks exist. For example, in Listing 5, v8::internal::Runtime_InstanceOf() represents 218 ticks, or 2.3% of the total ticks. As the name implies, this is an internal V8 C++ function related to the JavaScript instanceof operator. The remainder of the C++ entry points in Listing 5 also correspond to V8 runtime internals.

The Bottom Up (Heavy) Profile Section

The Bottom up (heavy) profile section is useful for understanding which code paths, not individual functions, execute during the largest number of ticks. The example in Listing 5 only contains two code paths. The first path consists of only libsystem_kernel.dylib, and occurs in 4769 ticks. This is a shared library on macOS, and represents system calls and idle time.

The second, more interesting, stack starts when an event is emitted. In Listing 5, this is the final line of output, containing LazyCompile: *emit events.js:136:44. The parent column for this function is 100%, indicating that every time emit() runs, it calls the function on the line above it. The line above emit() is the request handler, onRequest() defined in server.js of Listing 1. It makes sense that emit() calls onRequest() 100% of the time in this code path, because 100% of the HTTP connections trigger a request event. Similarly, onRequest() calls response.end() 100% of the time.

This code path continues unconditionally from response.end() to the HTTP response stream's clearBuffer() method to the TCP socket's _writeGeneric() method. Then, 19.3% of the time, _writeGeneric() calls a function that the profiler cannot identify. This mystery function is labeled as UNKNOWN. At 1,162 tick occurrences, this function matches up perfectly to the number of unaccounted ticks. The C++ binding layer can sometimes be the source of unaccounted ticks. Because _writeGeneric() calls into the binding layer, this is the likely culprit.

Conclusion

This post has introduced command line CPU profiling using nothing but out of the box Node.js. As of Node 6, the tick profiler is not the only CPU profiler that ships with Node. Be on the lookout for a future post on CPU profiling using the V8 Inspector.