Benchmarking
The best place to start with improving performance is to collect data. Knowing a system’s current performance makes it possible to tell whether a new change is an improvement.
Data can guide efforts. If performance is satisfactory, there is no need to improve the scalability of a system. If performance is unsatisfactory, then data can guide efforts towards the most critical parts of the design, where changes will have the most significant impact.
There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, [and] will be wise to look carefully at the critical code; but only after that code has been identified.
‘Structured programming with go to statements’, Computing Surveys, vol. 6, iss. 4.
Gathering data
The fastest way to begin benchmarking is to use the developer console in your preferred web browser (press Control+Shift+I or Command+Option+I). The networking tab will show precise timing information about each request. The performance tab will also identify slow client-side code. In Google Chrome, the Lighthouse tab provides in-depth performance, accessibility and search engine optimization (SEO) reports.
Going beyond the developer console, benchmarking tools such as ab (Apache Bench), loadtest, artillery.io, Siege and httperf can produce performance reports about your server and API.
The following example shows statistics for 4000 requests to /api/slowrequest
(using four simultaneous connections):
$ npm install loadtest
...
$ npx loadtest -c 4 -n 4000 http://localhost:3000/api/slowrequest
INFO Requests: 0 (0%), requests per second: 0, mean latency: 0 ms
INFO Requests: 696 (17%), requests per second: 139, mean latency: 28.6 ms
INFO Requests: 1399 (35%), requests per second: 141, mean latency: 28.4 ms
INFO Requests: 2118 (53%), requests per second: 144, mean latency: 27.8 ms
INFO Requests: 2841 (71%), requests per second: 145, mean latency: 27.6 ms
INFO Requests: 3554 (89%), requests per second: 143, mean latency: 28 ms
INFO
INFO Target URL: http://localhost:3000/api/slowrequest
INFO Max requests: 4000
INFO Concurrency level: 4
INFO Agent: none
INFO
INFO Completed requests: 4000
INFO Total errors: 0
INFO Total time: 28.284452056 s
INFO Requests per second: 141
INFO Mean latency: 28.2 ms
INFO
INFO Percentage of the requests served within a certain time
INFO 50% 28 ms
INFO 90% 30 ms
INFO 95% 31 ms
INFO 99% 33 ms
INFO 100% 69 ms (longest request)
$
This report suggests there a potential problem worthy of further investigation. The server can only handle 141 requests per second. This throughput may be too low, for example, on an API that needs to handle 1000 requests per second.
Warning
|
You can use benchmarking tools on your computer but be careful about interpreting the results. The tools produce artificial and predictable traffic. Benchmarking can predict performance and identify issues before they arise in production. However, the only way to fully understand a production website’s performance is to collect statistics and data from the running system. |
Warning
|
Running the server and the benchmarking software on the same computer will not include networking latency and overheads. It will be less realistic because the same CPU performs the request and response. |
The JavaScript Performance API also provides high-resolution timers suitable for small benchmarks without any external tools.
The following code demonstrates how to obtain a high-resolution timer in Node.js:
// Get the JavaScript Performance API
let { performance } = require('perf_hooks');
// Get the start time
let start = performance.now();
... // perform a slow and complex task
// Print the number of milliseconds elapsed
let elapsed = performance.now() - start;
console.log(elapsed);
What contributes to low performance?
The following table, by Brendan Greg [1] lists the time to complete operations on a computer. It also includes a scaled version of the same numbers to provide an intuition of the differences involved.
Event |
Latency |
Scaled |
---|---|---|
1 CPU cycle |
0.3 ns |
1s |
Level 1 cache access |
0.9 ns |
3s |
Level 2 cache access |
2.8 ns |
9s |
Level 3 cache access |
12.9 ns |
43s |
Main memory access (DRAM, from CPU) |
120 ns |
6 min |
Solid-state disk I/O (flash memory) |
50–150 μs |
2–6 days |
Rotational disk I/O |
1–10 ms |
1–12 months |
Internet: San Francisco to New York |
40 ms |
4 years |
Internet: San Francisco to United Kingdom |
81 ms |
8 years |
Internet: San Francisco to Australia |
183 ms |
19 years |
TCP packet retransmit |
1–3 s |
105–317 years |
OS virtualization system reboot |
4 s |
423 years |
SCSI command time-out |
30 s |
3 millennia |
Hardware (HW) virtualization system reboot |
40 s |
4 millennia |
Physical system reboot |
5 min |
32 millennia |
These statistics are continually improving with technology advances, so they are only approximations. However, as a general guide, these numbers help programmers design better systems.
This table has important implications. If performance is critical, it is easy to justify enormous amounts of computation to avoid reading from a disk drive. [2]