Caching is pervasive in modern computing. It is necessary because of the trade-off in storage technology price versus performance.
For example, the AMD64 instruction set used on 64-bit AMD and Intel processors has just sixteen 64-bit general-purpose registers. These registers contain a total of 1024 bits (or 128 bytes) of ultra-high-speed memory that update at the CPU’s speed. Gigabytes of such ultra-high-speed memory would cost millions of dollars.
Instead, there is a storage hierarchy. We can buy terabytes of slow disk storage for hundreds of dollars, gigabytes of fast RAM for hundreds of dollars, and CPUs have kilobytes of high-speed on-board cache memory, again for hundreds of dollars.
A cache is a copy of data held in high-speed or low-latency memory, to avoid the retrieval or recomputation of the same data stored in low-speed or high-latency devices.
Caching is possible because most data is rarely accessed. A tiny proportion of data is regularly accessed. Furthermore, once data has been accessed, it is often accessed again within a short period. For example, consider an email server. Emails received five years ago are seldom read, emails received today will be read once or twice, and the ‘Subject’, ‘Sender’ and ‘Date’ for the most recent twenty or so emails will be needed every time the user refreshes their screen.
Because caching is pervasive, the systems you build are already benefiting from caches. As a developer, it helps to understand what these do, how to best take advantage of them and when to add additional caching.
What to cache
Caching extends beyond the HTTP caches already provided by your browser. You might decide to cache the following:
-
JavaScript objects in memory (or retrieved from a database) when serving requests
-
Complete HTTP responses when handling a request
-
Media such as images, CSS and HTML files stored on disk
-
The results of slow function calls [1]
How to cache
Recall that layering is a design technique that hides implementation details of lower levels from higher levels.
Layering can also guide caching. There is an opportunity to introduce caching whenever the output of a lower-level layer is the same across repeated requests.
At its simplest, a cache is just a mapping from inputs to saved outputs. The following example code demonstrates how a slow layer, slow(…)
can be made faster by using a dictionary to cache results:
function slow(input) {
... // Complex calculations go in this file
return result;
}
let cache = {};
function faster(input) {
if (input in cache) {
// Use the cached value
return cache[input];
} else {
// Otherwise, call the lower layer
let result = slow(input);
// And save the result for future use
cache[input] = result;
return result;
}
}
Most of the time, development involves working with existing caches (which are also far more sophisticated than the example above), rather than implementing caches from scratch. Frequently encountered caches are listed below:
- In the rendering engine (client-side JavaScript)
-
Data and other responses can be stored in ordinary JavaScript variables or more permanently in the browser’s
window.localStorage
andwindow.sessionStorage
objects (Web Storage API). - By the browser engine
-
The web browser automatically performs caching of an HTTP response. The
Cache-Control
header in the HTTP response provides expiration and caching information to indicate an appropriate lifetime for the cached value. Other headers can control validation of the cache. - In the network
-
Rather than clients performing direct requests to a server, content distribution networks (CDN) provide caching services worldwide. End-users connect to a nearby CDN that attempts to cache as many responses as possible, only forwarding uncachable requests to your server.
Commercial CDNs include Cloudflare, AWS CloudFront, Google Cloud CDN and Azure Content Delivery Network.
A related idea is to deploy servers around the world, rather than in a single data center. For example, you might deploy the same code to servers on each continent (Australia, Europe, Africa, Asia, North America and South America) to ensure every user experiences low latencies.
- In the server
-
On a server that you manage, there is a range of options for caching:
-
Installing a caching reverse proxy on the server. The proxy handles incoming requests and uses a cache wherever possible, but forwards requests to your Express server when not possible (popular options include Nginx, Varnish and Squid)
-
Saving rendered pages, results or objects in the Node.js process, using JavaScript dictionaries
-
Saving rendered pages, results or objects in the Node.js process, using a specialized caching/memoization library (e.g., lru-cache, cacache, fast-memoize and memoizee)
-
Saving rendered pages, results or objects in an external in-memory caching database (e.g., Redis or Memcached)
-
- In the database
-
Databases automatically cache records in memory. In a large database, repeat queries for a record will typically return faster than the first query of that record.
In addition to the automatic caching included in a database, many databases (including PostgreSQL and MongoDB) include support for creating materialized views. A materialized view is a table that stores the results of a query: it is a cache that stores the precomputed results of a complex query.
Cache invalidation
Cache invalidation is the problem of deciding when and how to delete values stored in a cache.
There is a well-known saying in computer science, attributed to Phil Karlton:
There are only two hard things in Computer Science: cache invalidation and naming things.
Naming is difficult because a good name needs to be concise, clear, easily understood, yet also unique and timeless. It seems deceptively easy, yet causes a great deal of trouble when done poorly.
Cache invalidation is also very difficult.
Consider, the home page of a university’s website. It is a good candidate for caching because it is heavily used and only updates every few days (with different news stories). It might be reasonable to use Cache-Control
headers that invalidate the cache once per day. However, should a sudden emergency (fire, bomb, terrorism) require an emergency change to the home page, caching may mean that users don’t see the changes for hours.
Some strategies to improve the responsiveness of caching, without altogether eliminating caching, include the following:
- Reducing TTL
-
Lower settings for the expiration or time-to-live (TTL) will ensure a more current cache. If a cached page expires in ten minutes, then updates will be seen by end-users within ten minutes (or five minutes, on average). The design trade-off in reducing TTLs is lower efficiency and performance because the cache can service fewer requests.
- Create immutable resources
-
Rather than attempting to invalidate cached values, each version may have a separate name. For example, http://www.example.com/api/press_releases_today must be invalidated daily. In contrast, http://www.example.com/api/press_releases/2020_1_1 never needs to be updated, because the news from January 1 should never change (it is immutable).
- Fragment resources and use different TTLs
-
The expiration for cached data can be set on a per-resource basis (as opposed to a single setting across the entire cache). Long-lived content can have a prolonged expiration. Short-lived content can have a short expiration. For files that contain both long-lived and short-lived content, it may make sense to break the file up into parts that can be cached separately.
In caches held on servers under your control, there is also the possibility of directly invalidating cached data. You could use custom logic to delete cache entries (so that they need to be fetched again) or preemptively override data in the cache.