Scalability refers to the ability of a system to handle increased demand by adding additional resources.
Ideally, I should be able to increase a website’s capacity by increasing the number of servers. If one server can handle 10,000 users then ideally I should be able to handle 20,000 users with two servers or 100,000 users with ten servers.
Unfortunately, applications do not always handle increased demand so easily.
A good example is Twitter, which struggled with scaling in the early years of its existence. Twitter’s difficulty is many users can see messages. As the service grew in popularity, each new user posts new messages and also increases potential message recipients. A doubling of users potentially results in a quadrupling of the required system resources. [1] By this measure, growth from 10,000 users to 100,000 users (a factor of ten increase) could require one hundred times the number of servers!
An intuitive sense of the challenge of scaling comes from thinking about team-work in a kitchen: it may take 60 minutes for one person to make a cake or 50 minutes for two people. However, if you tried to get 100 people to make that one cake, it would likely take hours. Most of the time will be spent giving directions, allocating tasks and coordinating the group (let alone the problem of squeezing 100 people into a kitchen) rather than preparing the cake.
There can be other difficulties in addressing increased demand: more complex workloads do not always divide evenly across multiple servers. For example, some computational problems (such as simulations) require sequential processing on a single computer.
Furthermore, even if scalable algorithms are available, it does not mean that the developers have implemented them.
Measuring scalability
Mark Hill [2] proposed a mathematical formalization of scalability in terms of processors, problem size and computation time. These ideas can be generalized for modern web development. In particular, scalability is a relationship between:
-
Resources (e.g., servers, RAM, disk space, staff, money spent)
-
Demand (e.g., users, requests per second, orders processed)
-
Characteristics of performance (e.g., response time, throughput, latency, availability)
Suppose we have resources R and demand D, and we denote performance characteristics as P(R, D). Scalability refers to the relationship between performance characteristics when resources are increased for a given demand P(k × R, D) versus the theoretical speedup that might be achieved if doubling resources resulted in double the performance k × P(R, D).
scalability = ( k × P(R, D) ) / P(k × R, D)
Informally, this is the question: does doubling the resources result in an equivalent doubling in the ability to handle potential demand?
If a doubling of the resources results in either an ability to double the demand or double the performance, the system is scalable (scalability = 1).
When a system can scale perfectly, this is referred to as linear scaling (because the performance/demand increases in a linear relationship with resources). Linear scaling is sometimes also informally known as being ‘embarrassingly parallel’: i.e., the parallel computation is almost ‘embarrassingly easy’.
If a doubling of the resources results in no improvement to the system’s performance or ability to handle demand, then it is not scalable.
In the more common middle ground, a doubling of resources is likely to increase performance or potential demand by a smaller amount. Whether this is a 10% increase, a 50% increase or a 90% increase gives a sense of the system’s scalability.
Vertical vs horizontal scaling
Scalability does not always mean adding additional servers (horizontal scaling or scale-out). Resources can be added directly to individual servers (vertical scaling or scale-up). For example, doubling the amount of RAM or disk space on a server can often increase performance.
Like all design decisions, there will be a trade-off between these two approaches. Adding additional servers (horizontal scaling) may increase complexity and communication overheads. Upgrading an existing server may be cheaper or easier in the short-term but it will perpetuate a single point of failure.
Building for scalability
It is not easy to build systems that scale. It requires careful planning and design. This effort may slow down the implementation of end-user functionality.
However, the effort involved in scaling may never be needed: the system might never become popular enough to require more than a single server.
AKF partners [3] have a D-I-D principle that provides a useful guideline for thinking about scaling:
- Design
-
Design your system for 20× the demand you currently need to serve
- Implement
-
Implement your system for 3× the demand you currently need to serve
- Deploy
-
Deploy your system on equipment that can handle 1.5× the current demand.
This guideline helps offer a balance between delivering important functionality and investing in scalability for future growth. On one hand, it is important to consider scalability during design and implementation. On the other hand, those expectations should not be excessive.
If you’re expecting to have 100 users in the short term, then design for 2000 users. Do not go overboard: on a small site there is no need to copy the architecture that Facebook, Google and Twitter use to support billions of users globally.
Scaling a system
In the remainder of this chapter, I will discuss how to achieve scalability and high-performance in distributed systems.
Scalability is an iterative process alternating between measurement and implementation.
Benchmarking is used to identify performance issues and ensure that any changes improve scalability.
Implementation techniques to improve scalability fall into three main categories:
-
Cache: reusing the results of earlier processing
-
Replicate: duplicating data and services
-
Partition: dividing or specializing data and services