🕸️Understanding practical use cases
Applications built today are scalable by default. Every developer must understand the importance of scalability in software.
From a general user perspective, scalability matters more. Just that they don’t know how to define it in technical terms. Usually, you hear them say,
The screen has been showing a loading symbol for a long time!!!
Within a blink of an eye, the wicket was gone and we lost the match. Looks like the match was frozen for a moment and then it was already over…
I wanted to buy the new phone released in last month's flash sale. But I had to wait a long time only to know that my app had crashed 😂
In the above 3 general user sentiments, as engineers, we were able to observe the level of expectations from the customer. They want your app to be working whenever they open it. Neat and Simple.
Starting from handling huge user traffic, serving streaming content without interruption, and giving the users a seamless experience to purchase from your app during peak sales periods.
Everything points to a very important concept called Scalability.
The number of users is increasing day by day in the online world. We as developers should accept the ground reality and try to design the system in a highly available and scalable manner.
🙋🏽♂️ What is Scalability?
Scalability in simple words can be defined as,
Here part one of the definition means, the system that you are building should be able to handle the dynamic workload. The system should not get stuck when the workload increases.
The second part of the definition specifies about the performance of the system. This means, that when the workload becomes heavy, the system should perform in the same efficient manner similar to the situation when the load was light.
Software Scalability is needed in most of the applications. Some of the common examples you can see are mentioned below;
Increase in the number of users for your web app (in general terms)
Increase in the number of API requests when a flash sale is happening
Increase in the number of financial transactions per second
Increase in the amount of data transferred
In all the above-mentioned situations, your system should be able to handle the increase in workload without affecting the performance of the system.
🎯How to achieve Scalability?
Let’s imagine we are building an app that sells Ice Cream. I mean why not?
An exclusive app to sell just Ice Cream!!!
Let’s try to understand the requirements first and try to follow the scalable design notion.
Scalability in software can be achieved in certain ways. Some of the main items are listed as follows,
🏗️ Distributed Architecture
Logically separate the responsibilities and make the system distributed which can cooperate to perform the duties. Develop separate microservices for user management, order processing, and payment handling. Each service can scale independently based on demand. By doing this, we can scale the components independently based on our needs.
Let’s say the number of users signing up is increasing at the moment, we have to scale only the user management service to handle the traffic. If a flash sale is happening the order management system has to be scaled on demand. This dynamic handling of workload is achievable by setting up scaling policies in your organization.
Apart from this the distributed architecture also has a lot of advantages such as failure isolation, easier development among huge teams of developers, and so on. Distributing the system into multiple services (microservices) also introduces lots of complexity such as sharding, partitioning, data replication, handling message queues, and failure management which are the usual pitfalls that software architects have to take into consideration while designing large-scale distributed systems.
⚖️ Load balancing
Your system should be able to balance the workload (set of tasks) among the available resources in an efficient manner. Distribute the traffic evenly among the server instances. Make sure there is no bottleneck in your system.
Implementing a load balancer that evenly distributes incoming requests to multiple server instances that are handling different microservices in your distributed system.
🔝🔛 Scaling Methods
Satisfy user demand at any given point in time. This can be done by vertical or horizontal scaling. Vertical scaling is growing the size of the server in the system. Horizontal scaling means adding more servers to the system. There is always a hardware limit on how big the server can grow.
Based on a case-to-case basis, apps decide on the type of scalable approach to follow. Let’s say in for our ice cream app we have users around the globe (since we don’t want to count the minions 😂), so we go ahead with horizontal scaling.
🤖 Autoscaling
Configure auto-scaling for server instances or containers based on CPU and memory usage metrics. Ensure that resources scale up during peak ordering times. Most of the cloud providers today offer autoscaling services which acts as fully managed service. The configuration options range from system-level metrics such as CPU and memory usage to business metrics such as click rates, time spent on checkout, and so on.
Scaling your web server instances is one side of the coin. Scaling DB instances is the other. Database scaling can be achieved by creating read replicas and write instances. Indexing the required data and optimizing the db queries also helps in improving the performance of the system on a larger scale.
🌵 Caching
Caching is one of the important parts of the system that can reduce the load on the server for frequently accessed data. Implement Redis caching for frequently accessed data like user sessions and ice cream menu items.
🏋🏽♂️ Load Testing
In larger organizations usually, the application is put into load testing. They conduct periodic load testing trial runs by simulating the high traffic that they expect to receive during the peak hours. All the metrics should be captured and monitored properly.
Based upon these tests, we can identify the bottlenecks in your systems like slow API endpoints, inefficient db queries, cache misses, and so on. This will help us to accommodate for future changes.
🎰 Continuous Improvement
Finally, there should be continuous improvement. Monitor the app closely with the logging and metrics tools and adjust the requirements based on the need. Regularly review system performance and scalability requirements as and when the user base expands for your application.
Using these monitoring tools will help you track server response times, API time taken, server resource consumptions, db performance, and error rates hit in your system. Apart from this keep the technology stack up to date and consider incorporating new practices and effective tools that can improve the scalability of your application.
🍦A flow through the ice cream ordering app
Let’s try to keep the design as simple as possible and try to imagine a design flow for the ice cream ordering application.
User Interface: The place where your customers can place orders. It can be a website or mobile app.
Backend Server: Handles order processing, user authentication, and communication with the database.
Database: Stores user profiles, orders, inventory, and so on.
Message Queues: Handles async heavy tasks in both pre and post-ordering processes.
External Payment Gateway: Integrates with 3rd party payment services such as Stripe, PayPal, or some payment gateway.
Now that we have a basic understanding of our components in the system. Let’s try to understand the step-by-step flow of this app.
The Ice Cream Ordering Client Application is at the top. This is the UI part which is available to the customer where they can place their orders.
A Content Delivery Network (CDN) serves static assets like images of the ice cream available on our menu.
The next part is the Load Balancer. It distributes incoming traffic among the multiple available application servers. It also helps us to route the traffic and direct the requests to the respective service in your system.
Let’s say if the request comes in to create a new order, it is directed to the order processing microservice.
If another request comes in for payment, it is handed over to the payment service.
The most important thing to understand at this time is scaling. Here each service can be independently scaled depending on the load at the current moment.
The application servers interact with a distributed database cluster which includes a distributed database like Cassandra for storing the order data and a simple relational database with read replicas for storing and retrieving the user profiles.
The message queues available in the system are kept for handling the asynchronous tasks both pre and post-ordering time.
These are designed to handle all the asynchronous tasks in your system such as updating inventory, initiating the delivery process, sending notification emails, and so on.
Apart from this if there are any heavy time-consuming tasks to be performed in your system, the event-based message queues are the best way to handle these scenarios.
There are also cache servers built in the system to help reduce the load on the application servers. Frequently accessed data are returned directly from the cache servers.
Please note that this is a high-level representation, and in a real-world scenario, there may be additional components, security systems, logging and monitoring tools, and more detailed interactions among different components of your system.
🤔What happens if there is no scalability?
If a system lacks scalability, it can lead to several significant issues and limitations, including:
Performance Degradation: Without scalability, as the user load or data volume increases, the system's performance may degrade significantly. Users may experience slow response times, timeouts, and a poor overall user experience.
Limited Growth Potential: Without scalability, the system's capacity is limited by its initial design and infrastructure. It becomes challenging to accommodate a growing user base, add new features, or adapt to changing business needs.
Resource Constraints: A non-scalable system may struggle to handle spikes in traffic or increased demand. This can lead to resource constraints, such as overutilized CPU and memory, causing system crashes or downtime.
Competitiveness: In a competitive environment, businesses that cannot scale their systems may struggle to meet customer expectations, innovate quickly, or expand their market share, ultimately putting them at a disadvantage.
In summary, a lack of scalability can lead to poor performance, resource limitations, affecting business growth, increased operating costs, and eventually running out of business.