Efficient Scaling

Abhilash Ranjan
5 min readSep 7, 2020

Scaling is a common and most fascinating problem of Web application. Most successful application deal with scaling- When there is too much load in your web application and your application bogged up not responding, irrespective of used your load balancer, monitoring tools which will event you about the load but does not solve the problem. My dear friend it time to look at the proper scaling options.

What Is Application Scalability?

Page load slowly, your network connection time out frequently your client browser hang or take very long to respond, server creeps for request, response handling it is few short of the problem we face in web app loading. So Application scalability is to efficiently handle the more and more request per minute (RPM). It cannot be achieved by just tweaking the vertical and horizontal scaling of the application server or cloud.

In case of a problem if you increase the memory or CPU of a machine you are going to get the higher throughput not the scaling of the application. To understand better about the scalability one have to understand their application first then look for the tools.

Why it is required Application scalability?

By now we have understood what is scalability? we can think of an application why it is really important for thinking about the scalability. We understand the problem but how it is going to impact the developer. There are few short of the problem we can list down here

  • Adding new features takes longer to develop
  • Code can be harder to test
  • Finding and fixing bugs is more frustrating
  • Producing the production environment issue is more difficult.

All the above problem makes your life complex, and so fearsome that your own code you afraid to touch. So basic principle is that don’t introduce complexity or over-engineering of the problem.

Efficient Scaling

At a very high level we have to remove the complexity of the code, behavior and divide some request response of the application.

Clean Code

Writing the right, clean and proper use of design pattern not for the sake of learning but for the usability purpose. A very common example is strategy pattern vs factory pattern. Or fasade vs factory pattern.

Microservice 12 factor

Leverage microservice 12 factor pattern in which you can follow most of the pattern defined for microservices and the cloud migration.


As if now caching is the silver bullet most of us think for scalability and higher throughput, in my view it not we not all the frequent data you can cash but yes some keywords we can cache and if you are using the AWS then try using CloudFront or Elastic cache.


Choosing a proper DB engine and designing robust possible schema you’re able to handle increasing transactions per second efficiently.

Database Index

If your table has tens of thousands of rows, this could have a noticeable amount of time off of any queries that use that column.

As a very simple example, if your application has profile pages that look up a user by their handle or username, an un-indexed query would examine every single row in the user’s table, looking for the ones where the “handle” column matched the handle in the URL.

By simply adding an index to that table for the “handle” column, the database could pull out that row immediately without requiring a full table scan.

Cache DB Query

There are usually a few common queries that make up the majority of the load on your database.

Most databases support query logging, and there are many tools that will ingest those logs and run some analysis to tell you what queries are run most frequently, and what queries tend to take the longest to complete.

Simply cache the responses to frequent or slow queries so they live in memory on the web server and don’t require a round-trip over the network or any extra load on the database.

In this database case you can update the background so that the application will have less stale data.

A computation or long-running query offline

If you have some long-running queries or complex business logic that takes several seconds to run, you probably shouldn’t be running it in the request-response cycle during a page load.

Instead, make it “offline” and have a pool of workers that can chug away at it and put the results in a database or in-memory cache.

Then when the page loads, your web server can simply and quickly pull the precomputed data out of the cache and show it to the user.

A drawback here is that the data you’re showing the user is no longer “real time,” but having data that’s a few minutes old is often good enough for many use-cases.

HTTP Caching

Just like you want to cache database queries to avoid regenerating answers you already know, you should avoid having the browser ask for content that it has already downloaded.

You should use HTTP caching headers for all of your static files — CSS, javascript and images.

Load Balancing

Try to move in Cloud like AWS which provides the multi-host server and load balancing feature, route 53, ELB which provide better management of app load balance in cloud infrastructure.


Even if our codebase is clean and perfectly maintainable, we need some tools to monitor it and identify the problems as soon as possible. Try using tools like Splunk log aggregator, In AWS Cloud Watch monitor which will give you eventing for application heartbeat and latency.

To conclude the topic I would say don't introduce unwanted design and module just for the sake of complex learning. Try enhancing the simple design with proper scaling and throughput.