Scalable & Highly Available Web & Mobile App Architecture

Published in

The Startup

6 min readNov 6, 2020

This is a quick overview of how to architect a web+mobile application in a way it is scalable and highly available. We will use AWS cloud technologies to implement the architecutre to achieve these targets.

What is High Availability?

A highly available application is one that can function properly when one or more of its components fail. It does not have a single point of failure, that is, when one component fails, the application can still deliver correct results.

What is scalability?

Scalability is the ability of an application to fulfill its functions properly when its execution rate becomes higher. For example, in the case of a HTTP API, it concretely means the ability of the API to respond correctly and in a reasonable time to all requests when the number of requests per second goes higher.

Typical Application Architecture

Modern applications typically consist of one or more frontend clients used by customers (one or more native mobile apps + a javascript app), talking to one back-end through HTTP API. The back-end stores data in a database and responds to requests coming from front-end clients.

Design for High Availability

Single points of failure

The first step is to identify where a failure can compromise the availability of the application. All the following are single points of failure :

The locations where the front-end clients are stored for distribution. If a location becomes unavailable, one of the clients cannot be accessed and used by customers, and thus cannot use the application.
The HTTP API back-end component. If this component fails, requests sent by front-end clients will not be fullfilled.
The database. If the database fails, the back-end will not be able to extract stored data or write data in response to API requests sent by clients.

HA for front-end clients distribution locations

The locations where the front-end clients are stored for distribution depend of the target platform. In case of Android and IOS clients, these locations are typically Google Play Store and Apple App Store. Or mobile app stores in general. High availability of these locations are handled by Google, Apple and app stores owners and we can’t do much about it.
For web client, we can store it in AWS S3 and distribute it with AWS Cloudfront, which makes it not only highly available but also scalable as we will see later. Using this setup is so common today. Here is a step by step tutorial about how to achieve that.

HA for Back-end

The back-end API component needs to be up and running to respond to any request sent by front-end clients. The basic setup consists in running one instance of Nodejs express server that fullfills HTTP requests. But if that instance goes down for whatever reason, the application is not available anymore. One approach is to launch multiple EC2 instances hosting your servers on multiple Availability Zones / Regions. Then use Amazon Elastic Load Balancer to distribute the incoming requests to the healthy instances. Amazon ELB does the health check automatically so that if it finds that an instance is not responding, it does not forward future requests to it. Using Amazon ELB has another advantage, it can do the SSL/HTTPS connection management for us so that our servers receive plain HTTP traffic. This is a big advantage, given the high cost of SSL connection management computing. And since usage of SSL is increasingly being enforced by web browsers and platforms, this comes really handy.

HA for Database

The typical way to ensure high availability of a database is to have replicas in different availability zones that are the mirror of the master database. When the master becomes unavailable, one of the replicas takes the role of the master. Replication can be done either the old way, setting up multiple EC2 instances each hosting a database replica, and you manage the replication and failover by yourself. Or you can use Amazon RDS which manages the database server for you and takes care of maintenance, upgrades, replications and failover. Note that Amazon RDS is for relational database servers. There are also offerings for NoSql databases.

Design for Scalability

Now we know how to make our application highly available. But what about scalability? How to make sure our application can cope with traffic peaks and still functions properly under heavy load?

Front-end clients distribution

For front-end clients distribution, mobile app stores and Amazon cloudfront are designed to be highly scalable so no need to worry about that point.

Backend API

For the back-end part, Amazon EC2 Auto Scaling can be leveraged to automatically scale your Nodejs servers when required. Amazon Elastic Load Balancer works well with EC2 Auto Scaling. Here is a tutorial about how to achieve it.

Another way to achieve scalability is to use Amazon API Gateway in combination with AWS Lambda. The first lets you define endpoints for your API. The second lets you execute functions without managing any server. This is called Serverless Computing. Express application servers can easily be updated to run as lambda function using serverless-http npm module. It will be triggered when front-end clients fire HTTPS requests to your defined APIs. Amazon API Gateway is highly available and scalable and you can use your own domains and subdomains to trigger it. Amazon API Gateway + AWS Lambda can be used as a replacement for Amazon Elastic Load Balancer + Amazon EC2 Auto Scaling + EC2 with less administration overhead. It should also cost less. It costs nothing when you have no or low traffic.

Database

Coming to database, it is not as easy to scale as computing since most databases can support a limited number of open connections, depending on the database server and the underlying machine available memory.

The first step scaling database layer is to use some pooling mechanism that can recycle connections and manage them in an efficient way. Amazon RDS Proxy achieves this pooling mechanism for serverless applications that use AWS lambda. But even if it improves and optimizes connection management to your RDS instance, the proxy is not sufficient in case of heavy load. Once the pool is saturated due to high number of concurrent requests, the remaining requests will be delayed and will probably time out.

The second step, is to use a memory cache like Redis or the equivalent AWS offer called Amazon ElasticCache. Memory caches are incredibly fast and have a very low latency. They also have much lighter connection management mechanisms and support a much higher number of simultanous connections. So you will need to implement your data access methods in your application in a way that they look for data in the cache first, and only if it is not available or is outdated, retrieve it from database. Obviously, this is rather applicable to the read operations. Write operations should be done on the database for consistency. One way to scale write operations is to handle them asynchronically. An example implementation would be to send write commands to an Amazon SQS queue and have them executed by another lambda function. This way, write operations and database connections are made in a predictable manner.

Please note as well that to have a highly available memory cache setup you need either use multiple instances of Redis with sentinels or use Amazon ElasticCache with replicas.

Client Side Caching

Implementing caching on front-end client side can be benefical for two aspects :

It allows to reduce the load on the back-end by serving data that does not change frequently from local cache.
It provides a better overall user experience, allowing users to still use some parts of your application when your back-end is not reachable. This typically happens when a mobile user has no internet connection. Users appreciate when they can still use applications offline.

A simple cache can be implemented as a key,value,ttl array with 3 simple methods :

set(key,value[,ttl]) : store or update an object in the cache indexed by key with optional time to live
get(key) : get the object indexed by key if it exists and is not expired
getCacheEntry(key) : get the object indexed by key even if it has expired. The result could be something like {object : …, expired: true|false}

SharedPreferences could be used to store cache data in Android if it is relatively small. On Web application side, LocalStorage can be leveraged to achieve the same. One more convenient way is to use LocalForage which abstracts the underlying storage APIs and use the optimal ones when available.

Conclusion

I hope this article was a helpful overview. There is no step by step tutorial or code snippets but this is intended to be a quick overview about making your application scalable and highly available. I used most of these concepts to architect couponfog the coupons app.