The Anatomy of a Web Application

2022-05-15

Web-Server Layer

This is the “front door” for the application. Its function is to process and respond to HTTP requests as they come, handling them synchronously (meaning the request is blocked until a response is completely sent back to the client).

This layer is usually a very elastic component, as each web-server instance is usually designed to be stateless. This is allowed by the fact that the REST pattern requires the client to send the full context on every request, usually meaning sending the client's identification and authentication token.

Which also bring us to the fact this layer is responsible for authentication (making sure the other party is who they say they are) and authorization (make sure the other party, once authenticated, can only access they are supposed to access, and performed actions which are allowed).

Asynchronous Job Queue

Not everything can be done at request time, or needs to be. The perceived responsiveness from the client side can be increased through offloading some processes to be done asynchronously to the HTTP request, so after the response has been sent and the client has moved on.

Some candidates for job queues: logging and telemetry; long-running processes, such as payment processing, image processing, user-submitted code execution, long-running third-party calls. In short, anything that doesn't contribute to the contents of the response or something which results can be polled for afterwards.

Scheduled Jobs

Code which is supposed to run: 1) at regular intervals (e.g. every day, every hour, and so on) or 2) a given amount of time after some other event (e.g. some resource has to expire after a given time).

The first kind is usually enqueued in the same job queue described above, but by a continuously running scheduling process, instead of, say, HTTP requests. The second kind can be enqueued by HTTP requests, if the queuing service supports delayed messages.

Business-Data Persistence Layer

This is usually a relational database (SQL Database) or a document database (such as MongoDB). The goal is to store the information that the system must “remember”, and allow its efficient retrieval during request time. This is where all the business-data goes as well.

For a production-ready application, all of this data must be backed up regularly and stored in a redundant manner.

Caching Layer

When there's some data which is very frequently accessed and has low-latency requirements, it might be worth introducing a caching layer, usually powered by Redis. This can store the results of long running computations, results from recent database queries which might be reused soon, or otherwise data who does not need to be persisted for the long term (e.g. which are the users that are online right now?).

File Storage Layer

While entries in a database are at most a few kilobytes, there might be a requirement for allowing users to submit or download files, such as PDF files or images. These are much larger than an usual entry in a database, and are usually never changed after they are created.

This different access pattern can be accommodated more easily with object storage or blob storage solutions, which allow for unlimited disk usage while keeping the files accessible over the internet.

Putting large files into the database is not advisable since it makes backups a lot heavier unnecessarily. (Conversely, storing relational or business data in a file storage is not advisable since it doesn't support efficient query patters or low latency.)

Logging, Telemetry and Alerts

A production-ready web application needs somewhere to send, store and view application logs. These logs need to be correlated with some user-facing identifier, allowing the system maintainers to query for the logs afterwards without revealing the internals of the system failure to third-parties.

Other structured measurements critical for a healthy system, such as latency, amount of requests, application errors, should be also logged and aggregated, ideally triggering automated alerts so that system maintainers can get to work to mitigate the issue before customers reach out.