Vertical and horizontal scaling, scaling for the web. Impact of architecture on accessibility. Vertical scaling issues

Every programmer wants to become the best, to receive more and more interesting and complex tasks and solve them more and more in effective ways. In the world of Internet development, such tasks include those faced by developers of highly loaded systems.

Most of the information published on the topic of high loads on the Internet is just descriptions technical characteristics large systems. We will try to outline the principles by which the architectures of the most advanced and most visited Internet projects of our time are built.

Functional separation
Classic horizontal scaling
- Shared Nothing and Stateless Concepts
- Criticism of the Shared Nothing and Stateless concepts
- Coherence between code and data
Caching
- Cache invalidation problem
- Problem starting with a cold cache

Let's begin our third lesson, dedicated to the business logic of the project. This is the most important component in processing any request. Such calculations require backends - heavy servers with large computing power. If the front end cannot give something to the client on its own (and as we found out in the last issue, we can give it to the client without any problems, for example, pictures), then it makes a request to the back end. On the backend, business logic is processed, that is, data is generated and processed, while the data is stored in another layer - network storage, database or file system. Data storage is the topic of the next lesson, but today we will focus on scaling the backend.

Let us warn you right away: scaling computing backends is one of the most difficult topics, in which there are many myths. Cloud computing solve the performance problem - many are sure. However, this is not entirely true: in order for you to really be helped cloud services, you must prepare your program code correctly. You can set up as many servers as you want in, say, Amazon EC2, but what good are they if the code can’t use the power of each of them. So how do you scale the backend?

Functional separation

The very first and simplest method that everyone comes across is functional partitioning, in which different parts of the system, each of which strictly solves its own problem, are separated into separate physical servers. For example, the forum you visit is transferred to one server, and everything else works on another.

Despite its simplicity, many people forget about this approach. For example, we very often come across web projects where only one is used MySQL database completely under Various types data. The same database contains articles, banners, and statistics, although in good faith they should be different MySQL instances. If you have functionally unrelated data (as in this example), then it makes sense to distribute it to different database instances or even physical servers. Let's look at this from the other side. If you have both a built-in integrated banner spinner and a service that displays user posts in one project, then a reasonable solution is to immediately realize that this data is in no way related to each other and therefore should live in the simple version in two different running MySQL. This also applies to computing backends - they can also be different. With absolutely different settings, with different technologies used and written in different languages programming. Returning to the example: to display posts you can use the most common PHP as a backend, and for the banner system you can run a module on nginx. Accordingly, for posts you can allocate a server with big amount memory (well, PHP after all), while for a banner system memory may not be as important as processor capacity.

Let's draw conclusions: it is advisable to use functional partitioning of the backend as the simplest scaling method. Group similar functions and run their handlers on different physical servers. Let's look at the next approach.

From the authors

The main activity of our company is solving problems related to with high load, consulting, design of scalable architectures, conducting load testing website optimization. Our clients include investors from Russia and all over the world, as well as projects “VKontakte”, “Eldorado”, “Imhonet”, Photosight.ru and others. During consultations, we often come across the fact that many do not know the very basics - what scaling is and how it happens, what tools are used and for what. This publication continues the series of articles “Tutorial on high loads" In these articles we will try to consistently talk about all the tools that are used to build the architecture of high-load systems.

Classic horizontal scaling

In principle, we already know what horizontal scaling is. If your system doesn't have enough power, you simply add ten more servers and they keep running. But not every project will allow you to do this. There are several classic paradigms that need to be considered early in the design process so that the code can scale as workload increases.

Shared Nothing and Stateless Concepts

Let's consider two concepts - Shared Nothing and Stateless, which can provide the ability to scale horizontally.

The Shared Nothing approach means that each node is independent, self-sufficient and there is no single point of failure. This, of course, is not always possible, but in any case, the number of such points is under the strict control of the architect. By point of failure we mean some data or calculation that is common to all backends. For example, some kind of state or identifier manager. Another example is the use of network file systems. This is a direct way to get a bottleneck in the architecture at a certain stage of project growth. If each node is independent, then we can easily add several more as the load increases.

The concept of Stateless means that a program process does not store its own state. The user came and got to this specific server, and it makes no difference whether the user got to this server or another. Once the request is processed, this server will completely forget the information about that user. The user is not at all obliged to send all his next requests to the same server, and does not have to come to the same server a second time. Thus, we can dynamically change the number of servers and not worry about routing the user to the desired server. This is probably one of the big reasons why the web is growing so quickly. It is much easier to make applications in it than to write classic offline programs. The response-request concept and the fact that your program lives for 200 milliseconds or a maximum of one second (after which it is completely destroyed) has led to the fact that common programming languages like PHP still do not have a garbage collector.

The approach described is classic: it is simple and rock-solid. However, in Lately we have to give it up more and more often.

Criticism of the Shared Nothing and Stateless concepts

Today, the web faces new challenges that pose new challenges. When we talk about Stateless, this means that we re-draw every data for each user from storage, and this can sometimes be very expensive. There is a reasonable desire to put some data in memory, to make it not quite Stateless. This is because today the web is becoming more and more interactive. If yesterday a person went to webmail and clicked on the “Reload” button to check for new messages, then today the server is already doing this. He tells him: “Oh, man, while you were sitting on this page, you received new messages.”

New challenges arise that mean that the Shared Nothing and stateless approach is sometimes not necessary. We have already repeatedly encountered situations with our clients to whom we say: “Give up this, put the data in memory” and vice versa, “Direct people to the same server.” For example, when there is an open chat room, it makes sense to route people to the same server so that it all works faster.

Let's talk about another case we encountered. One of our friends was developing a toy like “Arena” (online fights and fights) using Ruby on Rails. Shortly after launch he encountered classic problem: if several people are in the same battle, each user constantly pulls out from the database the data that arose during this battle. As a result, this entire structure could only survive up to 30 thousand registered users, and then it simply stopped working.

The opposite situation has developed for the company Vuga, which makes games for Facebook. However, when they faced a similar problem, they had a different scale: several billion SELECTs from PostgreSQL per day on one system. They switched completely to the Memory State approach: data began to be stored and served directly in random access memory. The result: the guys practically abandoned the database, and a couple of hundred servers turned out to be superfluous. They were simply turned off: they were no longer needed.

In principle, any scaling (including horizontal) is achievable on many technologies. Now very often we're talking about so that when creating a service you don’t have to pay too much for hardware. To do this, it is important to know which technology is most suitable this profile loads with minimal costs gland. At the same time, very often, when they start thinking about scaling, they forget about the financial aspect of the same horizontal scaling. Some people think that horizontal scaling is really a panacea. We separated the data, scattered everything onto separate servers - and everything became normal. However, these people forget about overhead costs - both financial (purchase of new servers) and operational. When we break everything down into components, there is a communication overhead software components between themselves. Roughly speaking, there are more hops. Let's remember an example already familiar to you. When we go to Facebook page, powerful JavaScript goes to the server, which thinks for a long, long time and only after a while begins to give you your data. Everyone has seen a similar picture: you want to look at it and run on and drink coffee, but it just loads and loads and loads. It would be necessary to store the data a little “closer”, but Facebook no longer has this option.

Code layering

A couple more tips to make horizontal scaling easier. The first recommendation: program so that your code consists of layers, and each layer is responsible for a specific process in the data processing chain. Let's say, if you are working with a database, then it should be done in one place, and not be scattered across all scripts. For example, we are building a user page. It all starts with the kernel running the business logic module to build the user page. This module queries the underlying data storage layer for information about that particular user. The business logic layer knows nothing about where the data is: whether it is cached, sharded (sharding is the distribution of data across different data storage servers, which we will talk about in future lessons), or something else was done with it not good. The module simply requests information by calling the appropriate function. The function for reading user information is located in the data storage layer. In turn, the data storage layer by request type determines in which storage the user is stored. In cache? In the database? On the file system? And then calls the corresponding function of the underlying layer.

What does such a layered scheme provide? It makes it possible to rewrite, throw out or add entire layers. For example, let's say you decide to add caching for users. Doing this in a layered scheme is very simple: you only need to add one place - the data storage layer. Or you add sharding, and now users can reside in different databases. In the usual scheme, you will have to dig through the entire site and insert the appropriate checks everywhere. In a layered circuit, you only need to correct the logic of one layer, one specific module.

Coherence between code and data

The next important task that needs to be addressed to avoid problems when scaling horizontally is to minimize the coupling of both code and data. For example, if you use JOINs in your SQL queries, you already have potential problem. You can make a JOIN within one database. But within the framework of two databases located on different servers, this is no longer possible. General recommendation: try to communicate with the repository using minimally simple queries, iterations, steps.

What to do if you can’t do without JOIN? Do it yourself: make two requests, multiply in PHP - there’s nothing wrong with that. For example, consider the classic task of building a friend feed. You need to raise all the user's friends, request everything for them the last notes, for all posts, collect the number of comments - this is where the temptation to do this in one query (with a certain number of nested JOINs) is especially great. Just one request and you get all the information you need. But what will you do when there are too many users and records and the database can no longer cope? In a good way, it would be necessary to share the users (spread them evenly across different database servers). It is clear that in this case it will no longer be possible to perform the JOIN operation: the data is divided into different databases. So you have to do everything manually. The conclusion is obvious: do it manually from the very beginning. First query the database for all of the user's friends (first query). Then grab the last posts of these users (second query or group of queries). Then sort through your memory and select what you need. In fact, you are performing the JOIN operation manually. Yes, you may not do it as efficiently as a database would. But you are in no way limited by the volume of this database in storing information. You can divide and distribute your data to different servers or even to different DBMSs! All this is not at all as scary as it might seem. In a properly constructed layered system, most of these requests will be cached. They are simple and easy to cache - unlike the results of a JOIN operation. Another disadvantage of the JOIN option: when added by the user new entry you need to reset the sample caches of all his friends! And in this situation, it is unknown what will actually work faster.

Caching

The next important tool we'll learn about today is caching. What is cache? A cache is a place where you can put data under some key that takes a long time to calculate. Remember one of the key points: the cache should give you data using this key faster than recalculating it. We have repeatedly encountered a situation where this was not the case and people wasted their time pointlessly. Sometimes the database works quite quickly and it is easier to go directly to it. Second key moment: The cache should be the same for all backends.

Second important point. The cache is more of a way to cover up a performance problem rather than solve it. But, of course, there are situations where it is very expensive to solve a problem. So you say: “Okay, I’ll cover this crack in the wall with plaster, and we’ll think that it’s not there.” Sometimes it works - in fact, it works very often. Especially when you get into the cache and the data you wanted to show is already there. A classic example is the friend counter. This is a counter in the database, and instead of going through the entire database looking for your friends, it is much easier to cache this data (and not recalculate it every time).

For the cache, there is a criterion for the effectiveness of use, that is, an indicator that it works - it is called Hit Ratio. This is the ratio of the number of requests for which the answer was found in the cache to total number requests. If it's low (50-60%), then you have extra cache overhead. This means that on almost every second page the user, instead of getting data from the database, also goes to the cache: finds out that there is no data for him there, and then goes directly to the database. And these are an extra two, five, ten, forty milliseconds.

How to ensure a good Hit Ratio? In those places where your database is slow, and in those places where the data can be recalculated for quite a long time, you plug in Memcache, Redis or a similar tool that will serve as a fast cache - and this begins to save you. At least temporarily.

Oleg Bunin

Well-known specialist in Highload projects. His company Oleg Bunin Laboratory specializes in consulting, development and testing of high-load web projects. Currently he is the organizer of the HighLoad++ conference (www.highload.ru). This is a conference dedicated to high loads, which annually brings together the best in the world development specialists major projects. Thanks to this conference, I know all the leading experts in the world of high-load systems.

Konstantin Osipov

Database specialist who for a long time worked at MySQL, where I was responsible for a high-load sector. The speed of MySQL is largely due to Kostya Osipov. In my time he worked on MySQL 5.5 scalability. Currently he is responsible at Mail.Ru for the Tarantool clustered NoSQL database, which serves 500–600 thousand queries per second. Use this Open Source Anyone can do the project.

Maxim Lapshin

Solutions for organizing video broadcasting that exist in the world on this moment, can be counted on one hand. Max developed one of them - Erlyvideo (erlyvideo.org). This server application, which deals with video streaming. When creating such tools, a whole bunch of the most difficult problems with speed. Maxim also has some experience with scaling medium-sized sites (not as large as Mail.Ru). By average we mean such sites, the number of hits to which reaches about 60 million per day.

Konstantin Mashukov

Business analyst at Oleg Bunin's company. Konstantin came from the world of supercomputers, where for a long time he worked on various scientific applications related to number crunchers. As a business analyst, she participates in all consulting projects of the company, be it social networks, large online stores or electronic payment systems.

Cache invalidation problem

But with the use of a cache, you get the cache invalidation problem as a bonus. What's the point? You put data in the cache and take it from the cache, but by this time the original data has already changed. For example, Mashenka changed the caption under her picture, and for some reason you put one line in the cache instead of pulling it from the database each time. As a result, you are showing old data - this is a cache invalidation problem. IN general case it has no solution because this problem is related to the data usage of your business application. The main question is: when to update the cache? It is sometimes difficult to answer. For example, a user posts to social network new post - let's say at this moment we are trying to get rid of all invalid data. It turns out that you need to reset and update all caches that are related to this post. In the worst case scenario, if a person makes a post, you clear the cache from their feed of posts, flush all the caches from their friends' feed, clear all caches from the feeds of people who have people in that community as friends, and so on. You end up flushing half the caches on the system. When Zuckerberg publishes a post to his eleven and a half million subscribers, are we supposed to reset the eleven and a half million friend caches of all these subscribers? How to deal with this situation? No, we will go the other way and update the cache when requesting a friend feed that contains this new post. The system detects that there is no cache, goes and recalculates. The approach is simple and rock solid. However, there are also disadvantages: if the cache of a popular page is reset, you risk getting the so-called race-condition, that is, a situation where this same cache will be simultaneously calculated by several processes (several users decided to access new data). As a result, your system is engaged in a rather empty activity - simultaneous calculation of the nth amount of identical data.

One way out is to use several approaches simultaneously. You don't simply erase the obsolete value from the cache, you only mark it as obsolete and at the same time queue the task to recalculate the new value. While the queued job is being processed, the user is given a stale value. This is called degradation of functionality: you deliberately assume that some users will not receive the latest data. Most systems with well-thought-out business logic have a similar approach in their arsenal.

Startup problem with unheated cache

Another problem is starting with an unheated (that is, unfilled) cache. This situation clearly illustrates the point that a cache cannot solve the problem of a slow database. Let's say you want to show users the top 20 good posts for any period. This information was in your cache, but by the time the system started, the cache had been cleared. Accordingly, all users access the database, which takes, say, 500 milliseconds to build an index. As a result, everything starts to work slowly, and you have caused DoS (Denial-of-service) to yourself. The site is not working. Hence the conclusion: do not do caching until you have solved other problems. Make the database work quickly, and you won't need to bother with caching at all. Nevertheless, even the problem of starting with an empty cache has solutions:

Use cache storage with disk recording (we lose speed);
Manually fill the cache before starting (users wait and are indignant);
Let users into the site in batches (users are still waiting and indignant).

As you can see, any method is bad, so we’ll just repeat: try to make your system work without caching.

Scalability - the ability of a device to increase its
possibilities
by increasing the number of functional blocks,
performing alone and
the same tasks.
Glossary.ru

Usually people start thinking about scaling when one
the server cannot cope with the work assigned to it. What exactly is he not with?
cope? Operation of any web server by and large comes down to the basics
The occupation of computers is data processing. Response to an HTTP (or any other) request
involves performing some operations on some data. Respectively,
we have two main entities - data (characterized by its volume) and
calculations (characterized by complexity). The server may not be able to cope with its
work due to the large amount of data (it may not physically fit on
server), or due to heavy computing load. The talk here is
of course, about the total load - the complexity of processing one request can be
is small, but a large number of them can overwhelm the server.

We will mainly talk about scaling using an example
typical growing web project, but the principles described here are also suitable for
other areas of application. First we will look at the project architecture and simple
its distribution components to several servers, and then we'll talk about
scaling computing and data.

Typical site architecture

The life of a typical website begins with a very simple architecture
- this is one web server (usually Apache plays its role),
which handles all the work of servicing HTTP requests,
received from visitors. He gives clients so-called “statics”, then
there are files lying on the server disk that do not require processing: pictures (gif,
jpg, png), style sheets (css), client scripts (js, swf). Same server
answers queries that require calculations - usually forming
html pages, although sometimes images and other documents are created on the fly.
Most often, responses to such requests are generated by scripts written in PHP,
perl or other languages.

The disadvantage of such a simple work scheme is that different
nature of the requests (uploading files from disk and computational work scripts)
processed by the same web server. Computational queries require
keep a lot of information in the server memory (script language interpreter,
the scripts themselves, the data they work with) and can take up a lot
computing resources. Issuing static data, on the contrary, requires few resources
processor, but may take up long time, if the client has low
communication speed. Internal organization Apache server assumes that every
the connection is handled by a separate process. This is convenient for running scripts,
however not optimal for processing simple queries. It turns out that heavy (from
scripts and other data) Apache processes spend a lot of time waiting (first when receiving
request, then when sending a response), wasting server memory.

The solution to this problem is to distribute the processing work
requests between two different programs- i.e. division into frontend and
backend. A lightweight frontend server performs the tasks of serving static content, and the rest
requests are redirected (proxy) to the backend, where the formation is performed
pages. Waiting for slow clients is also taken care of by the frontend, and if it uses
multiplexing (when one process serves several clients - so
work, for example, nginx or lighttpd), then the wait is practically nothing
costs.

Among other components of the site, the database should be noted, in
which usually stores the main system data - the most popular here are
free DBMS MySQL and PostgreSQL. Storage is often allocated separately
binary files, which contains pictures (for example, illustrations for articles
site, avatars and user photos) or other files.

Thus, we received an architecture diagram consisting of
several components.

Typically, at the beginning of a site's life, all components of the architecture
located on the same server. If it stops coping with the load, then
There is a simple solution - move the most easily separated parts to another
server. The easiest way to start with a database is to move it to a separate server and
change access details in scripts. By the way, at this moment we are faced with
the importance of proper architecture program code. If working with a database
submitted to separate module, common for the entire site - then fix the parameters
connections will be easy.

The ways to further separate the components are also clear - for example, you can move the frontend to a separate server. But usually frontend
requires little system resources and at this stage its removal will not provide significant
productivity gains. Most often, the site is limited by performance.
scripts - generating a response (html page) takes too long.
That's why next step usually scaling the backend server.

Calculation distribution

A typical situation for a growing site - the database is already
moved to a separate machine, the division into frontend and backend is completed,
however, traffic continues to increase and the backend does not have time to process
requests. This means that we need to distribute the calculations over several
servers. This is easy to do - just buy a second server and install it on
It contains programs and scripts necessary for the backend to work.
After this, you need to make sure that user requests are distributed
(balanced) between the received servers. ABOUT in different ways balancing
will be discussed below, but for now we note that this is usually done by the frontend,
which is configured so that it evenly distributes requests between
servers.

It is important that all backend servers are capable of correctly
respond to requests. This usually requires each of them to work with
the same up-to-date data set. If we store all information in a single
database, then the DBMS itself will provide sharing and data consistency.
If some data is stored locally on the server (for example, php sessions
client), then you should think about transferring them to shared storage, or about more
complex request distribution algorithm.

Not only work can be distributed across several servers
scripts, but also calculations performed by the database. If the DBMS performs a lot
complex queries, taking up server CPU time, you can create several
copies of the database to different servers. This raises the issue of synchronization
data when changes, and several approaches are applicable here.

Synchronization at the application level. In this case our
scripts independently write changes to all copies of the database (and themselves carry
responsibility for the correctness of the data). Is not the best option because he
requires care in implementation and is highly error-tolerant.
Replication- that is, automatic replication
changes made on one server affect all other servers. Usually when
When using replication, changes are always written to the same server - it is called master, and the remaining copies are called slave. Most DBMSs have
built-in or external funds to organize replication. Distinguish
synchronous replication - in this case, a request to change data will wait
until the data is copied to all servers, and only then it will complete successfully - and asynchronously - in this case, the changes are copied to the slave servers from
delay, but the write request completes faster.
Multi-master replication. This approach is similar
the previous one, but here we can change the data without accessing
one specific server, but to any copy of the database. At the same time, changes
synchronously or asynchronously will be transferred to other copies. Sometimes this scheme is called
the term "database cluster".

Possible different variants distribution of the system among servers.
For example, we may have one database server and several backends (very
typical scheme), or vice versa - one backend and several databases. What if we scale
both backend server and database, then you can combine the backend and a copy of the database on
one car. In any case, as soon as we have several copies
any server, the question arises of how to correctly distribute between them
load.

Balancing methods

Let us create several servers (for any purpose - http, database, etc.), each of which can process requests. Before
we are faced with the task of how to distribute work between them, how to find out which
server send a request? There are two main ways to distribute requests.

Balancing unit. In this case, the client sends a request to one
a fixed server known to him, and it already redirects the request to one of
working servers. Typical example- a site with one frontend and several
backend servers to which requests are proxied. However, the "client" can
be inside our system - for example, a script can send a request to
to a database proxy server that will pass the request to one of the DBMS servers.
The balancing node itself can work both on a separate server and on one
from working servers.
The advantages of this approach are
that the client does not need to know anything about internal structure systems - about quantity
servers, about their addresses and features - only the
balancer However, the disadvantage is that the balancing unit is a single
point of failure of the system - if it fails, the entire system will be
inoperative. In addition, under heavy load the balancer may simply stop
cope with their work, so this approach is not always applicable.
Client side balancing. If we want to avoid
single point of failure exists Alternative option- entrust server selection
to the client himself. In this case, the client must know about the internal structure of our
systems to be able to choose correctly which server to contact.
An undoubted advantage is the absence of a point of failure - if one of the
servers the client will be able to contact others. However, the price for this is
more complex client logic and less balancing flexibility.

Of course, there are also combinations of these approaches. For example,
such known method load balancing, like DNS balancing, is based on
that when determining the IP address of a site, the client is given
the address of one of several identical servers. Thus, DNS acts as a
the role of the balancing node from which the client receives the “distribution”. However
the very structure of DNS servers implies the absence of a point of failure due to
duplication - that is, the advantages of two approaches are combined. Of course, this one
There are also disadvantages to the balancing method - for example, such a system is difficult to dynamically
rebuild.

Working with a site is usually not limited to one request.
Therefore, when designing, it is important to understand whether sequential queries can
client be processed correctly by different servers, or the client must be
tied to one server while working with the site. This is especially important if
the site stores temporary information about the user's session (in this
In this case, free distribution is also possible - but then it is necessary to store
sessions (storage common to all servers). “Bind” the visitor to
you can specify a specific server by its IP address (which, however, may change),
or by cookie (in which the server identifier is pre-recorded), or even
simply by redirecting it to the desired domain.

On the other hand, computing servers may not have equal rights.
In some cases, it is advantageous to do the opposite, to allocate a separate server for
processing requests of one type - and get a vertical division
functions. Then the client or balancing node will select the server in
depending on the type of request received. This approach allows us to separate
important (or vice versa, not critical, but difficult) requests from others.

Data distribution

We have learned to distribute calculations, so a large
attendance is not a problem for us. However, data volumes continue to grow,
storing and processing them is becoming increasingly difficult - which means it’s time to build
distributed data storage. In this case, we will no longer have one or
several servers containing a complete copy of the database. Instead, the data
will be distributed across different servers. What distribution schemes are possible?

Vertical distribution(vertical partitioning) - in the simplest case
constitutes a ruling separate tables database to another server. At
In this case, we will need to change the scripts to access different servers for
different data. In the limit, we can store each table on a separate server
(although in practice this is unlikely to be beneficial). Obviously, with this
distribution, we lose the ability to make SQL queries that combine data from
two tables located on different servers. If necessary, you can implement
merging logic in the application, but it will not be as efficient as in a DBMS.
Therefore, when partitioning a database, you need to analyze the relationships between tables,
to distribute as independent tables as possible.
More complex case
vertical base distribution is a decomposition of one table when part
some of its columns end up on one server, and some of them end up on another. This technique
is less common, but can be used, for example, to separate small
and frequently updated data from a large volume of rarely used data.
Horizontal distribution(horizontal partitioning) - consists of
distributing data from one table across several servers. In fact, on
each server creates a table of the same structure and stores
a certain piece of data. You can distribute data across servers in different ways
criteria: by range (records with id< 100000 идут на сервер А, остальные - на сервер Б), по списку значений (записи типа «ЗАО» и «ОАО» сохраняем на сервер
A, the rest - to server B) or by the hash value of a certain field
records. Horizontal partitioning of data allows you to store unlimited
the number of records, however, complicates the selection. The most effective way to choose
records only when it is known on which server they are stored.

For selection correct scheme data distribution is necessary
carefully analyze the structure of the database. Existing tables(and, perhaps,
individual fields) can be classified by frequency of access to records, by frequency
updates and relationships (the need to make selections from several
tables).

As mentioned above, in addition to a database, a site often needs
storage for binary files. Distributed systems file storage
(actually, file systems) can be divided into two classes.

Working at the level operating system . Moreover, for
applications working with files in such a system is no different from regular work With
files. The exchange of information between servers is handled by the operating system.
Examples of such file systems include the long-known
NFS family or less known, but more modern system Luster.
Implemented at the application level distributed
repositories imply that the work of exchanging information is carried out by itself
application. Typically, functions for working with storage are placed in
separate library. One of the striking examples of such storage is MogileFS, developed by
by the creators of LiveJournal. Another common example is using
WebDAV protocol and storage that supports it.

It should be noted that data distribution decides not only
a question of storage, but also partly a question of load distribution - on each
server becomes fewer entries, and therefore they are processed faster.
The combination of methods for distributing calculations and data makes it possible to build
potentially unlimitedly scalable architecture capable of working with
any amount of data and any load.

conclusions

To summarize what has been said, let us formulate conclusions in the form of brief theses.

The two main (and related) scaling challenges are compute distribution and data distribution
Typical site architecture involves separation of roles and
includes frontend, backend, database and sometimes file storage
At small volumes data and heavy loads apply
database mirroring - synchronous or asynchronous replication
For large amounts of data, it is necessary to distribute the database - split
it vertically or horizontally
Binaries are stored on distributed file systems
(implemented at the OS level or in the application)
Balancing (distribution of requests) can be uniform or
divided by functionality; with a balancing node, or on the client side
The right combination of methods will allow you to withstand any load;)

Links

You can continue studying this topic on interesting English-language sites and blogs.

ALEXANDER KALENDAREV, RBC Media, programmer, [email protected]

Problems and solutions

Sooner or later, a popular web or mobile project with a server side will encounter a performance problem. One solution is to scale the database horizontally. We talk about pitfalls and about possible ways bypassing them

Every growing project faces a productivity challenge. Therefore, if you think that your project is ambitious and will soon conquer the whole world, then it is advisable to include the possibility of scaling already at the level of initial architecture development.

Let's clarify the terminology:

Performance– the ability of the application to meet requirements such as maximum time reactions, throughput.
Throughput (capacity)– the maximum ability of an application to pass through a certain number of requests per unit of time or hold a certain number of user sessions.
Scalability is a characteristic of an application that shows its ability to maintain performance as it increases bandwidth. In turn, scaling is the process of ensuring system growth. Scaling can be vertical or horizontal.
Vertical scaling– this is an increase in productivity due to increasing the power of hardware, the amount of RAM, etc. Sooner or later, vertical scaling will hit its upper limit.
Horizontal scaling is an increase in productivity by dividing data across multiple servers.

Functional data separation

There are several options for horizontal scaling. For example, it is very often used to separate data based on functional usage. For example, data for photo albums is contained in one group of servers, user profile data is located in another group, and user correspondence is located in a third. In Fig. Figure 1 shows a diagram of horizontal scaling by functional distribution.

Scaling using replication

The easiest way to scale, often used for small and medium-sized projects, is to use replication. Replication is a mechanism for synchronizing multiple copies of an object and database tables (see Fig. 2). Master-slave replication is the synchronization of data from the main master server to slave slave servers.

Since most web and mobile projects There are an order of magnitude more reading operations than write operations, then we can perform write operations on one master server, and read data from many slave servers. Replication must be configured between master and slave servers.

Many databases have built-in replication, or, as they say, an “out-of-the-box solution.” For example, PostgreSQL replication can be performed by the following utilities:

Slony-I – asynchronous (master to multiple slaves) replication;
pgpool-I/II – synchronous multimaster replication;
Pgcluster – synchronous multimaster replication;
Bucardo;
London;
RubyRep.
starting from version 9.0, built-in streaming replication.

When scaling using replication, you must use different connections: one with a master server, only for writing or updating, and the second, only with a slave server, directly for reading. Moreover, if we use several slave servers, then the selection strategy can be random, or a specific database server is assigned to a specific web server.

Read the entire article in the magazine " System Administrator", No. 10 for 2014 on pages 54-62.

PDF version given number can be purchased in our store.

In contact with

Vertical scaling— scaling up — increasing the number of resources available to the software by increasing the power used from the servers.

— scaling out — increasing the number of nodes combined into a server cluster when there is a lack of CPU, memory or disk space.

Both are infrastructure solutions that different situations required when a web project grows.

Vertical and horizontal scaling, scaling for web

For example, consider database servers. For large applications this is always the most loaded component of the system.

Scaling capabilities for database servers are determined by the applicable software solutions: most often these are relational databases (MySQL, Postgresql) or NoSQL(, Cassandra, etc.).

Horizontal scaling for database servers under heavy loads is much cheaper

A web project is usually started on one server, the resources of which run out as it grows. In such a situation, there are 2 options:

move the site to a more powerful server
add another low-power server and combine machines into a cluster

MySQL is the most popular RDBMS and, like any of them, requires a lot of server resources to run under load. Scaling is possible mainly upwards. There is sharding (which requires code changes to configure) and sharding, which can be difficult to support.

Vertical scaling

NoSQL scales easily and the second option with, for example, MongoDB will be much more financially profitable, and will not require labor-intensive settings and support for the resulting solution. Sharding is carried out automatically.

Thus, with MySQL you will need a server with a large amount of CPU and RAM; such servers have a significant cost.

Horizontal scaling
With MongoDB you can add one more average server and the resulting solution will work stably, giving additional fault tolerance.

Scale-out or is a natural stage of infrastructure development. Any server has limitations, and when they are reached or when the cost of a more powerful server turns out to be unreasonably high, new machines are added. The load is distributed between them. It also provides fault tolerance.

You should start adding medium-sized servers and setting up clusters when the possibilities for increasing the resources of one machine have been exhausted or when purchasing a more powerful server turns out to be unprofitable

The example given with relational databases and NoSQL is a situation that occurs most often. The frontend and backend servers are also scalable.

Read about the balancer

A constantly growing number of website visitors is always a great achievement for developers and administrators. Of course, with the exception of those situations when the traffic increases so much that it crashes the web server or other software. Constant website interruptions are always very costly for the company.

However, this can be fixed. And if you are now thinking about scaling, you are on the right track.

In a nutshell, scalability is the ability of a system to handle a large volume of traffic and adapt to its growth while maintaining the desired UX. There are two scaling methods:

Vertical (also called scaling up): increasing system resources, such as adding memory and computing power. This method allows you to quickly resolve problems with traffic processing, but its resources can quickly exhaust themselves.
Horizontal (or scaling out): adding servers to the cluster. Let's take a closer look at this method.

What is horizontal scaling?

Simply put, a cluster is a group of servers. A load balancer is a server that distributes the workload among servers in a cluster. At any time, you can add a web server to an existing cluster to handle more traffic. This is the essence of horizontal scaling.

The load balancer is only responsible for deciding which server in the cluster will process the received request. Basically, it works as a reverse proxy.

Horizontal scaling is undoubtedly more reliable method increasing application performance, but it is more difficult to configure than vertical scaling. The main and most difficult task in this case is to constantly keep all application nodes updated and synchronized. Let's say user A sends a request to mydomain.com, after which the balancer passes the request to server 1. Then user B's request will be processed by server 2.

What happens if user A makes changes to the application (for example, uploads a file or updates the contents of the database)? How can this change be transmitted to the rest of the servers in the cluster?

The answer to these and other questions can be found in this article.

Server separation

Preparing the system for scaling requires separating servers; it is very important that servers with fewer resources have fewer responsibilities than larger servers. In addition, dividing the application into such “parts” will allow you to quickly identify its critical elements.

Let's say you have a PHP application that allows you to authenticate and upload photos. The application is based on the LAMP stack. Photos are saved on disk, and links to them are stored in the database. The challenge here is to support synchronization between multiple application servers that share this data (uploaded files and user sessions).

To scale this application, you need to separate the web server and the database server. Thus, nodes will appear in the cluster that share the database server. This will increase application performance by reducing the load on the web server.

In the future, you can configure load balancing; You can read about this in the manual ""

Session consistency

With the web server and database separated, you need to focus on handling user sessions.

Relational databases and network file systems

Session data is often stored in relational databases data (such as MySQL), because these databases are easy to configure.

However, this solution is not the most reliable, because in this case the load increases. The server must enter into the database every read and write operation for each separate request, and in the event of a sudden increase in traffic, the database tends to fail before other components.

Network file systems are another simple way to store data; there is no need to make changes to the database source texts, however network systems process I/O operations very slowly, and this can have a negative impact on application performance.

Sticky sessions

Sticky sessions are implemented on the load balancer and do not require any changes to the application nodes. This is the most convenient method for handling user sessions. The load balancer will always direct the user to the same server, eliminating the need to distribute session data among the remaining nodes in the cluster.

However, this solution also has one serious drawback. Now the balancer not only distributes the load, it also has additional task. This may affect its performance and cause it to crash.

Memcached and Redis servers

You can also configure one or more additional servers to process sessions. This is the most reliable way solving problems related to session processing.

Final actions

Horizontally scaling an application seems like a very complex and confusing solution at first, but it helps eliminate serious traffic problems. The main thing is to learn how to work with a load balancer in order to understand which components require additional configuration.

Scaling and application performance are very closely related. Of course, not all applications and sites need scaling. However, it is better to think about this in advance, preferably at the application development stage.

Tags: ,