Scaling the load of web applications. Vertical and horizontal scaling. Stateless Components

A lot of words have already been said on this topic both on my blog and beyond. It seems to me that the moment has come to move from the specific to the general and try to look at this topic separately from any successful implementation of it.

Shall we get started?

To begin with, it makes sense to decide what we are going to talk about. In this context, the web application has three main goals:

scalability- the ability to respond in a timely manner to continuous load growth and unexpected influxes of users;
availability- providing access to the application even in case of emergency;
performance- even the slightest delay in loading a page can leave a negative impression on the user.

The main topic of conversation will be, as you might guess, scalability, but I don’t think the other goals will remain aside. I would like to immediately say a few words about accessibility, so as not to return to this later, meaning as “it goes without saying”: any site one way or another strives to function as stably as possible, that is, to be accessible to absolutely all of its potential visitors at absolutely every moment time, but sometimes all sorts of unforeseen situations happen that can cause temporary unavailability. To minimize the potential impact on the availability of the application, it is necessary to avoid having components in the system, the potential failure of which would lead to the unavailability of any functionality or data (or at least the site as a whole). Thus, each server or any other component of the system must have at least one backup (it does not matter in what mode they will work: in parallel or one “backs up” the other, while being in passive mode), and the data must be replicated in at least two copies (and preferably not at the RAID level, but on different physical machines). Storing multiple backup copies data somewhere separate from the main system (for example, on special services or on a separate cluster) will also help avoid many problems if something goes wrong. Don’t forget about the financial side of the issue: insurance in case of failures requires additional significant investments in equipment, which it makes sense to try to minimize.

Scalability is usually divided into two areas:

Vertical scalability Increasing the performance of each system component to improve overall performance. Horizontal scalability Breaking down a system into smaller structural components and separating them into separate physical machines(or their groups) and/or increasing the number of servers performing the same function in parallel.

One way or another, when developing a system growth strategy, you have to look for a compromise between price, development time, final performance, stability and a host of other criteria. From a financial point of view, vertical scalability is far from the most attractive solution, because prices for servers with a large number of processors always grow almost exponentially relative to the number of processors. This is why the horizontal approach is most interesting, since it is the one that is used in most cases. But vertical scalability sometimes has a right to exist, especially in situations where the main role is played by the time and speed of solving the problem, and not by the financial issue: after all, to buy BIG server significantly faster than practically developing an application from scratch, adapting it to work on a large number of parallel running servers.

Having finished with in general terms let's get to the review potential problems and options for their solutions when scaling horizontally. Please do not criticize too much - I don’t claim absolute correctness and reliability, I’m just “thinking out loud”, and I definitely won’t be able to even mention all the points on this topic.

Application servers

In the process of scaling the applications themselves, problems rarely arise if during development you always keep in mind that each instance of the application should be directly in no way connected with its “colleagues” and should be able to process absolutely any user request, regardless of where previous requests were processed given user and what exactly does he want from the application as a whole at the moment.

Further, ensuring the independence of each individual running application, more and more can be processed and large quantity requests per unit of time simply by increasing the number of parallel operating application servers participating in the system. Everything is quite simple (relatively).

Load Balancing

The next task is to evenly distribute requests among the available application servers. There are many approaches to solving this problem and even more products that offer their specific implementation.

Equipment network hardware, which allows you to distribute the load between several servers, usually costs quite a significant amount, but among other options it is usually this approach that offers the highest performance and stability (mainly due to quality, plus such equipment sometimes comes in pairs, working according to the principle). There are quite a lot of serious brands in this industry offering their solutions - there is plenty to choose from: Cisco, Foundry, NetScalar and many others. Software There is even more diversity in this area possible options. It is not so easy to obtain software performance comparable to hardware solutions, and HeartBeat will have to be provided in software, but the equipment for the operation of such a solution is a regular server (possibly more than one). Such software products quite a lot, they are usually just HTTP servers that forward requests to their colleagues on other servers instead of sending them directly to the programming language interpreter for processing. For example, you can mention, say, mod_proxy. In addition, there are more exotic options based on DNS, that is, in the process of determining by the client the IP address of the server with the Internet resources it needs, the address is issued taking into account the load on the available servers, as well as some geographical considerations.

Each option has its own assortment of positive and negative sides, which is why there is no unambiguous solution to this problem - each option is good in its own specific situation. Do not forget that no one limits you to using only one of them; if necessary, an almost arbitrary combination of them can easily be implemented.

Resource-intensive computing

Many applications use some complex mechanisms, this could be converting video, images, sound, or simply performing some resource-intensive calculations. Such tasks require special attention if we are talking about the Web, since a user of an Internet resource is unlikely to be happy watching a page load for several minutes, waiting only to see a message like: “The operation was completed successfully!”

To avoid such situations, you should try to minimize the performance of resource-intensive operations synchronously with the generation of Internet pages. If a particular operation does not affect new page sent to the user, you can simply organize queue tasks that need to be completed. In this case, at the moment when the user has completed all the actions necessary to start the operation, the application server simply adds a new task to the queue and immediately begins generating the next page without waiting for the results. If the task is actually very labor-intensive, then such a queue and job handlers can be located on a separate server or cluster.

If the result of an operation is involved in next page sent to the user, then if it is executed asynchronously, you will have to cheat a little and somehow distract the user while it is being executed. For example, if we're talking about about converting video to flv, then, for example, you can quickly generate a screenshot with the first frame in the process of compiling a page and substitute it in place of the video, and dynamically add the viewing option to the page after the conversion is completed.

Another good method of handling such situations is to simply ask the user to "come back later." For example, if the service generates screenshots of websites from different browsers in order to demonstrate the correctness of their display to owners or those simply interested, then generating a page with them can take not even seconds, but minutes. The most convenient for the user in such a situation would be to offer to visit the page at specified address in so many minutes, and not wait for an indefinite period of time by the sea for weather.

Sessions

Almost all web applications interact in some way with their visitors, and in the vast majority of cases there is a need to track user movements across site pages. To solve this problem, a mechanism is usually used sessions, which consists of assigning each visitor a unique identification number, which is passed to it for storage in cookies or, in the absence of cookies, for constant “dragging” along with itself via GET. Having received a certain ID from the user along with the next HTTP request, the server can look at the list of already issued numbers and unambiguously determine who sent it. Each ID can be associated with a certain set of data that the web application can use at its discretion; this data is usually stored by default in a file in a temporary directory on the server.

It would seem that everything is simple, but... but requests from visitors to the same site can be processed by several servers at once, how then can one determine whether the received ID was issued on another server and where its data is generally stored?

The most common solutions are centralization or decentralization of session data. A somewhat absurd phrase, but hopefully a couple of examples can clear things up:

Centralized session storage The idea is simple: create a common “piggy bank” for all servers, where they can put the sessions they issue and learn about the sessions of visitors to other servers. Theoretically, the role of such a “piggy bank” could be simply a file system mounted over the network, but for some reasons it looks more promising to use some kind of DBMS, since this eliminates a lot of problems associated with storing session data in files. But in the version with common base data, do not forget that the load on it will steadily increase with the increase in the number of visitors, and it is also worthwhile to provide in advance options for solving problematic situations associated with potential failures in the operation of the server with this DBMS. Decentralized session storage A good example- storage of sessions in , originally designed for distributed data storage in random access memory the system will allow all servers to receive fast access to any session data, but at the same time (unlike the previous method) any one-stop center there will be no storage for them. This will avoid performance and stability bottlenecks during busy periods.

As an alternative to sessions, mechanisms similar in purpose, built on cookies, are sometimes used, that is, all required by the application user data is stored on the client side (probably encrypted) and requested as needed. But in addition to the obvious advantages associated with not having to store unnecessary data on the server, a number of security problems arise. Data stored on the client side, even in encrypted form, poses a potential threat to the functioning of many applications, since anyone can try to modify it for their own benefit or in order to harm the application. This approach is only good if there is confidence that absolutely any manipulation of data stored by users is safe. But is it possible to be 100% sure?

Static content

While the volumes of static data are small - no one bothers to store them in the local file system and provide access to them simply through a separate lightweight web server like (I mean mainly different forms of media data), but sooner or later the server limits on disk space or the file system limit on the number of files in one directory will be reached, and you will have to think about redistributing the content. A temporary solution could be to distribute data by type into different servers, or perhaps using a hierarchical directory structure.

If static content plays one of the main roles in the operation of the application, then it is worth thinking about using a distributed file system to store it. This is perhaps one of the few ways to scale volume horizontally disk space by adding additional servers without making any fundamental changes to the operation of the application itself. I don’t want to give any advice on which cluster file system to choose right now; I have already published more than one review of specific implementations - try reading them all and comparing, if this is not enough, the rest of the Network is at your disposal.

Perhaps this option will not be feasible for some reason, then you will have to “reinvent the wheel” to implement at the application level principles similar to data segmentation in relation to a DBMS, which I will mention later. This option is also quite effective, but requires modification of the application logic, and therefore the execution extra work developers.

An alternative to these approaches is the use of so-called Content Delivery Network - external services, ensuring the availability of your content to users for a certain material reward to the service. The advantage is obvious - there is no need to organize your own infrastructure to solve this problem, but another additional cost item appears. I won’t give a list of such services, but if someone needs it, it won’t be difficult to find.

Caching

Caching makes sense at all stages of data processing, but only a few caching methods are most effective for different types of applications.

DBMS Almost all modern DBMSs provide built-in mechanisms for caching the results of certain queries. This method is quite effective if your system regularly makes the same data samples, but it also has a number of disadvantages, the main ones being the invalidation of the cache of the entire table at the slightest change, as well as the local location of the cache, which is ineffective if there are several servers in the system data storage. Application At the application level, objects of any programming language are usually cached. This method allows you to completely avoid a significant part of queries to the DBMS, greatly reducing the load on it. Like the applications themselves, such a cache must be independent of the specific request and the server on which it is executed, that is, it must be available to all application servers at the same time, and even better, it must be distributed across several machines for more efficient utilization of RAM. The leader in this aspect of caching can rightfully be called, which I already mentioned at one time. HTTP server Many web servers have caching modules like static content, and the results of the scripts. If the page is rarely updated, then using this method allows you to avoid generating the page in response to a fairly large part of requests without any changes visible to the user. Reverse proxy By placing a transparent proxy server between the user and the web server, you can provide the user with data from the proxy cache (which can be either in RAM or disk), without sending requests even to HTTP servers. In most cases this approach is only relevant for static content, mainly different forms media data: images, videos and the like. This allows web servers to focus only on working with the pages themselves.

Caching, by its very nature, requires practically no additional hardware costs, especially if you carefully monitor the use of RAM by other server components and utilize all available “surplus” cache forms that are most suitable for a particular application.

Cache invalidation in some cases may become non-trivial task, but one way or another universal solution everyone possible problems It’s not possible to write anything related to it (at least for me personally), so let’s leave this question until better times. IN general case the solution to this problem falls on the web application itself, which usually implements some kind of invalidation mechanism by means of deleting a cache object through a certain period of time after its creation or last used, or “manually” when certain events from the user or other system components.

Database

I left the most interesting part for the appetizer, because this integral component of any web application causes more problems when the load increases than all the others combined. Sometimes it may even seem that you should completely abandon horizontal scaling of the data storage system in favor of vertical scaling - just buy that same BIG server for a six- or seven-figure sum of non-rubles and not bother yourself with unnecessary problems.

But for many projects this is cardinal decision(and that, by and large, temporary) is not suitable, which means there is only one road left before them - horizontal scaling. Let's talk about her.

The path of almost any web project from a database point of view began with one simple server, on which the entire project worked. Then, at one fine moment, the need arises to move the DBMS to a separate server, but over time it also begins to not cope with the load. There is no particular point in going into detail about these two stages - everything is relatively trivial.

Next step usually happens master-slave with asynchronous data replication, how this scheme works has already been mentioned several times in the blog, but perhaps I’ll repeat it: with this approach, all write operations are performed only on one server (master), and the remaining servers (slave) receive data directly from the “master”, while processing only requests to read data. As you know, the read and write operations of any web project always grow in proportion to the load, while the ratio between both types of requests remains almost fixed: for each request to update data, there is usually an average of about a dozen read requests. Over time, the load grows, which means the number of write operations per unit of time also grows, but only one server processes them, and then it also ensures the creation of a certain number of copies on other servers. Sooner or later, the costs of data replication operations will become so high that this process will begin to occupy a very large part of the processor time of each server, and each slave will be able to process only a relatively a small amount of read operations, and, as a result, each additional slave server will begin to increase the total performance only slightly, also doing for the most part only maintaining its data in accordance with the “master”.

A temporary solution to this problem may be to replace the master server with a more productive one, but one way or another it will not be possible to endlessly postpone the transition to the next “level” of data storage system development: "sharding", to which I recently dedicated. So let me dwell on it only briefly: the idea is to divide all the data into parts according to some characteristic and store each part on a separate server or cluster, such part of the data in conjunction with the data storage system in which it is located , and is called a segment or shard’om. This approach allows you to avoid costs associated with data replication (or reduce them many times), and therefore significantly increase the overall performance of the storage system. But, unfortunately, the transition to this data organization scheme requires a lot of costs of a different kind. Since there is no ready-made solution for its implementation, you have to modify the application logic or add an additional “layer” between the application and the DBMS, and all this is most often implemented by the project developers. Off-the-shelf products can only make their work easier by providing a framework for building the basic architecture of the data storage system and its interaction with the rest of the application components.

At this stage the chain usually ends, since segmented databases can scale horizontally in order to fully satisfy the needs of even the most heavily loaded Internet resources. It would be appropriate to say a few words about the actual data structure within databases and the organization of access to them, but any decisions strongly depend on specific application and implementation, so let me just give a couple of general recommendations:

Denormalization Queries that combine data from multiple tables usually, all other things being equal, require more CPU time to execute than a query that affects only one table. And performance, as mentioned at the beginning of the story, is extremely important on the Internet. Logical data partitioning If some part of the data is always used separately from the bulk, then sometimes it makes sense to separate it into a separate independent data storage system. Low-level query optimization By maintaining and analyzing request logs, you can determine the slowest ones. Replacing the found queries with more efficient ones with the same functionality can help to use computing power more efficiently.

In this section it is worth mentioning another, more specific, type of Internet projects. Such projects operate with data that does not have a clearly formalized structure; in such situations, the use of relational DBMS as a data storage, to put it mildly, it is impractical. In these cases, they usually use less strict databases, with more primitive functionality in terms of data processing, but they are able to process huge amounts of information without finding fault with its quality and compliance with the format. A clustered file system can serve as the basis for such a data storage, and in this case a mechanism called is used to analyze the data; I will tell you the principle of its operation only briefly, since in its full scale it is somewhat beyond the scope of this story.

So, we have at the input some arbitrary data in a format that is not necessarily correctly observed. As a result, you need to get some final value or information. According to this mechanism, almost any data analysis can be carried out in the following two stages:

Map The main goal this stage is the representation of arbitrary input data in the form of intermediate key-value pairs that have a certain meaning and are formalized. The results are sorted and grouped by key, and then transferred to the next stage. Reduce Received after map the values are used for the final calculation of the required totals.

Each stage of each specific calculation is implemented as an independent mini-application. This approach allows almost unlimited parallelization of calculations on a huge number machines, which allows you to process volumes of almost arbitrary data in an instant. To do this, you just need to run these applications on each available server simultaneously, and then put all the results together.

Example finished frame To implement data processing according to this principle, use the opensource project of the Apache Foundation called, which I have already talked about several times before, and even wrote about at one time.

Instead of a conclusion

To be honest, I find it hard to believe that I was able to write such a comprehensive post and had almost no energy left to sum it up. I would just like to say that in the development of large projects every detail is important, and an unaccounted for detail can cause failure. That is why in this matter you should learn not from your own mistakes, but from others.

Although this text may look like a kind of generalization of all the posts in the series, it is unlikely to become the final point, I hope I will find something to say on this topic in the future, maybe one day it will be based on personal experience, and will not simply be the result of processing the mass of information I received. Who knows?...

So you've made a website. It is always interesting and exciting to watch how the visit counter slowly but surely creeps up, showing better results every day. But one day, when you don’t expect it, someone will post a link to your resource on some Reddit or Hacker News (or on Habré - approx. lane), and your server will crash.

Instead of getting new regular users, you will be left with blank page. At this point, nothing will help you restore the server's functionality, and traffic will be lost forever. How to avoid such problems? In this article we will talk about optimization and scaling.

A little about optimization

Everyone knows the basic tips: update to latest version PHP (5.5 now has OpCache built in), deal with indexes in the database, cache static (rarely changed pages, such as “About Us”, “FAQ”, etc.).

Also worth mentioning is one particular aspect of optimization - serving static content with a non-Apache server, such as Nginx for example. Configure Nginx to handle all static content (*.jpg, *.png, *.mp4, *.html...), and send files that require server processing to heavy Apache. It is called reverse proxy.

Scaling

There are two types of scaling - vertical and horizontal.
In my understanding, a site is scalable if it can handle traffic without changing the software.

Vertical scaling.

Imagine a server serving a web application. It has 4GB RAM, i5 processor and 1TB HDD. It does its job well, but to better cope with higher traffic, you decide to increase the RAM to 16GB, install an i7 processor, and fork out for SSD drive. Now the server is much more powerful and can cope with high loads. That's what it is vertical scaling.

Horizontal scaling.

Horizontal scaling is the creation of a cluster of interconnected (often not very powerful) servers that together serve the site. In this case, it is used load balancer(aka load balancer) - a machine or program whose main function is to determine which server to send a request to. Servers in a cluster share application servicing among themselves without knowing anything about each other, thus significantly increasing throughput and fault tolerance of your site.

There are two types of balancers - hardware and software. Software - installed on a regular server and receives all traffic, passing it on to the appropriate handlers. Such a balancer could be, for example, Nginx. In the “Optimization” section, it intercepted all requests for static files, and served these requests itself, without burdening Apache. Another popular software for implementing load balancing is Squid. Personally, I always use it, because... it provides a great, user-friendly interface to control the deepest aspects of balancing.

A hardware balancer is a dedicated machine whose sole purpose is to distribute the load. Typically, this machine no longer runs any software other than that developed by the manufacturer. You can read about hardware load balancers.

Please note that these two methods are not mutually exclusive. You can vertically scale any machine (aka Nodu) on your system.
In this article we discuss horizontal scaling because... it is cheaper and more effective, although more difficult to implement.

Permanent connection

When scaling PHP applications, several difficult problems arise. One of them is working with user session data. After all, if you logged in to the site, and the balancer sent your next request to another machine, then new car will not know that you are already logged in. In this case, you can use a persistent connection. This means that the balancer remembers which node the user’s request was sent to last time, and sends the next request there. However, it turns out that the balancer is too overloaded with functions; in addition to processing hundreds of thousands of requests, it also has to remember exactly how it processed them, as a result of which the balancer itself becomes a bottleneck in the system.

Local data exchange.

Sharing user session data among all nodes in the cluster seems like a good idea. And although this approach requires some changes in the architecture of your application, it is worth it - the load balancer is unloaded, and the entire cluster becomes more fault-tolerant. The death of one of the servers does not affect the operation of the entire system at all.
As we know, session data is stored in a superglobal array $_SESSION, which writes and retrieves data from a file on disk. If this disk is on one server, obviously other servers cannot access it. How can we make it available on multiple machines?
Firstly, please note that The session handler in PHP can be overridden. You can implement your own session class.

Using a database to store sessions

Using own handler sessions, we can store them in the database. The database can be on a separate server (or even a cluster). Usually this method works great, but when really high traffic, The database becomes a bottleneck(and if the database is lost, we completely lose functionality), because it has to service all the servers, each of which is trying to write or read session data.

Distributed File System

You may be thinking that it would be a good idea to set up a network file system, where all servers could write session data. Do not do that! This is a very slow approach, leading to damage or even loss of data. If, for some reason, you still decide to use this method, I recommend you GlusterFS

Memcached

You can also use memcached to store session data in RAM. However, this is not safe, because the data in memcached is overwritten if it runs out free place. You're probably wondering, isn't RAM shared across machines? How is it applied to the entire cluster? Memcached has the ability to combine available on different cars RAM in one pool.

The more machines you have, the more you can allocate to this memory pool. You don't have to pool all the machines' memory, but you can, and you can donate an arbitrary amount of memory from each machine to the pool. So, there is an opportunity to leave b O most of the memory for normal use, and allocate a piece for the cache, which will allow you to cache not only sessions, but other relevant information. Memcached is an excellent and widespread solution.

To use this approach, you need to slightly edit php.ini

Session.save_handler = memcache session.save_path = "tcp://path.to.memcached.server:port"

Redis cluster

Redis - NoSQL data storage. Stores the database in RAM. Unlike memcached, it supports persistent data storage and more complex data types. Redis doesn't support clustering, so using it for horizontal scaling is somewhat difficult, however, this is temporary, and the alpha version of the cluster solution has already been released.

Total

As you can see, horizontal scaling of PHP applications is not that easy. There are many difficulties, most solutions are not interchangeable, so you have to choose one and stick to it until the end, because when the traffic goes off scale, there is no longer the opportunity to smoothly switch to something else.

I hope this little guide will help you choose a scaling approach for your project.

In the second part of the article we will talk about scaling the database.

) Hello! I'm Alexander Makarov, and you may know me from the Yii framework - I'm one of its developers. I also have a full-time job - and this is no longer a startup - Stay.com, which deals with travel.

Today I will talk about horizontal scaling, but in very, very general terms.

What is scaling, anyway? This is an opportunity to increase project productivity for minimum time by adding resources.

Typically, scaling does not involve rewriting code, but either adding servers or increasing the resources of an existing one. This type includes vertical and horizontal scaling.

Vertical is when more RAM, disks, etc. are added. on an already existing server, and horizontal is when they install more servers to data centers, and the servers there already interact somehow.

The coolest question they ask is: why is it needed if everything works fine for me on one server? In fact, we need to check what will happen. That is, it works now, but what will happen later? There are two wonderful utilities - ab and siege, which seem to catch up with a cloud of competitor’s users who begin to hammer the server, try to request pages, send some requests. You have to tell them what to do, and the utilities generate reports like this:

The main two parameters: n - the number of requests that need to be made, c - the number of simultaneous requests. This way they check the competition.

At the output we get RPS, i.e. the number of requests per second that the server is capable of processing, from which it will become clear how many users it can handle. Everything, of course, depends on the project, it varies, but usually this requires attention.

There is one more parameter - Response time - the response time during which the server served a page on average. It varies, but it is known that about 300 ms is the norm, and anything higher is not very good, because these 300 ms are processed by the server, and another 300-600 ms are added to this, which are processed by the client, i.e. While everything is loading - styles, images and the rest - time also passes.

It happens that in fact there is no need to worry about scaling yet - we go to the server, update PHP, get a 40% increase in performance and everything is cool. Next, we configure Opcache and tune it. Opcache, by the way, is tuned in the same way as APC, with a script that can be found in the repository of Rasmus Lerdorf and which shows hits and misses, where hits are how many times PHP went to the cache, and misses are how many times it went to the file system get files. If we run the entire site, or run some kind of crawler on the links, or poke it manually, then we will have statistics on these hits and misses. If there are 100% hits and 0% misses, then everything is fine, but if there are misses, then you need to highlight more memory so that all our code fits into Opcache. This common mistake, which they admit - it seems like Opcache is there, but something doesn’t work...

They also often start to scale, but don’t look at it at all, which is why everything works slowly. Most often we go into the database, look - there are no indexes, put indices - everything works right away, enough for another 2 years, beauty!

Well, you also need to enable the cache, replace apache with nginx and php-fpm to save memory. Everything will be great.

All of the above are quite simple and give you time. There is time for the fact that someday this will not be enough, and we must prepare for this now.

How, in general, can you understand what the problem is? Either you already have a high load, and this is not necessarily some crazy number of requests, etc., it is when your project cannot cope with the load, and this can no longer be solved in trivial ways. You need to grow either wider or higher. Something needs to be done and, most likely, there is little time for it; something needs to be invented.

The first rule is to never do anything blindly, i.e. we need excellent monitoring. First, we gain time on some obvious optimization such as enabling a cache or caching the Home page, etc. Then we set up monitoring, it shows us what is missing. And all this is repeated many times - you can never stop monitoring and improvement.

What can monitoring show? We can rest against the disk, i.e. to the file system, to memory, to the processor, to the network... And it may be that everything seems to be more or less, but some errors appear. All this is resolved in different ways. You can solve a problem with, say, a disk by adding a new disk to the same server, or you can install a second server that will deal only with files.

What should you pay attention to right now when monitoring? This:

accessibility, i.e. whether the server is alive or not;
lack of disk resources, processor, etc.;
errors.

How to monitor all this?

Here is a list of great tools that allow you to monitor resources and show the results in a very convenient way:

This report is a transcript of one of the best presentations at the training conference for developers of high-load systems in 2015.
Old stuff! - you say.
- Eternal values! - we will answer. Add tags

Scalability is such a property computing system, which ensures predictable growth system characteristics, for example, the number of supported users, responsiveness, overall performance, etc., when adding computing resources to it. In the case of a DBMS server, two scaling methods can be considered - vertical and horizontal (Fig. 2).

With horizontal scaling, the number of DBMS servers increases, possibly interacting with each other in a transparent mode, thus separating total load systems. This solution is likely to become increasingly popular as support for loosely coupled architectures grows and distributed databases data, but is usually characterized by complex administration.

Vertical scaling involves increasing the power of a separate DBMS server and is achieved by replacing hardware(processor, disks) to a faster one or by adding additional nodes. A good example An increase in the number of processors in symmetric multiprocessor (SMP) platforms may serve as an incentive. In this case, the server software should not be changed (in particular, you cannot require the purchase additional modules), since this would increase the complexity of administration and reduce the predictability of system behavior. Regardless of which scaling method is used, the gain is determined by how fully the server programs use the available computing resources. In further assessments, we will consider vertical scaling, which, according to analysts, is experiencing the greatest growth in the modern computer market.

The scalability property is relevant for two main reasons. First of all, modern business conditions change so quickly that they make long-term planning, which requires comprehensive and lengthy analysis of already outdated data, impossible, even for those organizations that can afford it. Instead comes a strategy of gradually, step by step, increasing power information systems. On the other hand, changes in technology lead to the emergence of more and more new solutions and lower hardware prices, which potentially makes the architecture of information systems more flexible. At the same time, interoperability and openness of software and hardware products are expanding different manufacturers, although so far their compliance efforts have been consistent only in narrow market sectors. Without taking these factors into account, the consumer will not be able to take advantage of new technologies without freezing funds invested in technologies that are not open enough or have proven to be unpromising. In the area of data storage and processing, this requires that both the DBMS and the server be scalable. Today, the key scalability parameters are:

support for multiprocessing;
architectural flexibility.

Multiprocessor systems

For vertical scaling, symmetric multiprocessor (SMP) systems are increasingly being used, since in this case there is no need to change the platform, i.e. operating system, hardware, and administration skills. For this purpose, it is also possible to use systems with massive parallelism (MPP), but so far their use is limited to special tasks, for example, computational ones. When evaluating a DBMS server with a parallel architecture, it is advisable to pay attention to two main characteristics of the architecture's extensibility: adequacy and transparency.

The adequacy property requires that the server architecture equally support one or ten processors without reinstallation or significant configuration changes, as well as additional software modules. Such an architecture will be equally useful and effective both in a single-processor system and, as the complexity of the tasks being solved increases, on several or even multiple (MPP) processors. In general, the consumer does not have to purchase or learn new software options.

Providing transparency to the server architecture, in turn, makes it possible to hide hardware configuration changes from applications, i.e. guarantees portability of applications software systems. In particular, in tightly coupled multiprocessor architectures, the application can communicate with the server through a shared memory segment, while in loosely coupled multiserver systems (clusters), a message mechanism can be used for this purpose. The application should not take into account the implementation capabilities of the hardware architecture - the methods of data manipulation and the software interface for accessing the database must remain the same and equally effective.

High-quality support for multiprocessing requires the database server to be able to independently schedule the execution of many queries to be served, which would ensure the most complete division of available computing resources between server tasks. Requests can be processed sequentially by several tasks or divided into subtasks, which, in turn, can be executed in parallel (Fig. 3). The latter is more optimal because proper implementation of this mechanism provides benefits that are independent of request types and applications. The processing efficiency is greatly influenced by the level of granularity of the operations considered by the scheduler task. With coarse granularity, for example, at the level of individual SQL queries, the division of computer system resources (processors, memory, disks) will not be optimal - the task will be idle, waiting for the end of the I/O operations necessary to complete the SQL query, at least in the queue to There were other queries that required significant computational work. With finer granularity, resource division occurs even within one SQL query, which is even more clearly evident when parallel processing several requests. The use of a scheduler ensures that large system resources are used to solve the actual database maintenance tasks and minimizes downtime.

Architectural flexibility

Regardless of the degree of portability, support for standards, parallelism and other useful qualities, the performance of a DBMS, which has significant built-in architectural limitations, cannot be increased freely. Presence of documented or practical restrictions on the number and sizes of database objects and memory buffers, the number simultaneous connections, the depth of recursion in calling procedures and subqueries or firing database triggers is the same limitation on the applicability of the DBMS, such as, for example, the impossibility of porting to multiple computing platforms. The parameters that limit the complexity of database queries, especially the sizes of dynamic buffers and the stack size for recursive calls, should be dynamically configurable and not require stopping the system for reconfiguration. There is no point in purchasing a powerful new server if expectations cannot be met due to the internal limitations of the DBMS.

Typically, the bottleneck is the inability to dynamically adjust the characteristics of database server programs. Ability to determine on-the-fly parameters such as the amount of memory consumed, the number of busy processors, the number of parallel job threads (whether real threads, operating system processes or virtual processors) and the number of fragments of database tables and indexes, as well as their distribution By physical disks WITHOUT stopping and restarting the system is a requirement arising from the essence of modern applications. Ideally, each of these parameters could be changed dynamically within user-specific limits.

As the popularity of a web application grows, its support inevitably begins to require more and more resources. At first, the load can (and undoubtedly should) be dealt with by optimizing the algorithms and/or architecture of the application itself. However, what to do if everything that could be optimized has already been optimized, but the application still cannot cope with the load?

Optimization

The first thing you should do is sit down and think whether you have already managed to optimize everything:

Are database queries optimal (EXPLAIN analysis, use of indexes)?
is the data stored correctly (SQL vs NoSQL)?
is caching used?
Are there any unnecessary queries to the file system or database?
Are data processing algorithms optimal?
Are the environment settings optimal: Apache/Nginx, MySQL/PostgreSQL, PHP/Python?

A separate article could be written about each of these points, so a detailed consideration of them within the framework of this article is clearly redundant. It is only important to understand that before you start scaling an application, it is highly desirable to optimize its operation as much as possible - after all, perhaps then no scaling will be required.

Scaling

And so, let’s say that optimization has already been carried out, but the application still cannot cope with the load. In this case, the solution to the problem can obviously be to distribute it across several hosts in order to increase the overall performance of the application by increasing the available resources. This approach has the official name – “scaling” the application. More precisely, “scalability” refers to the ability of a system to increase its performance as the number of resources allocated to it increases. There are two scaling methods: vertical and horizontal. Vertical scaling implies an increase in application performance when adding resources (processor, memory, disk) within a single node (host). Horizontal scaling is typical for distributed applications and implies an increase in application performance when adding another node (host).

It is clear that the most in a simple way there will be a simple upgrade of hardware (processor, memory, disk) - that is, vertical scaling. In addition, this approach does not require any modifications to the application. However, vertical scaling very quickly reaches its limit, after which the developer and administrator have no choice but to move to horizontal scaling of the application.

Application architecture

Most web applications are a priori distributed, since at least three layers can be distinguished in their architecture: web server, business logic (application), data (database, static).

Each of these layers can be scaled. Therefore, if in your system the application and the database live on the same host, the first step, of course, should be to distribute them across different hosts.

Bottleneck

When starting to scale the system, the first step is to determine which layer is the “bottleneck” - that is, it works slower than the rest of the system. To begin with, you can use banal utilities like top (htop) to estimate processor/memory consumption and df, iostat to estimate disk consumption. However, it is advisable to allocate a separate host with combat load emulation (using or JMeter), on which it will be possible to profile the operation of the application using utilities such as xdebug, and so on. To identify narrow queries to the database, you can use utilities like pgFouine (it is clear that it is better to do this based on logs from the production server).

It usually depends on the architecture of the application, but the most likely candidates for a bottleneck in general are the database and code. If your application works with a large amount of user data, then the bottleneck will most likely be static storage.

Database scaling

As mentioned above, often the bottleneck in modern applications is a database. Problems with it are divided, as a rule, into two classes: performance and storage requirements large quantity data.

You can reduce the load on the database by spreading it across several hosts. At the same time, the problem of synchronization between them becomes acute, which can be solved by implementing a master/slave scheme with synchronous or asynchronous replication. In the case of PostgreSQL, you can implement synchronous replication using Slony-I, asynchronous replication using PgPool-II or WAL (9.0). The problem of separating read and write requests, as well as balancing the load between existing slaves, can be solved by setting up a special database access layer (PgPool-II).

The problem of storing large amounts of data when using relational DBMSs can be solved using the partitioning mechanism (“partitioning” in PostgreSQL), or by deploying databases on distributed file systems such as Hadoop DFS.

However, for storing large amounts of data the best solution There will be “sharding” of the data, which is a built-in benefit of most NoSQL databases (for example, MongoDB).

In addition, NoSQL databases generally work faster than their SQL brothers due to the absence of overhead for query parsing/optimization, checking the integrity of the data structure, etc. The topic of comparing relational and NoSQL databases is also quite extensive and deserves a separate article.

Separately worth noting Facebook experience, which is used by MySQL without JOIN selects. This strategy allows them to scale the database much more easily, while transferring the load from the database to the code, which, as will be described below, scales more easily than the database.

Code scaling

The difficulty in scaling code depends on how many shared resources the hosts need to run your application. Will it be sessions only, or will it require a shared cache and files? In any case, the first step is to run copies of the application on several hosts with the same environment.

Next, you need to configure load/request balancing between these hosts. This can be done as follows: TCP level(haproxy), and on HTTP (nginx) or DNS.

The next step is to make sure that static files, cache and web application sessions are available on each host. For sessions, you can use a server running over the network (for example, memcached). It is quite reasonable to use the same memcached as a cache server, but, of course, on a different host.

Static files can be mounted from some common file storage via NFS/CIFS or use a distributed file system (HDFS, GlusterFS, Ceph).

You can also store files in a database (for example, Mongo GridFS), thereby solving the problems of availability and scalability (taking into account the fact that for NoSQL databases the scalability problem is solved due to sharding).

Separately, it is worth noting the problem of deployment to multiple hosts. How to make it so that the user does not see when clicking “Update” different versions applications? The simplest solution, in my opinion, would be to exclude non-updated hosts from the load balancer (web server) config, and turn them on sequentially as they are updated. You can also link users to specific hosts by cookie or IP. If the update requires significant changes in the database, the easiest way is to temporarily close the project altogether.

FS scaling

If it is necessary to store a large amount of static data, two problems can be identified: lack of space and speed of data access. As already written above, the problem with lack of space can be solved in at least three ways: a distributed file system, storing data in a database with sharding support, and organizing sharding “manually” at the code level.

At the same time, it is worth understanding that the distribution of statics is also not the best simple task, when it comes to high loads. Therefore, it is quite reasonable to have many servers dedicated to distributing static files. Moreover, if we have a shared data storage (distributed file system or database), when saving a file, we can save its name without taking into account the host, and substitute the host name randomly when creating a page (I randomly balance the load between web servers distributing static). In the case where sharding is implemented manually (that is, the logic in the code is responsible for choosing the host to which the data will be uploaded), information about the upload host must either be calculated based on the file itself, or generated based on third data (information about the user, amount of space on storage disks) and saved together with the file name in the database.

Monitoring

It is clear that it is large and a complex system requires constant monitoring. The solution, in my opinion, is standard here - zabbix, which monitors the load/operation of system nodes and monit for daemons for backup.

Conclusion

The above briefly discusses many options for solving the problems of scaling a web application. Each of them has its own advantages and disadvantages. There is no recipe for how to do everything well and at once - for each task there are many solutions with their own pros and cons. Which one to choose is up to you.