Setting up a cluster of 1C Enterprise servers 8.3. Force termination of problematic processes

Server 8.3 is characterized by a newly redesigned internal code, although “from the outside” it may seem that it is a slightly modified 8.2.

The server has become more “auto-configurable”; some parameters, such as the number of worker processes, are no longer created manually, but are calculated based on the descriptions of the requirements of fault tolerance and reliability tasks.

A load balancing mechanism has been developed, which can be used either to increase the performance of the system as a whole, or to use a new “memory saving” mode, which allows you to work “with limited memory” in cases where the configuration used “likes to eat up memory.”

Stability of operation when using large amounts of memory will be determined by the new parameters of the production server.


The “safe memory consumption per call” parameter is especially interesting. For those who have little idea what it is, it’s better not to train on a “productive” basis. The "Maximum memory size of working processes" parameter allows, in case of "overflow", not to crash the entire working process, but only one session "with the loser". “The amount of memory for work processes up to which the server is considered productive” allows you to block new connections as soon as this memory threshold is exceeded.

I recommend isolating work processes by information base, for example, specifying the parameter “Number of information security per process = 1”. With several highly loaded databases, this will reduce mutual influence both in reliability and performance.

A separate contribution to the stability of the system is made by the “expenditure” of licenses/keys. In 8.3, it became possible to use a “software license manager”, reminiscent of the “aladin” manager. The goal is to be able to place the key on a separate machine.

It is implemented as another “service” in the cluster manager. You can use, for example, a “free” laptop. Add it to the 1C 8.3 cluster, create a separate manager on it with the “licensing service” service. You can insert a hardware hasp key into your laptop, or activate software licenses.

Of greatest interest to programmers should be the “Functionality Assignment Requirements”.

So on a laptop with a security key, in order not to launch users on the cluster server, you need to add “requirements” for the requirement object “Client connection to information security” - “Do not assign”, i.e. prevent worker processes on this server from processing client connections.

Even more interesting is the ability to run “background jobs only” on the cluster’s production server without user sessions. This way you can move highly loaded tasks (code) to a separate machine. Moreover, you can run one background task of “closing the month” using the “Value of an additional parameter” on one computer, and the background task “Updating the full-text index” on another. Clarification occurs through the indication “Value of an additional parameter”. For example, if you specify BackgroundJob.CommonModule as a value, you can limit the work of the worker server in the cluster to only background jobs with any content. The BackgroundJob.CommonModule..- value will indicate the specific code.

It is clear that there is no point in retelling the documentation. But if someone gives some useful advice, I’ll expand the article.

Please note that the cluster settings are responsible for the settings of all servers belonging to the configured cluster. A cluster involves the operation of several physical or virtual servers working with the same information databases.

Restart interval– is responsible for the frequency of restarting cluster worker processes. This parameter must be set when the server is running around the clock. It is recommended to associate the restart frequency with the technological cycle of the cluster infobases. Typically this is every 24 hours (86400 seconds). As you know, work processes of 1C servers process and store work data.

Automatic restart was designed into the platform "to minimize the negative effects of fragmentation and memory leaks in workflows." The ITS even has information on how to organize the restart of work processes based on other parameters (memory size, occupied resources, etc.).

Allowed memory size– protects 1C servers from memory overuse. If the process exceeds this volume in range of exceeding the permissible volume, the process is restarted. Can be calculated as the maximum memory size occupied by "rphost" processes during periods of peak server load. It is also worth setting a small interval for exceeding the permissible volume.

Permissible deviation of the number of server errors. The platform calculates the average number of server errors in relation to the number of calls to the server within 5 minutes. If this ratio exceeds the permissible value, then the workflow is considered "problematic" and can be terminated by the system if the flag is set “Forcibly terminate problematic processes.”

Stop disabled processes after. When the permissible amount of memory is exceeded, the worker process does not terminate immediately, but becomes “turned off” so that there is time to “transfer” the working data without loss to the new running worker process. If this parameter is specified, then the “turned off” process will in any case terminate after this time has elapsed. If you observe “frozen” work processes in the operation of the 1C server, then you can set this parameter to 2-5 minutes.
These settings are set for each 1C server individually.

Maximum Workflow Memory– this is the volume total memory that can be occupied by worker processes (rphost) on the current cluster. If the parameter is set to “0”, it takes up 80% of the server’s RAM. “-1” - no restrictions. When a DBMS and a 1C server are running on the same server, they need to share RAM. If during operation it turns out that the DBMS server does not have enough memory, then you can limit the memory allocated to the 1C server using this parameter. If the DBMS and 1C are separated by servers, then it makes sense to calculate this parameter using the formula:

“Max volume” = “Total RAM” – “OS RAM”;

“OS RAM” is calculated on the principle of 1 GB for every 16 GB of server RAM

Safe memory consumption per call. In general, individual calls should not take up all the RAM allocated to a worker process. If the parameter is set to “0”, then the safe flow rate will be equal to 5% of “ Maximum memory capacity for work processes". “-1” - without limitation, which is highly not recommended. In most cases, it is better to leave this parameter at “0”.

Using parameters “Number of information security per process” and “Number of connections per process” you can control the distribution of 1C server work among work processes. For example, run a separate “rphost” for each information base, so that in case of process crashes, only users of one database are disconnected. These parameters should be selected individually for each server configuration.

Limitation on the use of RAM by the DBMS server– The MS SQL DBMS server has one remarkable feature - it likes to load databases with which it is actively working completely into RAM. If you don't limit it, it will take all the RAM it can.

  • If the 1C:Enterprise server is installed together with Microsoft SQL Server, then the upper memory threshold must be reduced by an amount sufficient for the operation of the 1C server.
  • If only the DBMS is running on the server, then for the DBMS according to the formula:

“DBMS memory” = “General RAM” – “OS RAM”;

Shared memory– a lot is known about this parameter, but it still happens that people forget about it. We set it to “1” if the 1C server and the DBMS run on the same physical or virtual server. By the way, it works starting from platform 8.2.17.

Max degree of parallelism– determines how many processors are used when executing one request. The DBMS parallelizes data retrieval when executing complex queries on multiple threads. For 1C it is recommended to set it to “1”, that is, in one thread.

Auto-extension of database files- we determine the step in MB with which the database file is “expanded”. If the step is small, then with active growth of the database, frequent expansions will lead to additional load on the disk system. It is better to set it to 500 – 1000 MB.

Reindexing and defragmenting indexes– it is recommended to defragment/reindex at least once a week. Reindexing locks tables, so it is best to run during non-working hours or periods of minimal load. There is no point in doing defragmentation after rebuilding the index (reindexing). According to Microsoft recommendations, defragmentation is done if the index fragmentation does not exceed 30%. If higher, it is recommended to reindex.

Power plan– set the power settings of the operating system to high performance.

Several worker processes on one server make it possible to effectively use the amount of RAM and processor resources to execute requests, as well as connect a client session to another worker process if the current one “crash”.
The Server Agent (ragent) program is responsible for understanding what is running on a specific server. Stopping the server agent will make the server unavailable for use by the cluster. The agent stores its information in the file srvribrg.lst.

Information about work databases and involved work processes is owned by the “Server Manager” (rmngr). It stores this information in the file 1CV8Reg.lst. Stopping the server manager can lead to a restart of client applications if the manager restarts successfully or to a complete stop of the working servers of the entire cluster.

1C: The enterprise allows the possibility of creating several independent clusters on one server. Each of them is identified on the network by a unique “IP port” and a unique number in service files. The first cluster receives port 1541 by default.

The Enterprise Servers snap-in is designed to manage the cluster.
You can connect to servers by server name or IP address.

Server agent

The server agent “knows” about all the clusters that are running on the server. This information is stored in the file srvribrg.lst with a list of clusters and list administrators. The main port of the agent is 1540. On each Working server, only one agent can be launched, servicing all possible clusters on this server.

Let's take a closer look at the cluster properties

Restart interval

This parameter restarts 1C server work processes according to the specified value in seconds. Typically, the parameter is used on application servers that have a 32-bit system, since the memory capacity there is limited to ~ 3.7 GB if the operating system is 64-bit and the application server is 32-bit. If the OS uses a 32-bit architecture, then the total memory consumption of the working process is ~ 1.7 GB. And users may often receive an error message like “Insufficient memory on the 1C Enterprise server.” The easiest way to avoid this error is to restart the work processes, for example 86400 seconds (1 day). When changing the parameter, the time count starts from the start of the 1C application server service.

Allowed memory size

Restarting worker processes when a certain threshold of memory occupied by the worker process in kilobytes is reached.

Interval for exceeding the permissible amount of memory

This means that if within a specified number of seconds the memory specified in the “allowable memory amount” parameter is exceeded, then the 1C server will decide to restart the workflow.

Permissible deviation of the number of server errors

It is calculated as follows. We have server calls that can be seen in the technology log by the “CALL” event, and there are also various exception situations that can be seen in the technology log by the “EXCP” event. The platform calculates the ratio of these events. It is assumed that these events should be approximately the same. If in any work process this ratio exceeds the ratio of these events in other work processes by some significant amount, then such a work process is considered problematic. Just this value is set in this parameter. The recommended value is 50.

Force termination of problematic processes

If we enable this parameter, then according to the “permissible deviation in the number of server errors” parameter, problematic processes will be terminated. If the parameter is disabled, the platform displays the process log event “ATTN”, which indicates the problematic process.

Stop disabled processes after

If one of the “restart interval” or “allowable memory size” parameters is triggered, then when the working process is restarted, it may “fall off”. If the client does not access the server during the restart (is inactive), then the next time it accesses it, it will smoothly switch to the new worker process. If the client contacts the server at the time of restarting the workflow, then in this case it will receive an error message and terminate its work. To prevent this from happening, you must set the value of this parameter in seconds. Usually 120 seconds is enough. During this time, the workflow will have time to process current customer requests and transfer them to a new workflow. Those active clients that the process did not have time to process are terminated and the clients may receive an error.

Fault tolerance level

This setting lives on its own, regardless of the number of central servers. The fault tolerance level can take any value. For example, resiliency level = 1, then each user session is doubled. If the fault tolerance level = 2, then each session is multiplied by 3. The load on the server also increases. When changing the fault tolerance level, if we have a central server, it replicates to each central server: “cluster registry”, “cluster locking service”. There is also replication to other servers of such services as “session data service”, “operational time stamp service”, “object blocking service”, “licensing service”, “numbering service”. Among them, the heaviest one is the “session data service”.

Load sharing mode

In terms of performance. When a client connection connects, it will connect to whichever server has a worker process with more available performance. The available performance is set in the workflow properties:


The available performance at the 1C level is calculated as follows: a reference server call is made to all work processes once every 10 minutes and the time of this call is measured. The resulting number is divided by 10,000 (ten thousand) and the application server mechanisms calculate the reference time. In the event that the productivity of a work process has become 25% less than that of the others, connections from this work process begin to go to other work processes until all connections are gone.

Memory priority. User connections will be made to a production server that has more available memory.

Cluster Manager

The cluster manager is responsible for the operation of the cluster. Each cluster has its own Manager. The manager stores information about the cluster in the file 1CV8Reg.lst (cluster registry). Each Cluster Manager also has its own port on the Work Server. For the first cluster, the default Manager port is 1541. It is this port that is displayed in the 1C Servers: Enterprise snap-in in the Clusters branch, identifying the cluster.
The manager receives requests from the client part of 1C: Enterprise and decides which Workflow to give this service request to.

The Manager uses the service port to interact with worker processes.

The working process

The Work Process is responsible for “working with clients.” There can be several worker processes in the 1C: Enterprise 8 cluster. The number of work processes is not created manually, but is calculated based on descriptions of task requirements for fault tolerance and reliability. The server manager decides which worker process will serve the client connection. For client connections, Worker Processes are by default allocated a range of IP ports 1560 – 1591. In addition, each Worker Process is assigned a Service port for communication with the cluster manager.

The working server settings, according to 1C documentation, can only be changed in the CORP version of the 1C application server. In fact, the settings work in both the CORP and PROF versions. If these settings are used in the PROF version, this will be a violation of the license agreement.

Maximum Workflow Memory

This parameter in itself does not limit anything. It works in conjunction with the “safe memory consumption per call” parameter. Let’s imagine that all our work processes in total have reached approximately the memory consumption of the specified value of this parameter. And now a certain user wants to make a certain server call that wants to consume a large amount of memory. As soon as the server call exceeds the amount of memory specified in this parameter by the amount of memory in the “safe memory consumption for one call” parameter, this particular user will receive an error of the form: “safe memory consumption for one client-server call has been exceeded.” This is necessary so that one user cannot overwhelm the working server. The value of parameter 0 is equal to 80% of the memory installed on the 1C server.

Safe memory consumption per call

A value of 0 (default) is 5% of the Maximum Workflow Memory value. The value may be -1. This means that any client-server call that exceeds the specified value of the “maximum worker memory size” parameter.

The amount of work process memory up to which the server is considered productive

Means, if set to a value and worker processes have taken up the amount of memory specified in this parameter, the server will continue to run, but will not accept new connections until the memory is freed.

Number of information security per process

There may be a decrease in performance when there are many infobases and one workflow. Therefore, with this parameter it is possible to reduce the number of databases per process. If you set the value to 1 (in most cases this works quite optimally), then a new worker process (rphost) will be created for each infobase.

Number of connections per process

Same as the parameter above, but depends on the number of connections per process. A value of 0 will mean that there will be only one worker process on each worker server.

Manager for each service

Each central worker server has a main cluster manager with certain services:


They are executed by one service “rmngr”. Let's imagine that this service begins to consume a lot of memory or waste CPU resources. There are usually a few typical suspects. But suddenly you are at a “dead end” and cannot understand what exactly is loading the service, you can check the “manager for each service” checkbox, the service will be divided into 21 processes (this is the number of services in the main cluster manager). And accordingly, using the PID of the process, it will be possible to calculate which service is loading the system.

Central server

This is the server that stores the cluster registry in the 1CV8Clst.lst file. The file stores a list of databases, a list of cluster administrators, a list of functionality assignment requirements, a list of security profiles, and in general all cluster settings. This file is present only where the “central server” checkbox is checked. There can be several central servers. Also on the central servers there are such services as “cluster blocking service”, “cluster configuration service”. As long as at least one central server is operational, the cluster is functioning. Once the most recent central server fails, the cluster becomes unusable regardless of fault tolerance settings.

Functionality assignment requirement

The 1C Enterprise 8.3 server cluster provides a certain set of functionality (called requirement objects), the distribution of which between working servers within the cluster can be controlled. For example, you can specify that all background jobs in the cluster will run on a selected worker server. In order to place a connection or cluster service on any production server, you need to create a functionality assignment requirement for the selected production server. This requirement determines the ability or impossibility of a particular server to perform a particular job. Let's take a closer look at what a functionality assignment requirement is.

Migrating User Connections

Let's say we want user connections to work on worker server #1, but if that server goes down, we want them to fail over to another worker server #2

To do this, we need to create a functionality assignment requirement on server No. 1:


On server No. 2, set the same settings, but change the priority:


The importance of priority is implemented in reverse. That is, priority 1 is higher than priority 2.

Remove the production server from the cluster

We can simply remove the working server from the cluster by deleting it from the list, but in this case all users will be “kicked out” from the system. To make the withdrawal more painless, you can do the following:

Create a functionality assignment requirement with the following settings:


This setting means that there will be no new connections to this production server. Those users who were working will continue to work, but will gradually move to other working servers.

Licensing service

Move the licensing service to a separate server. This is good because software licenses can be tied to a specific computer. Let's create a functionality assignment requirement with the following settings:


Background jobs

With the release of platform 8.3.7, background jobs were divided into 2 groups:

1. Background jobs called from configuration code

2. Routine tasks

Therefore, several settings for assigning functionality are required:



1. To make background jobs run quickly, you need to add session data for background and scheduled jobs



After creating the necessary requirements for assigning functionality, you need to apply them:


Partial – application that will not disrupt the user experience

Full – an application that may disrupt the user experience.

In practice, I have never encountered a situation where, when fully applied, it disrupted the user experience or anything like that. But anything is possible, keep in mind. After application, restarting the 1C application server service is not necessary.

You can always contact 1C optimization specialists; our practical experience will save your time.

First of all, after installing the 1C cluster, it was necessary to create workflows. As it turned out, cluster processes began to be created automatically depending on the load of the database.

A test run of background jobs of the main database caused the 1C cluster to endlessly overload rphost.exe and the additional rphost.exe did not want to be created. After digging through the settings, everything became clear.

Maximum Workflow Memory is the amount of memory that worker processes can use together. You need to be very careful when setting the parameter, measured in bytes. If you set the wrong value (insufficient for normal user operation), users will receive the error “Not enough free memory on the 1C server.” You can also get this error when the memory quota on the 1C server has run out.

Safe memory consumption per call– allows you to control memory consumption during a server call, measured in bytes. If a call uses more memory than expected, this call will be completed within the 1C cluster without restarting the worker process (rphost.exe). Accordingly, the “loser” who made the server call will lose his session with the 1C database without affecting the work of other users.

in one GB – 1073741824 Bytes, therefore in 2 GB – 2147483648 Bytes

The amount of memory for work processes up to which the server is considered productive - if this parameter is exceeded, the server in the 1C cluster will stop accepting new connections.

Number of information security per process– allows you to isolate information bases for work processes. By default, the current 1C cluster was set to “ 8 “, but over the course of several hours of operation, the server behaved very unstable, user sessions froze. After isolating each infobase (value – “1”) the problems disappeared.

Number of connections per process– default value “ 128 “. Since the current database has a very large load of background tasks (logistics calculations, price list analysis, competitor analysis, etc.), it was decided to reduce the number to “25”.

The settings of the 1C cluster itself have changed slightly:

Fault tolerance level– this is the number of working servers that can simultaneously fail, and this will not lead to abnormal termination of users. Backup services are launched automatically in the amount necessary to ensure the specified fault tolerance. In real time, the active service is replicated to the backup ones.

Load sharing mode– there are two options for the parameter: “Priority by performance” – more server memory is spent and performance is higher, “Priority by memory” – the 1C cluster saves server memory.

Server 8.3 is characterized by a newly redesigned internal code, although “from the outside” it may seem that it is a slightly modified 8.2.

The server has become more “auto-configurable”; some parameters, such as the number of worker processes, are no longer created manually, but are calculated based on the descriptions of the requirements of fault tolerance and reliability tasks.

This reduces the likelihood of server misconfiguration and lowers the qualification requirements for administrators.

A load balancing mechanism has been developed, which can be used either to increase the performance of the system as a whole, or to use a new “memory saving” mode, which allows you to work “with limited memory” in cases where the configuration used “likes to eat up memory.”

Stability of operation when using large amounts of memory will be determined by the new parameters of the production server.

The “safe memory consumption per call” parameter is especially interesting. For those who have little idea what it is, it’s better not to train on a “productive” basis. The “Maximum memory size of working processes” parameter allows, in case of “overflow”, not to crash the entire working process, but only one session “with the loser”. “The amount of working process memory up to which the server is considered productive” allows you to block new connections as soon as this memory threshold is exceeded.

I recommend isolating work processes by information base, for example, specifying the parameter “Number of information security per process = 1”. With several highly loaded databases, this will reduce mutual influence both in reliability and performance.

A separate contribution to the stability of the system is made by the “expenditure” of licenses/keys. In 8.3, it became possible to use a “software license manager”, reminiscent of the “aladin” manager. The goal is to be able to place the key on a separate machine.

It is implemented as another “service” in the cluster manager. You can use, for example, a “free” laptop. Add it to the 1C 8.3 cluster, create a separate manager on it with the “licensing service” service. You can insert a hardware hasp key into your laptop, or activate software licenses.

Of greatest interest to programmers should be the “Functionality Assignment Requirements”.

Requirements for the assigned functionality of 1c

So, on a laptop with a security key, in order not to launch users on the cluster server, you need to add “requirements” for the requirement object “Client connection to information security” – “Do not assign”, i.e. prevent worker processes on this server from processing client connections.

Even more interesting is the ability to run “background jobs only” on the cluster’s production server without user sessions. This way you can move highly loaded tasks (code) to a separate machine. Moreover, you can run one background task of “closing the month” using the “Value of an additional parameter” on one computer, and the background task “Updating the full-text index” on another. Clarification occurs through the indication “Value of an additional parameter”. For example, if you specify BackgroundJob.CommonModule as a value, you can limit the work of the worker server in the cluster to only background jobs with any content. BackgroundJob.CommonModule value.<Имя модуля>.<Имя метода>– will indicate a specific code.

Often, other services run on the machine along with the 1C:Enterprise server - a terminal server, SQL server, etc. And at some point the 1C:Enterprise server, or rather the rphost worker process, eats up more memory than planned or all the memory. Which leads to a slowdown of other services and zombies of the server. To avoid such situations, you need to configure automatic restart of 1C:Enterprise server workflows

Solution

1. Open the administration console of 1C Enterprise servers;
2. Expand the central server tree to clusters and select the cluster that interests us. In the example there is only one cluster;
3. Open the properties of the selected cluster and see the following form

Properties of the 1C:Enterprise 8.3 server cluster

Let's look at the example shown in the image:

Restart interval— time after which the rphost process will be forced to restart. Before the process terminates, a new rphost process is launched, to which all connections are transferred, and only then will the old process terminate. This will not affect the user's experience in any way. The interval is indicated in seconds, in the example 24 hours are indicated.

Allowed memory size— the amount of memory within which the workflow can operate without problems. The volume is indicated in kilobytes, in the example the value is 20 gigabytes (in fact, the figure is too large and you need to start from the specific system, but the average figure is 4 GB). As soon as the memory occupied by the working process exceeds the specified value, the countdown begins.

Interval for exceeding the permissible amount of memory— after the timer launched after exceeding the permissible amount of memory counts down the specified time, a new worker process will be launched, to which all connections are transferred, the old process is marked as disabled. The interval is specified in seconds, in the example 30 seconds are indicated.

Stop disabled processes after— the time after which the workflow marked as disabled will be stopped; if the value is 0, the process will not be completed. The interval is specified in seconds, in the example 60 seconds are indicated.

After applying the settings, you do not have to restart the server service; they are applied dynamically.

Total

This is how we set up automatic restart of 1C:Enterprise server work processes and get a more stable system; if a memory leak occurs, the work of a specific session will be terminated.

Also, in some situations, you can play with the settings and prevent a possible server crash if you make mistakes.