Dynamic programming optimal control examples. Was it written at the top about some kind of two-dimensional dynamics? Applied problems of dynamic programming

Hello, Habrakhabr. IN currently I'm working on teaching aid on Olympiad programming, one of the paragraphs of which is devoted to dynamic programming. Below is an excerpt from this paragraph. Trying to explain this topic as simply as possible, I tried difficult moments accompany with illustrations. I'm interested in your opinion on how clear this material is. I would also be glad to receive advice on what other tasks should be included in this section.

In many olympiad problems in programming, solving using recursion or brute force requires very large number operations. An attempt to solve such problems, for example, by exhaustive search, leads to exceeding the execution time.

However, among search and some other problems, we can distinguish a class of problems that have one good property: having solutions to some subproblems (for example, for a smaller number n), you can find a solution to the original problem almost without searching.

Such problems are solved using the method dynamic programming, and by dynamic programming itself we mean the reduction of a task to subtasks.

Sequences

The classic sequence problem is the following.

Fibonacci sequence F n is given by the formulas: F 1 = 1, F 2 = 1,
Fn = F n - 1 + F n- 2 at n> 1. Need to find F n by number n.

One solution that may seem logical and efficient is to solve using recursion:

Int F(int n) ( if (n< 2) return 1; else return F(n - 1) + F(n - 2); }
Using such a function, we will solve the problem “from the end” - we will reduce step by step n until we reach known values.

But as you can see, such a seemingly simple program already at n= 40 works noticeably for a long time. This is due to the fact that the same intermediate data is calculated several times - the number of operations grows at the same speed as the Fibonacci numbers grow - exponentially.

One way out of this situation is to save already found intermediate results for the purpose of their reuse:

Int F(int n) ( if (A[n] != -1) return A[n]; if (n< 2) return 1; else { A[n] = F(n - 1) + F(n - 2); return A[n]; } }
The given solution is correct and effective. But for this problem a simpler solution is also applicable:

F = 1; F = 1; for (i = 2; i< n; i++) F[i] = F + F;
This solution can be called a “from the beginning” solution - we first fill in the known values, then we find the first unknown value ( F 3), then the next one, etc., until we reach the desired one.

This is exactly the solution that is classic for dynamic programming: we first solved all the subproblems (we found all F i For i < n), then, knowing the solutions to the subproblems, we found the answer ( F n = F n - 1 + F n - 2 , F n- 1 and F n- 2 have already been found).

One-Dimensional Dynamic Programming

To better understand the essence of dynamic programming, we first more formally define the concepts of task and subtask.

Let the initial problem be to find a certain number T with initial data n 1 , n 2 , ..., n k. That is, we can talk about the function T(n 1 , n 2 , ..., n k), the value of which is the answer we need. Then we will consider the tasks as subtasks
T(i 1 , i 2 , ..., i k) at i 1 < n 1 , i 2 < n 2 , ..., i k < n k .

The following one-dimensional dynamic programming problem occurs in various variations.

At n < 32 полный перебор потребует нескольких секунд, а при n= 64 exhaustive search is not feasible in principle. To solve the problem using the dynamic programming method, we reduce original problem to subtasks.

At n = 1, n= 2 the answer is obvious. Let's assume that we have already found K n - 1 , K n- 2 — the number of such sequences of length n- 1 and n - 2.

Let's see what the length of the sequence can be n. If its last character is 0, then the first n- 1 — any correct sequence of length
n- 1 (it doesn’t matter whether it ends with zero or one - 0 comes next). There are only such sequences K n- 1 . If the last character is equal to 1, then the penultimate character must be equal to 0 (otherwise there will be two ones in a row), and the first
n- 2 characters - any valid sequence of length n- 2, the number of such sequences is equal K n - 2 .

Thus, K 1 = 2, K 2 = 3, K n = K n - 1 + K n- 2 at n> 2. That is this task actually comes down to finding the Fibonacci numbers.

2D Dynamic Programming

A classic problem in two-dimensional dynamic programming is the problem of routes on a rectangular field.
In different formulations, you need to count the number of routes or find a route that is the best in some sense.

Here are a couple of formulations of such tasks:

Task 2. n*m cells. You can take one-cell steps to the right or down. Count how many ways you can get from the top left cell to the bottom right cell.

Task 3. Given a rectangular field of size n*m cells. You can take one-cell steps to the right, down, or diagonally right and down. Each cell contains some natural number. You need to get from the top left cell to the bottom right cell. The weight of the route is calculated as the sum of numbers from all visited cells. It is necessary to find a route with minimal weight.

A characteristic feature of all such problems is that each individual route cannot go through the same cell two or more times.

Let's consider task 2 in more detail. To some cell with coordinates ( i,j) can only come from above or from the left, that is, from cells with coordinates ( i - 1, j) And ( i, j - 1):

Thus, for a cell ( i, j) the number of routes A[i][j] will be equal to
A[j] + A[i], that is, the problem is reduced to two subtasks. This implementation uses two parameters − i And j- therefore, in relation to this problem we are talking about two-dimensional dynamic programming.

Now we can go sequentially through the rows (or columns) of array A, finding the number of routes for the current cell using the above formula. First you need to put the number 1 in A.

In problem 3, in a cell with coordinates ( i, j) we can get from cells with coordinates
(i- 1, j), ( i, j- 1) and ( i - 1, j- 1). Let's assume that for each of these three cells we have already found the route of the minimum weight, and placed the weights themselves in W[j], W[i],
W. To find the minimum weight for ( i, j), you need to select the minimum of the weights W[j], W[i], W and add to it the number written in the current cell:

W[i][j] = min(W[j], W[i], W) + A[i][j];

This task is complicated by the fact that it is necessary to find not only the minimum weight, but also the route itself. Therefore, in another array we will additionally record for each cell from which side we need to enter it.

The following figure shows an example of the initial data and one of the steps of the algorithm.

Exactly one arrow leads to each of the already passed cells. This arrow shows from which side you need to come to this cell in order to get the minimum weight recorded in the cell.

After passing through the entire array, you will need to trace the route itself from the last cell, following the arrows in the opposite direction.

Subsequence problems

Consider the increasing subsequence problem.

Task 4. Given a sequence of integers. It is necessary to find its longest strictly increasing subsequence.

Let's start solving the problem from the beginning - we will look for the answer, starting from the first terms of this sequence. For each number i we will look for the largest increasing subsequence ending with an element in position i. Let the original sequence be stored in array A. In array L we will record the lengths of the maximum subsequences ending with the current element. Let us find all L[i] for 1<= i <= k- 1. Now you can find L[k] as follows. We look through all elements of A[i] for 1<= i < k- 1. If
A[i]< A[k], то k The th element can become a continuation of the subsequence ending with the element A[i]. The length of the resulting subsequence will be 1 greater than L[i]. To find L[k], you need to go through all i from 1 to k - 1:
L[k] = max(L[i]) + 1, where the maximum is taken over all i such that A[i]< A[k] и
1 <= i < k.

Here, the maximum of the empty set will be considered equal to 0. In this case, the current element will become the only one in the selected sequence, and will not be a continuation of one of the previous ones. After filling the array L, the length of the largest increasing subsequence will be equal to the maximum element of L.

To restore the subsequence itself, you can also store the number of the previous selected element for each element, for example, in an array N.

Let's consider the solution to this problem using the example of the sequence 2, 8, 5, 9, 12, 6. Since there is no element before 2, the maximal subsequence contains only one element - L = 1, and there is none before it - N = 0. Further,
2 < 8, поэтому 8 может стать продолжением последовательности с предыдущим элементом. Тогда L = 2, N = 1.

The only element smaller than A = 5 is A = 2, so 5 can become a continuation of only one subsequence - the one that contains 2. Then
L = L + 1 = 2, N = 1, since 2 is in position number 1. Similarly, we perform three more steps of the algorithm and get the final result.

Now we select the maximum element in the array L and from the array N we reconstruct the subsequence itself 2, 5, 9, 12.

Another classic dynamic programming problem is the palindrome problem.

Task 5. Given a string of capital letters of the Latin alphabet. You need to find the length of the largest palindrome that can be obtained by crossing out some letters from a given string.

Let us denote this string by S and its symbols by S[i], 1<= i <= n. We will consider possible substrings of this string with i th j th symbol, we denote them by S(i, j). We will write the lengths of the maximum palindromes for substrings in a square array L: L[i][j] is the length of the maximum palindrome that can be obtained from the substring S(i, j).

Let's start solving the problem with the simplest substrings. For a single character string (that is, a substring of the form S(i, i)) the answer is obvious - there is no need to cross out anything, such a string will be a palindrome. For a two character string S(i, i+ 1) two options are possible: if the characters are equal, then we have a palindrome, there is no need to cross out anything. If the characters are not equal, then cross out any.

Let us now be given a substring S(i, j). If the first (S[i]) and last (S[j]) characters of the substring do not match, then one of them must be deleted. Then we are left with the substring S(i, j- 1) or S(i + 1, j) - that is, we reduce the problem to a subtask: L[i][j] = max(L[i], L[j]). If the first and last characters are equal, then we can leave both, but we need to know the solution to the problem S(i + 1, j - 1):
L[i][j] = L + 2.

Let's look at the solution using the string ABACCBA as an example. First of all, we fill the diagonal of the array with units, they will correspond to the substrings S(i, i) from one character. Then we start looking at substrings of length two. In all substrings except S(4, 5), the symbols are different, so we write 1 in the corresponding cells, and 2 in L.

It turns out that we will fill the array diagonally, starting with the main diagonal leading from the upper left corner to the lower right. For substrings of length 3, the following values ​​are obtained: in an ABA substring, the first and last letters are equal, so
L = L + 2. In the remaining substrings, the first and last letters are different.

BAC: L = max(L, L) = 1.
ACC: L = max(L, L) = 2.
CCB: L = max(L, L) = 2.
CBA: L = max(L, L) = 1.

If in the task it is necessary to output not the length, but the palindrome itself, then in addition to the array of lengths, we must construct an array of transitions - for each cell, remember which of the cases was implemented (for clarity, in the figure, instead of the numeric values ​​encoding transitions, the corresponding arrows are drawn) .

The Dynamic Programming section is represented by the following calculators:

  1. Investment Allocation Problem. For the reconstruction and modernization of production at four enterprises, funds were allocated C = 80 den. units For each enterprise, the possible increase f i (x) (i = 1, 4) in output is known, depending on the allocated amount.

In dynamic programming problems, the economic process depends on time (or on several periods of time), so a number of optimal solutions are found (sequentially for each stage) that ensure optimal development of the entire process as a whole. Dynamic programming is a mathematical apparatus that allows for optimal planning of controlled processes and time-dependent processes. Carrying out optimization step by step is called a multi-step decision-making process. An economic process is called controlled if it is possible to influence the course of its development.

The dynamic programming (DP) method is based on the principle of sequential optimization: the solution to the original high-dimensional optimization problem is replaced by the solution to a sequence of small-dimensional optimization problems. The main condition for the applicability of the DP method is the possibility of dividing the decision-making process into a number of similar steps or stages, each of which is planned separately, but taking into account the results obtained at other steps. For example, the activity of an industry over a number of business years, or a sequence of tests used in the control of equipment, etc. Some processes (operations) are divided into steps naturally, but there are operations that have to be divided into stages artificially, for example, the guidance process missiles on target.
This principle ensures that the control chosen at any step is not locally the best, but the best from the point of view of the process as a whole, since this control is chosen taking into account the consequences in the upcoming steps.

Let's consider general description of the dynamic programming problem.
Let the multi-step decision-making process be broken down into n steps. Let us denote by ε 0 the initial state of the system, by ε 1, ε 2, … ε n– state of the system after the first, second, n-th step. In general, the state ε k– vector (ε k 1 , …, ε k s).
Management in a multi-step process, a set of solutions (control variables) is called u k = (u k 1 , ..., u k r) taken at every step k and transferring the system from the state ε k-1 = (ε k- 1 1 , …, ε k -1 s) to state ε k = (ε k 1 , …, ε k s).
In economic processes, management consists of the distribution and redistribution of funds at each stage. For example, the production of products by any enterprise is a controlled process, since it is determined by changes in the composition of equipment, the volume of supplies of raw materials, the amount of financing, etc. A set of decisions made at the beginning of the year, the planning period, to provide the enterprise with raw materials, replacement of equipment, and the amount of financing etc. is management. It would seem that in order to obtain the maximum volume of output, the easiest way is to invest the maximum possible amount of funds and use the equipment at full capacity. But this would lead to rapid wear of equipment and, as a consequence, to a decrease in production output. Therefore, product release must be planned in such a way as to avoid undesirable effects. It is necessary to take measures to ensure that equipment is replenished as it wears out, i.e. over periods of time. The latter, although it leads to a decrease in the initial volume of output, provides the possibility of expanding production in the future. Thus, the economic process of production can be considered to consist of several stages (steps), each of which influences its development.
The beginning of a stage (step) of a controlled process is considered to be the moment when a decision is made (on the amount of capital investments, on the replacement of equipment of a certain type, etc.). A stage is usually understood as a business year.
Usually on control at every step u k some restrictions apply. Controls that satisfy these restrictions are called acceptable.
Assuming that the performance indicator k the th step of the process depends on the initial state at this step k-1 and from control at this step u k, we obtain the objective function of the entire multi-step process in the form:
.

Let us now formulate the dynamic programming problem: “Determine the set of admissible controls ( u 1 , …, u n), transferring the system from the initial state ε 0 to the final state ε n and maximizing or minimizing the efficiency indicator F».
Control that achieves the maximum (minimum) of the function F called optimal control u * = (u 1* ,…, u n *).
If control variables u k take discrete values, then the DP model is called discrete. If the variables u k change continuously, then the DP model is called continuous.
Depending on the number of state parameters s and number of control variables r differentiate one-dimensional And multidimensional DP tasks.
The number of steps in a task can be final or endless.

Applied problems of dynamic programming

  1. problem of planning the construction of facilities.

Dynamic programming.

When modeling network structures, in addition to problems related to the existence of flows in transport, electrical, telephone, computer and other types of networks, a whole class of problems arises that can be reduced to the shortest path problem. For example, the problem of the shortest path is solved every time by a router program when a site is found by its name on the Internet.

The shortest path problem in a directed network is a typical dynamic programming problem, therefore, although dynamic programming, like network planning, is associated with the development of processes over time, the modeling of which is discussed in more detail in the next section, we will consider in this section the method of dynamic programming on example of finding the shortest path.

The concept of dynamic programming is closely related to multi-step processes decision making. A multi-step decision-making process can be defined as the process of making sequential decisions aimed at achieving a given goal. Multi-step decision-making processes are constantly encountered in a variety of situations, from repairing a car in a car repair shop to controlling a spacecraft.

Dynamic programming can be loosely defined as a set of mathematical procedures used in the analysis of multi-step decision-making processes. Each multi-step decision process is an evolution of the following problem: finding the shortest path in a directed, acyclic network.

Dynamic programming can be considered as a unified theory due to a single set of ideas and techniques that are used in the mathematical analysis of various problems. These ideas and techniques are the essence of dynamic programming. Bellman was one of the first to understand the essence of the optimality principle and began to apply it to many optimization problems arising in mathematics, engineering, operations research and other areas.

Thus, the concept of dynamic programming is associated with a multi-step decision-making process to achieve a specific goal. For example, transferring an aircraft from one orbit to another is a typical dynamic programming problem, provided that the orbit correction is carried out by applying an impulse at discrete times and the goal is to save fuel.

Characterizing dynamic programming as a set of mathematical procedures for optimal control of a discrete system, in general, the optimal control problem can be formulated as follows. At discrete moments in time t= 1, 2,..., N the system is in one of the sets s i of states characterized by the state vector x (t) . The transition between successive states is carried out using the control vector u (t) according to the law:

x (t) = g (t) (x (t) , u (t)) ; t= 1, 2,..., N

Controls u (t) are selected from the set of admissible controls and form a sequence of admissible controls u (0) ,u (1) ,…,u (N) . The sequence of admissible controls for a given initial state x (0) determines the trajectory of the system x (0), x (1), x (2),…, x (N).

Each trajectory corresponds to its own value of the optimality criterion F, or the target control function, which is composed of individual contributions at each stage of control:

The optimal control problem is to find among a set of control sequences one that achieves the minimum value of F. Such a sequence is called an optimal control sequence and determines the optimal trajectory.

Dynamic programming is based on the Bellman optimality principle, which can be formulated as follows. An optimal strategy has the property that whatever the initial state and the decision at the initial moment, subsequent decisions must formulate an optimal strategy with respect to the state arising after the initial decision.

The meaning of the principle of optimality becomes clearer if we consider that for an optimal trajectory, each section between the end point and any intermediate point is also an optimal trajectory. The optimality principle, or otherwise the dynamic programming method, allows you to find the optimal multi-step strategy by solving a set of simpler one-step optimization problems.

The dynamic programming method is well illustrated by the example of finding the shortest path between the extreme nodes of an oriented network. Consider some directed network with 12 nodes, which needs to be traversed from the start node (1) to the end node (12) in four steps, moving from node to node with each step.

Rice. 6.4.1. Traversing a directed network along the shortest path.

The numbers indicated at the arcs ( i,j) are equal to the lengths of the arcs l ij between nodes i And j(in conventional units). Possible states of the system s i in this case are associated with being in i th node, control u(t) is associated with the choice of path direction at each control step. Four control steps u (1),...,u (4) sequentially transfer the system from the initial state s 1 to the final state s 12 and, thus, form a certain trajectory that needs to be found. The optimality criterion F in this case is the length of the trajectory L, consisting of the lengths of individual arcs:

If the search for the shortest path, i.e., the optimal trajectory, starts not from the beginning, but from the end of the network and moves in the opposite direction to the beginning, then in this case we have a “backward sweep” algorithm. In this case, when implementing the backward sweep algorithm, the movement is carried out from the final state s 12 to the initial state s 1.

At the beginning of the search for the shortest path, a transition table is compiled. The number of table rows is equal to the number of control steps, the number of columns is equal to the number of states minus one. This table will store control steps and the corresponding values ​​of the optimality criterion L t for all possible states of the system after each step.

Table 6.4.1

i t s 1 s 2 s 3 s 4 s 5 S 6 s 7 s 8 s 9 s 10 s 11
12 12 6
10 11 10
5
1


The filled cells of the table are divided in half. The control u(t) is entered into the upper part of the filled cell, i.e., in this case, the number of the node to which the transition is made. The value of the contribution Lt to the total value of the optimality criterion L, which was obtained during the transition from the node corresponding to this cell to the final node, is entered into the lower part of the filled cell.

Filling the table begins with the first line, where information about the last step of the path is stored. The last, in this case, the fourth step of the path is uniquely determined during the transition from any penultimate state, which can be any of the three possible: s 9, s 10, s 11. Therefore, optimal control at the last step is obvious. Depending on the penultimate state, the contribution to the optimality criterion is L 4 (9) = 12, L 4 (10) = 6, or L 4 (11) = 7. These contribution values ​​to L are written at the bottom of the cells in the first row of the table. 6.4.1.

Before the penultimate - in this case, the third - step, the set of possible states of the system is (s 5, s 6, s 7, s 8). Let us now apply Bellman's principle to determine the trajectory at the third and fourth steps. It lies in the fact that, regardless of the first two steps of control, the trajectory segment in the last two steps is itself an optimal trajectory, i.e. gives the minimum contribution of L 3 to the optimality criterion.

If the state of the system before the penultimate step is the state s 8, then at the last steps the contribution to L is determined by the relation

L 3 (s 5)=min( }.

Since from s 5 transitions to s 9 and s 11 are possible, i.e.:

g(s 5,9) = s 9; ; L 4 (s 9) = 12,

g(s 5,11) = s 11; ; L 4 (s 11) = 7,

L 3 (s 5) = min(6+12, 4+7) = 11 and u (3) = 11.

This means that if the system is in state s 5, then optimal control consists of first transitioning to state s 11, then to state s 12. The length of the arc from s 5 to s 12 turns out to be equal to 11 units.

Calculating the contribution to L similarly for transitions from states s 6, s 7, s 8, we obtain the following contributions:

L 3 (s 6)=min(7+12, 6+6)=12, u (3) =10;

L 3 (s 7)=min(5+6, 3+7)=10, u (3) =11;

L 3 (s 8)=min(10+6, 12+7)=16, u (3) =10;

The resulting four pairs of numbers are written in the second line of Table. 6.4.1.

At the second control step, the contribution to the optimality criterion depending on the initial state is

L 2 (s 2) = min( ) = min(11+11, 14+10) = 22, u (2) = 5;

L 2 (s 3) = min( ) = min(7+11, 9+12) = 18, u (2) = 5;

L 2 (s 4) = min( ) = min(2+16, 3+12, 6+10) = 15, u (2) = 6;

The resulting three pairs of numbers are written in the third line of Table 6.4.1.

The initial state s 1 is uniquely determined, therefore, in the last row of the table the only cell is filled in, where the values ​​3 and 24 are written because:

L 1 (s 1) = min( ) = min(5+22, 6+18, 11+15) = 24, u (1) = 3.

Now we can finally determine the sequence of optimal multi-step control. At the first step u (1) = 3, i.e. from node 1 we go to node 3, at the second step u (2) = 5, i.e. we go to node 5, then after control u (3) = 11 - to node 11 and, finally, to node 12. We finally obtain that the shortest path through the network shown in Fig. 6.4.1, passes through the sequence of states s 1 →s 2 →s 5 →s 11 →s 12, and its length is 24 conventional units.

The search for the shortest path can also be carried out from the beginning of the network, while implementing a forward sweep algorithm, which performs essentially the same addition and comparison operations, but in a slightly different sequence.

The forward and backward sweep algorithms, although essentially different, provide one addition and one comparison per arc. Therefore, both algorithms have the same performance. However, there is an important difference. The forward sweep algorithm considers arcs outgoing of those nodes, the shortest paths l i which are already known.

The backward sweep algorithm considers arcs inbox to those nodes, the shortest paths l j to which are still unknown. Due to the latter circumstance, preference is often given to the direct sweep algorithm. This algorithm can be applied to any structure of the shortest path set.

Solving a simple shortest path problem illustrates a number of the following characteristic features that are inherent in much more complex multi-step decision-making processes:

1. The original problem is immersed in many optimization problems; in this case, each node solves its own problem.

2. The set of solutions to optimization problems is described by a functional equation, which is a system of equations that connect several optimization problems. In such a system, each equation corresponds to one node and usually contains operators like min, max or minimax to the right of the equal sign, and variables like g i and g j on both sides of it.

3. The solution to many optimization problems can be found using a backward sweep algorithm, which is equivalent to an ordered procedure for solving a sequence of functional equations.

Dynamic programming is well suited for solving problems associated with modeling network systems that do not have a special structure. Thus, forward and backward sweep algorithms are suitable for performing calculations in acyclic networks. The backward sweep algorithm can be generalized and used to solve problems in which there is an element of randomness. This cannot be done for the forward sweep algorithm.

The concept of "state" plays a central role in dynamic programming, and states mean the following. The transition is carried out from a state to a state that contains the entire history of the process, i.e. the state is described with a degree of detail that allows for the calculation (evaluation) of current alternative solutions.

For the network model, the states are the nodes, and the arcs coming out of a certain node represent the various decisions that can be made at that node (state). With this interpretation, we can say that the transition occurs from state to state, and states represent points at which decisions are made. The above statement means that the arcs leaving a node have no relation to the path by which this or that node was reached, i.e. they do not depend on the incoming arcs.

State elements are often called state variables. In dynamic programming models, states are sometimes grouped into stages, and transition is made from one stage to another. For example, in the shortest path problem there are states, but there are no stages, since it is impossible to group states into sets in such a way that there is a transition from one set to another.

Diving into a variety of optimization problems is tantamount to introducing the concept state space which represents a set of states. In the functional equation, the optimal response is considered as a function of the starting state, and the principle of optimality establishes the relationship between the optimal responses for different starting states.

A bunch of S possible (or observable) states are called state space, and element s from S defines a specific state. With every condition s many connected D(s) . Element d from many D(s) is called decision. The rule according to which a feasible solution is determined for each state is called strategy d.

Actually strategy d matches each state s some element d( s) from many D(s). The set of all such d forms strategy space D. The latter means that the choice of a solution in some state does not constrain the choice in all other states. Essentially, D is a Cartesian product of sets D(s) By s.

One of the ideas of dynamic programming is that each strategy d must have a so-called profit function Vd(s), which can be obtained based on the state s and using strategy d. The concept of a profit function (or income) generalizes the concept of the contribution L t to the overall value of the optimality criterion L considered when solving the shortest path problem.

The expression “using strategy d” means that you are able to s solution d( s); the process is then assumed to have entered the state s", i.e. the state is realized s", in which the solution d( s"), etc. The profit function has a rather complex structure, since it depends on the sequence of states and decisions, on the rewards that are associated with these states and decisions, as well as on the way rewards are aggregated.

A state is a description of the history of a process with a level of detail that allows an assessment of current alternative solutions. The main property of states is that the state is a short record of the history of the process, and degree of detail allows us to determine the local income function. In other words, the local income function can only depend on s, d And v.

The next chapter will look in more detail at Markov chains, which are of great importance for modeling the time evolution of production and technical systems. There are also Markov decision-making models in which the state s determined by some pair of numbers (n,i) , the solution is the function depending on them k, and the local income function is determined by an expression like h[(n, I) , k, v] = R k i(n) + å j P k ij(n)v(n+ 1,j) (n ).

Markov decision-making models are generalized in different directions, in particular, to the case Markov reconstruction problems. The most useful generalization is obtained when unequal or variable transition times are considered. In simple models, it is assumed that the transition from state to state and observation of the state are instantaneous, and the length of time between transitions from state to state can be of variable or random length.

Whenever a state is observed, a solution is selected and cannot be changed until the process moves to a new state, where a new solution is selected, etc. This model is a combination of Markov chain theory and restoration theory; usually called Markov reconstruction problem.

Test questions for Chapter 6.

1. What components does a directed network consist of?

1. How is the network capacity matrix constructed?

1. How is the network flow matrix formed?

1. Why are the capacity and flow matrices subtracted?

1. What is a network diagram and what is it used for?

1. How are the early start and early finish times of work determined?

1. What is the total float for some event on the network diagram?

1. How is the critical path determined?

1. What is called the state vector of a certain system?

1. What is the trajectory of a system in state space?

1. What is the problem of optimal control?

1. How is the optimality criterion formulated?

1. What is dynamic programming?

1. Formulate the Bellman optimality principle.

1. What is the essence of the forward and backward sweep algorithms when searching for the shortest path?

Options for assignments for Chapter 6.

For networks in each of the options:

1) Find the maximum flow from the source (1) to the final node of the network - the sink, assuming that one of the numbers in brackets for each arc (i, j) determines the capacity of the arc;

1) Assuming that arcs (1)®(2), (1)®(3), etc. define some jobs, the minimum and maximum duration of which are given by the numbers indicated under the corresponding arcs, find the critical path from the initial event (1) to the end;

1) Search for the shortest path from the start node to the end node of the network. Consider the distances between nodes i, j given by one of the numbers in brackets.





X 4

Dynamic programming is a topic to which quite a few articles are devoted in RuNet, so we decided to tackle it. This article will analyze classical problems into sequences, one-dimensional and two-dimensional dynamics, provide rationale for solutions and describe different approaches to their implementation. All code given in the article is written in Java.

What are we even talking about? What is dynamic programming?

A method for solving a problem by dividing it into several identical subtasks that are recurrently interconnected. The simplest example would be the Fibonacci numbers - to calculate a certain number in this sequence, we need to first calculate the third number by adding the first two, then the fourth in the same way based on the second and third, and so on (yes, we heard about the closed formula).

Okay, how to use this?

The solution to a problem by dynamic programming should contain the following:

So, do I need to write a recursive method to solve this problem? I heard they are slow.

Of course, it’s not necessary, there are other approaches to implementing dynamics. Let's look at them using the following problem as an example:

Calculate the nth term of the sequence given by the formulas:
a 2n = a n + a n-1 ,
a 2n+1 = a n - a n-1 ,
a 0 = a 1 = 1.

Solution idea

Here we are given both initial states (a 0 = a 1 = 1) and dependencies. The only difficulty that may arise is the understanding that 2n is the condition for the evenness of a number, and 2n+1 is the condition for oddness. In other words, we need to check whether a number is even and count it according to this using various formulas.

Recursive solution

The obvious implementation is to write the following method:

Private static int f(int n)( if(n==0 || n==1) return 1; // Check for initial value if(n%2==0)( //Check for parity return f(n /2)+f(n/2-1); // Calculate using the formula for even indices, // referring to previous values ​​)else( return f((n-1)/2)-f((n-1) /2-1); // Calculate using the formula for odd //indices, referring to previous values ​​) )

And it works great, but there are some nuances. If we want to calculate f(12) , then the method will calculate the sum f(6)+f(5) . At the same time, f(6)=f(3)+f(2) and f(5)=f(2)-f(1), i.e. we will calculate the value of f(2) twice. There is a solution to this - memoization (caching values).

Recursive solution with value caching

The idea of ​​memoization is very simple - once we calculate a value, we enter it into some kind of data structure. Before each calculation, we check whether the calculated value is in this structure, and if so, we use it. An array filled with flag values ​​can be used as a data structure. If the value of the element at index N is equal to the value of the flag, then we have not calculated it yet. This creates certain difficulties, because the value of the flag should not belong to the set of function values, which is not always obvious. Personally, I prefer to use a hash table - all operations in it are performed in O(1), which is very convenient. However, with a large number of values, two numbers may have the same hash, which naturally causes problems. In this case, you should use, for example, red-black wood.

For the already written function f(int), caching values ​​will look like this:

Private static HashMap cache = new HashMap (); private static int fcashe(int n)( if(!cache.containsKey(n))(//Check whether we found this value cache.put(n, f(n)); //If not, then find and write to table ) return cache.get(n)

Not too difficult, would you agree? But this eliminates a huge number of operations. You pay for this with extra memory consumption.

Sequential calculation

Now let's go back to where we started - recursion is slow. Not so slow that it would cause real trouble in real life, but in a sports programming competition, every millisecond counts.

The sequential calculation method is suitable only if the function refers exclusively to the elements before it - this is its main, but not the only disadvantage. Our problem satisfies this condition.

The essence of the method is as follows: we create an array of N elements and sequentially fill it with values. You probably already guessed that in this way we can also calculate those values ​​that are not needed for the answer. In a significant part of dynamics problems, this fact can be omitted, since all the values ​​are often needed for the answer. For example, when searching for the shortest path, we cannot help but calculate the path to some point; we need to reconsider all options. But in our problem we need to calculate approximately log 2 (N) values ​​(in practice more), for the 922337203685477580th element (MaxLong/10) we will need 172 calculations.

Private static int f(int n)( if(n<2) return 1; //Может, нам и вычислять ничего не нужно? int fs = int[n]; //Создаём массив для значений fs=fs=1; //Задаём начальные состояния for(int i=2; i

Another disadvantage of this approach is the relatively large memory consumption.

Creating an Index Stack

Now we have to essentially write our own recursion. The idea is as follows - first we go “down” from N to the initial states, remembering the arguments from which we will need to calculate the function. Then we go back “up”, sequentially calculating the values ​​​​from these arguments, in the order that we wrote down.

Dependencies are calculated as follows:

LinkedList stack = new LinkedList (); stack.add(n); (LinkedList queue = new LinkedList (); //Store indexes for which dependencies have not yet been calculated queue.add(n); int dum; while(queue.size()>0)( //As long as there is something to calculate dum = queue.removeFirst(); if(dum%2==0)( //Check parity if(dum/2>1)( // If the calculated dependency does not belong to the initial states stack.addLast(dum/2); //Add to the stack queue.add(dum/2); //Save to //calculate further dependencies ) if(dum/2-1>1); )( //Check for belonging to the initial states stack.addLast(dum/2-1); //Add to the stack queue.add(dum/2-1); //Save to //calculate further dependencies ) )else( if((dum-1)/2>1)( //Check belonging to the initial states stack.addLast((dum-1)/2); //Add queue.add((dum-1)/2) to the stack ; //Save to //calculate further dependencies ) if((dum-1)/2-1>1)( //Check belonging to the initial states stack.addLast((dum-1)/2-1); / /Add to the stack queue.add((dum-1)/2-1); //Save to //calculate further dependencies ) ) /* For this particular task, there is a more elegant way to find all dependencies, but a fairly universal one is shown here. */ ) )

The resulting stack size is how many calculations we need to do. This is how I got the number 172 mentioned above.

Now we extract the indices one by one and calculate the values ​​for them using formulas - it is guaranteed that all the necessary values ​​will already be calculated. We will store it as before - in a hash table.

HashMap values ​​= new HashMap (); values.put(0,1); //It is important to add the initial states //to the value table values.put(1,1); while(stack.size()>0)( int num = stack.removeLast(); if(!values.containsKey(num))( //You should remember this construction //from the paragraph about caching if(num%2= =0)( //Check the parity int value = values.get(num/2)+values.get(num/2-1); //Calculate the value values.add(num, value); //Place it in the table )else( int value = values.get((num-1)/2)-values.get((num-1)/2-1); //Calculate the value values.add(num, value); //Place it into the table) )

All the necessary values ​​have been calculated, all that remains is to write

Return values.get(n);

Of course, this solution is much more time-consuming, but it is worth it.

Okay, math is beautiful. What about tasks in which not everything is given?

For greater clarity, let us consider the following problem for one-dimensional dynamics:

At the top of a ladder containing N steps there is a ball, which starts jumping down them to the bottom. The ball can jump to the next step, one step later, or two steps later. (That is, if the ball is on the 8th step, then it can move to the 5th, 6th or 7th.) Determine the number of possible “ routes" of the ball from the top to the ground.

Solution idea

You can get to the first step in only one way - by making a jump with a length equal to one. You can get to the second step by making a jump of length 2, or from the first step - only 2 options. You can get to the third step by making a three-length jump, from the first or three steps. Those. only 4 options (0->3; 0->1->3; 0->2->3; 0->1->2->3). Now let's look at the fourth step. You can get to it from the first step - one route for each route before it, from the second or from the third - similarly. In other words, the number of paths to the 4th step is the sum of the routes to the 1st, 2nd and 3rd steps. Mathematically speaking, F(N) = F(N-1)+F(N-2)+F(N-3) . The first three steps will be considered the initial states.

Implementation via recursion

private static int f(int n)( if(n==1) return 1; if(n==2) return 2; if(n==3) return 4; return f(n-1)+f(n -2)+f(n-3)

There is nothing tricky here.

Based on the fact that, by and large, a simple solution on an array of N elements is obvious, I will demonstrate here a solution on an array of only three.

Int vars = new int; vars=1;vars=2;vars=4;

Since each subsequent value depends only on the three previous ones, no value under the index less than i-3 would be useful to us. In the above code, we write a new value in place of the oldest one, which is no longer needed. Cycling the remainder of division by 3 helps us avoid a bunch of conditional statements. Simple, compact, elegant.

Was it written at the top about some kind of two-dimensional dynamics?..

There are no special features associated with two-dimensional dynamics, but, just in case, I will consider here one problem for it.

In the rectangular NxM table, at the beginning the player is in the upper left cell. In one move, he is allowed to move to an adjacent cell either to the right or down (it is prohibited to move to the left and up). Count how many ways a player has to get into the bottom right cell.

Solution idea

The logic of the solution is completely identical to that in the problem about the ball and the ladder - only now you can get to cell (x,y) from cells (x-1,y) or (x, y-1). Total F(x,y) = F(x-1, y)+F(x,y-1) . Additionally, you can understand that all cells of the form (1,y) and (x,1) have only one route - straight down or straight to the right.

Implementation via recursion

For heaven's sake, don't do 2D dynamics through recursion. It has already been mentioned that recursion is less efficient than a loop in terms of performance, so two-dimensional recursion is also terrible to read. It is only in such a simple example that it looks easy and harmless.

Private static int f(int i, int j) ( if(i==1 || j==1) return 1; return f(i-1, j)+f(i, j-1); )

Implementation via an array of values

int dp = new int; for(int i=0; i A classic solution using dynamics, nothing unusual - we check whether a cell is an edge and set its value based on neighboring cells.

Great, I understand everything. How should I test my skills?

In conclusion, I will give a number of typical problems for one-dimensional and two-dimensional dynamics; analyzes are attached.

Explosion hazard

When processing radioactive materials, two types of waste are generated - highly hazardous (type A) and non-hazardous (type B). The same containers are used to store them. After placing the waste in containers, the latter are stacked vertically. A stack is considered explosive if it contains more than one Type A container in a row. A stack is considered safe if it is not explosive. For a given number of containers N, determine the number of possible types of safe stacks.

Solution

The answer is the (N+1)th Fibonacci number. It was possible to guess by simply calculating the first 2-3 values. It was possible to strictly prove it by constructing a tree of possible constructions.


Each main element is divided into two - main (ends with B) and secondary (ends with A). Side elements turn into main ones in one iteration (only B can be added to a sequence ending with A). This is typical for Fibonacci numbers.

Implementation

For example, like this:

//Entering the number N from the keyboard N+=2; BigInteger fib = new BigInteger; fib=fib=BigInteger.ONE; for(int i=2; i

Climbing stairs

The boy approached the toll stairs. To step on any step, you need to pay the amount indicated on it. The boy knows how to step to the next step, or jump over a step. You need to find out what is the smallest amount the boy will need to get to the top step.

Solution

Obviously, the amount that the boy will give on the Nth step is the amount that he gave before plus the cost of the step itself. “The amount he gave before” depends on from which step the boy steps to the Nth - from the (N-1)th or from the (N-2)th. You need to choose the smallest one.

Implementation

For example, like this:

Int Imax; //*enter the number of steps from the keyboard* DP = new int; for(int i=0; i

Calculator

There is a calculator that performs three operations:

  • Add one to the number X;
  • Multiply the number X by 2;
  • Multiply the number X by 3.

Determine the minimum number of operations required to obtain the given number N from number 1. Print this number and, on the next line, a set of executed operations of the form “111231”.

Solution

The naive solution is to divide the number by 3 as long as possible, otherwise by 2 if possible, otherwise subtract one, and so on until it becomes one. This is the wrong decision, because... it eliminates, for example, the possibility of subtracting a number by one and then dividing by three, which would cause errors on large numbers (such as 32718).

The correct solution is to find for each number from 2 to N the minimum number of actions based on the previous elements, in other words: F(N) = min(F(N-1), F(N/2), F(N/3) ) + 1 . It should be remembered that all indexes must be integers.

To recreate the list of actions, you need to go in the opposite direction and look for an index i such that F(i)=F(N), where N is the number of the element in question. If i=N-1, write 1 at the beginning of the line, if i=N/2 - two, otherwise - three.

Implementation
int N; //Keyboard input int a = new int; a= 0; ( int min; for(int i=2; i 1)( if(a[i]==a+1)( ret.insert(0, 1); i--; continue; ) if(i%2==0&&a[i]==a+1)( ret.insert(0, 2); continue; ret.insert(0, 3); System.out.println(a[N]); System.out.println(ret);

The cheapest way

Each cell of the rectangular table N*M contains a number. Initially, the player is in the upper left cell. In one move, he is allowed to move to an adjacent cell either to the right or down (it is prohibited to move to the left and up). When passing through a cell, the player is taken as many kilograms of food as the number written in this cell (food is also taken for the first and last cells of his path).

You need to find the minimum weight of food in kilograms, giving which the player can get into the lower right corner.