The essence of dynamic programming. Applied problems of dynamic programming. Dynamic programming. Basic Concepts

Dynamic programming.

When modeling network structures In addition to problems related to the existence of flows in transport, electrical, telephone, computer and other types of networks, a whole class of problems arises that can be reduced to the shortest path problem. For example, the problem of the shortest path is solved every time by a router program when a site is found by its name on the Internet.

The shortest path problem in a directed network is a typical problem dynamic programming, therefore, although dynamic programming, as well as network planning, is associated with the development of processes over time, the modeling of which is discussed in more detail in the next section, we will consider in this paragraph the dynamic programming method using the example of finding the shortest path.

The concept of dynamic programming is closely related to multi-step processes decision making. A multi-step decision-making process can be defined as the process of making sequential decisions aimed at achieving a given goal. Multi-step decision-making processes are constantly encountered in the most different situations, from car repair in a car service center to control of a spacecraft.

Dynamic programming can be loosely defined as a set of mathematical procedures used in the analysis of multi-step decision-making processes. Each multi-step decision process is an evolution of the following problem: finding the shortest path in a directed, acyclic network.

Dynamic programming can be considered as a unified theory due to a single set of ideas and techniques that are used in mathematical analysis various tasks. These ideas and techniques are the essence of dynamic programming. Bellman was one of the first to understand the essence of the optimality principle and began to apply it to many optimization problems arising in mathematics, engineering, operations research and other areas.

Thus, the concept of dynamic programming is associated with a multi-step decision-making process to achieve a specific goal. For example, translation aircraft from one orbit to another is a typical dynamic programming problem, provided that the orbit correction is carried out by applying a pulse at discrete times and the goal is fuel economy.

Characterizing dynamic programming as a set of mathematical procedures for optimal control discrete system, V general view The optimal control problem can be formulated as follows. At discrete moments in time t= 1, 2,..., N the system is in one of the sets s i of states characterized by the state vector x (t) . The transition between successive states is carried out using the control vector u (t) according to the law:

x (t) = g (t) (x (t) , u (t)) ; t= 1, 2,..., N

Controls u (t) are selected from the set of admissible controls and form a sequence of admissible controls u (0) ,u (1) ,…,u (N) . The sequence of admissible controls for a given initial state x (0) determines the trajectory of the system x (0), x (1), x (2),…, x (N).

Each trajectory has its own value of the optimality criterion F, or objective function control, consisting of individual contributions at each stage of control:

The optimal control problem is to find among a set of control sequences one that achieves minimum value F. This sequence is called the optimal control sequence and determines the optimal trajectory.

Dynamic programming is based on the Bellman optimality principle, which can be formulated as follows. An optimal strategy has the property that whatever the initial state and the decision at the initial moment, subsequent decisions must formulate an optimal strategy with respect to the state arising after the initial decision.

The meaning of the principle of optimality becomes clearer if we consider that for an optimal trajectory, each section between the end point and any intermediate point is also an optimal trajectory. The optimality principle, or otherwise the dynamic programming method, allows you to find the optimal multi-step strategy by solving a set of simpler one-step optimization problems.

The dynamic programming method is well illustrated by the example of finding the shortest path between the extreme nodes of an oriented network. Consider some directed network with 12 nodes, which needs to be traversed from the start node (1) to the end node (12) in four steps, moving from node to node with each step.

Rice. 6.4.1. Traversing a directed network along the shortest path.

The numbers indicated at the arcs ( i,j) are equal to the lengths of the arcs l ij between nodes i And j(in conventional units). Possible states of the system s i in in this case associated with being in i th node, control u(t) is associated with the choice of path direction at each control step. Four control steps u (1),...,u (4) sequentially transfer the system from the initial state s 1 to the final state s 12 and, thus, form a certain trajectory that needs to be found. The optimality criterion F in this case is the length of the trajectory L, consisting of the lengths of individual arcs:

If the search for the shortest path, i.e., the optimal trajectory, starts not from the beginning, but from the end of the network and moves to reverse direction to the beginning, then in this case we have a “backward sweep” algorithm. In this case, when implementing the backward sweep algorithm, the movement is carried out from the final state s 12 to the initial state s 1.

At the beginning of the search for the shortest path, a transition table is compiled. The number of table rows is equal to the number of control steps, the number of columns is equal to the number of states minus one. This table will store control steps and the corresponding values of the optimality criterion L t for all possible states of the system after each step.

Table 6.4.1

i t	s 1	s 2	s 3	s 4	s 5	S 6	s 7	s 8	s 9	s 10	s 11
									12	12 6
						10	11 10
			5
1

The filled cells of the table are divided in half. IN top part control u(t) is entered into the filled cell, i.e., in this case, the number of the node to which the transition is made. The value of the contribution Lt to the total value of the optimality criterion L, which was obtained during the transition from the node corresponding to this cell to the final node, is entered into the lower part of the filled cell.

Filling the table begins with the first line, where information about the last step of the path is stored. The last, in this case, the fourth step of the path is uniquely determined during the transition from any penultimate state, which can be any of the three possible: s 9, s 10, s 11. That's why optimal control the last step is obvious. Depending on the penultimate state, the contribution to the optimality criterion is L 4 (9) = 12, L 4 (10) = 6, or L 4 (11) = 7. These contribution values to L are written at the bottom of the cells in the first row of the table. 6.4.1.

Before the penultimate - in this case, the third - step, the set of possible states of the system is (s 5, s 6, s 7, s 8). Let us now apply Bellman's principle to determine the trajectory at the third and fourth steps. It lies in the fact that, regardless of the first two steps of control, the trajectory segment in the last two steps is itself an optimal trajectory, i.e. gives the minimum contribution of L 3 to the optimality criterion.

If the state of the system before the penultimate step is state s 8, then at last steps the contribution to L is determined by the relation

L 3 (s 5)=min( }.

Since from s 5 transitions to s 9 and s 11 are possible, i.e.:

g(s 5,9) = s 9; ; L 4 (s 9) = 12,

g(s 5,11) = s 11; ; L 4 (s 11) = 7,

L 3 (s 5) = min(6+12, 4+7) = 11 and u (3) = 11.

This means that if the system is in state s 5, then optimal control consists of first transitioning to state s 11, then to state s 12. The length of the arc from s 5 to s 12 turns out to be equal to 11 units.

Calculating the contribution to L similarly for transitions from states s 6, s 7, s 8, we obtain the following contributions:

L 3 (s 6)=min(7+12, 6+6)=12, u (3) =10;

L 3 (s 7)=min(5+6, 3+7)=10, u (3) =11;

L 3 (s 8)=min(10+6, 12+7)=16, u (3) =10;

The resulting four pairs of numbers are written in the second line of Table. 6.4.1.

At the second control step, the contribution to the optimality criterion depending on initial state There is

L 2 (s 2) = min( ) = min(11+11, 14+10) = 22, u (2) = 5;

L 2 (s 3) = min( ) = min(7+11, 9+12) = 18, u (2) = 5;

L 2 (s 4) = min( ) = min(2+16, 3+12, 6+10) = 15, u (2) = 6;

The resulting three pairs of numbers are written in the third line of Table 6.4.1.

The initial state s 1 is uniquely determined, therefore in last line The table is filled in the only cell where the values 3 and 24 are entered because:

L 1 (s 1) = min( ) = min(5+22, 6+18, 11+15) = 24, u (1) = 3.

Now we can finally determine the sequence of optimal multi-step control. At the first step u (1) = 3, i.e. from node 1 we go to node 3, at the second step u (2) = 5, i.e. we go to node 5, then after control u (3) = 11 - to node 11 and, finally, to node 12. We finally obtain that the shortest path through the network shown in Fig. 6.4.1, passes through the sequence of states s 1 →s 2 →s 5 →s 11 →s 12, and its length is 24 conventional units.

The search for the shortest path can also be carried out from the beginning of the network, while implementing a forward sweep algorithm, which performs essentially the same addition and comparison operations, but in a slightly different sequence.

The forward and backward sweep algorithms, although essentially different, provide one addition and one comparison per arc. Therefore, both algorithms have the same performance. However, there is an important difference. The forward sweep algorithm considers arcs outgoing of those nodes, the shortest paths l i which are already known.

The backward sweep algorithm considers arcs inbox to those nodes, the shortest paths l j to which are still unknown. Due to the latter circumstance, preference is often given to the direct sweep algorithm. This algorithm can be applied to any structure of the shortest path set.

Solving a simple shortest path problem illustrates the following: characteristic features, which are inherent in much more complex multi-step decision-making processes:

1. The original problem is immersed in many optimization problems; in this case, each node solves its own problem.

2. The set of solutions to optimization problems is described by a functional equation, which is a system of equations that connect several optimization problems. In such a system, each equation corresponds to one node and usually contains operators like min, max or minimax to the right of the equal sign, and type variables g i, and g j - on both sides of it.

3. The solution to many optimization problems can be found using a backward sweep algorithm, which is equivalent to an ordered procedure for solving a sequence of functional equations.

Dynamic programming is well suited for solving simulation problems network systems, which do not have a special structure. Thus, forward and backward sweep algorithms are suitable for performing calculations in acyclic networks. The backward sweep algorithm can be generalized and used to solve problems in which there is an element of randomness. This cannot be done for the forward sweep algorithm.

The concept of "state" plays a central role in dynamic programming, and states mean the following. The transition is carried out from a state to a state that contains the entire history of the process, i.e. the state is described with a degree of detail that allows for the calculation (evaluation) of current alternative solutions.

For network model states are nodes, and arcs emanating from some node represent various solutions, which can be accepted in a given node (state). With this interpretation, we can say that the transition occurs from state to state, and states represent points at which decisions are made. The above statement means that the arcs leaving a node have no relation to the path by which this or that node was reached, i.e. they do not depend on the incoming arcs.

State elements are often called state variables. In dynamic programming models, states are sometimes grouped into stages, and transition is made from one stage to another. For example, in the shortest path problem there are states, but there are no stages, since it is impossible to group states into sets in such a way that there is a transition from one set to another.

Diving into a variety of optimization problems is tantamount to introducing the concept state space which represents a set of states. In the functional equation, the optimal response is considered as a function of the starting state, and the principle of optimality establishes the relationship between the optimal responses for different starting states.

A bunch of S possible (or observable) states are called state space, and element s from S defines a specific state. With every condition s many connected D(s) . Element d from many D(s) is called decision. The rule according to which a feasible solution is determined for each state is called strategy d.

Actually strategy d matches each state s some element d( s) from many D(s). The set of all such d forms strategy space D. The latter means that the choice of a solution in some state does not constrain the choice in all other states. Essentially, D is a Cartesian product of sets D(s) By s.

One of the ideas of dynamic programming is that each strategy d must have a so-called profit function Vd(s), which can be obtained based on the state s and using strategy d. The concept of a profit function (or income) generalizes the concept of the contribution L t to the overall value of the optimality criterion L considered when solving the shortest path problem.

The expression “using strategy d” means that you are able to s solution d( s); the process is then assumed to have entered the state s", i.e. the state is realized s", in which the solution d( s"), etc. The profit function has a rather complex structure, since it depends on the sequence of states and decisions, on the rewards that are associated with these states and decisions, as well as on the way rewards are aggregated.

A state is a description of the history of a process with a level of detail that allows an assessment of current alternative solutions. The main property of states is that the state is a short record of the history of the process, and degree of detail allows us to determine the local income function. In other words, local function income can only depend on s, d And v.

The next chapter will look in more detail at Markov chains that have great importance to simulate the time evolution of production and technical systems. There are also Markov decision-making models in which the state s determined by some pair of numbers (n,i) , the solution is the function depending on them k, and the local income function is determined by an expression like h[(n, I) , k, v] = R k i(n) + å j P k ij(n)v(n+ 1,j) (n ).

Markov decision-making models are generalized in different directions, in particular, to the case Markov reconstruction problems. The most useful generalization is obtained when unequal or variable transition times are considered. In simple models, it is assumed that the transition from state to state and observation of the state are instantaneous, and the length of time between transitions from state to state can be of variable or random length.

Whenever a state is observed, a solution is selected and cannot be changed until the process moves to a new state, where a new solution is selected, etc. This model is a combination of Markov chain theory and restoration theory; usually called Markov reconstruction problem.

Test questions for Chapter 6.

1. What components does a directed network consist of?

1. How is the network capacity matrix constructed?

1. How is the network flow matrix formed?

1. Why are the capacity and flow matrices subtracted?

1. What is a network diagram and what is it used for?

1. How are the early start and early finish times of work determined?

1. What is the total float for some event on the network diagram?

1. How is the critical path determined?

1. What is called the state vector of a certain system?

1. What is the trajectory of a system in state space?

1. What is the problem of optimal control?

1. How is the optimality criterion formulated?

1. What is dynamic programming?

1. Formulate the Bellman optimality principle.

1. What is the essence of the forward and backward sweep algorithms when searching for the shortest path?

Options for assignments for Chapter 6.

For networks in each of the options:

1) Find the maximum flow from the source (1) to the final node of the network - the sink, assuming that one of the numbers in brackets for each arc (i, j) determines the capacity of the arc;

1) Assuming that arcs (1)®(2), (1)®(3), etc. define some jobs, the minimum and maximum duration of which are given by the numbers indicated under the corresponding arcs, find the critical path from the initial event (1) to the end;

1) Search for the shortest path from the start node to the end node of the network. Consider the distances between nodes i, j given by one of the numbers in brackets.

X 4

Let's say there is a problem that we have already solved by dynamic programming, for example, the eternal Fibonacci numbers.
Let's reformulate it a little. Let us have a vector from which we want to obtain a vector. Let's expand on the formulas a little: . You can see that from a vector you can get a vector by multiplying by some matrix, because only the added variables from the first vector appear in the final vector. This matrix is easy to derive, here it is: . Let's call it the transition matrix.
This means that if we take the vector and multiply it by the transition matrix n - 1 times, we get a vector containing fib[n] - the answer to the problem.
Now, why is all this necessary? Matrix multiplication has the property of associativity, that is, (but at the same time it does not have commutativity, which in my opinion is surprising). This property gives us the right to do this: .
This is good because now you can use the fast exponentiation method, which works in . In total, we were able to calculate the Nth Fibonacci number as the logarithm of arithmetic operations.
And now a more serious example:
Example #3: Ramp Sequence
Let us denote a sawtooth sequence of length N as a sequence in which the following condition is satisfied for each non-extreme element: it is either less than both of its neighbors or greater. You need to count the number of sawtooth sequences of digits of length N . It looks something like this:

Solution
First, a solution without a transition matrix:
1) Dynamics state: dp[n] - the number of sawtooth sequences of length n ending with the number last. Moreover, if less == 0, then the last digit is less than the penultimate one, and if less == 1, then it is more.
2) Initial values:
for last in range(10): dp = 9 - last dp = last 3) Recalculation of dynamics:
for prev in range(10): if prev > last: dp[n] += dp if prev< last: dp[n] += dp 4) Порядок пересчёта: мы всегда обращаемся к предыдущей длине, так что просто пара вложенных for "ов.
5) The answer is the sum dp[N] .
Now we need to come up with an initial vector and a transition matrix to it. The vector seems to come up quickly: all states denoting the length of the sequence N . Well, the transition matrix is derived by looking at the conversion formulas.
Transition vector and matrix
Dynamics by subsegments
This is a class of dynamics in which the state is the boundaries of a subsegment of some array. The point is to calculate answers for subproblems based on all possible subsegments of our array. Usually they are sorted in order of increasing length, and the recalculation is based, accordingly, on shorter segments.
Example #4: Packing a string
Here is the Expanded Condition. I'll summarize it briefly:
Let's define a compressed string:
1) A string consisting only of letters is a compressed string. She unclenches into herself.
2) A string that is the concatenation of two compressed strings A and B. It is decompressed into a concatenation of decompressed strings A and B.
3) String D(X) , where D is an integer greater than 1 and X is a compressed string. It is decompressed into a concatenation of D strings decompressed from X .
Example: “3(2(A)2(B))C” expands to “AABBAABBAABBC” .
It is necessary to find out from the string s the length of the shortest compressed string that decompresses into it.
Solution
This problem is solved, as you probably already guessed, by dynamics in subsegments.
1) Dynamics state: d[l][r] - compressed string of minimum length, decompressed into string s
2) Initial states: all substrings of length one can be compressed only into themselves.
3) Recalculation of dynamics:
The best answer has some last operation compression: either it's just a string from capital letters, or it is the concatenation of two strings, or the compression itself. So let's go through all the options and choose the best one.
Dp_len = r - l dp[l][r] = dp_len # The first compression option is just a string. for i in range(l + 1, r): dp[l][r] = min(dp[l][r], dp[l][i] + dp[i][r]) # Try dividing by two compressed substrings for cnt in range(2, dp_len): if (dp_len % cnt == 0): # If it is not divisible, then there is no point in trying to divide good = True for j in range(1, (dp_len / cnt) + 1 ): # Check that all cnt substrings are the same good &= s == s if good: # Try to divide into cnt identical substrings and compress dp[l][r] = min(dp[l][r], len (str(cnt)) + 1 + dp[l] + 1) 4) Recalculation order: direct ascending substring length or lazy dynamics.
5) The answer lies in d.
Example #5:
Dynamics by subtrees
The state parameter of the dynamics along subtrees is usually a vertex, which denotes the subtree in which this vertex is the root. To obtain the value of the current state, you usually need to know the results of all your children. Most often they implement it lazily - they simply write a depth-first search from the root of the tree.
Example #6: Logical tree
Given a hanging tree, the leaves of which contain single-bit numbers - 0 or 1. All internal vertices also contain numbers, but according to the following rule: for each vertex one of the logical operations is selected: “AND” or “OR”. If it is "AND", then the value of the vertex is a logical "AND" from the values of all its children. If “OR”, then the value of the vertex is a logical “OR” from the values of all its children.
It is required to find the minimum number of changes in logical operations in internal vertices such that the value at the root changes or to report that this is impossible.
Solution
1) Dynamics state: d[v][x] - the number of operations required to obtain the value x at vertex v. If this is not possible, then the state value is +inf .
2) Initial values: for leaves, it is obvious that your value can be obtained in zero changes, but it is impossible to change the value, that is, it is possible, but only in +inf operations.
3) Conversion formula:
If this vertex already has a value x , then zero. If not, then there are two options: change the operation at the current vertex or not. For both you need to find best option and choose the best one.
If the operation is “AND” and you need to get “0”, then the answer is the minimum of the values d[i] , where i is the son of v .
If the operation is “AND” and you want to get “1”, then the answer is the sum of all values d[i] , where i is the son of v .
If the operation is “OR” and you want to get “0”, then the answer is the sum of all values d[i] , where i is the son of v .
If the operation is “OR” and you need to get “1”, then the answer is the minimum of the values d[i] , where i is the son of v .
4) Recalculation order: the easiest way to implement it is lazily - in the form of a depth-first search from the root.
5) The answer is d xor 1].
Dynamics by subsets
In subset dynamics, the state usually includes a mask of a given set. They are most often sorted in order of increasing the number of units in this mask and recalculated, accordingly, from states that are smaller in inclusion. Lazy dynamics are usually used so as not to specifically think about the traversal order, which is sometimes not entirely trivial.
Example #7: Hamiltonian cycle of minimum weight, or the traveling salesman problem
Given a weighted (edge weights are non-negative) graph G of size N . Find a Hamiltonian cycle (a cycle passing through all vertices without self-intersections) of minimum weight.
Solution
Since we are looking for a cycle passing through all vertices, we can choose any vertex as the “initial” vertex. Let this be vertex number 0.
1) State of dynamics: dp[v] - the path of minimum weight from vertex 0 to vertex v, passing through all vertices lying in mask and only through them.
2) Initial values: dp = 0, all other states are initially +inf.
3) Conversion formula: If the i-th bit in mask is 1 and there is an edge from i to v, then:
dp[v] = min(dp[v], dp[i] + w[i][v]) Where w[i][v] is the weight of the edge from i to v .
4) Recalculation procedure: the simplest and convenient way- this is to write lazy dynamics, but you can get creative and write an enumeration of masks in order of increasing the number of unit bits in it.
5) The answer lies in d[(1<< N) - 1] .
Dynamics by profile
Classical problems solved by profile dynamics are problems of tiling a field with some figures. Moreover, different things can be asked, for example, the number of tiling methods or tiling with a minimum number of figures.
These problems can be solved by exhaustive search in , where a is the number of tiling options for one cell. Dynamics by profile optimizes time along one of the dimensions to linear, leaving only a coefficient in the exponential. You will get something like this: .
A profile is k (often one) columns, which are the boundary between the part that has already been paved and the part that has not yet been paved. This border is only partially filled. Very often it is part of the dynamic state.
Almost always the state is a profile and where that profile is. And the transition increases this location by one. You can find out whether it is possible to switch from one profile to another in a time linear to the profile size. This can be checked every time during the recalculation, but it can also be pre-calculated. We will pre-calculate a two-dimensional array can - is it possible to move from one mask to another by placing several figures, increasing the position of the profile by one. If you pre-calculate, it will take less time to complete and more memory.
Example #8: Domino tiling
Find the number of ways to tile an N x M table using 1 x 2 and 2 x 1 dominoes.
Solution
Here the profile is one column. It is convenient to store it in the form of a binary mask: 0 - an untiled cell of the column, 1 - a tiled cell. That is, total profiles.
0) Pre-calculation (optional): go through all pairs of profiles and check that it is possible to go from one to another. In this problem this is checked like this:
If in the first profile there is a 1 in the next place, then in the second there must be a 0, since we will not be able to cover this cell with any figure.
If in the first profile there is a 0 in the next place, then there are two options - either in the second it is 0 or 1.
If 0, this means that we must place a vertical domino, which means the next cell can be considered as 1. If 1, then we place a vertical domino and move to the next cell.
Examples of transitions (from the top profile you can go to the bottom ones and only in them):
After that, save everything in the can array - 1 if you can go, 0 if you can’t.
1) Dynamics state: dp - the number of complete tilings of the first pos - 1 columns with the mask profile.
2) Initial state: dp = 1 - left border of the field - straight wall.
3) Conversion formula:
dp += dp * can
4) The order of traversal is in increasing order of pos.
5) The answer lies in dp.
The resulting asymptotics is .
Dynamics along a broken profile
This is a very strong optimization of profile dynamics. Here the profile is not only a mask, but also a break point. It looks like this:
Now, after adding a break to the profile, you can move on to the next state, adding just one figure covering the left cell of the break. That is, by increasing the number of states by N times (we must remember where the break point is), we reduced the number of transitions from one state to another from to . Asymptotics improved from to .
Transitions in dynamics along a broken profile using the example of a problem about tiling with dominoes (example No. 8):
Reply recovery
Sometimes it happens that simply knowing some characteristic of the best answer is not enough. For example, in the “String Packing” task (example No. 4), we end up getting only the length of the shortest compressed string, but, most likely, we need not its length, but the string itself. In this case, you need to restore the answer.
Each problem has its own way of recovering the answer, but the most common are:
Store the complete answer to the subtask next to the dynamics state value. If the answer is something large, it may require too much memory, so if another method can be used, it is usually done.

Reconstruct the answer, knowing the ancestor(s) of the given state. It is often possible to reconstruct a response by knowing only how it was received. In the same “String Packing”, to restore the response, you can store only the type of the last action and the states from which it was received.

There is a way that does not use additional memory at all - after recalculating the dynamics, go from the end along the best path and compose the answer along the way.

Small optimizations
Memory
Often in dynamics one can encounter a problem in which a state requires a not very large number of other states to be counted. For example, when calculating Fibonacci numbers, we use only the last two, and never return to the previous ones. This means that you can forget about them, that is, not store them in memory. Sometimes this improves the asymptotic estimate from memory. This technique can be used in examples No. 1, No. 2, No. 3 (in the solution without a transition matrix), No. 7 and No. 8. True, this cannot be used in any way if the traversal order is lazy dynamics.
Time
Sometimes it happens that you can improve the asymptotic time by using some kind of data structure. For example, in Dijkstra's algorithm, you can use a priority queue to change the asymptotic time.
State replacement
In solutions by dynamics, a state necessarily appears - parameters that uniquely define the subtask, but this state is not necessarily the only one. Sometimes you can come up with other parameters and get a benefit from this in the form of a reduction in asymptotic time or memory.
Example #9: Number expansion
You need to find the number of decompositions of the number N into different terms. For example, if N = 7, then there are 5 such expansions:
3 + 4

2 + 5

1 + 7

1 + 2 + 4

Among the problems solved using mathematical programming, one can distinguish a separate class of problems that require optimization of multi-step (multi-stage) processes. Such problems are distinguished by the possibility of dividing the solution into several interconnected stages. To solve such problems, dynamic programming or, as it is also called, multi-stage programming is used. Its methods are optimized to find the optimal solution to multi-step problems that can be divided into several stages, steps, etc.
Origin of the term
The use of the word “dynamic” in the name originally implied that the division into subtasks would occur primarily in time. When using dynamic methods to solve production, economic and other problems in which the time factor appears, breaking it down into separate stages is not difficult. But it is also possible to use dynamic programming techniques in tasks where individual stages are not related in time. In a multi-step problem, you can always select a parameter or property that can be used to divide it into separate steps.
Algorithm (method) for solving multi-stage problems
The dynamic programming algorithm or method is based on the principle of sequential optimization of a problem, when the solution to a general problem is divided into a number of solutions to individual subproblems and then combined into a single solution. Very often, individual subproblems turn out to be the same, and one general solution significantly reduces calculation time.
A feature of the method is the autonomy of solving the problem at each individual stage, i.e., regardless of how the process was optimized and solved at the previous stage, only the process parameters that characterize it at the moment are used in the current calculation. For example, a driver moving along the road makes a decision about the current turn regardless of how and how long he drove before.
Method from above and method from below
Despite the fact that when calculating at a separate stage of solving a problem, the process parameters at the current moment are used, the result of optimization at the previous stage affects the calculations of subsequent stages to achieve the best result as a whole. Dynamic programming calls this solution principle the optimality method, which determines that the optimal strategy for solving a problem, regardless of the initial decisions and conditions, must be followed by subsequent decisions at all stages to create an optimal strategy relative to the initial state. As we can see, the process of solving a problem is a continuous optimization of the result at each individual stage from the first to the last. This method is called the top-down programming method. The figure schematically shows the top-down solution algorithm. But there is a class of multi-step problems in which the maximum effect at the last stage is already known, for example, we have already arrived from point A to point B and now we want to find out whether we drove correctly at each previous stage or whether something could have been done more optimally. A recursive sequence of stages arises, i.e. we go, as it were, “from the opposite direction.” This solution method is called the “bottom-up programming method.”
Practical use
Dynamic programming can be used in any field of activity where there are processes that can be divided into a number of identical small stages according to some parameter (time, amount, temperature, etc.). Dynamic solution methods are most widely used in control theory and in the development of computer systems.
Finding the optimal path
Using dynamic optimization, it is possible to solve a wide class of problems of finding or optimizing the shortest path and other problems in which the “classical” method of enumerating possible solution options leads to an increase in calculation time and is sometimes completely unacceptable. The classic dynamic programming problem is the knapsack problem: given a certain number of objects with a certain mass and cost, and it is necessary to select a set of objects with the maximum cost and mass that does not exceed the volume of the knapsack. A classic search of all options in search of the optimal solution will take considerable time, but with the help of dynamic methods the problem is solved in an acceptable time frame. The problems of finding the shortest path for transport logistics are basic, and dynamic solution methods are optimally suited for solving them. The simplest example of such a task is building the shortest route using a car GPS navigator.
Production
Dynamic programming is widely used in solving a variety of production problems, such as inventory management to maintain the required number of components at any given time, scheduling of the production process, routine and overhaul of equipment, uniform workload of personnel, the most efficient distribution of investment funds, etc. To solve production problems using dynamic programming methods, special software packages have been developed that are integrated into popular enterprise management systems, such as SAP.
Scientific field

Dynamic programming methods are widely used in various scientific research. For example, they are successfully used in speech and image recognition algorithms, when processing large amounts of data in sociology and

Hello, Habrakhabr. I'm currently working on a textbook on Olympiad programming, one of the paragraphs of which is devoted to dynamic programming. Below is an excerpt from this paragraph. Trying to explain this topic as simply as possible, I tried to accompany complex moments with illustrations. I'm interested in your opinion on how clear this material is. I would also be glad to receive advice on what other tasks should be included in this section.
In many Olympiad programming problems, solving using recursion or exhaustive search requires performing a very large number of operations. An attempt to solve such problems, for example, by exhaustive search, leads to exceeding the execution time.
However, among exhaustive search and some other problems, we can distinguish a class of problems that have one good property: having solutions to some subproblems (for example, for a smaller number n), you can find a solution to the original problem almost without searching.
Such problems are solved using the dynamic programming method, and by dynamic programming itself we mean reducing the problem to subtasks.
Sequences
The classic sequence problem is the following.
Fibonacci sequence F n is given by the formulas: F 1 = 1, F 2 = 1,
Fn = F n - 1 + F n- 2 at n> 1. Need to find F n by number n.
One solution that may seem logical and efficient is to solve using recursion:
Int F(int n) ( if (n< 2) return 1; else return F(n - 1) + F(n - 2); }
Using such a function, we will solve the problem “from the end” - we will reduce step by step n until we reach known values.
But as you can see, such a seemingly simple program already at n= 40 works noticeably for a long time. This is due to the fact that the same intermediate data is calculated several times - the number of operations grows at the same speed as the Fibonacci numbers grow - exponentially.
One way out of this situation is to save already found intermediate results for the purpose of their reuse:
Int F(int n) ( if (A[n] != -1) return A[n]; if (n< 2) return 1; else { A[n] = F(n - 1) + F(n - 2); return A[n]; } }
The given solution is correct and effective. But for this problem a simpler solution is also applicable:
F = 1; F = 1; for (i = 2; i< n; i++) F[i] = F + F;
This solution can be called a “from the beginning” solution - we first fill in the known values, then we find the first unknown value ( F 3), then the next one, etc., until we reach the desired one.
This is exactly the solution that is classic for dynamic programming: we first solved all the subproblems (we found all F i For i < n), then, knowing the solutions to the subproblems, we found the answer ( F n = F n - 1 + F n - 2 , F n- 1 and F n- 2 have already been found).
One-Dimensional Dynamic Programming
To better understand the essence of dynamic programming, we first more formally define the concepts of task and subtask.
Let the initial problem be to find a certain number T with initial data n 1 , n 2 , ..., n k. That is, we can talk about the function T(n 1 , n 2 , ..., n k), the value of which is the answer we need. Then we will consider the tasks as subtasks
T(i 1 , i 2 , ..., i k) at i 1 < n 1 , i 2 < n 2 , ..., i k < n k .
The following one-dimensional dynamic programming problem occurs in various variations.
At n < 32 полный перебор потребует нескольких секунд, а при n= 64 exhaustive search is not feasible in principle. To solve the problem using the dynamic programming method, we reduce original problem to subtasks.
At n = 1, n= 2 the answer is obvious. Let's assume that we have already found K n - 1 , K n- 2 — the number of such sequences of length n- 1 and n - 2.
Let's see what the length of the sequence can be n. If its last character is 0, then the first n- 1 — any correct sequence of length
n- 1 (it doesn’t matter whether it ends with zero or one - 0 comes next). There are only such sequences K n- 1 . If the last character is equal to 1, then the penultimate character must be equal to 0 (otherwise there will be two ones in a row), and the first
n- 2 characters - any valid sequence of length n- 2, the number of such sequences is equal K n - 2 .
Thus, K 1 = 2, K 2 = 3, K n = K n - 1 + K n- 2 at n> 2. That is this task actually comes down to finding the Fibonacci numbers.
2D Dynamic Programming
A classic problem in two-dimensional dynamic programming is the problem of routes on a rectangular field.
In different formulations, you need to count the number of routes or find a route that is the best in some sense.
Here are a couple of formulations of such tasks:
Task 2. n*m cells. You can take one-cell steps to the right or down. Count how many ways you can get from the top left cell to the bottom right cell.
Task 3. Given a rectangular field of size n*m cells. You can take one-cell steps to the right, down, or diagonally right and down. Each cell contains some natural number. You need to get from the top left cell to the bottom right cell. The weight of the route is calculated as the sum of numbers from all visited cells. It is necessary to find a route with minimal weight.
A characteristic feature of all such problems is that each individual route cannot go through the same cell two or more times.
Let's consider task 2 in more detail. To some cell with coordinates ( i,j) can only come from above or from the left, that is, from cells with coordinates ( i - 1, j) And ( i, j - 1):
Thus, for a cell ( i, j) the number of routes A[i][j] will be equal to
A[j] + A[i], that is, the problem is reduced to two subtasks. This implementation uses two parameters − i And j- therefore, in relation to this problem we are talking about two-dimensional dynamic programming.
Now we can go sequentially through the rows (or columns) of array A, finding the number of routes for the current cell using the above formula. First you need to put the number 1 in A.
In problem 3, in a cell with coordinates ( i, j) we can get from cells with coordinates
(i- 1, j), ( i, j- 1) and ( i - 1, j- 1). Let's assume that for each of these three cells we have already found the route of the minimum weight, and placed the weights themselves in W[j], W[i],
W. To find the minimum weight for ( i, j), you need to select the minimum of the weights W[j], W[i], W and add to it the number written in the current cell:
W[i][j] = min(W[j], W[i], W) + A[i][j];
This task is complicated by the fact that it is necessary to find not only the minimum weight, but also the route itself. Therefore, in another array we will additionally record for each cell from which side we need to enter it.
The following figure shows an example of the initial data and one of the steps of the algorithm.
Exactly one arrow leads to each of the already passed cells. This arrow shows from which side you need to come to this cell in order to get the minimum weight recorded in the cell.
After passing through the entire array, you will need to trace the route itself from the last cell, following the arrows in the opposite direction.
Subsequence problems
Consider the increasing subsequence problem.
Task 4. Given a sequence of integers. It is necessary to find its longest strictly increasing subsequence.
Let's start solving the problem from the beginning - we will look for the answer, starting from the first terms of this sequence. For each number i we will look for the largest increasing subsequence ending with an element in position i. Let the original sequence be stored in array A. In array L we will record the lengths of the maximum subsequences ending with the current element. Let us find all L[i] for 1<= i <= k- 1. Now you can find L[k] as follows. We look through all elements of A[i] for 1<= i < k- 1. If
A[i]< A[k], то k The th element can become a continuation of the subsequence ending with the element A[i]. The length of the resulting subsequence will be 1 greater than L[i]. To find L[k], you need to go through all i from 1 to k - 1:
L[k] = max(L[i]) + 1, where the maximum is taken over all i such that A[i]< A[k] и
1 <= i < k.
Here, the maximum of the empty set will be considered equal to 0. In this case, the current element will become the only one in the selected sequence, and will not be a continuation of one of the previous ones. After filling the array L, the length of the largest increasing subsequence will be equal to the maximum element of L.
To restore the subsequence itself, you can also store the number of the previous selected element for each element, for example, in an array N.
Let's consider the solution to this problem using the example of the sequence 2, 8, 5, 9, 12, 6. Since there is no element before 2, the maximal subsequence contains only one element - L = 1, and there is none before it - N = 0. Further,
2 < 8, поэтому 8 может стать продолжением последовательности с предыдущим элементом. Тогда L = 2, N = 1.
The only element smaller than A = 5 is A = 2, so 5 can become a continuation of only one subsequence - the one that contains 2. Then
L = L + 1 = 2, N = 1, since 2 is in position number 1. Similarly, we perform three more steps of the algorithm and get the final result.
Now we select the maximum element in the array L and from the array N we reconstruct the subsequence itself 2, 5, 9, 12.
Another classic dynamic programming problem is the palindrome problem.
Task 5. Given a string of capital letters of the Latin alphabet. You need to find the length of the largest palindrome that can be obtained by crossing out some letters from a given string.
Let us denote this string by S and its symbols by S[i], 1<= i <= n. We will consider possible substrings of this string with i th j th symbol, we denote them by S(i, j). We will write the lengths of the maximum palindromes for substrings in a square array L: L[i][j] is the length of the maximum palindrome that can be obtained from the substring S(i, j).
Let's start solving the problem with the simplest substrings. For a single character string (that is, a substring of the form S(i, i)) the answer is obvious - there is no need to cross out anything, such a string will be a palindrome. For a two character string S(i, i+ 1) two options are possible: if the characters are equal, then we have a palindrome, there is no need to cross out anything. If the characters are not equal, then cross out any.
Let us now be given a substring S(i, j). If the first (S[i]) and last (S[j]) characters of the substring do not match, then one of them must be deleted. Then we are left with the substring S(i, j- 1) or S(i + 1, j) - that is, we reduce the problem to a subtask: L[i][j] = max(L[i], L[j]). If the first and last characters are equal, then we can leave both, but we need to know the solution to the problem S(i + 1, j - 1):
L[i][j] = L + 2.
Let's look at the solution using the string ABACCBA as an example. First of all, we fill the diagonal of the array with units, they will correspond to the substrings S(i, i) from one character. Then we start looking at substrings of length two. In all substrings except S(4, 5), the symbols are different, so we write 1 in the corresponding cells, and 2 in L.
It turns out that we will fill the array diagonally, starting with the main diagonal leading from the upper left corner to the lower right. For substrings of length 3, the following values are obtained: in an ABA substring, the first and last letters are equal, so
L = L + 2. In the remaining substrings, the first and last letters are different.
BAC: L = max(L, L) = 1.
ACC: L = max(L, L) = 2.
CCB: L = max(L, L) = 2.
CBA: L = max(L, L) = 1.
If in the task it is necessary to output not the length, but the palindrome itself, then in addition to the array of lengths, we must construct an array of transitions - for each cell, remember which of the cases was implemented (for clarity, in the figure, instead of the numeric values encoding transitions, the corresponding arrows are drawn) .

The essence of the dynamic programming method………………………..4
An example of solving a problem using the dynamic programming method………………………………………………………...7

List of sources used……………………………………...11
Dynamic programming. Basic concepts.

Dynamic programming (DP) in computer systems theory, a way to solve complex problems by breaking them down into simpler subtasks. It is applicable to problems with an optimal substructure, which look like a set of overlapping subproblems whose complexity is slightly less than the original one. In this case, the computation time, compared to “naive” methods, can be significantly reduced.
The key idea in dynamic programming is quite simple. As a rule, in order to solve a given problem, it is necessary to solve individual parts of the problem (subtasks), and then combine the solutions to the subtasks into one general solution. Often many of these subtasks are the same. The dynamic programming approach is to solve each subproblem only once, thereby reducing the number of calculations. This is especially useful in cases where the number of repeating subtasks is exponentially large.
Dynamic programming is a mathematical technique that approaches the solution of a certain class of problems by breaking them down into parts, smaller and less complex problems. At the same time, a distinctive feature is the solution of problems in stages, at fixed intervals, periods of time, which determined the appearance of the term dynamic programming. It should be noted that dynamic programming methods are also successfully used when solving problems in which the time factor is not taken into account. In general, the mathematical apparatus can be represented as step-by-step or step-by-step programming. The solution of problems by dynamic programming methods is carried out on the basis of the principle of optimality formulated by R. E. Bellman: optimal behavior has the property that whatever the initial state of the system and the initial solution, the subsequent solution must determine the optimal behavior relative to the state obtained as a result of the initial solution. It follows from this that the planning of each step should be carried out taking into account the overall benefit obtained upon completion of the entire process, which allows optimizing the final result according to the selected criterion.
Thus, dynamic programming in a broad sense is the optimal control of a process by changing the controlled parameters at each step, and, therefore, influencing the progress of the process, changing the state of the system at each step. In general, dynamic programming is a harmonious theory to understand and simple enough to be used in commercial activities when solving both linear and nonlinear problems.
Dynamic programming is one of the branches of optimal programming. It is characterized by specific methods and techniques applied to operations in which the decision-making process is divided into stages (steps). Dynamic programming methods are used to solve variant optimization problems with given optimality criteria, with certain connections between variables and the objective function, expressed by a system of equations or inequalities. In this case, as in problems solved by linear programming methods, restrictions can be given in the form of equalities or inequalities. However, if in linear programming problems the dependencies between the criterion function and variables are necessarily linear, then in dynamic programming problems these dependencies can also be nonlinear. Dynamic programming can be used both to solve problems associated with the dynamics of a process or system, and for static problems associated, for example, with resource allocation. This significantly expands the scope of dynamic programming for solving control problems. And the possibility of simplifying the solution process, which is achieved by limiting the area and number examined when moving to the next stage of options, increases the advantages of this set of methods.
However, dynamic programming also has disadvantages. First of all, there is no single universal solution method. Almost every problem solved by this method is characterized by its own characteristics and requires a search for the most appropriate set of methods to solve it. In addition, the large volumes and complexity of solving multi-step problems with many states lead to the need to select low-dimensional problems or use compressed information. The latter is achieved using methods for analyzing options and processing the list of states.
For continuous-time processes, dynamic programming is considered as a limiting version of a discrete solution scheme. The results obtained in this case practically coincide with those obtained by the maximum methods of L. S. Pontryagin or Hamilton-Jacobi-Bellman.
Dynamic programming is used to solve problems in which the search for an optimum is possible with a step-by-step approach, for example, the distribution of scarce capital investments between new areas of their use; development of demand or inventory management rules that establish the moment of replenishment and the size of the replenishing order; development of principles for production scheduling and equalization of employment in conditions of fluctuating demand for products; drawing up calendar plans for current and major repairs of equipment and its replacement; search for the shortest distances on the transport network; formation of the sequence of development of a commercial operation, etc.