Formulation and proof of the Kuhn-Tucker theorem. The concept of a saddle point. Kuhn-Tucker theorem

The Kuhn-Tucker theorem is a generalization of methods for solving optimization problems in two directions:

Linear programming for the nonlinear case, which by analogy received the not very successful name “not linear programming»;

Nonlinear equality constraints on inequality constraints.

Karush-Kuhn-Tucker method and conditions ( Karush-Kuhn-Tucker conditions, KKT) are formally necessary conditions for the optimality of a nonlinear programming problem. Besides, the necessary conditions include the so-called regularity conditions for stationary points - non-degeneracy of the set of constraint gradients. The method is a generalization of the Lagrange multiplier method to the case of inequality constraints.

Kuhn and Tucker generalized the Lagrange multiplier method (for use in constructing optimality criteria for problems with equality constraints) to the case of a general nonlinear programming problem with both equalities and inequalities constraints.

The main method in nonlinear programming still remains the use of the Lagrange function obtained by transferring restrictions to the objective function. The Kuhn-Tucker conditions are also derived from this principle.

William Karush in his diploma work in 1931 found the necessary conditions in general case restrictions of equalities and inequalities. Independently, Harold Kuhn and Albert Tucker came to the same conclusions later in 1941. Their results were more general.

If, under the imposed restrictions, is a solution to the problem, then there is a vector of Lagrange multipliers such that for the Lagrange function conditions are met:

- stationarity: ;

- complementary softness: ;

- non-negativity:.

The listed necessary conditions for the minimum of a function are not sufficient in the general case. There are several options additional conditions, which make them sufficient:

- simple condition – 1) the point is stationary, 2) the condition of complementary non-rigidity is satisfied, 3) Lagrange multipliers;

- Slater's condition (weaker) – 1) the point is stationary, 2) the condition of complementary non-rigidity is satisfied, 3) .

Let us proceed directly to the proof of the Kuhn-Tucker theorem.

If a continuous function of n variables x = (x1,...,xn) F(x) has at the point X opt maximum, then there exists ε > 0 such that for all x from the ε-neighborhood of the point X wholesale

F(x)≤F(x opt)

F(x)-F(x opt)≤0.

Let's choose two types of increment x j along j th coordinates

Δx j =x j -x jopt >0,

Δx j =x j -x jopt<0.

Passing in these relations to the limit at Δ x j→0, we get:

From these relations it follows that

(3.6.1)

A similar relation can be obtained for the case of a minimum function. Thus, the necessity of conditions (3.6.1) to achieve at the point x wholesale maximum or minimum function f(x), i.e. if there is an extremum, then conditions (3.6.1) are satisfied. But the equality to zero of all derivatives at the point x wholesale does not yet ensure the existence of an extremum in it, i.e., conditions (3.6.1) are not sufficient. Geometrically, this means that in the case of a zero derivative of a function of one variable there may be an inflection point, and not a maximum (or minimum), and in the case of a function of two variables, a saddle point, and not an extremum, etc. Therefore, the points x wholesale, in which relations (3.6.1) are satisfied are called stationary.

Note that condition (3.6.1) was obtained thanks to the ability to assign a variable X increments of two signs, which is where the two inequalities at and at . If the valid range X limited to non-negative values X≥0, then inside the region where X> 0, condition (3.6.1) remains valid, since increments of both signs are allowed there. On the border of the region X≥ 0, where X= 0, only positive increment Δ is allowed X>0, we can only talk about a one-sided derivative, and from (3.6.1) the following necessary condition for a maximum follows:

Accordingly, the necessary condition for a minimum at the boundary of the region x j= 0 will be written as

b) Conditional extremum problem

When determining conditional extremum functions when you need to determine the maximum (or minimum) of a function F(x) under limiting conditions:

g i (x) = b i , i = 1, ..., m,

f(x)=max;

g i (x)=b i ;

The method of Lagrange multipliers is also used, which, as in the case of classical calculus of variations, consists in introducing the Lagrange function

(3.6.2)

where λ i are the undetermined Lagrange multipliers.

Assuming that the function is a special case of the functional, we obtain that the necessary conditions for the extremum are found by direct differentiation of relation (3.6.2) and are written in the form

If we introduce vectors

relations (17-8) and (17-9) will be rewritten as

grad Φ = grad F - λ grad φ = 0; (3.6.6)

where the equality of vectors to zero is understood componentwise.

Figure 3.6.1 - Explanation of the conditional extremum problem

When n= 2 and m= 1, the geometric problem of finding a conditional extremum is reduced (Fig. 17-6) to finding the tangency point A of the curve φ = b to one of the constant level curves f= const.

Theorem 7.2. For a convex programming problem (7.4)–(7.6), the set of feasible solutions of which has the regularity property, a plan is an optimal plan if and only if there exists a vector such that
– saddle point of the Lagrange function.

Assuming that the objective function
and functions are continuous and differentiable, then the Kuhn-Tucker theorem can be supplemented with analytical expressions that determine the necessary and sufficient condition for the point
was the saddle point of the Lagrange function, i.e. was a solution to the convex programming problem. These expressions have the following form:

Where And are the values of the corresponding partial derivatives of the Lagrange function calculated at the saddle point.

The Kuhn-Tucker theorem occupies a central place in the theory of nonlinear programming. It is also valid for so-called quadratic programming problems, i.e. where is the objective function
with restrictions:

In the general case of solving nonlinear programming problems to determine the global extremum of the problem, there is no effective computational method if it is not known that any local extremum is also global. Thus, in a convex and quadratic programming problem, the set of feasible solutions is convex, then if the objective function is concave, any local maximum is also global.

Example 7.3. Using the Kuhn-Tucker theorem we find
under restrictions

We solve graphically (Fig. 7.2) and find:
;
;
(see solution below for more details). Let us show that there is a vector Y 0 0, at which the Kuhn-Tucker conditions are satisfied at the optimum point. Let's compose the Lagrange function:

Solution
we find from the system:
. From the second equation
substitute into the first equation of the system. Differentiate by :
. In expressions And substitute the values
And
. We have values of partial derivatives greater than zero, and according to the condition they must be equal to zero, then =0 And =0. On the other side, may not accept zero values, because
from here
; All
those. the Kuhn-Tucker conditions are satisfied, therefore, this point is an extremum point.

Fig.7.2. Graphical solution to the nonlinear problem

programming example 7.3

Necessary condition for optimality for a nonlinear programming problem can be formulated as follows. Let
is the optimal solution to problem (7.4)–(7.6), and the functions
,
,
differentiable at this point. If
is a nonsingular point of the admissible set of problem (7.4)–(7.6), then it is a Kuhn-Tucker point of this problem.

Thus, if an admissible set
problem (7.4)-(7.6) has no singular points, and the functions F(M),g i (M), (
) differentiable at all points
, then the optimal solution to this problem should be sought among the Kuhn-Tucker points.

As is known from mathematical analysis, special point sets of admissible solutions to a system of linear equations and inequalities of the form

i.e., sets of feasible solutions

if at the point
the gradients of those functions are linearly dependent
, which in it turn into . For example,
– singular point of the set

Really,

Thus, it is a linearly dependent system.

Gradient function
is a vector whose coordinates are respectively equal to the values of the partial derivatives of the function
at the point M 0 . This vector shows the direction of the fastest growth of the function.

The necessary optimality condition is of little use for solving most nonlinear programming problems, because Only in rare cases is it possible to find all the Kuhn-Tucker points of a nonlinear programming problem.

Sufficient condition for optimality for a nonlinear programming problem: if
is the saddle point of the Lagrange function for problem (7.4)–(7.6), then
is the optimal solution to this problem.

This condition is not necessary in the general case: the Lagrange function may not have saddle points, while at the same time the original nonlinear programming problem has an optimal solution.

Various methods are known that allow one to find the optimal solution to a nonlinear programming problem approximately. However, these methods provide a fairly good approximation only for the convex programming problem, where every local extremum is also global. In general, under gradient methods understand methods in which the movement to the optimum point of a function coincides with the direction of the gradient of this function, i.e. starting from some point
a sequential transition is carried out to some other points until an acceptable solution is identified original problem. In this case, gradient methods can be divided into 2 groups.

TO first This group includes methods in which the points under study do not go beyond the range of feasible solutions to the problem. The most common of these methods is the method Frank-Wolf (Wolf). Such methods include the method projected Rosen gradients, method valid directions of Zeutendijk.

Co. second This group includes methods in which the points under study may or may not belong to the region of feasible solutions. However, as a result of the implementation of the iterative process, a point in the region of feasible solutions is found that determines an acceptable solution.

Of these methods, the most commonly used method is penalty functions or method Arrow-Hurwitz.

When finding solutions to problems using gradient methods, including those mentioned above, the iterative process is carried out until that moment, while the gradient of the function
at the next point
will not become equal to zero or until
, Where – a fairly small positive number characterizing the accuracy of the resulting solution.

Solving complex nonlinear programming problems using gradient methods involves a large amount of calculations and is only advisable using a computer.

Using an example, we will show the general meaning of gradient methods for solving nonlinear programming problems.

Let there be a function of two variables
. Let the initial values of the variables be
, and the value of the function
. If
is not an extremum, then such new values of the variables are determined
And
so that the next value of the function
turned out to be closer to the optimum than the previous one.

How is the size of the increments determined? And ? To do this, at points And The direction of change of the function towards the extremum is determined using the gradient. How more value partial derivative, the faster the function changes towards the extremum. Therefore the increments And are chosen proportional to the value of the partial derivatives And at points
And
. Thus,
And
, Where – a value called step, which sets the scale of change And .

Example 7.4.

Let it be necessary to maximize the function

(maximum at point (3;2))

At the point
,
at
we have

;
,

Let's do one more iteration. At the point
,
at
we have

;
,

Fig.7.3. Geometric interpretation of two steps

gradient method for example 7.4

In Fig. 7.3 shows the movement along the gradient, which was carried out in in this example. Attitude indicates the direction of change of the function towards the maximum. If we take the gradient with a minus sign, then we will move towards the minimum.

A specific problem with gradient methods is the choice of step size t. If we take small steps, we will look for the optimum for a very long time. If you choose its value too large, you can “overshoot” the optimum. The intermediate problem of choosing the optimal step size is solved using appropriate methods. If step t is determined approximately, then, as a rule, they fall not into the optimum, but into its zone. Gradient methods provide the determination of local optima. When searching for a global optimum using a gradient, there is often doubt that the found optimum is global. This is the disadvantage of this method when solving non-convex programming problems.

Self-test questions

Statement of the nonlinear programming problem.

The process of finding a solution to a nonlinear programming problem using its geometric interpretation.

Algorithm of the Lagrange method for solving a nonlinear programming problem.

Application of the Lagrange method to solving a nonlinear programming problem in the case where the connection conditions are inequalities.

Auxiliary definitions and theorems necessary for considering the central theorem of nonlinear programming - the Kuhn-Tucker theorem.

Kuhn-Tucker theorem.

Necessary and sufficient optimality condition for a nonlinear programming problem.

The general meaning of gradient methods for approximate solution of nonlinear programming problems.

Groups of gradient methods for approximate solution of nonlinear programming problems.

Let's write the Lagrange function: (4)
where λ i , i=1..m – undetermined Lagrange multipliers; z(X) and q i (X) are upward convex functions.

Kuhn-Tucker theorem. For plan X 0 to be a solution to problem (1) – (4), it is necessary and sufficient that there exist a vector λ 0 ≥ 0 such that the pair (X 0 ,λ 0) for all X ≥ 0 and λ ≥ 0.
L(X, λ 0) ≤ L(X 0 ,λ 0) ≤ L(X 0 ,λ)

Purpose of the service. An online calculator is used to find the extremum of a function through Lagrange multipliers in online mode with checking the solution in MS Excel. In this case, the following tasks are solved:

the Lagrange function L(X) is compiled in the form of a linear combination of the function F(X) and constraints g i (x);
the partial derivatives of the Lagrange function are found, ∂L/∂x i , ∂L/∂λ i ;
a system of (n+m) equations is compiled, ∂L/∂x i = 0.
the variables x i and Lagrange multipliers α i are determined.

In order for the function of two vector variables (4) to have a saddle point, it is necessary and sufficient to fulfill the following Kuhn-Tucker conditions:
(5)
(6)
(7)
(8)

If the problem is solved minimization, then there is a saddle point (X 0 , Y 0) if the relations are satisfied
L(X, λ 0) ≤ L(X 0, Y 0) ≤ L(X 0, Y)
The Kuhn-Tucker conditions for existence saddle point the Lagrange functions will change the sign in (5) and (7) to the opposite.

Example. Check the fulfillment of the Kuhn-Tucker conditions. Find the optimum point of a linear programming problem with constraints:
f(x) = x 1 2 -x 2 → min
g 1 (x) = x 1 - 1 ≥ 0
g 2 (x) = 26 - x 1 2 -x 2 2 ≥ 0
h 1 (x) = x 1 + x 2 - 6 = 0

Solution:
Lagrange function: L(X, λ) = x 1 2 -x 2 - λ 1 (1-x 1) + λ 2 (26-(x 1 2 +x 2 2)) + λ 3 (6-(x 1 +x 2))
A necessary condition for the extremum of the Lagrange function is the equality to zero of its partial derivatives with respect to the variables x i and indefinite factors λ.
Let us check the fulfillment of the Kuhn-Tucker conditions. Let us calculate the matrices of second derivatives objective function and constraint functions.

The Hessian matrix H f is positive semidefinite for all values of x, therefore f(x) is a convex function.
Constraint g 1 (x) – linear function, which is both convex and concave.
The Hessian matrix for the function g 2 (x) has the form:

Δ 1 = -2, Δ 2 = 4.
Since the matrix H g 2 is negative definite, g 2 (x) is concave.
The function h 1 (x) is included in the linear constraint in the form of equality. Therefore, conditions (a), (b) and (c) of Theorem 1 are satisfied. Let us now find the optimum point x *.
We have:

The equation takes the following form:

For j=1 and j=2, respectively, the following expressions can be obtained:
j=1, 2x 1 -μ 1 + 2x 2 μ 2 + λ 1 = 0
j=2, -1 + 2x 2 μ 2 + λ 1 = 0
Thus, the Kuhn-Tucker conditions for our example are as follows:
2x 1 -μ 1 + 2x 2 μ 2 + λ 1 = 0
-1 + 2x 2 μ 2 + λ 1 = 0
x 1 + x 2 – 6=0
μ 1 (x 1 -1) = 0
μ 2 (26 - x 1 2 – x 2 2) = 0
μ 1 , μ 2 ≥ 0

From the third equation we find: x 1 = 6-x 2. Substituting the value x 1 into the remaining equations, we get:
2(6-x 2) - μ 1 + 2μ 2 (6-x 2)
-1 + 2x 2 μ 2 + λ 1 = 0
μ 1 (6-x 2 -1) = 0 → x 2 = 5, x 1 = 1
μ 2 (26 – (6-x 2) 2 – x 2 2) = 0
Let's substitute x 2 = 5 into the first and second equations:

From the second equation we express λ and substitute it into the first equation:
3 - μ 1 - 8μ 2 = 0
Let μ 1 = 0.1 ≥ 0, then μ 2 = 2.2 ≥ 0. Thus, the point x * = is the minimum point.

The central place in the theory of nonlinear programming is occupied by the Kuhn-Tucker theorem. Let a nonlinear programming problem be given:

find the maximum value of a function Z=f(x 1, x 2, ..., x n) with restrictions

Let's compose the Lagrange function for this problem:

(4.2)

If the regularity condition is satisfied (there is at least one point X, for which g i(X)>0 for everyone i), then the following theorem holds.

Theorem. Vector X(0) if and only if is optimal solution problem (4.1), when there is a vector Λ (0) such that for for all

Dot ( X (0),Λ (0)) is called saddle point for function F(X,Λ), and the theorem is called the saddle point theorem. Let us prove the sufficiency of conditions (4.3).

Proof. Let X(0) >0 and Λ (0) >0 - saddle point of the function F(X,Λ). If in (4.3) instead F(X,Λ) substitute its value (4.2), we get

at .

The right inequality is valid for any , therefore

Then from the left inequality we get

Since in this case

That f(X (0))>f(X).

So the point X (0) satisfies (4.1) and at all other points satisfying (4.1), the function takes a value less than at X (0).

This statement leads the solution of the NLP problem to finding the saddle points of the Lagrange function F(X,Λ).

The proof of the necessity of conditions (4.3) is not considered due to its complexity.

If f(X) And g i(X)-differentiable functions, then conditions (4.3) are equivalent to the following local Kuhn-Tucker conditions:

Expression

means that the value of the partial derivative of the Lagrange function is taken at the point ( X (0), Λ (0)), where

X (0)=(x 1 (0) , x 2 (0) , ..., x n(0)), Λ (0) =(λ 1 (0) , λ 2 (0) , ..., λ n (0)).

These conditions can be written in vector form:

Example. Find max Z=-x 1 2 -x 2 2 with restrictions

Let us show that there exists Λ (0) 0 for which the Kuhn-Tucker conditions (4.4), (4.5) for the function are satisfied at the optimum point F(X,Λ):

F(X,Λ)=- x 1 2 -x 2 2 +λ 1 (2 x 1 +x 2 -2)+λ 2 (8-2 x 1 -x 2)+λ 3 (6- x 1 -x 2).

According to conditions (4.5), λ 2 and λ 3 must take zero values, since, substituting x 1 =0.8 and x 2 =0.4 in expression

we have values greater than zero, and by condition

According to the condition, λ 1 can take a non-zero value, since

In accordance with (2.16), the derivative

must take zero values, since the coordinates of the vector X (0) are different from zero. We find λ 1 =0.8. Therefore, at the point ( X (0),Λ (0)) the Kuhn-Tucker conditions are satisfied and it is indeed an extremum point.

Let's consider the Kuhn-Tucker conditions in a slightly different form.

Let us have an optimization problem with constraints in the form of equalities:

z= f(x 1 , x 2 , …, xn) → min

under conditions:

g 1 ( x 1 , x 2 , ... , x n) = 0,

g 2 ( x 1 , x 2 , ... , x n) = 0,

g n(x 1 , x 2 , . . . , x n) = 0.

Conditional minimum point of the function f coincides with the saddle point of the Lagrange function:

In this case, the saddle point must provide a minimum in its variables x i and maximum over variables λ j.

Necessary conditions for a stationary point are that partial derivatives of the first order with respect to all variables are equal to zero:

Note that the second equation implies that only valid points will satisfy the necessary conditions.

To obtain a sufficient condition for the existence of a minimum, it is necessary to add the requirement that the Hessian of the objective function be positive definite.

Let's consider general mathematical programming problem:

Z= f(X) → min,

under conditions:

Inequality constraints can be converted to equality constraints by adding to each of them weakening variables

Let's form the Lagrange function:

Then the necessary minimum conditions take the form:

The second equation can be transformed by discarding the attenuating variables and moving to inequality constraints. Let us obtain the constraints of the original problem. The third equation can be multiplied by ui/2 and replace the weakening variables by expressing them from the second equation of the system.

There is one more condition that must be met at the minimum point. This condition: λ i= 0, which is a consequence of the analysis physical meaning coefficients of the Lagrange function.

It can be shown that

Lagrange coefficient at the minimum point;

f* - optimal value functions.

Obviously, as b increases i the admissible region expands, which means that minimum value can only decrease, that is, the derivative must be negative (non-positive). Therefore, at the conditional minimum point

We finally obtain the necessary conditions for the conditional minimum:

The expressions in the second line ensure that the optimal point is valid.

The third line contains the following information: If the constraint is active (i.e. the expression in parentheses is equal to zero), then the corresponding Lagrange coefficient is strictly positive. The positivity of the Lagrange coefficient means that the corresponding constraint is active, i.e. The fact that this limitation is deficient is what stops further improvement of the target function. If the constraint is not active (i.e. the expression in parentheses is not equal to zero), then the corresponding Lagrange coefficient should be equal to zero, i.e. This limitation is not deficient; it does not affect further improvement of the target function.

For the conditional maximum point, the Lagrange coefficients change sign to the opposite (since the optimal value of the objective function at the conditional maximum point should increase).

The above conditions are equivalent Kuhn-Tucker theorem and are often called the same.

Sufficient condition for the minimum (maximum) is the convexity (concavity) of the objective function at a stationary point. This means that the Hessian must be positive (negative) definite.

A summary of the material in this chapter can be viewed in two presentations:

file “Nonlinear programming”;

file "Kuhn-Tucker Theorem".

Slides 10-14 of the presentation “Kuhn-Tucker Theorem” show an example of solving the Kuhn-Tucker problem.

4.5. Control questions

(Developed by Afanasyev M.Yu. and Suvorov B.P.)

Question 1. Given a real function f(X S= . Let X 1 and X 2 - points of this segment and 0 £ l £ 1.

Which of the following inequalities is a condition for the convexity of a function?

Possible answers:

Question 2. Given a real function f(x), defined on the segment real numbers S=. Let x 1 and x 2 are the points of this segment and 0 £ l £ 1.

Which of the following inequalities is a condition for a function to be strictly concavity?

Possible answers:

Question 3. Function

1) convex;

2) strictly convex;

3) concave;

4) strictly concave;

5) convex and concave.

Question 4. Function

3) concave; 4) strictly concave;

5) convex and concave.

Question 5. Function

1) convex; 2) neither convex nor concave;

3) strictly convex; 4) concave:

5) convex and concave.

Question 6. New model high-speed motorcycle “Snail” is sold by the company at a price of (30 – 2 x) thousand dollars per piece, where X- number of motorcycles sold. Variable manufacturing costs are $6,000 per unit and fixed costs are $30,000. Maximize your business's profit for the week.

Let's say that the change in the sales tax rate resulted in an additional $4,000 in sales tax for each motorcycle sold.

How will the optimal production of motorcycles change compared to the initial situation?

(Solve using Lagrange function.)

Possible answers:

1) will increase by 2 ; 2) will decrease by 2 ;

3) will not change; 4) will increase by 1 ;

5) will decrease by 1 .

Question 7. Let's assume you have 2 weeks (14 days) of holiday to spend in the Canary Islands and Nice. Let your utility function be of the form 2 KN – 3K 2 – 4N 2, Where TO And N- the number of days you spend in the Canary Islands and Nice respectively.

How many days should you spend in Nice to maximize your utility function?

(To solve, use the Lagrange function. Round the result to the nearest integer. Check whether the Kuhn-Tucker optimality conditions are met.)

Possible answers:

1) 3 ; 2) 4 ; 3) 5 ; 4) 6 ; 5) 7 .

Question 8. For the problem in Question 7, find the value of the dual estimate of the constraint.

(Round the result to the nearest integer.)

Possible answers:

1) 41 ; 2) 34; 3) 29 ; 4) 39 ; 5) 44 .

Question 9. The monopolist plans a production and sales program for the next period. Prices: R 1 = 14 – 0,25x 1 (per product 1); R 2 = 14 – 0,5X 2 (per product 2), where x 1 and X 2 - sales volumes of products. Let us assume that all produced products are sold. The maximum total sales volume is 57.

What is the optimal release of product 2?

Possible answers:

1) 36,4 ; 2) 30,7 ; 3) 26,3 ; 4) 20,6 ; 5) 41,8 .

Question 10. The owner of a small enterprise has 100 thousand rubles for the next month, which he can spend on increasing fixed assets TO(purchase of equipment) at a price of 1 thousand rubles per unit or for the purchase of additional labor L at a price of 50 rub./hour. Increase in finished products that can be sold for 10 thousand rubles. per unit, determined by the production function F(K, L)= L 2/7 K 2/5.

How much money should be spent on increasing fixed assets?

Possible answers:

1) 74.36 thousand rub.; 2) 58,33 thousand roubles.; 3) 63,44 thousand roubles.;

4) 45,66 thousand roubles.; 5) 39,77 thousand roubles.

Answers on questions:

1 -4,2 - 1,3 -4,4 - 5,

5 -2, 6 -5,7- 4,8- 2,9- 4,10- 2.