Tanmoy Som1, Pankaj Gautam1, Avinash Dixit1, and D. R. Sahu2
1Department of Mathematical Sciences, Indian Institute of Technology (Banaras Hindu University), Varanasi, Uttar Pradesh, 221005, India
2Department of Mathematics, Banaras Hindu University, Varanasi, Uttar Pradesh, 221005, India
Consider a monotone operator . A point is zero of if . The set of all zeros of is denoted by “zer(T).” A wide class of problems can be reduced to a fundamental problem of nonlinear analysis is to find a zero of a maximal monotone operator in a real Hilbert space :
A more general problem we can consider here is to find such that , for some . This problem finds many important application in scientific fields such as signal and image processing [1], inverse problems [2,3], convex optimization [4], and machine learning [5].
Suppose is the gradient of a differentiable convex function , i.e. , the most simple approach to solve Problem 18.1 is via gradient projection method, given by
where are stepsizes. The operator is called as forward–backward and the above scheme is nothing else than classical method of steepest descent.
In case is nondifferentiable function, gradient method generalizes to subgradient method, given by
For a general monotone operator , the classical approach to solve Problem 18.1 is proximal point method, which converts the maximal monotone operator inclusion problem into a fixed point problem of a firmly nonexpansive mapping via resolvent operator. The proximal point algorithm [6,7] is one of the most influential method to solve Problem 18.1 and has been studied extensively both in theory and practice [8–10]. Proximal point algorithm is given by
where is regularization parameter.
The operator is called resolvent operator, introduced by Moreau [11]. The resolvent operator is also known as a backward operator. This method was further extended to monotone operator by Rockafellar [10]. Rockafeller also studied the weak convergence behavior of proximal point algorithm under some mild assumptions. The aim of this chapter is to study the convergence behavior of proximal point algorithm and its different modified form.
In the Section 18.2, we have discussed some definitions and results used to prove the convergence analysis of algorithms. In Section 18.3, we discussed the convergence behavior of proximal point algorithm. In Section 18.4, we have discussed about the splitting algorithms to solve monotone inclusion problems. Section 18.5 deals with the inertial methods to solve monotone inclusion problems. Numerical example is also shown to compare the convergence speed of different algorithms based on inertial technique.
Through this chapter, we will denote the Hilbert space by .
An operator is said to be uniformly monotone on a subset if there exists an increasing function vanishing only at 0 such that
The set of fixed points of function is denoted by .
In many cases, it is difficult to evaluate the resolvent of operator . Interestingly, sometimes, it is as difficult as the original problem itself. To overcome this problem, the operator splits into sum of two maximal monotone operators and such that resolvent of and are easier to compute than the full resolvent . Based on splitting techniques, many iterative methods has been designed to solve Problem 18.1. Some popular methods are Peaceman–Rachford splitting algorithm [15], Douglas–Rachford splitting algorithm [16], and forward–backward algorithm [17].
Douglas–Rachford splitting algorithm is proposed to solve a general class of monotone inclusion problems, when and are maximally monotone. Let and be a sequence in [0,2]. Then, algorithm is as given as follows:
The convergence behavior of Douglas–Rachford splitting algorithm can be summarized as follows:
The splitting method discussed above is complicated, and the simplest one is forward–backward method. The forward–backward algorithm, the composition of forward step with respect to followed by backward step with respect to , is given by
The forward–backward splitting method is studied extensively in [18–20]. For the general monotone operators and , the weak convergence of Algorithm 18.6 has been studied under some restriction on the step‐size.
Passty studied the convergence of forward–backward, which can be summarized as follows.
The classical proximal point algorithm converts the maximal monotone operator inclusion problem into a fixed point problem of firmly nonexpansive mapping via resolvent operators. Consider and are differentiable functions. The proximal point algorithm can be interpreted as one step discretization method for the ordinary differential equations
where represents the derivative of and is the gradient of . The iterative method mentioned above are one step, i.e. the new iteration term depends on only its previous iterate. Multistep methods have been proposed to accelerate the convergence speed of the algorithm, which are discretization of second‐order ordinary differential equation
where . System 18.9 roughly shows the motion of heavy ball rolling under its own inertia over the graph of until friction stops it at a stationary point of . Inertial force, friction force, and graving force are the three parameters in System 18.9, that is why the system is named heavy‐ball with friction (HBF) system. The energy function is given by
which is always decreasing with unless vanishes. Convexity of insures that trajectory will attain minimum point. The convergence speed of the solution trajectories of the System 18.9 is greater than the System 18.8. Polyak [21] first introduced the multistep method using the property of heavy ball System 18.9 to minimize a convex smooth function . Polyak's multistep algorithm was given by
where is a momentum parameter and is a step‐size parameter. It significantly improves the performance of the scheme. Nestrov [22] improves the idea of Polyak by evaluating the gradient at the inertial rather than the previous term. The extrapolation parameter is chosen to obtain the optimal convergence of the scheme. The scheme is given by
where
In 2001, by combining the idea of heavy ball method and proximal point method, Alvarez and Attouch proposed the inertial proximal point algorithm, which can be written as
where is a set‐valued operator and and are the sequences satisfying some conditions.
In 2003, Moudafi and Oliny [23] modified the inertial proximal point algorithm by splitting into maximal monotone operators and with is single valued, Lipschitz continuous operator and ‐cocoercive. The proposed algorithm can be given as follows:
This section is dedicated to the formulation of an inertial Douglas–Rachford splitting algorithm, which approaches the set of zeros of the sum of two maximally monotone operators and to the investigation of its convergence properties.
Recently, with the inspiration of Nestrov's accelerated gradient method, Lorenz and Pock [24] proposed a modification of the forward–backward splitting to solve monotone inclusion problems by the sum of a monotone operator whose resolvent is easy to compute and another monotone operator is cocoercive. They have introduced a symmetric, positive definite map , which is considered as a preconditioner or variable metric. The algorithm can be given as follows:
where is an extrapolation factor, is a step‐size parameter.
Recently, some inertial iterative algorithms have been proposed, which replaced the condition (iii) of Theorem 18.5 with some mild conditions, which makes the algorithms easy to use for real‐world problems. In 2015, Bot et al. [26] proposed the inertial Mann algorithm for nonexpansive operators and studied the weak convergence in real Hilbert space framework. The inertial Mann algorithm for some initial points :
where , and are nonempty, closed, and affine subsets of .
In 2018, Dong et al. [27] proposed a generalized form of inertial Mann algorithm. They included an extra inertial term in inertial Mann algorithm. The algorithm is given as follows:
where .
Convergence analysis can be summarized as follows:
In 2019, Dixit et al. [28] proposed the inertial normal algorithm and studied the weak convergence of the proposed algorithm. The algorithm is given as follows:
In order to compare the convergence speed of Algorithm 18.27, 18.43, and 18.44 in real Hilbert space with Euclidean norm, we consider the nonexpansive mapping
To show is nonexpansive mapping, we have for and
This implies that , thus is nonexpansive. For comparison, we choose the sequences , , and . The convergence behaviors of the algorithms are shown in figure.
Figure 18.1 shows that the Algorithms 18.27 and 18.43 have nearly same convergence speed but greater than the Mann algorithm. The convergence speed of Algorithm 18.44 is highest. This figure shows the importance of Algorithm 18.44 to calculate the zeros of a nonexpansive mapping.
Further to compare the convergence speed and accuracy of iterative algorithms 18.27, 18.43, and 18.44 on real‐world problems, we conducted numerical experiments for regression problem on high dimensional datasets that are publicly available. Consider the convex minimization problem given by
where is a linear map, and is sparsity controlling parameter. According to the Karush–Kuhn–Tucker (KKT) condition a point solves 18.46 if and only if
For any , Eq. 18.47 becomes
Thus, minimizes iff is the fixed point of the operator .
Note that here operator is nonexpansive as ‐norm is proper lower semicontinuous convex function. Thus, we can apply the Algorithms 18.27, 18.43, and 18.44 to solve the convex minimization problem 18.46.
For numerical experiments, we consider the operator data matrix having ‐features and ‐samples, where each is a ‐dimensional vector contains responses. We have taken here datasets from two types of cancer patients: Colon‐cancer dataset and Carcinom dataset.
(i) Colon‐cancer dataset: Colon cancer is a cancer of large intestine (colon). This dataset is collected from 62 patients having 2000 gene expressions. It contains 40 tumor biopsies from tumors and 22 normal biopsies from healthy parts of the large intestine of the patient.
(i) Carcinom dataset: Carcinoma cancer occurs because of uncontrolled growth of a cell. This dataset contains 174 samples with 9182 features.
For the numerical experiment, we selected , , and . For this choice of , , and , algorithms satisfies their respective convergence criteria. The sparsity controlling parameter is taken as , where is tuned in the range in the multiple of 0.1. The maximum number of iteration is set to 1000 and to stop the procedure, difference between consecutive iteration should be less than 0.001.
In first experiment, we compare the convergence speed of the Algorithms 18.27, 18.43, and 18.44. We plotted the graphs between number of iteration and corresponding function value for each algorithm.
From Figure 18.2, we can observe that for both the datasets, convergence speed of Algorithm 18.44 is highest and convergence speed of Algorithms 18.27 and 18.43 are nearly the same.
In second experiment, we compare the Algorithms 18.27, 18.43, and 18.44 on the basis of their accuracy for colon dataset and carcinom dataset. We plotted the graphs between number of iterations and corresponding root mean square error (RMSE) value.
From Figure 18.3, we observe that the RMSE value for Algorithm 18.44 is least while the RMSE values for Algorithms 18.27 and 18.43 are nearly the same.
Thus, we conclude from the observations of Figures 18.2 and 18.3 that Algorithm 18.44 outperforms over Algorithms 18.27 and 18.43.
3.141.47.221