2
Adaptive Dynamic Programming Neural Networks for Control

2.1 Introduction

With the development of mechatronics, automatic systems consisting of sensors for perception and actuators for action are being more and more widely used [17]. Besides the proper choices of sensors and actuators and an elaborate fabrication of mechanical structures, the control law design also plays a crucial role in the implementation of automatic systems especially for those with complicated dynamics. For most mechanical sensor–actuator systems, it is possible to model them in Euler–Lagrange equations [17]]. In this chapter, we are concerned with the sensor–actuator systems modeled by Euler–Lagrange equations.

Due to the importance of Euler–Lagrange equations in modeling many real sensor–actuator systems, much attention has been paid to the control of such systems. According to the type of constraints, the Euler–Lagrange system can be categorized as a Euler–Lagrange system without nonholonomic constraints (e.g. fully actuated manipulator [18], omni‐directional mobile robot [19]), the under‐actuated multiple body system. For a Euler–Lagrange system without nonholonomic constraints, the input dimension are often equal to the output dimensions and the system is often able to be transformed into a double integrator system by employing feedback linearization [20]. Other methods, such as the control Lyapunov function method, passivity‐based method, and optimal control method are also successfully applied to control the Euler–Lagrange system without nonholonomic constraints. In contrast, as the input dimensions are lower than those of outputs, it is often impossible to directly transform the Euler–Lagrange system subject to nonholonomic constraints to a linear system and thus feedback linearization fails to stabilize the system. To tackle the problem, various methods (variable structure control‐based method [21], backstepping‐based control [22], optimal control‐based method, and discontinuous control method) have been widely investigated and some useful design procedures are proposed. However, due to the inherent nonlinearity and nonholonomic constraints, most existing methods [21, 22] are strongly model dependent and the performance is very sensitive to model errors. Inspired by the success of human operators for the control of Euler–Lagrange systems, various intelligent control strategies are proposed to solve the control problem of Euler–Lagrange systems subject to nonholonomic constraints. As demonstrated by extensive simulations, these type of strategies are indeed effective for the control of Euler–Lagrange systems subject to nonholonomic constraints. However, rigorous proof of the stability is difficult for this type of method and there may exist some initializations of the state, from which the system cannot be stabilized.

In this chapter, we propose a self‐learning control method applicable to Euler–Lagrange systems. In contrast to existing work on intelligent control of Euler–Lagrange systems, the stability of the closed loop system with the proposed method is proven in theory. On the other hand, different from model‐based design strategies, such as backstepping‐based design [22] and variable structure‐based design [21], the proposed method does not require information on the model parameters and therefore is a model independent method. We formulate the problem from an optimal control perspective. In this framework, the goal is to find the input sequence to minimize the cost function defined on infinite horizon under the constraint of the system dynamics. The solution can be found by solving a Bellman equation according to the principle of optimality [23]. Then an adaptive dynamic programming strategy is utilized to numerically solve the input sequence in real time.

2.2 Preliminaries on Variable Structure Control of the Sensor–Actuator System

In this chapter, we are concerned with the following sensor–actuator system in the Euler–Lagrange form,

where images, images is the inertial matrix, images, images and images. Note that the inertial matrix images is symmetric and positive definite. There are three terms on the left‐hand side of Equation (2.1). The first term involves the inertial force in the generalized coordinates, the second one models the Coriolis force and friction, the values of which depend on images, and the third one is the conservative force, which corresponds to the potential energy. The control force images applied on the system drives the variation of the coordinate images. It is also noteworthy that we assume the dimension of images is equal to that of images here. This definition also admits the case for images with lower dimension than that of images by imposing constraints to images, e.g. the constraint images with images restricts images in a images dimensional space. Defining state variables images and images, the Euler–Lagrange equation (2.1) can be put into the following state‐space form:

Note that the matrix images is invertible as it is positive definite. The control objective is to asymptotically stabilize the Euler–Lagrange system (2.2), i.e. design a mapping images such that images and images when time elapses.

As an effective design strategy, variable structure control finds applications in many different types of control systems including the Euler–Lagrange system. The method stabilizes the dynamics of a nonlinear system by steering the state to a elaborately designed sliding surface, on which the state inherently evolves towards the zero state. Particularly for the system (2.2), we define images as follows:

where images is a constant. Note that images together with the dynamics of images in Equation (2.2) gives the dynamics of images as images for images. Clearly, images asymptotically converges to zero. Also we know images when images according to images. Therefore, we conclude the states images, images on the sliding surface images for images defined in Equation (2.3) converge to zero with time. With this property of the sliding surface, a control law driving the states to images definitely guarantees the ultimate convergence to the zero states. Accordingly, the stabilization of the system can be realized by controlling images to zero. To reach this goal, a positive definite control Lyapunov function images, e.g. images, is often used to design the control law. For stability consideration, the time derivative of images is required to be negative definite. In order to guarantee the negative definiteness of the time derivative of images, exact information about the system dynamics (2.2) is often necessary, which results in the model‐based design strategies.

We have the following remark about the Euler–Lagrange equation (2.1) for modeling sensor–actuator systems.

2.3 Problem Formulation

Without losing generality, we stabilize the system (2.1) by steering it to the sliding surface images with images defined in Equation (2.3). Different from existing model‐based design procedures, we design a self‐learning controller, which does not require accurate knowledge about images, images, and images in Equation (2.1). In this section, we formulate such a control problem from the optimal control perspective.

In this chapter, we set the origin as the desired operating point, i.e. we consider the problem of controlling the state of the system (2.1) to the origin. For the case with other desired operating points, the problem can be equivalently transformed to the one with the origin as the operating point by shifting the coordinates. At each sampling period, the norm of images, which measures the distance from the desired sliding surface images, can be used to evaluate the one step performance. Therefore, we define the following utility function associated with the one‐step cost at the ith sampling period,

with

where images is defined in Equation (2.3) and images, images denotes the absolute value of the ith component of the vector images, the parameter images for images. At each step, there is a value images and the total cost starting from the kth step along the infinite time horizon can be expressed as follows:

where images is the state vector of system (2.1) sampled at the kth step with images, images is the discount factor with images, and images is the control sequence starting from the kth step. Note that for the deterministic system (2.1), the preceding states after the kth step are determined by images and the control sequence images. Accordingly, images is a function of images and images with images. Also note that both the cost function images and the utility function images are defined based on the discrete samplings of the continuous system (2.1). Now, we can define the problem of controlling the sensor–actuator system (2.1) in this framework as follows:

(2.7c) equation

where images is defined by Equations (2.4) and (2.5), images is the sampling period, the set images defines the feasible control actions, and images is the cost function for images in Equation (2.6). It is worth noting that images is a function of images and images according to Equation (2.6). The optimization in Equation (2.7) is relative to images with a given initial state images. Also note that in the optimization problem in Equation (2.7), the decision variables images are defined in every sampling period. The control action keeps the value in the duration of two consecutive sampling steps. This formulation is consistent with the real implementations of digital controllers.

2.4 Model‐Free Control of the Euler–Lagrange System

In this section, we present the strategy to solve the constrained optimization problem efficiently without knowing the model information of the chaotic system. We first investigate the optimality condition of Equation (2.7) and present an iterative procedure to approach the analytical solution. Then, we analyze the convergence of the iterative procedure and the stability with the derived control strategy.

2.4.1 Optimality Condition

Denoting images the optimal value to the optimization problem in Equation (2.7), i.e.

(2.8) equation

According to the principle of optimality [23], the solution of Equation (2.7) satisfies the following Bellman equation:

where images is the solution of Equation (2.7b) at images with images and the control action images for images. Without introducing confusion, we simply write Equation (2.9) as follows:

Define the Bellman operator images relative to function images as follows:

(2.11) equation

Then, the optimality condition in Equation (2.10) can be simplified into the following with the Bellman operator,

Note that the function images is implicitly included in the Bellman operator. Equation (2.12) constitutes the optimality condition for the problem in Equation (2.7). It is difficult to solve the explicit form of images analytically from Equation (2.9). However, it is possible to get the solution by iterations. We use the following iterations to solve images,

The control action keeps constant in the duration between the kth and the k+1th step, i.e. images for images. images can be obtained from Equation (2.9) based on Equation (2.13),

2.4.2 Approximating the Action Mapping and the Critic Mapping

In the previous sections, the iteration (2.13) is derived to calculate images and the optimization (2.14) is obtained to calculate the control law. The iteration to approach images and the optimization to derive images have to be run in every time step in order to obtain the most up‐to‐date values. Inspired by the learning strategies widely studied in artificial intelligence, a learning‐based strategy is used in this section to facilitate the processing. After a sufficiently long time, the system is able to memorize the mapping of images and the mapping of images. After this learning period, there will be no need to repeat any iterations or optimal searching, which will make the strategy more practical.

Note that the optimal cost images is a function of the initial state. Counting the cost from the current time step, images can also be regarded as a function of both the current state and the optimal action at the current time step according to Equation (2.10). Therefore, images, the approximation of images, can also be regarded as a function relative to the current state and the current optimal input. As to the optimal control action images, it is a function of the current state. Our goal in this section is to obtain the mapping from the current state and the current input to images and the mapping from the current state to the optimal control action images using parameterized models, denoted as the critic model and the action model, respectively. Therefore, we can write the critic model and the action model as images and images, respectively, where images and images are the parameters of the critic model and the action model, respectively.

In order to train the critic model with the desired input–output correspondence, we define the following error at time step images to evaluate the learning performance,

(2.15) equation

Note that images is the desired value of images according to Equation (2.13). Using the back‐propagation rule, we get the following rule for updating the weight images of the critic model,

where images is the step size for the critic model at the time step images.

As to the action model, the optimal control images in Equation (2.14) is the one that minimizes the cost function. Note that the possible minimum cost is zero, which corresponds to the scenario with the state staying inside the desired bounded area. In this regard, we define the action error as follows:

(2.17) equation

Then, similar to the update rule of images for the critic model, we get the following update rule of images for the action model,

where images is the step size for the action model at the time step images.

Equations (2.16) and (2.18) update the critic model and the action model progressively. After images and images have learnt the model information by learning for a sufficiently long time, their values can be fixed at the one obtained at the final step and no further learning is required, which is in contrast to Equation (2.14) which requires an optimization problem to be solved even after a long time.

2.5 Simulation Experiment

In this section, we consider the simulation implementation of the proposed control strategy. The dynamics given in Equation (2.1) model a wide class of sensor–actuator systems. Particularly, to demonstrate the effectiveness of the proposed self‐learning variable structure method, we apply it to the stabilizations of a typical benchmark system: the cart–pole system.

The cart–pole system, as sketched in Figure 2.1, is a widely used testbed for the effectiveness of control strategies. The system is composed of a pendulum and a cart. The pendulum has its mass above its pivot point, which is mounted on a cart moving horizontally. In this section, we apply the proposed control method to the cart–pole system to test the effectiveness of our method.

c02f001

Figure 2.1 The cart–pole system.

2.5.1 The Model

The cart–pole model used in this work is the same as that in [24], which can be described as follows:

(2.20) equation

where

(2.21) equation

with the following values for the parameters:

  • images 9.8 m/s2, acceleration due to gravity;
  • images 1.0 kg, mass of cart;
  • images 0.1 kg, mass of pole;
  • images 0.5 m, half‐pole length;
  • images 0.0005, coefficient of friction of the cart on the track;
  • images 0.000002, coefficient of friction of the pole on the cart;
  • images images10 N, force applied to the cart's center of mass.

This system has four state variables: images is the position of the cart on the track, images is the angle of the pole with respect to the vertical position, and images and images are the cart velocity and angular velocity, respectively.

Define images, images, images, images, images, images, and images. With these notations, Equation (2.19) can be rewritten as:

(2.22) equation

By choosing

equation

the system of Equation (2.19) coincides with the model of Equation (2.1). Note that the input images in this situation is constrained in the set images.

2.5.2 Experiment Setup and Results

In the simulation experiment, we set the discount factor images, the sliding surface parameter images, images, and images. The feasible control action set images in Equation (2.7) is defined as images N}. This definition corresponds to the widely used bang‐bang control in industry. To make the output of the action model within the feasible set, the output of the action network is clamped to 10 if it is greater than or equal to zero and clamped to images if less than zero. The sampling period images is set to 0.02 s. Both the critic model and the action model are linearly parameterized. The step size of the critic model, images, and that of the action model, images, are both set to 0.03. Both the update of the critic model weight images in Equation (2.16) and the update of the action model weight images in Equation (2.18) last for 30 s. For the uncontrolled cart–pole system with images in Equation (2.19), the pendulum will fall down. The control objective is to stabilize the pendulum to the inverted direction (images). The time history of the state variables is plotted in Figure 2.2 for the system with the proposed self‐learning variable structure control strategy. From Figure 2.2, it can be observed that images is stabilized in a small vicinity around zero (with a small error of images0.1 rad), which corresponds to the inverted direction.

c02f002

Figure 2.2 State profiles of the cart–pole system with the proposed control strategy.

2.6 Summary

In this chapter, the self‐learning variable structure control is considered to solve a class of sensor–actuator systems. The control problem is formulated from the optimal control perspective and solved via iterative methods. In contrast to existing models, this method does not need pre‐knowledge of the accurate mathematical model. The critic model and the the action model are introduced to make the method more practical. Simulations show that the control law obtained by the proposed method indeed achieves the control objective.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.133.152.198