10. Cognitive Control for Communications-Based Train Control Systems (2/7)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

218 ◾ Advances in Communications-Based Train Control Systems

onthe track. It is the nearest potential obstacle in front of the train, such as the tail of

the front train. In some scenarios, when ATP calculates an emergency braking prole

and ATO calculates an operating speed/distance prole based on MA, MA is often

taken as the distance from the front end of the current train to the tail of the front train.

In CBTC systems, the current train needs the information of the front train to

control acceleration/deceleration at every communication cycle. If ZC can send the

accurate information to the current train, which means that the current train can

get sucient information, the current train can make correct decisions. In CBTC

systems, ZC transmits an MA to the current train according to the information sent

from the front train. An MA is generally dened as a physical point on the track.

It is the nearest potential obstacle in front of the train, such as the tail of the front

train. In some scenarios, when ATP calculates an emergency braking prole and

ATO calculates an operating speed/distance prole based on MA, MA is often taken

as the distance from the front end of the current train to the tail of the front train.

As mentioned earlier, due to unreliable wireless communications and handos,

the information included in the received MA by the current train may not exactly

describe the state of the front train. As a result, we can see that the information gap

in CBTC systems is the dierence between the derived state of the front train from

the received MA sent by ZC and the actual state of the front train.

In this chapter, we take a cognitive control approach to CBTC systems consid-

ering both train–ground communication and train control, and information gap is

used to quantitatively describe the eects of train–ground communication on train

control performance.

10.3 Cognitive Control

In this section, we describe cognitive control in detail. e cost function is dened.

en, we present RL to derive the optimal policy in cognitive control.

For a cognitive control system shown in Figure 10.1, the perceptual part con-

tains the estimator and the perceptual memory, where the estimator is to obtain the

available information from the sensory measurements results and the perceptual

memory can process the information to get the relevant part. e cognitive control-

ler of the executive part makes corresponding decisions based on the knowledge in

the executive memory according to the feedback information from the perceptual

part. Based on the knowledge in the executive memory, the cognitive controller

selects the optimal action, which has inuence on the system itself or the environ-

ment. When it acts on the system, the sensors or the actuators may be recong-

ured. When it acts on the environment, the perception process could be indirectly

aected. In fact, the key of cognitive control is that the cognitive actions might be

a part of physical actions (state-control actions). In other words, a physical action is

applied and the goal is to decrease the information gap. For example, when there is

a quadratic optimal controller, the cost function is

Cognitive Control for CBTC Systems ◾ 219

JxxQxx uRu

=− −+()()



(10.1)

where:



is the desired state of the system

is the actual state of the system

e matrices

and

are applied as the desired weights for systems’s state and control

e objective is to minimize the cost function.

Moreover, cognitive control adds another term about information gap to

Equation 10.1. e resulting cost function can be formulated as [14]

JxxQxx uR

=− −+ +()()



β (10.2)

where:

is the information gap

is a scalar

In cognitive control, the cognitive actions are of greatest concern. e actions are

determined through the implementation of RL. RL is the process by which the agent

learns an approximately optimal policy through trial-and-error interactions with the

environment. At each communication cycle, RL can determine the cognitive action

to decrease the information gap according to the reward. As a result, the objective

of RL is to nd a policy that is updated by rewards provided by the environment,

which means minimizing the cumulative amount of cost over a long run [19].

In the RL model depicted in Figure 10.3, a learning agent selects an action

acting on the system or environment according to the current system state and the

current environment state. When the new state comes, the agent gathers informa-

tion about the new state and calculates the immediate reward and the time that the

state transition costs. en, based on the information and an algorithm, the agent

can update the knowledge base and select the next action. e agent continues to

improve its performance with the process repeated.

Let

Ssss

{, ,..., }

be the set of system states and

Aaaa

={ ,,...

}

be the set

of actions. According to the current state

∈

, the RL agent interacts with the

environment and chooses the action

∈

. en there is a state transition, and the

System and environment

Agent Policy

Reward r Action a

State s

Figure10.3 RL model.

220 ◾ Advances in Communications-Based Train Control Systems

new state is

∈

based on the state transition probability. e immediate reward

is given and the process is repeated.

As Q-learning is one of the most popular RL algorithms, it is adopted in this

chapter to nd an optimal policy for the learning agent. e Q-learning algo-

rithm has a Q-function that calculates the

qualit

y of a state–action combination,

×→

R. In fact, Q-function is an evaluation function

a(,

)

so that its value

is the maximum discounted cumulative reward

a(,

)

that can be achieved starting

from state

and applying action

as the rst action. In other words, the value of

is the reward received immediately upon executing action

from state

, plus the

value of following the optimal policy (discounted by

) thereafter [20].

Qsarsa Vsa(, )(,) (, )

≡+ηδ{[ ]} (10.3)

where:

a(,

)

is the immediate reward

)sa

denotes the state resulting from applying action

to state

()

gives the maximum discounted cumulative reward that the agent can obtain

starting from state

, and it is dened as

Vs r

()

∑

∞

, where

01≤<η

As there are state transition probabilities, Equation 10.3 can be rewritten as follows:

arsa Pa

ππ

η(, )(,) ()

()

′

∈

′

∑

(10.4)

where:

′

=ss

)

′

is the transition probability from state

to state

′

when applying action

Now, we dene a policy

()

sA∈

, and the optimal policy is denoted as π

()

s . en,

we get

aQ rsaPaV s

*(,)

(, )(,) ()

()

==+

′

∈

′

∑

ππ

η (10.5)

where:

VsVs

()

Notice the close relationship between

and

Qsa

() (,

)

∈

max . en,

Equation 10.5 can be written as

Qsarsa Pa Qsa

ss aA

(, )(,) () (, )=+

′′

′

∈

′′

∈

∑

η max (10.6)

As a result, the optimal policy can be dened as

() (,

)sQsa

∈

argmax , which

means the Q-learning rules can be determined.

Cognitive Control for CBTC Systems ◾ 221

10.4 Formulation of Cognitive Control

Approach to CBTC Systems

In the section, we illustrate each part of the cognitive control model of CBTC

systems, including the control model, wireless channel model, and Q-learning

parameters.

ATP is the safety guard of CBTC systems. For each communication cycle, ATP

calculates the permitted maximum speed for the train base on MA. According to

the permitted speed, the limited speed of the line, and the state of the train, ATO

determines to accelerate or decelerate. When MA is delayed due to the communica-

tion latency, ATO system nds that the speed might be beyond the permitted speed

calculated by ATP, and ATO will bring the deceleration command to force the train

to brake or stop in order to keep safe. erefore, ATP has the higher safety level,

which generates the basis for the ATO decision. When cognitive control is applied in

CBTC systems, ATP should be considered as the constraint in the control procedure.

e general structure of cognitive control consists of three parts: the executive

part including a cognitive controller, the perceptual part, and the practical environ-

ment including the system, as shown in Figure 10.1. We adopt cognitive control to

model a CBTC system, where ZC can be taken as the perceptual part and send MA

to the current train. According to the state of the system, the environmental condi-

tions, and the output of ATO, the cognitive controller can control the mobile station

(MS) in the current train to perform cognitive actions to decrease the information

gap, such as the trade-o among MIMO diversity gain, multiplexing gain, and the

hando decisions. ese actions have impacts on performance of wireless communi-

cations, such as signal-to-noise (SNR) and data rate. Similarly, the wireless channel

model plays an important role in the perceptual memory, while the Q-learning can

get the knowledge stored in the executive memory. Hence, the structure of the cog-

nitive control approach is illustrated in Figure 10.4. Next, we need to determine the

train control model, which generates the inputs of the cognitive controller.

10.4.1 Train Control Model

Generally, ATO receives the MA from ZC, calculates the operation speed curve,

and determines the acceleration (deceleration) of the next communication cycle for

the train. According to Newton’s Second Law, the state of the train can be updated.

In this chapter, we assume that the train controller is linear time invariant in dis-

crete time. en, the dynamic model of the train can be shown as follows:

xAxB

zCxDu

yCx

kkk

=++

(10.7)

222 ◾ Advances in Communications-Based Train Control Systems

where:

is the state of the train in the

th communication cycle including the position

and the velocity

is the input of the train in the

th communication cycle that is determined

by the ATO model

is the resistance related to the train in the

th communication cycle

is the regulated output in the

th communication cycle

is the measured output in the

th communication cycle

Hence, the ATO model has the following state space model:

xAxBy

uCfx

()

(10.8)

where:

is the input of ATO, which includes the states of the two trains

Function

()⋅

converts the information of two trains into the parameters that

can be used to directly calculate the acceleration (deceleration)

e linear quadratic cost is taken as our performance measure of the train control

in this chapter. e general expression is

ATO

Cognitive controller

Train dynamic model

Multiplexing

Diversity

Handoff

Front train

Wireless

channel

model

Information

Back train

Figure 10.4 Schematic structure of the cognitive control approach to CBTC

systems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10. Cognitive Control for Communications-Based Train Control Systems (2/7)

Create new playlist

Sign In

Sign Up

Table of Contents for
10. Cognitive Control for Communications-Based Train Control Systems (2/7)