218 Advances in Communications-Based Train Control Systems
onthe track. It is the nearest potential obstacle in front of the train, such as the tail of
the front train. In some scenarios, when ATP calculates an emergency braking prole
and ATO calculates an operating speed/distance prole based on MA, MA is often
taken as the distance from the front end of the current train to the tail of the front train.
In CBTC systems, the current train needs the information of the front train to
control acceleration/deceleration at every communication cycle. If ZC can send the
accurate information to the current train, which means that the current train can
get sucient information, the current train can make correct decisions. In CBTC
systems, ZC transmits an MA to the current train according to the information sent
from the front train. An MA is generally dened as a physical point on the track.
It is the nearest potential obstacle in front of the train, such as the tail of the front
train. In some scenarios, when ATP calculates an emergency braking prole and
ATO calculates an operating speed/distance prole based on MA, MA is often taken
as the distance from the front end of the current train to the tail of the front train.
As mentioned earlier, due to unreliable wireless communications and handos,
the information included in the received MA by the current train may not exactly
describe the state of the front train. As a result, we can see that the information gap
in CBTC systems is the dierence between the derived state of the front train from
the received MA sent by ZC and the actual state of the front train.
In this chapter, we take a cognitive control approach to CBTC systems consid-
ering both train–ground communication and train control, and information gap is
used to quantitatively describe the eects of train–ground communication on train
control performance.
10.3 Cognitive Control
In this section, we describe cognitive control in detail. e cost function is dened.
en, we present RL to derive the optimal policy in cognitive control.
For a cognitive control system shown in Figure 10.1, the perceptual part con-
tains the estimator and the perceptual memory, where the estimator is to obtain the
available information from the sensory measurements results and the perceptual
memory can process the information to get the relevant part. e cognitive control-
ler of the executive part makes corresponding decisions based on the knowledge in
the executive memory according to the feedback information from the perceptual
part. Based on the knowledge in the executive memory, the cognitive controller
selects the optimal action, which has inuence on the system itself or the environ-
ment. When it acts on the system, the sensors or the actuators may be recong-
ured. When it acts on the environment, the perception process could be indirectly
aected. In fact, the key of cognitive control is that the cognitive actions might be
a part of physical actions (state-control actions). In other words, a physical action is
applied and the goal is to decrease the information gap. For example, when there is
a quadratic optimal controller, the cost function is
Cognitive Control for CBTC Systems 219
JxxQxx uRu
TT
=− −+()()

(10.1)
where:
x
is the desired state of the system
x
is the actual state of the system
e matrices
Q
and
R
are applied as the desired weights for systems’s state and control
e objective is to minimize the cost function.
Moreover, cognitive control adds another term about information gap to
Equation 10.1. e resulting cost function can be formulated as [14]
JxxQxx uR
uG
TT
=− −+ +()()

β (10.2)
where:
G
is the information gap
β
is a scalar
In cognitive control, the cognitive actions are of greatest concern. e actions are
determined through the implementation of RL. RL is the process by which the agent
learns an approximately optimal policy through trial-and-error interactions with the
environment. At each communication cycle, RL can determine the cognitive action
to decrease the information gap according to the reward. As a result, the objective
of RL is to nd a policy that is updated by rewards provided by the environment,
which means minimizing the cumulative amount of cost over a long run [19].
In the RL model depicted in Figure 10.3, a learning agent selects an action
acting on the system or environment according to the current system state and the
current environment state. When the new state comes, the agent gathers informa-
tion about the new state and calculates the immediate reward and the time that the
state transition costs. en, based on the information and an algorithm, the agent
can update the knowledge base and select the next action. e agent continues to
improve its performance with the process repeated.
Let
Ssss
n
=
{, ,..., }
12
be the set of system states and
Aaaa
m
={ ,,...
}
12
be the set
of actions. According to the current state
sS
k
, the RL agent interacts with the
environment and chooses the action
aA
k
. en there is a state transition, and the
System and environment
Agent Policy
Reward r Action a
State s
Figure10.3 RL model.
220 Advances in Communications-Based Train Control Systems
new state is
sS
k+
1
based on the state transition probability. e immediate reward
is given and the process is repeated.
As Q-learning is one of the most popular RL algorithms, it is adopted in this
chapter to nd an optimal policy for the learning agent. e Q-learning algo-
rithm has a Q-function that calculates the
qualit
y of a state–action combination,
QS
A:
×→
R. In fact, Q-function is an evaluation function
Qs
a(,
)
so that its value
is the maximum discounted cumulative reward
rs
a(,
)
that can be achieved starting
from state
s
and applying action
a
as the rst action. In other words, the value of
Q
is the reward received immediately upon executing action
a
from state
s
, plus the
value of following the optimal policy (discounted by
η
) thereafter [20].
Qsarsa Vsa(, )(,) (, )
*
≡+ηδ{[ ]} (10.3)
where:
rs
a(,
)
is the immediate reward
δ
(,
)sa
denotes the state resulting from applying action
a
to state
s
Vs
*
()
gives the maximum discounted cumulative reward that the agent can obtain
starting from state
s
, and it is dened as
Vs r
k
i
i
ki
()
=0
=
+
η
, where
01≤<η
As there are state transition probabilities, Equation 10.3 can be rewritten as follows:
Qs
arsa Pa
Vs
sS
ss
ππ
η(, )(,) ()
()
=+
(10.4)
where:
=ss
a
δ
(,
)
p
ss
is the transition probability from state
s
to state
s
when applying action
a
Now, we dene a policy
π
()
sA
, and the optimal policy is denoted as π
*
()
s . en,
we get
Qs
aQ rsaPaV s
sa
sS
ss
*(,)
**
(, )(,) ()
()
==+
ππ
η (10.5)
where:
VsVs
π
*
()
()
*
=
Notice the close relationship between
Q
and
V
*
:
Vs
Qsa
aA
**
() (,
)
=
max . en,
Equation 10.5 can be written as
Qsarsa Pa Qsa
sS
ss aA
**
(, )(,) () (, )=+
′′
′′
η max (10.6)
As a result, the optimal policy can be dened as
π
**
() (,
)sQsa
aA
=
argmax , which
means the Q-learning rules can be determined.
Cognitive Control for CBTC Systems 221
10.4 Formulation of Cognitive Control
Approach to CBTC Systems
In the section, we illustrate each part of the cognitive control model of CBTC
systems, including the control model, wireless channel model, and Q-learning
parameters.
ATP is the safety guard of CBTC systems. For each communication cycle, ATP
calculates the permitted maximum speed for the train base on MA. According to
the permitted speed, the limited speed of the line, and the state of the train, ATO
determines to accelerate or decelerate. When MA is delayed due to the communica-
tion latency, ATO system nds that the speed might be beyond the permitted speed
calculated by ATP, and ATO will bring the deceleration command to force the train
to brake or stop in order to keep safe. erefore, ATP has the higher safety level,
which generates the basis for the ATO decision. When cognitive control is applied in
CBTC systems, ATP should be considered as the constraint in the control procedure.
e general structure of cognitive control consists of three parts: the executive
part including a cognitive controller, the perceptual part, and the practical environ-
ment including the system, as shown in Figure 10.1. We adopt cognitive control to
model a CBTC system, where ZC can be taken as the perceptual part and send MA
to the current train. According to the state of the system, the environmental condi-
tions, and the output of ATO, the cognitive controller can control the mobile station
(MS) in the current train to perform cognitive actions to decrease the information
gap, such as the trade-o among MIMO diversity gain, multiplexing gain, and the
hando decisions. ese actions have impacts on performance of wireless communi-
cations, such as signal-to-noise (SNR) and data rate. Similarly, the wireless channel
model plays an important role in the perceptual memory, while the Q-learning can
get the knowledge stored in the executive memory. Hence, the structure of the cog-
nitive control approach is illustrated in Figure 10.4. Next, we need to determine the
train control model, which generates the inputs of the cognitive controller.
10.4.1 Train Control Model
Generally, ATO receives the MA from ZC, calculates the operation speed curve,
and determines the acceleration (deceleration) of the next communication cycle for
the train. According to Newtons Second Law, the state of the train can be updated.
In this chapter, we assume that the train controller is linear time invariant in dis-
crete time. en, the dynamic model of the train can be shown as follows:
xAxB
uC
w
zCxDu
yCx
kk
kk
kkk
kk
+
=++
=+
=
1
11
2
(10.7)
222 Advances in Communications-Based Train Control Systems
where:
x
k
is the state of the train in the
k
th communication cycle including the position
and the velocity
u
k
is the input of the train in the
k
th communication cycle that is determined
by the ATO model
w
k
is the resistance related to the train in the
k
th communication cycle
z
k
is the regulated output in the
k
th communication cycle
y
k
is the measured output in the
k
th communication cycle
Hence, the ATO model has the following state space model:
xAxBy
uCfx
k
cc
k
cc
k
k
c
k
c
+
=+
=
1
()
(10.8)
where:
y
k
is the input of ATO, which includes the states of the two trains
Function
f
()
converts the information of two trains into the parameters that
can be used to directly calculate the acceleration (deceleration)
e linear quadratic cost is taken as our performance measure of the train control
in this chapter. e general expression is
ATO
Cognitive controller
Train dynamic model
Multiplexing
Diversity
Handoff
ZC
Front train
Wireless
channel
model
MS
Information
MA
Back train
Figure 10.4 Schematic structure of the cognitive control approach to CBTC
systems.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.82.154