Cognitive Control for CBTC Systems ◾ 235
ere are three policies in our simulations for comparisons: the proposed cog-
nitive control policy, the semi-Markov decision process (SMDP) policy, and the
greedy policy. Based on the Markov property of state transition process, it is pos-
sible to model the problem considered in this chapter as an SMDP [29] and derive
the SMDP policy. In the greedy policy, if there is one AP whose signal strength
is higher than the current associated AP, the MS switches to the AP with higher
signal strength. In other words, the greedy policy always makes decisions based on
the immediate reward, not the long-term reward.
First of all, we compare the cost function under dierent policies. As shown in
Figure 10.8, the
-axis represents the index of the communication cycle and the
-axis
is the cost in each communication cycle. Under the greedy and SMDP policies, the
cost increases sharply in some communication cycles, which means that the informa-
tion gap becomes larger due to the long hando latency. Obviously, the SMDP policy
can bring better performance and less cost with less peaks compared with the greedy
policy. However, the cost under the proposed cognitive control policy is a smooth
curve, which means that no long hando latency happens. Figure 10.8 indicates
that the cognitive control can help the MS to make the optimal hando decision
through minimizing the information gap, which can decrease the cost of train con-
trol including the tracking errors and energy consumption.
e travel trajectories of these two trains under dierent policies are shown in
Figures 10.9 through 10.11, where the
-axis is the position of trains and the
-axis
is the corresponding velocity of trains. Under dierent policies, the current (back)
train follows dierent travel curves. When the greedy policy and the SMDP policy
are used, the train will be o the preset running prole sometimes due to the large
information gap. e hando latency enlarges the information gap, and the current
train has to slow down in order to keep the safe distance. Next, when the latest MA
is received by the train and the information gap is eliminated, the current train has
to speed up to reach the optimal running prole. As there are frequent accelera-
tions and decelerations, it can cause much more energy consumption. By contrast,
as shown in Figure 10.11, the current train with the cognitive control policy can be
very close to the optimal running prole, which means improved passenger com-
fort and energy saving in the proposed scheme.
We also consider the case when the headway is 90s, which is the standard
headway used in Beijing Yizhuang Line. From Figures 10.12 through 10.14,
Table 10.1 Availability under Different Policies
Policy Availability (A
av
) Unavailability (1 – A
av
)
Cognitive control 0.9978 2.2 × 10
–3
SMDP 0.9413 5.87 × 10
–2
Greedy 0.8833 1.167 × 10
–1