Novel Communications-Based Train Control System 135
7.5.2.1 Reduced-State Bellman Equation
Instead of working on the global state space sk hkhkhk
hk
BB
BB
() {(), (),(),
(),
12
34
=
ξε
(),(
)}kk
, we shall derive a reduced-state Bellman equation from Equation 7.24
using partitioning of the control policy π, which is based on partial system state
ε only. Specically, we partition a unichain policy π into a collection of actions as
follows:
Denition (conditional actions): Given a policy π, we dene
πε π
() {(
):=
s
sh
hhhhhhh
BB BB BB BB
=∀
(, ,,,,) ,,
,,
}
1234 1234
ξε ξ
as the collection of actions under a
given ε for all possible
hh
hh
BB BB12
34
,,
,,ξ
. e policy π is therefore equal to the
union of all the conditional actions, at is,
ππε
ε
=
()
.
ρε εδεεεδ
εε
δε
ε
+
[]
+
[]
Vr Pr V()=,() |,()
()
()
min
(7.25)
where:
VE
Vs() [()|
]εε
= is the conditional potential function
δε πε
()
()=
is the collection of actions under a given
ε
rE
rs s
εδεδ
,(
),
()
[]
=
[]
{}
is the conditional per-stage reward
Pr
EPrs ss[|,()] [(|,()
|)]
=
εε
εδ
εd is the conditional average transition kernel
A solution to this equation is still very complex due to the huge dimensionality of
the state space, and brute force value iteration or policy iteration [11] has exponen-
tial memory size requirement. As a result, it is desirable to obtain an online and
low-complexity solution to the problem.
7.5.2.2 Online Value Iteration AlgorithmviaStochastic
Approximation
In this section, we shall derive a low-complexity (but optimal) solution by pro-
posing an online value iteration to solve the reduced-state Bellman equation in
Equation 7.25. We shall propose an online sample path-based iterative learning
algorithm to estimate the performance potential and the optimal policy.
Dene a vector mapping
Π
: with the
i
th component mapping (1 ≤ i K) as
Π
i
i
ii
j
ji
ij
Vr Pr V() ,( )|,( )()
()
=
+
δε
ε
εδεεεδ
εε
min
(7.26)
where:
K
is the total number of velocity tracking error states
Because the potential is unique up to a constant, we could set
Π
i
i
VV J() ()

−=
ε
χ
for
a reference state
ε
i
(1 i K). Let
{(
0),,()
,}εε
k be the sample path, that is, the
corresponding realizations of the system states. To perform the online value iteration,
136 Advances in Communications-Based Train Control Systems
we divide the sample path into regenerative periods. A regenerative period is dened as
the minimum interval that each
ε
state is visited at least once. Let
li
d
()
and
V
d
be the
times that
ε
i
is visited and the estimated performance potential in the
d
th regenerative
period, respectively. Let
n
0
0=
,
nk
kn li
dd
i
d+
=+ =
1
{1:>
,()1}
min
min
for d
0.
en
the sample path in the
d
th regenerative period is
{(
), ,( 1)}
1
εεnn
dd
+
.
At the begin-
ning of the
d
th regenerative period (ntn
dd
≤≤
+1
1
), initialize the following dummy
variables as
Si
g
d
()
0=
, Si
V
d
() 0
=
, and li
d
()
0=
. Within the
d
th regenerative period,
we adopt policy
d
. After observing the velocity error state
ε(1
)k
+
at the end of the
k
th slot, update the following metric of the visited state
ε()
k according to
Si Sirs s
Si SiVk k
g
d
g
d
d
V
d
V
d
d


() () ,()
() () () ()
=+
[]
=+
[]
=
π
εε
if εε
i
dd
li li() () 1=+
(7.27)
At the end of the
d
th regenerative period, using stochastic approximation algo-
rithm [29], we update the estimated potential for the
(1)
d
+
th regenerative period,
which is

VV
Y
d
i
d
i
dd
i
+
=+
1
() ()
()
εε ες
(7.28)
where:
Y
Si
li
SK
lK
SK
lK
V
d
i
g
d
d
g
d
d
V
d
d
d
K
()
()
()
()
()
()
()
()
εε
=− +−

+−
Si
li
V
g
d
d
d
i
()
()
()
ε (7.29)
ς
d
is the step size of the stochastic approximation algorithm
ε
K
is the reference state
Without loss of generality, we set the state that the velocity track error with the
highest value is the reference state. Accordingly, we update the policy for the
(1)
d
+
th
regenerative period, which is given by
π
dd
V
++
=
11
()
argminΠ
(7.30)
erefore, we could construct an online value iteration algorithm as follows:
Step 1 (initialization): Set
k = 0
and start the system at an initial state
ε
(0
)
. Set
d = 0
and initialize the potential
V
0
and policy
π
00
()
= argminTV
in the 0th
regenerative period.
Step 2 (online potential estimation): At the beginning of the
d
th regenerative
period, set
Si
g
d
()
0=
, Si
V
d
()
0=
, and
li i
d
() 0,
=∀
. Run the system with policy
π
d
to n
d +
1
1 and accumulate the information of the visited
ε
from slot to
slot according to Equation 7.27. At
n
d +
1
1
, update the estimated potential
V
d +1
for the
(1)
d
+
th regenerative period according to Equation 7.28.
Novel Communications-Based Train Control System 137
Step 3 (online policy improvement): Update the policy
π
d
+1
for the
(1)d +
th
regenerative period according to Equation 7.30.
Step 4 (termination): If
||
||<
1

VV
dd
v
+
−σ
, stop; otherwise, set
dd:= 1+
and go to
step 2.
7.5.3 Optimal Guidance Trajectory Calculation
An optimal CoMP cluster selection and hando decision policy, which minimizes
the linear quadratic cost function, can be derived from the algorithm presented
above. Recall that the linear quadratic cost function is dened as the sum of track-
ing error and control magnitude in Equation 7.4. Due to the hando latency in the
system, the tracking error always exists when the train travels along the guidance
trajectory. In order to further reduce the tracking error, the second part of our
scheme is to recalculate the guidance trajectory. e newly calculated guidance
trajectory should take full consideration of the tracking error caused by hando
latency. e calculation approach is presented in the following.
In traditional xed-block track circuit-based train control systems, optimal guid-
ance trajectory calculation has been studied for many years. e earliest and most
noticeable work is from the Scheduling and Control Group in North Australia [30].
ey conducted theoretic research and project on optimal train travel trajectory
considering energy saving and trip time. e results show that the optimal guid-
ance trajectory could be divided into four phases: traction, speed holding, coasting,
and braking, as illustrated in Figure 7.5. e shift and velocity of four phases are
SSSS
12
34
,,, and
vvvv
12
34
,,, , respectively. Based on that, many researchers investigated
the optimal travel trajectory with the aim to nd the switch point [30]. However,
most existing works focus on traditional xed-block track circuit-based train control
systems, where the train travel trajectory is not aected by the front trains.
Traction
Speed holding
Coasting
Braking
S
1
S
2
S
3
v
1
v
2
v
3
S
4
Shift
Velocity
Figure7.5 Optimal train guidance trajectory.
138 Advances in Communications-Based Train Control Systems
In this chapter, we propose an optimal guidance trajectory calculation in CBTC
systems with CoMP. e scheme takes full consideration of the impacts from the
front train. e scheme is described as follows:
Step 1: Initialize the basic parameters, train mass
M
, train traction power
F
trac
,
train braking power
F
brac
, trip time
T
tr
ip
, and trip distance
S
tr
ip
. Set the current
train velocity to be
v
1
0=
, the current train position to be
S
cur
= 0
, and the
current traveled time to be
T
cur
= 0
.
Step 2: Compute the remaining trip time,
TT
T
remaintrip trip cu
r
=−
, and the remain-
ing trip distance,
SS
S
remaintrip trip cu
r
=−
. With
T
remaintrip
, v
1
, S
remaintrip
, and the basic
parameters, the optimal guidance trajectory can be calculated as follows:
As shown in Figure 7.5, the optimal guidance trajectory could be divided
into four phases: traction, velocity holding, coasting, and braking. erefore,
given the already known traction power and braking power, the optimal guid-
ance trajectory can be determined if the holding velocity
v
2
and the velocity v
3
that indicates the end of the costing phase are obtained. With this objective,
the optimization problem can be formulated as
min fMvv FS
vvv
S
vv
=−+⋅
≤≤
=
1
2
()
.
2
2
2
1
2
2
123
2
2
1
2
fric
remaintrip
s.t
FFM
S
vv
FM
v
FM
T
vv
F
trac fric brak
remaintrip
t
/2/2/
2
3
2
2
2
3
2
21
++
+
=
rracfricbrak
///
2
2
32 3
M
S
v
vv
FM
v
FM
++
+
(7.31)
In this equation,
f is the energy consumed to accelerate the train from
v
1
to
v
2
, and keep it travel at speed
v
2
for a distance of
S
2
. e second constraint
is the trip distance constraint, which includes four parts representing the
distance traveled in the traction phase, the distance traveled in the velocity
holding phase, the distance traveled in the coasting phase, and the distance
traveled in the braking phase, respectively. e sum of these four distances
should be equal to the remaining trip distance. e third constraint is the trip
time constraint, which includes four parts representing the time traveled in
the traction phase, the time traveled in the velocity holding phase, the time
traveled in the coasting phase, and the time traveled in the braking phase,
respectively. e sum of these four parts should be equal to the remaining
trip time.
Step 3: If
S
remaintrip
=
0, go to step 4. Otherwise, compare the current train
velocity
v
1
with the reference train velocity
v
ref
on the calculated guidance
trajectory. If
||
vv v
1
−≥
ref
, where v is the preset velocity error, go to step 2.
Novel Communications-Based Train Control System 139
Otherwise,update the train velocity
v
1
and train position
S
cu
r
according to
the train control model described in Section 7.3.1.
Step 4: e train stops at the destination. Wait until the train starts, and then
go to Step 1.
7.6 Simulation Results and Discussions
In this section, simulation results are presented to illustrate the performance of our
proposed system (Table 7.2). e system optimization process can be divided into
the o-line part and the online part. For the o-line part, the mathematical models
described earlier are used to derive the optimal policy for CoMP cluster selection
and hando decisions. e calculation of the optimal policy using the value itera-
tion algorithm is performed o-line with C language. Once the optimal policy is
obtained, it is stored in a table format. Each entry of the table species the optimal
action (CoMP cluster selection and hando) given the current state (i.e., channel
state, currently used cluster).
For the online part, computer simulation based on NS2.29 is used. Specically,
at each decision epoch, each MT on the train looks up the policy table to nd out
the optimal action corresponding to its current state, and then executes action.
Online looking up tables can be designed with little computational complexity in
practice.
In the simulations using NS2, we consider a scenario with three moving nodes
(MT on the trains) traveling between two stations, A and B. e wayside nodes
(base stations) are deployed along the railway line with an average distance of
600m between two successive base stations. ese three moving nodes depart from
station A successively with a constant interval and stop at station B. In the moving
Table 7.2 Simulation Parameters
Notation Denition Value
M Train mass 50,000 kg
T
ex
Extra delay caused by handoff
9 μs
P
o
Transmission power 100 mW
L
ack
ACK packet size 20 bytes
L
packet
Data packet size 200 bytes
L Number of channel states 4
K Number of tracking error states 5
T Communication period 100 ms
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.156.235