130 Advances in Communications-Based Train Control Systems
an overview of SMDP modeling. en, the states, action, reward functions, state
transition probability, and constraints are presented.
Markov decision process (MDP) provides a mathematical framework for mod-
eling decision making in situations where outcomes are partly random and partly
under the control of a decision maker. Besides the basic features, an SMDP gener-
alizes an MDP by allowing decision maker to choose actions whenever the system
state changes and allowing the time spent in a particular state to follow an arbitrary
probability distribution.
In our proposed CBTC systems with CoMP, the MT on the train makes hando
and CoMP cluster selection decisions at specic time instances according to the current
state
st()
, and the system moves into a new state based on the current state
st()
as well as
the chosen decision
at()
. Given
st()
and
at()
, the next state is conditionally independent
of all previous states and actions. is Markov property of state transition process makes
it possible to model the CoMP cluster selection and hando problem as an SMDP.
An SMDP model consists of the following six elements: (1) decision epochs,
(2)actions, (3) states, (4) reward functions, (5) state transition probabilities, and
(6)constraints, which will be described in Sections 7.5.1.1 through 7.5.1.6.
7.5.1.1 Decision Epochs
e MT on a train has to make a decision whenever a certain time period has
elapsed. e instant times are called decision epochs, and we designate it as
k
.
7.5.1.2 Actions
In our SMDP model, at each decision epoch, the MT on the train rst has to decide
whether or not to perform hando action. Afterward, the CoMP cluster should be
determined. As shown in Figure 7.3, we assume the MT on the train will not be
in the coverage of more than four successive base stations, and we denote them as
BBB1, 2, 3
, and
B4
. As a result, there are ve potential clusters in the system. ey are
(1
,2),(2),(2,3),
(3)
B BBB
BB
, and
(3,4)BB
, and we denote them as
CCCC
1, 2, 3,
4
,
and
C5
, respectively.
In order to achieve the optimal performance, all of the base stations in the network
should cooperate with each other in each transmission or reception process. However,
the introduced complexity may not be acceptable in real systems. erefore, standards
have specied the maximal number of base stations that may cooperate with each other
[24]. In CBTC systems, the base stations are linearly deployed. For ease of presentation,
we choose only two base stations in the cluster, because a third base station is too far
away from the mobile station, and the performance gain brought by adding a base sta-
tion into the cluster may not be worth the complexity cost in real systems. We also need
to point out that our model is not limited to a cluster of only one or two base stations.
Itcan be easily extended to incorporate more than two base stations in a cluster.
Novel Communications-Based Train Control System 131
ese ve clusters correspond to ve dierent actions
ak()
that can be performed
at each decision epoch. erefore, the action space can be described as
ACCCCC
=
{1,2,3,4
,5
} (7.16)
7.5.1.3 States
e composite state
sk
S()
is given as
sk hkhkhkhk
kk
BB BB
() {(), (),(), (),(),
()
}
1234
ε
(7.17)
where:
hk
B1
()
,
hk
B2
()
,
hk
B3
()
, and
hk
B4
()
are the channel gains from four successive base
stations to the train MT, respectively
ξ()
k
is the currently used cluster
ε()
k is the velocity tracking error which is the error between the current train
velocity and the reference velocity on the guidance trajectory
hk
B1
()
,
hk
B2
()
,
hk
B3
()
, and
hk
B4
()
{,
}
12
γγ γ
L
, where
L
is the number of states
in the channel model
ξ
() {1,2,3,4
,5}
kCCC
CC
, because the currently used cluster is completely
determined by the current action
e velocity tracking error
ε
is obtained by comparing the current train velocity
with the destination velocity, and the destination velocity is calculated based on
the distance between the two trains. In order to make the SMDP solvable,
ε()
k
is
discretized, and
εεεε
() {, ,,
}
12
k
K
, where
K
is the total number of velocity track-
ing error states.
7.5.1.4 Reward Function
Reward function reects the reward that can be obtained under a certain state and
action. With the objective to minimize linear quadratic cost function as shown in
Equation 7.4, we dene the reward function as the reciprocal of the sum of tracking
error and control magnitude. e reward function is closely related to the com-
munication latency in the CBTC system. We need to point out that our proposed
CBTC system with CoMP is not limited to a certain type of wireless technology.
Once the communication latency is obtained, it can also be used in CBTC systems
with dedicated short-range communications [28], LTE, or WiMax.
We present the rewards under dierent circumstances based on the communi-
cation latency model described in Section 7.4 as follows:
When
ak
k
()
, which means that the currently used cluster does not change
and no hando happens, the reward under this circumstance is dened as
132 Advances in Communications-Based Train Control Systems
rh kh kh kh kkkak
Qk T
BB BB1234
(),(), (),(), (),(), ()
1/ ()
ξε
εα
[]
=
+ αα
εα
×
[]
+
{}
<
+
TR
uT
T
Qk
u
M
TTRu
af
ll
afl
if
2
2
2
2
,
1/ ()
>
,if
l
TT
(7.18)
where:
u
is the control command from the ATO subsystem to bring the train to the
optimized guidance trajectory
T
is the communication period
M
is the train mass
α
is the deceleration that denes the ATP service braking curve
T
l
is the current train–ground communication latency, which can be calculated
with current channel state
hk
hkhk
hk
BB
BB
12
34
(),(), ()
,()
according to the
communication latency model
When
TT
l
< , the current communication latency is greater than the communication
period and the MA cannot be updated under this decision epoch. erefore, the veloc-
ity tracking error is increased by
α⋅T
due to a communication interruption of
T
. For
any train that is traveling behind, when the front train is in long latency/interruption
state, it cannot receive the updated MA. is is because the train in front cannot report
its new position to ZC. Given the average communication latency from the front train
to ground
T
afl
, the velocity tracking error increased by the front train latency is
α⋅
T
afl
.
When
ak
k
()
≠ξ
, which means that a base station joins or leaves the currently
used cluster and a hando happens, the communication latency between the train
and the ground will be increased by
TT
lex
+
.
T
ex
is due to the extra signal exchange.
e reward under this circumstance is dened as
rh kh kh kh kkkak
Qk T
BB BB1234
(),(), (),(), (),(), ()
1/
() (
ξε
εα
[]
=
+
exxl afl
++×
[]
+
{}
TTRu)
2
2
α
(7.19)
7.5.1.5 State Transition Probability
Given the current state,
sk
hkhkhkhk
kk
BB BB
() {(), (),(), (),(),
()}
1234
ε
, and the chosen
action, a(k), the probability function of the next state,
sk
hk hk
BB
(1){ (1),
(1),
12
+= ++
hk
hk
kk
BB34
(1), (1), (1),
(1)}++++ξε
, is given by
Ps
kskak(1)| (),()+
[]
. Here, for sim-
plicity of formulation and presentation, we assume that the wireless channels, cur-
rently used cluster, and velocity tracking error are independent. is assumption is
reasonable in practice, because the wireless channels for dierent links are indepen-
dent, and the currently used cluster is solely determined by the last action. Moreover,
the velocity tracking error is dependent on the train dynamic model, which makes
Novel Communications-Based Train Control System 133
it reasonable to assume that it is independent from other components in the state.
erefore, we have
PskskakPhk hk Ph khk
BBBB
(( 1)|(), ())[(1)| ()][ (1)| ()]
11
22
+=+⋅+⋅
PPh khkPhk hk
Pk kak
BBBB
[(1)|()] [(1)|()]
[( 1)|(), ()]
33
44
+⋅
+⋅
+ξξ ⋅⋅
+⋅
+
PHkHkak
Pk kak
[( 1)|(), ()]
[( 1)|(), ()]εε
( 7. 20 )
where:
Ph kh
k
BB
[( 1)
|()]
11
+
,
Ph
khk
BB
[(1)
|()]
22
+
,
Ph kh
k
BB
[(1)
|()]
33
+
, and P[h
B4
(k + 1)
|()]
4
hk
B
are the channel state transition probabilities for dierent wireless
links, respectively
Pk kak[( 1)|(),
()]ξξ+
is the currently used cluster transition probability
Pk kak[( 1)|(),
()]εε+
is the velocity tracking error transition probability
e channel state transition probabilities can be obtained from real eld test data,
which is described in Section 7.3.2. Other state transition probabilities will be
derived in the following.
First, we derive the transition probability for the currently used CoMP cluster.
Because the next used cluster is determined by the chosen action, the currently used
cluster transition probability can be simply derived as
Pk kak
ak k
ak k
[( 1)|(), ()]
0, ()
(1
)
1, ()
(1
)
ξξ
ξ
ξ
+=
≠+
=+
if
if
(7.21)
Second, we derive the transition probability for the velocity tracking error. e
velocity tracking error is dependent on the control command at every decision
epoch and the hando action. Given a control command from the ATO subsystem
to bring the train to the optimized guidance trajectory, the velocity tracking error
transition probability is derived as
Pk kak
ak sT T
kkT
k
[( 1)|(), ()]
1, () ,>,
(1)()
εε
εεαα
+=
=
+=
if
and
l
TT
ak sT T
kk
u
M
TT
ak s
k
afl
l
afl
if
and
if
1, () ,<,
(1)()
1, ()
=
+=
εε α
kk
kkTT T
,
(1)()( )
0,
and
otherwise
ex
la
fl
εε
αα
+= ++
(7.22)
134 Advances in Communications-Based Train Control Systems
where:
u
is the control command
T
is the communication period
M
is the train mass
T
l
is the current train–ground communication latency
T
af
l
is the average communication latency from the front train to the ground
α
is the deceleration that denes the ATP service braking curve
7.5.1.6 Constraints
As we described in Section 7.5.1.2, the hando action in the system is performed by
switching between dierent clusters. In order to avoid the “break-before-make” situ-
ation and realize soft hando, action
ak()
is constrained in certain states. It can only
switch between neighbor clusters. Mathematically, the constraints are dened as
ak Cj
ik
C
ji kC
ji
i
() ,< 1()
>1 ()
≠−
+=
if
and
if
ξ
ξ
(7.23)
7.5.2 Solutions to SMDP-Based CoMP ClusterSelection
and Handoff Decision Scheme
A decision rule prescribes a procedure for action selection in each state at a specied
decision epoch. Markov decision rules are functions
δ
():kS A
, which specify
the action choice
ak()
when the system occupies state
sk()
at decision epoch
k
.
Apolicy
πδ δδ=
((1),(2),...,
())
k is a sequence of decision rules to be used at all deci-
sion epochs.
A stationary control policy π induces a joint distribution for the random process
{( )}
sk . e optimal policy for SMDP can be obtained by solving the Bellman equa-
tion [11] recursively as
ρδδ
δ
+=
[]
+
[]
Vs rs sPrs ssVs
s
s
() ,()|,()()
()
min
(7.24)
where:
δ()
s
is the cluster selection action taken in state
s
rs
s(,
())δ
given by Equations 7.18 and 7.19 is the per-stage reward when the cur-
rent state is
s
and action
δ()
s
is taken
If there is a
(, ())ρ
Vs satisfying Equation 7.24, then
ρ
is the optimal average reward
per stage, and the optimizing policy is given by
πδ
**
()
()ss
= , where
δ
*
()
s
is the opti-
mizing action of Equation 7.24 at state
s
.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.199.122