130 ◾ Advances in Communications-Based Train Control Systems
an overview of SMDP modeling. en, the states, action, reward functions, state
transition probability, and constraints are presented.
Markov decision process (MDP) provides a mathematical framework for mod-
eling decision making in situations where outcomes are partly random and partly
under the control of a decision maker. Besides the basic features, an SMDP gener-
alizes an MDP by allowing decision maker to choose actions whenever the system
state changes and allowing the time spent in a particular state to follow an arbitrary
probability distribution.
In our proposed CBTC systems with CoMP, the MT on the train makes hando
and CoMP cluster selection decisions at specic time instances according to the current
state
, and the system moves into a new state based on the current state
as well as
the chosen decision
. Given
and
, the next state is conditionally independent
of all previous states and actions. is Markov property of state transition process makes
it possible to model the CoMP cluster selection and hando problem as an SMDP.
An SMDP model consists of the following six elements: (1) decision epochs,
(2)actions, (3) states, (4) reward functions, (5) state transition probabilities, and
(6)constraints, which will be described in Sections 7.5.1.1 through 7.5.1.6.
7.5.1.1 Decision Epochs
e MT on a train has to make a decision whenever a certain time period has
elapsed. e instant times are called decision epochs, and we designate it as
.
7.5.1.2 Actions
In our SMDP model, at each decision epoch, the MT on the train rst has to decide
whether or not to perform hando action. Afterward, the CoMP cluster should be
determined. As shown in Figure 7.3, we assume the MT on the train will not be
in the coverage of more than four successive base stations, and we denote them as
, and
. As a result, there are ve potential clusters in the system. ey are
,2),(2),(2,3),
B BBB
, and
, and we denote them as
1, 2, 3,
,
and
, respectively.
In order to achieve the optimal performance, all of the base stations in the network
should cooperate with each other in each transmission or reception process. However,
the introduced complexity may not be acceptable in real systems. erefore, standards
have specied the maximal number of base stations that may cooperate with each other
[24]. In CBTC systems, the base stations are linearly deployed. For ease of presentation,
we choose only two base stations in the cluster, because a third base station is too far
away from the mobile station, and the performance gain brought by adding a base sta-
tion into the cluster may not be worth the complexity cost in real systems. We also need
to point out that our model is not limited to a cluster of only one or two base stations.
Itcan be easily extended to incorporate more than two base stations in a cluster.