7. Novel Communications-Based Train Control System with Coordinated Multipoint Transmission and Reception (4/7)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

130 ◾ Advances in Communications-Based Train Control Systems

an overview of SMDP modeling. en, the states, action, reward functions, state

transition probability, and constraints are presented.

Markov decision process (MDP) provides a mathematical framework for mod-

eling decision making in situations where outcomes are partly random and partly

under the control of a decision maker. Besides the basic features, an SMDP gener-

alizes an MDP by allowing decision maker to choose actions whenever the system

state changes and allowing the time spent in a particular state to follow an arbitrary

probability distribution.

In our proposed CBTC systems with CoMP, the MT on the train makes hando

and CoMP cluster selection decisions at specic time instances according to the current

state

st()

, and the system moves into a new state based on the current state

st()

as well as

the chosen decision

at()

. Given

st()

and

at()

, the next state is conditionally independent

of all previous states and actions. is Markov property of state transition process makes

it possible to model the CoMP cluster selection and hando problem as an SMDP.

An SMDP model consists of the following six elements: (1) decision epochs,

(2)actions, (3) states, (4) reward functions, (5) state transition probabilities, and

(6)constraints, which will be described in Sections 7.5.1.1 through 7.5.1.6.

7.5.1.1 Decision Epochs

e MT on a train has to make a decision whenever a certain time period has

elapsed. e instant times are called decision epochs, and we designate it as

7.5.1.2 Actions

In our SMDP model, at each decision epoch, the MT on the train rst has to decide

whether or not to perform hando action. Afterward, the CoMP cluster should be

determined. As shown in Figure 7.3, we assume the MT on the train will not be

in the coverage of more than four successive base stations, and we denote them as

BBB1, 2, 3

, and

. As a result, there are ve potential clusters in the system. ey are

,2),(2),(2,3),

(3)

B BBB

, and

(3,4)BB

, and we denote them as

CCCC

1, 2, 3,

and

, respectively.

In order to achieve the optimal performance, all of the base stations in the network

should cooperate with each other in each transmission or reception process. However,

the introduced complexity may not be acceptable in real systems. erefore, standards

have specied the maximal number of base stations that may cooperate with each other

[24]. In CBTC systems, the base stations are linearly deployed. For ease of presentation,

we choose only two base stations in the cluster, because a third base station is too far

away from the mobile station, and the performance gain brought by adding a base sta-

tion into the cluster may not be worth the complexity cost in real systems. We also need

to point out that our model is not limited to a cluster of only one or two base stations.

Itcan be easily extended to incorporate more than two base stations in a cluster.

Novel Communications-Based Train Control System ◾ 131

ese ve clusters correspond to ve dierent actions

ak()

that can be performed

at each decision epoch. erefore, the action space can be described as

ACCCCC

{1,2,3,4

} (7.16)

7.5.1.3 States

e composite state

S()

∈

is given as

sk hkhkhkhk

BB BB

() {(), (),(), (),(),

()

}

1234

=ξε

(7.17)

where:

()

, and

()

are the channel gains from four successive base

stations to the train MT, respectively

ξ()

is the currently used cluster

ε()

k is the velocity tracking error which is the error between the current train

velocity and the reference velocity on the guidance trajectory

()

, and

()

∈

}

γγ γ

, where

is the number of states

in the channel model

() {1,2,3,4

,5}

kCCC

CC∈

, because the currently used cluster is completely

determined by the current action

e velocity tracking error

is obtained by comparing the current train velocity

with the destination velocity, and the destination velocity is calculated based on

the distance between the two trains. In order to make the SMDP solvable,

ε()

discretized, and

εεεε

() {, ,,

}

∈ 

, where

is the total number of velocity track-

ing error states.

7.5.1.4 Reward Function

Reward function reects the reward that can be obtained under a certain state and

action. With the objective to minimize linear quadratic cost function as shown in

Equation 7.4, we dene the reward function as the reciprocal of the sum of tracking

error and control magnitude. e reward function is closely related to the com-

munication latency in the CBTC system. We need to point out that our proposed

CBTC system with CoMP is not limited to a certain type of wireless technology.

Once the communication latency is obtained, it can also be used in CBTC systems

with dedicated short-range communications [28], LTE, or WiMax.

We present the rewards under dierent circumstances based on the communi-

cation latency model described in Section 7.4 as follows:

When

()

=ξ

, which means that the currently used cluster does not change

and no hando happens, the reward under this circumstance is dened as

132 ◾ Advances in Communications-Based Train Control Systems

rh kh kh kh kkkak

Qk T

BB BB1234

(),(), (),(), (),(), ()

1/ ()

ξε

εα

[]

+×+ αα

εα

[]

{}

+×+×























TTRu

afl

1/ ()





















,if

(7.18)

where:

is the control command from the ATO subsystem to bring the train to the

optimized guidance trajectory

is the communication period

is the train mass

is the deceleration that denes the ATP service braking curve

is the current train–ground communication latency, which can be calculated

with current channel state

hkhk

(),(), ()

,()

according to the

communication latency model

When

< , the current communication latency is greater than the communication

period and the MA cannot be updated under this decision epoch. erefore, the veloc-

ity tracking error is increased by

α⋅T

due to a communication interruption of

. For

any train that is traveling behind, when the front train is in long latency/interruption

state, it cannot receive the updated MA. is is because the train in front cannot report

its new position to ZC. Given the average communication latency from the front train

to ground

afl

, the velocity tracking error increased by the front train latency is

α⋅

afl

When

()

≠ξ

, which means that a base station joins or leaves the currently

used cluster and a hando happens, the communication latency between the train

and the ground will be increased by

lex

is due to the extra signal exchange.

e reward under this circumstance is dened as

rh kh kh kh kkkak

Qk T

BB BB1234

(),(), (),(), (),(), ()

() (

ξε

εα

[]

exxl afl

++×

[]

{}

TTRu)

(7.19)

7.5.1.5 State Transition Probability

Given the current state,

hkhkhkhk

BB BB

() {(), (),(), (),(),

()}

1234

=ξε

, and the chosen

action, a(k), the probability function of the next state,

hk hk

(1){ (1),

(1),

+= ++

BB34

(1), (1), (1),

(1)}++++ξε

, is given by

kskak(1)| (),()+

[]

. Here, for sim-

plicity of formulation and presentation, we assume that the wireless channels, cur-

rently used cluster, and velocity tracking error are independent. is assumption is

reasonable in practice, because the wireless channels for dierent links are indepen-

dent, and the currently used cluster is solely determined by the last action. Moreover,

the velocity tracking error is dependent on the train dynamic model, which makes

Novel Communications-Based Train Control System ◾ 133

it reasonable to assume that it is independent from other components in the state.

erefore, we have

PskskakPhk hk Ph khk

BBBB

(( 1)|(), ())[(1)| ()][ (1)| ()]

+=+⋅+⋅

PPh khkPhk hk

Pk kak

BBBB

[(1)|()] [(1)|()]

[( 1)|(), ()]

+⋅

+ξξ ⋅⋅

+⋅

PHkHkak

Pk kak

[( 1)|(), ()]

[( 1)|(), ()]εε

( 7. 20 )

where:

Ph kh

[( 1)

|()]

khk

[(1)

|()]

Ph kh

[(1)

|()]

, and P[h

(k + 1)

|()]

are the channel state transition probabilities for dierent wireless

links, respectively

Pk kak[( 1)|(),

()]ξξ+

is the currently used cluster transition probability

Pk kak[( 1)|(),

()]εε+

is the velocity tracking error transition probability

e channel state transition probabilities can be obtained from real eld test data,

which is described in Section 7.3.2. Other state transition probabilities will be

derived in the following.

First, we derive the transition probability for the currently used CoMP cluster.

Because the next used cluster is determined by the chosen action, the currently used

cluster transition probability can be simply derived as

Pk kak

ak k

[( 1)|(), ()]

0, ()

)

1, ()

)

ξξ

≠+







(7.21)

Second, we derive the transition probability for the velocity tracking error. e

velocity tracking error is dependent on the control command at every decision

epoch and the hando action. Given a control command from the ATO subsystem

to bring the train to the optimized guidance trajectory, the velocity tracking error

transition probability is derived as

Pk kak

ak sT T

kkT

[( 1)|(), ()]

1, () ,>,

(1)()

εε

εεαα

+= +×+×

and

ak sT T

ak s

afl

and

1, () ,<,

(1)()

1, ()

+= +×+×

≠

εε α

kkTT T

(1)()( )

and

otherwise

εε

αα

+= +++×











(7.22)

134 ◾ Advances in Communications-Based Train Control Systems

where:

is the control command

is the communication period

is the train mass

is the current train–ground communication latency

is the average communication latency from the front train to the ground

is the deceleration that denes the ATP service braking curve

7.5.1.6 Constraints

As we described in Section 7.5.1.2, the hando action in the system is performed by

switching between dierent clusters. In order to avoid the “break-before-make” situ-

ation and realize soft hando, action

ak()

is constrained in certain states. It can only

switch between neighbor clusters. Mathematically, the constraints are dened as

ak Cj

ji kC

() ,< 1()

>1 ()

≠− ≠

and

(7.23)

7.5.2 Solutions to SMDP-Based CoMP ClusterSelection

and Handoff Decision Scheme

A decision rule prescribes a procedure for action selection in each state at a specied

decision epoch. Markov decision rules are functions

():kS A

→

, which specify

the action choice

ak()

when the system occupies state

sk()

at decision epoch

Apolicy

πδ δδ=

((1),(2),...,

())

k is a sequence of decision rules to be used at all deci-

sion epochs.

A stationary control policy π induces a joint distribution for the random process

{( )}

sk . e optimal policy for SMDP can be obtained by solving the Bellman equa-

tion [11] recursively as

ρδδ

[]

′

[]

′





















′

∑

Vs rs sPrs ssVs

() ,()|,()()

()

min

(7.24)

where:

δ()

is the cluster selection action taken in state

s(,

())δ

given by Equations 7.18 and 7.19 is the per-stage reward when the cur-

rent state is

and action

δ()

is taken

If there is a

(, ())ρ

Vs satisfying Equation 7.24, then

is the optimal average reward

per stage, and the optimizing policy is given by

πδ

()

()ss

= , where

()

is the opti-

mizing action of Equation 7.24 at state

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 7. Novel Communications-Based Train Control System with Coordinated Multipoint Transmission and Reception (4/7)

Create new playlist

Sign In

Sign Up

Table of Contents for
7. Novel Communications-Based Train Control System with Coordinated Multipoint Transmission and Reception (4/7)