Home Page Icon
Home Page
Table of Contents for
Table of Contents
Close
Table of Contents
by Paola Festa, Christian Blum
Metaheuristics for String Problems in Bio-informatics
Cover
Dedication
Title
Copyright
Preface
Acknowledgments
List of Acronyms
1 Introduction
1.1. Complete methods for combinatorial optimization
1.2. Approximate methods: metaheuristics
1.3. Outline of the book
2 Minimum Common String Partition Problem
2.1. The MCSP problem
2.2. An ILP model for the UMCSP problem
2.3. Greedy approach
2.4. Construct, merge, solve and adapt
2.5. Experimental evaluation
2.6. Future work
3 Longest Common Subsequence Problems
3.1. Introduction
3.2. Algorithms for the LCS problem
3.3. Algorithms for the RFLCS problem
3.4. Future work
4 The Most Strings With Few Bad Columns Problem
4.1. The MSFBC problem
4.2. An ILP model for the MSFBC problem
4.3. Heuristic approaches
4.4. ILP-based large neighborhood search
4.5. Experimental evaluation
4.6. Future work
5 Consensus String Problems
5.1. Introduction
5.2. Organization of this chapter
5.3. The closest string problem and the close to most string problem
5.4. The farthest string problem and the far from most string problem
5.5. An ILP-based heuristic
5.6. Future work
6 Alignment Problems
6.1. Introduction
6.2. The pairwise alignment problem
6.3. The multiple alignment problem
6.4. Conclusion and future work
7 Conclusions
7.1. DNA sequencing
7.2. Founder sequence reconstruction
7.3. Final remarks
Bibliography
Index
End User License Agreement
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Dedication
Table of Contents
Cover
Dedication
Title
Copyright
Preface
Acknowledgments
List of Acronyms
1 Introduction
1.1. Complete methods for combinatorial optimization
1.2. Approximate methods: metaheuristics
1.3. Outline of the book
2 Minimum Common String Partition Problem
2.1. The MCSP problem
2.2. An ILP model for the UMCSP problem
2.3. Greedy approach
2.4. Construct, merge, solve and adapt
2.5. Experimental evaluation
2.6. Future work
3 Longest Common Subsequence Problems
3.1. Introduction
3.2. Algorithms for the LCS problem
3.3. Algorithms for the RFLCS problem
3.4. Future work
4 The Most Strings With Few Bad Columns Problem
4.1. The MSFBC problem
4.2. An ILP model for the MSFBC problem
4.3. Heuristic approaches
4.4. ILP-based large neighborhood search
4.5. Experimental evaluation
4.6. Future work
5 Consensus String Problems
5.1. Introduction
5.2. Organization of this chapter
5.3. The closest string problem and the close to most string problem
5.4. The farthest string problem and the far from most string problem
5.5. An ILP-based heuristic
5.6. Future work
6 Alignment Problems
6.1. Introduction
6.2. The pairwise alignment problem
6.3. The multiple alignment problem
6.4. Conclusion and future work
7 Conclusions
7.1. DNA sequencing
7.2. Founder sequence reconstruction
7.3. Final remarks
Bibliography
Index
End User License Agreement
List of Tables
2 Minimum Common String Partition Problem
Table 2.1.
Results of tuning CMSA for the UMCSP problem using irace
Table 2.2.
Results for instances where
|Σ| = 4
Table 2.3.
Results for instances where
|Σ| = 12
3 Longest Common Subsequence Problems
Table 3.1.
Setting of
κ
pbs
,
κ
rb
,
κ
bs
,
and
ρ
depending on the convergence factor
cf
and the Boolean control variable bs_update
Table 3.2.
Results of tuning Beam–ACO with irace
Table 3.3.
Results for instances where
|Σ| = 4
Table 3.4.
Results for instances where
|Σ| = 12
Table 3.5.
Results for instances where
|Σ| = 20
Table 3.6.
Results of Beam–ACO tuning for the RFLCS problem with irace
Table 3.7.
Results of CMSA tuning for the RFLCS problem with irace
Table 3.8.
Experimental results concerning the RFLCS problem
4 The Most Strings With Few Bad Columns Problem
Table 4.1.
Parameter settings produced by irace for
LNS
concerning
|Σ| = 4
Table 4.2.
Parameter settings produced by irace for LNS where
|Σ| = 12
Table 4.3.
Parameter settings produced by irace for LNS where
|Σ| = 20
Table 4.4.
Experimental results for instances where
|Σ| = 4
Table 4.5.
Experimental results for instances where
|Σ| = 12
Table 4.6.
Experimental results for instances where
|Σ| = 20
5 Consensus String Problems
Table 5.1.
Setting of
κ
ib
and
κ
rb
depending on the convergence factor cf
Table 5.2.
Numerical results
Table 5.3.
Numerical results for the CSP
Table 5.4.
Numerical results for the FSP
Table 5.5.
Numerical results for the CTMS problem
Table 5.6.
Results for the FFMS problem
7 Conclusions
Table 7.1.
Heuristic and metaheuristic approaches for the layout phase in DNA fragment assembly
Table 7.2.
Heuristic and metaheuristic approaches to the SBH problem
List of Illustrations
1 Introduction
Figure 1.1.
DNA double helix (image courtesy of Wikipedia)
Figure 1.2.
Graphical representation of the feasible region
X
of a generic ILP problem:
,
where
Figure 1.3.
Feasible set of the problem described in Example 1.1.
, with
and
. Furthermore,
, since either x
I
= (0, 2)
or x
I
= (1, 2)
Figure 1.4.
Feasible set of the problem described in point (6) below Example 1.1: X
= Ø
, but
Figure 1.5.
Branch and bound branching tree
Figure 1.6.
B&B scenario
Figure 1.7.
B&B partitioning (branching)
Figure 1.8.
Branching:
Algorithm 1.1.
General Cutting Plane Technique
Algorithm 1.2.
Local search
Algorithm 1.3.
Ant colony optimization (ACO)
Algorithm 1.4.
Evolutionary algorithm (EA)
Algorithm 1.5.
Greedy randomized adaptive search procedure (GRASP)
Algorithm 1.6.
Iterated local search (ILS)
Algorithm 1.7.
Simulated annealing (SA)
Algorithm 1.8.
Large neighborhood search (LNS)
Algorithm 1.9.
Construct, merge, solve & adapt (CMSA)
2 Minimum Common String Partition Problem
Figure 2.1.
Improvement of CMSA over CPLEX (in percent). Note that when boxes are missing, CPLEX was not able to provide feasible solutions within the computation time allowed
Figure 2.2.
Improvement of
CMSA
over
GREEDY
(in percent)
Figure 2.3.
Graphical presentation of the sizes of the sub-instances as a percentage of the size of the original problems
Algorithm 2.1.
GREEDY
Algorithm 2.2.
CMSA for the UMCSP problem
Algorithm 2.3.
ProbabilisticSolutionGeneration(
B
) function
3 Longest Common Subsequence Problems
Figure 3.1.
Consider example
I
:= (
S
= {
s
1
, s
2
, s
3
}, Σ = {
a, b, c, d
})
, where
s
1
=
dbcadcc
,
s
2
=
acabadd
, and
s
3
=
bacdcdd
. Let
t
=
ba
be the current partial solution. Figures a), b), and c) show the separation of
s
i
into
x
i
and
y
i
. In addition, position pointers
p
i
and the next positions of the four letters in
y
i
are indicated. If, for some
i
,
y
i
does not contain a specific letter, the corresponding pointer is set to
∞
. This happens, for example, in the case of letters
a
and
b
in
and
Figure 3.2.
Improvement of Beam–ACO over Best-Next. Each box shows the differences between the objective function value of the solution produced by Beam–ACO and the one of the solution produced by Best-Next for the 10 instances of the same type
Figure 3.3.
Improvement of Beam–ACO over BS-L. Each box shows the differences between the objective function value of the solution produced by Beam–ACO and the one of the solution produced by BS-L for the 10 instances of the same type
Figure 3.4.
Improvement of Beam–ACO over BS-H. Each box shows the differences between the objective function value of the solution produced by Beam–ACO and the one of the solution produced by BS-H for the 10 instances of the same type
Figure 3.5.
Improvement of CMSA over Beam–ACO. Each box shows the differences between the objective function value of the solution produced by CMSA and the one of the solution produced by Beam–ACO for 10 instances of the same type
Figure 3.6.
Improvement of CMSA over CPLEX. Each box shows the differences between the objective function value of the solution produced by CMSA and that of the solution produced by CPLEX for 10 instances of the same type. Missing boxes indicate that CPLEX was not able to produce any valid solution
Algorithm 3.1.
BEST-NEXT heuristic for the LCS problem
Algorithm 3.2.
BS for the LCS problem
Algorithm 3.3.
Beam–ACO for the LCS problem
Algorithm 3.4.
CMSA for the RFLCS problem
4 The Most Strings With Few Bad Columns Problem
Figure 4.1.
Improvement of LNS over
PILOT-TR
(in absolute terms). Each box shows these differences for the corresponding 10 instances. Note that negative values indicate that
PILOT-TR
obtained a better result than LNS
Figure 4.2.
Improvement of LNS over CPLEX (in absolute terms). Each box shows these differences for the corresponding 10 instances. Note that negative values indicate that CPLEX obtained a better result than LNS
Figure 4.3.
Average time (in seconds) used by CPLEX within LNS for each application at each iteration. Two examples considering destruction between 10 and 90 percent
Algorithm 4.1.
MakeSolutionComplete
(
S
p
) procedure
Algorithm 4.2.
Truncated pilot method
Algorithm 4.3.
ILP-based LNS for the MSFBC problem
5 Consensus String Problems
Figure 5.1.
Time to target distributions (in seconds) comparing
grasp
and
grasp-h-ev
for a random instance where
n
= 100,
m
= 600,
t
= 480
, with a target value of
= 0.68 ×
n
Figure 5.2.
Time to target distributions (in seconds) comparing
grasp
and
grasp-h-ev
for a random instance where
n
= 200,
m
= 300,
t
= 255
, and a target value of
= 0.025 ×
n
Figure 5.3.
Time to target distributions (in seconds) comparing
grasp
and
grasp-h-ev
for a real instance where
n
= 300,
m
= 1200,
t
= 1020
, and a target value of
= 0.22 ×
n
Figure 5.4.
Time-to-target distributions (in seconds) comparing
grasp-h-ev
,
grasp-h-ev_b
and
grasp-h-ev_ev_pr
on the random instance where
n
= 100,
m
= 300,
t
= 240
, and a target value of
= 0.73 ×
n
Figure 5.5.
Time-to-target distributions (in seconds) comparing
grasp-h-ev
,
grasp-h-ev_b
and
grasp-h-ev_ev_pr
on the random instance where
n
= 200,
m
= 300,
t
= 240
, and a target value of
= 0.445 ×
n
Figure 5.6.
Justification for the choice of parameter
t
for the CTMSP and the FFMSP
Figure 5.7.
Graphical representation of the comparison between
HEURISTIC-3600
and the current FFMSP state-of-the-art methods
(GRASP+PR, ACO, and ACO+CPLEX) for
t
= 0.8
m
and
t
= 0.85
m
Algorithm 5.1.
GRASP for the FFMSP
Algorithm 5.2.
Function GRRAND of Algorithm 5.1
Algorithm 5.3.
Path-relinking for the FFMSP
Algorithm 5.4.
Mixed-Path-relinking for the FFMSP [RIB 07]
Algorithm 5.5.
(GRAPR) Greedy Randomized Adaptive Path-relinking for FFMSP [FAR 05]
Algorithm 5.6.
Evolutionary path-relinking for FFMSP [RES 10b, RES 04]
Algorithm 5.7.
GRASP+PR for the FFMSP
Algorithm 5.8.
CHOOSE-PR-STRATEGY – Function for selection of path-relinking strategy for FFMSP
Algorithm 5.9.
(Hybrid) ACO for the FFMSP
Algorithm 5.10.
RunACO(
s
bs
) function of Algorithm 5.9
6 Alignment Problems
Figure 6.1.
An example of alignment of two sequences
Figure 6.2.
All possible alignments of two sequences without gaps
Figure 6.3.
Two possible alignments of two sequences: a) not inserting gaps with 6 pairings; b) inserting gaps with 7 pairings
Figure 6.4.
Dot matrix representing the alignment in Figure 6.1
Figure 6.5.
Matrix
H
generated by the application of Smith and Waterman’s algorithm to strings
s
= A A U G C C A U U G A C G G
and
t
= C A G C C U C G C U U A G
Figure 6.6.
The alignment obtained for the simple example in Figure 6.5
7 Conclusions
Figure 7.1.
a) The completely connected directed graph
G
with the spectrum
S
= {
ACT, TGA, GAC, CTC, TAA
}
as vertex set. The edge weights (i.e. overlaps) are not indicated for readability reasons. For example, the weight on the edge from TGA to GAC is 2, because GA is the longest DNA strand that is a suffix of TGA and a prefix of GAC. An optimal solution is
p
∗
= (
ACT, TGA, GAC, CTC
)
. b) How to retrieve the DNA sequence that is encoded by
p
∗
. Note that
c
(
p
∗
) = 8,
which is equal to the length of the encoded DNA sequence
Figure 7.2.
a) A set of six recombinants in matrix form. Assuming that the number of founders (
k
f
) is fixed to three, b) shows a valid solution, that is, a matrix of three founders. Denoting the first founder by a, the second founder by b, and the third one by c, c) shows a decomposition of the recombinant matrix into fragments taken from the founders. Note that breakpoints are marked by vertical lines. This is a decomposition with 8 breakpoints, which is the minimum value for this instance
Guide
Cover
Table of Contents
Begin Reading
Pages
C1
ii
iii
iv
v
ix
x
xi
xiii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
175
176
177
178
179
180
181
182
183
184
185
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
G1
G2
G3
G4
G5
G6
G7
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset