Chapter 12
Strings are useful for more than representing sentences and words. For example, genetic information is usually represented by character strings.
DNA is a double helix of two chains of nucleotides. Each nucleotide base can be represented by a single letter, and so a chain of nucleotides can be thought of as a string. Even though DNA has two chains (also called strands), the two are closely related: given one, it is easy to calculate the other. Thus, DNA is generally described by one string of characters representing the nucleotide bases of one of its strands.
The four bases that make up DNA are adenine (A), cytosine (C), guanine (G), and thymine (T).
The second strand of DNA is always the reverse complement of the first. The complement of a strand of DNA swaps each base with its complementary base: A ↔ T and C ↔ G, and the reverse complement just reverses the order of the complementary sequence. For example, the complement of AGGTC is TCCAG, and the reverse complement is GACCT.
Listing 12.1: DNA Sequences
1 # dna.py
2
3 from random import choice
4
5 def complement(dna):
6 result = ""
7 for c in dna:
8 if c == "A":
9 result += "T"
10 elif c == "T":
11 result += "A"
12 elif c == "C":
13 result += "G"
14 elif c == "G":
15 result += "C"
16 return result
17
18 def reversecomp(dna):
19 return complement(dna)[::−1]
20
21 def random_dna(length=30):
22 fragment = ""
23 for j in range(length):
24 fragment += choice("ACGT")
25 return fragment
26
27 def main():
28 for i in range(10):
29 dna = random_dna()
30 print(dna + " " + reversecomp(dna))
31 print(complement(dna) + "
")
32
33 main()
This program generates random strings of DNA and displays each with its reverse complement. On the left side of the output, you will see the random strand with its complement below it. This is how the two strands are tied together in the double-helix: each base binds with its complement across the helix. However, while the top strand is read left-to-right, the second strand is read from right to left, and so it is the reverse complement, printed on the right, that is more useful than the complement.
Both the complement() and random_dna() functions use string accumulation loops to build their return values. These follow the same pattern as numeric accumulation loops (see Chapter 4):
<accumulator> = <starting value>
loop:
<accumulator> += <string to add> # adds on the right
The reason these are so similar is that concatenation is analogous to addition for strings, and Python uses “+” to represent both.
There are two main differences between numeric and string accumulators:
The starting value for string accumulators is usually the empty string "", denoted in Python by two quotation marks with nothing between them.
Concatenation is not commutative. Adding on the right is usually different from adding on the left (see Exercise 12.2):
s = s + t |
Add t to s on the right. |
s = t + s |
Add t to s on the left. |
Most of the time (as in Listing 12.1), you will want to add on the right, and you can use the shorthand “+=” to do that. However, if you need to add on the left, you will have to write out the full statement instead of using the shorthand.
There is another new feature in the complement() function of Listing 12.1: the for loop in line 7. In fact, the syntax is not new:
for <variable> in <string>: # loop over each character
<body>
Compare this with the syntax given in Chapter 4: they are identical except that the earlier version wrote <sequence> instead of string. Python treats strings as sequences of characters, and so when a string is used in a for loop, the variable takes on the value of each character in the string.
The main() function in Listing 12.1 uses one other new feature: an escape sequence inside of a string. Escape sequences begin with a backslash “” and are used to insert non-alphabetic characters into a string:
|
Newline. |
|
Tab. |
" |
To get " inside a double-quoted string. |
’ |
To get ’ inside a single-quoted string. |
\ |
If you need a backslash itself. |
Escape sequences and concatenation give us more control over printing than we have had to this point.
repeat while n is positive:
concatenate the bit n % 2 to the left end of the result
integer-divide n by 2
You may need a type conversion from Chapter 3. Write a main() that tests your function on all integers between 0 and 100.
18.227.26.217