Home Page Icon
Home Page
Table of Contents for
Mastering Python for Bioinformatics
Close
Mastering Python for Bioinformatics
by
Mastering Python for Bioinformatics
Preface
Who Should Read This?
Programming Style: Why I Avoid OOP and Exceptions
Structure
Test-Driven Development
Using the Command Line and Installing Python
Getting the Code and Tests
Installing Modules
Installing the new.py Program
Why Did I Write This Book?
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
Acknowledgments
I. The Rosalind.info Challenges
1. Tetranucleotide Frequency: Counting Things
Getting Started
Creating the Program Using new.py
Using argparse
Tools for Finding Errors in the Code
Introducing Named Tuples
Adding Types to Named Tuples
Representing the Arguments with a NamedTuple
Reading Input from the Command Line or a File
Testing Your Program
Running the Program to Test the Output
Solution 1: Iterating and Counting the Characters in a String
Counting the Nucleotides
Writing and Verifying a Solution
Additional Solutions
Solution 2: Creating a count() Function and Adding a Unit Test
Solution 3: Using str.count()
Solution 4: Using a Dictionary to Count All the Characters
Solution 5: Counting Only the Desired Bases
Solution 6: Using collections.defaultdict()
Solution 7: Using collections.Counter()
Going Further
Review
2. Transcribing DNA into mRNA: Mutating Strings, Reading and Writing Files
Getting Started
Defining the Program’s Parameters
Defining an Optional Parameter
Defining One or More Required Positional Parameters
Using nargs to Define the Number of Arguments
Using argparse.FileType() to Validate File Arguments
Defining the Args Class
Outlining the Program Using Pseudocode
Iterating the Input Files
Creating the Output Filenames
Opening the Output Files
Writing the Output Sequences
Printing the Status Report
Using the Test Suite
Solutions
Solution 1: Using str.replace()
Solution 2: Using re.sub()
Benchmarking
Going Further
Review
3. Reverse Complement of DNA: String Manipulation
Getting Started
Iterating Over a Reversed String
Creating a Decision Tree
Refactoring
Solutions
Solution 1: Using a for Loop and Decision Tree
Solution 2: Using a Dictionary Lookup
Solution 3: Using a List Comprehension
Solution 4: Using str.translate()
Solution 5: Using Bio.Seq
Review
4. Creating the Fibonacci Sequence: Writing, Testing, and Benchmarking Algorithms
Getting Started
An Imperative Approach
Solutions
Solution 1: An Imperative Solution Using a List as a Stack
Solution 2: Creating a Generator Function
Solution 3: Using Recursion and Memoization
Benchmarking the Solutions
Testing the Good, the Bad, and the Ugly
Running the Test Suite on All the Solutions
Going Further
Review
5. Computing GC Content: Parsing FASTA and Analyzing Sequences
Getting Started
Get Parsing FASTA Using Biopython
Iterating the Sequences Using a for Loop
Solutions
Solution 1: Using a List
Solution 2: Type Annotations and Unit Tests
Solution 3: Keeping a Running Max Variable
Solution 4: Using a List Comprehension with a Guard
Solution 5: Using the filter() Function
Solution 6: Using the map() Function and Summing Booleans
Solution 7: Using Regular Expressions to Find Patterns
Solution 8: A More Complex find_gc() Function
Benchmarking
Going Further
Review
6. Finding the Hamming Distance: Counting Point Mutations
Getting Started
Iterating the Characters of Two Strings
Solutions
Solution 1: Iterating and Counting
Solution 2: Creating a Unit Test
Solution 3: Using the zip() Function
Solution 4: Using the zip_longest() Function
Solution 5: Using a List Comprehension
Solution 6: Using the filter() Function
Solution 7: Using the map() Function with zip_longest()
Solution 8: Using the starmap() and operator.ne() Functions
Going Further
Review
7. Translating mRNA into Protein: More Functional Programming
Getting Started
K-mers and Codons
Translating Codons
Solutions
Solution 1: Using a for Loop
Solution 2: Adding Unit Tests
Solution 3: Another Function and a List Comprehension
Solution 4: Functional Programming with the map(), partial(), and takewhile() Functions
Solution 5: Using Bio.Seq.translate()
Benchmarking
Going Further
Review
8. Find a Motif in DNA: Exploring Sequence Similarity
Getting Started
Finding Subsequences
Solutions
Solution 1: Using the str.find() Method
Solution 2: Using the str.index() Method
Solution 3: A Purely Functional Approach
Solution 4: Using K-mers
Solution 5: Finding Overlapping Patterns Using Regular Expressions
Benchmarking
Going Further
Review
9. Overlap Graphs: Sequence Assembly Using Shared K-mers
Getting Started
Managing Runtime Messages with STDOUT, STDERR, and Logging
Finding Overlaps
Grouping Sequences by the Overlap
Solutions
Solution 1: Using Set Intersections to Find Overlaps
Solution 2: Using a Graph to Find All Paths
Going Further
Review
10. Finding the Longest Shared Subsequence: Finding K-mers, Writing Functions, and Using Binary Search
Getting Started
Finding the Shortest Sequence in a FASTA File
Extracting K-mers from a Sequence
Solutions
Solution 1: Counting Frequencies of K-mers
Solution 2: Speeding Things Up with a Binary Search
Going Further
Review
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
Mastering Python for Bioinformatics
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset