How it works...

The first part of this recipe is pretty familiar. We load in the libraries and use Biostrings to load in our protein sequences. Note that our sequences in the seqs variable are an instance of the XStringSet class.

In Step 2, we can create a basic dot plot using the dotplotg() function. The arguments are the sequences we want to plot. Note that we can't pass the XStringSet objects directly; we need to pass character vectors, so we coerce our sequences into that format with the as.character() function. Running this code gives us the following dot plot:

In Step 3, we elaborate on the basic dot plot by first changing the way a match is considered. With the wsize=7 option, we state that we are looking at seven residues at a time (instead of the default of one), the wstep=5 option tells the plotter to jump five residues each step (instead of one, again), and the nmatch=4 option tells the plotter to mark a window as matching if four of the residues are identical. We then customize the plot by adding a ggplot2 theme to it in the usual ggplot manner and add axis names with the label function. From this, we get the following dot plot. Note how it is different to the first one:

The custom function, make_dot_plot(), defined in Step 4 takes two numbers in variables, i and j, and an XStringSet object in the seqs argument. It then converts the i-th and j-th sequence in the seqs object to characters and stores those in seqi and seqj variables. It also extracts the names of those sequences to namei and namej. Finally, it creates and returns a dot plot using the variables created

To use the function, we need two things; the combinations of sequences to be plotted and a list to hold the results in. In Step 4, the expand.grid() function is used to create a data frame of all possible combinations of sequences by number, which we store in the combinations variable. The plots variable, created with the vector() function, contains a list object with the right number of slots to hold the resultant dot plots.

Step 6 is a loop that iterates over each row of the combination's data frame, extracting the sequence numbers we wish to work with and storing them in the i and j variables. The make_dot_plot() function is then called with i, j, and seqs, and its results stored in the plots list we created.

Finally, in Step 7, we use the cowplot library function, plot_grid(), with our list of plots to make a master plot of all possible combinations that looks like this:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.73.147