How it works...

Step 1 is our standard library loading step.

Step 2 is the data loading step. This is a little unusual. Instead of just calling a file-reading function, we must first create and empty the MSnID object and load the data into it. We create msnid with the MSnID() function and then pass it to the read_mzid() function to actually put data into it. 

Step 3 is concerned with extracting the information we are concerned about from the msnid object. We require rows that match actual hits, not decoys, so we access the msnid@psms slot directly, which contains the useful data and subset that retains a row if its value of isDecoy is FALSE. This gives us an object that we save in the real_hits variable. Next, we use real_hits to select a few useful columns from the many in the original object.

Step 4 helps us extract the Uniprot IDs embedded in the accession column field. It is important to note that these values come from the names that are used in the search engine's database. Naturally, this step will vary according to the precise formatting of the database, but the general pattern applies. We have a fairly densely nested set of functions that breaks down like this: the inner, anonymous function, function(x){x[2]}, returns the second element of any vector it is passed. We use lapply() to apply that function to every element in the list returned from strsplit() on the accession column. Finally, as lapply() returns lists, we use unlist() to flatten it to the vector we require. Sometimes, this will generate NAs as there is no Uniprot ID, so we remove them from the vector with subsetting and is.na().

In Step 5, we connect to the Ensembl database package and use the genes() function to get Ensembl genes that match our Uniprot IDs. The vector of Uniprot IDs is passed in the UniprotFilter() function and, with the columns argument, we select the data we wish to get back from the database. This gives us a GRanges object that contains all the information we require in order to build a browser track.

In Step 6, we use the helper function, GRangesForUCSCGenome(), passing it the version of the genome we wish to view—hg38, and then the basic chromosome name, coordinates, and strand information a GRanges object needs. We can use the seqnames(), ranges(), and strand() accessor functions to pull these out of the genes_for_prots object we created previously. The seqnames in UCSC are prefixed with chr, so we use paste to add that to our seqnames data. We also create columns for the gene name and gene ID, preserving that information in our eventual view. We save the resulting object in the track variable.

Finally, in Step 7, we can render the track we created. First, we create a session object that represents a session on UCSC and add the track to it with the session() and track() functions, respectively. We select which of the many peptides to focus on by passing the first peptide just to the view() function, which actually spawns a new web browser window with the data requested. The second argument to view() specifies a zoom level and, by formulating the argument as first_peptide * -5, we get a zoom that will fit five of the requested features.

At the time of writing, this recipe generated the following view. Note that the very top track is our my_peptides track:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.255.5