How it works...

In Step 1, we do some library loading and add a data loading step. As we mentioned previously, with MSnID, this is a little unusual. Instead of just calling a file reading function, we must first create and empty the MSnID object and load the data into it. We create msnid with the MSnID() function and then pass it to the read_mzid() function to actually put data into it. Next, we use the as() function to convert msnid into a data.table object—a data frame-like object that is optimized for large datasets.

In Step 2, we prepare a plot using the tidyverse packages, dplyr and ggplot. tidyverse packages all work really well in concert as they're centered on working with data frames. The usual way of working is to use the piping operator, %>%, to pass data from one function to another without having to save the interim object. By convention, the result of the upstream function is passed as the first argument of the downstream function, so we don't need to specify it. This results in the construction we have here. We take the peptide_info object and pass it through the %>% operator to the dplyr filter() function, which does its work and passes its result onto the group_by() function and so on. Each function does its work and passes the data on. So, in this pipeline, we use filter() to keep all the rows that are not decoys, and then use group_by(pepSeq) to group the long data.table into subtables according to the value of the pepSeq row effectively getting one table per peptide sequence. The next step uses summarise(), which generates a summary table containing a column called count that contains the result of the n() function, which counts rows in a table, giving us a table with one row per peptide, telling us how many times the peptide appears in the table. It's a good idea to step through the code one function at a time if it isn't clear how these objects are building up. Finally, we use mutate() to add a new column called sample to the table, which simply creates a column of the same length as the current table, fills it with the word peptide_counts, and adds it to the table. The table is saved in a variable called per_peptide_counts.

In Step 3, we pipe the per_peptide_counts data to the ggplot() function, which sets up a ggplot object. These are built-in layers, so we use the + operator to add an aesthetic layer using the aes() function. This usually contains the variables to plot on the x and y axes here, these are sample and count. Then, we use + again to add a geom a layer that defines what a plot should look like. First, we add geom_jitter(), which plots the points, adding a bit of random x and y noise to spread them out a little. We then add another geom, geom_violin(), which gives a violin density plot. Finally, we add a scale layer, converting the scale into a log base 10 scale. The resulting plot looks like this:

In Step 4, we create a cumulative hits plot by piping the per_peptide_counts data to the arrange() function, which sorts a data frame in ascending order by the variable specified (in this case, count). The result is piped to mutate to add a new column called cumulative_hits, which gets the result of the cumsum() function on the count column. We also add a column called peptide, which gets the row number of the table, but also gives us a convenient variable so that we can order the peptides in the plot. We can generate the plot by piping the sorted data directly to ggplot() and adding the aes() function so that peptide is on the x-axis and cumulative_hits is on the y-axis. Then by adding geom_line(), the resulting plot appears as follows:

From the two plots, we can see the spread of hits and assess which thresholds we wish to apply.

With Step 5, we use the filter() function again to retain rows with a value of count over 5 and below 2500 and put that new data into the same plot recipe we made in Step 3. This gives us the following plot, showing the removal of points outside the thresholds:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.175.148