How to do it...

Visualizing distributions of peptide hit counts to find thresholds can be done using the following steps:

  1. Load the libraries and data:
library(MSnID)
library(data.table)
library(dplyr)
library(ggplot2)
msnid <- MSnID()
msnid <- read_mzIDs(msnid, file.path(getwd(), "datasets", "ch6", "HeLa_180123_m43_r2_CAM.mzid.gz"))
peptide_info <- as(msnid, "data.table")
  1. Filter out decoy data rows and get a count of every time a peptide appears:
per_peptide_counts <- peptide_info %>% 
filter(isDecoy == FALSE) %>%
group_by(pepSeq) %>%
summarise(count = n() ) %>%
mutate(sample = rep("peptide_counts", length(counts) ) )
  1. Create a violin and jitter plot of the hit counts:
per_peptide_counts %>% 
ggplot() + aes( sample, count) + geom_jitter() + geom_violin() + scale_y_log10()
  1. Create a plot of cumulative hit counts for peptides sorted by hit count:
per_peptide_counts %>%
arrange(count) %>%
mutate(cumulative_hits = cumsum(count), peptide = 1:length(count)) %>%
ggplot() + aes(peptide, cumulative_hits) + geom_line()
  1. Filter out very low and very high peptide hits and then replot them:
filtered_per_peptide_counts <- per_peptide_counts %>%
filter(count >= 5, count <= 2500)

filtered_per_peptide_counts %>%
ggplot() + aes( sample, count) + geom_jitter() + geom_violin() + scale_y_log10()
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.47.82