How it works...

After loading in the library in Step 1, we set up the URL to the file we want to pull over the internet from http://www.proteomexchange.org/; we're after just one file in accession PXD006247, and we save the URL in the online_file variable. We also create an mzmxl_file variable that points to a non-existent file, PXD006247_mz.xml.gzX, on our local filesystem – this will be the saved name of the downloaded file. The download.file() function actually does the downloading; the first argument is the online source, while the second argument is the place to put the file on the local machine when it downloads. The final argument, internal, is the download method to use. The setting we've chosen should use a system-agnostic downloader that works anywhere, but you can change this to other faster or more system-specific settings if you like. The documentation will explain these options.

In Step 2, we create a design file that describes the experiment. In our small demo, we only have one file, but you can specify many more here. In the first part, we create a dataframe with the columns file, sample, bioRep, techRep, and fraction. We only have one file, so the table only has one row. It looks like this:

file	sample	bioRep	techRep	fraction
`PXD006247_mz.xml.gz`	1	1	1	1

If you had a more complicated experiment, you'd have many more rows describing the sample and bioRep, for example, for each file. We then save this file to disk for use in the next step using write.table() along with the appropriate options. Note that although, for the sake of demonstration, we've created this file programmatically, the file would be equally valid if we'd created it by hand in a spreadsheet program or text editor.

Finally, we set up and run the QC pipeline in Step 3. The main function, msQCpipe(), is the workhorse and needs a few option settings. The spectralist option needs the path to the design file we created so that it knows which files to open and how to treat them. The fasta option requires the file of the target organism protein sequences in fasta format. This allows the QC pipeline to carry out spectral peptide identification using XTandem from the rtandem package. The outdir argument gets the path to a new folder that will hold the numerous report files that will be created. Here, our folder will be called qc_result, and it will be a sub-directory of the current working directory. The arguments enzyme, varmod, and fixmod describe the enzyme used for digest (1 = trypsin), the variable modifications that may be present, and the fixed modifications that will be present on all residues. The arguments tol and itol specify tolerances on peptide mass values and error windows. The cpu argument specifies the compute cores to use on the source machine and mode specifies the sort of run to do.

When the QC pipeline completes, we get a series of reports in the qc_result folder. The qc_report.html file contains the browsable results of QC. The many pages describing the results should allow you to see the extent to which the experiment was a success.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...