How it works...

Power analysis in powsimR requires us to do some pre-analysis so that we have estimates for some important parameters. To perform a simulation-based power analysis, we need to estimate the distribution of log fold changes between treatments and the proportion of features that are differentially expressed.

In step 1, we'll get the mean counts for each feature in the two treatments. After loading the expression data using the readRDS() function, we use the rowMeans() function on certain columns to get the mean expression counts of each gene in both the mock and hrcc1 treatments. We can then get the log2 ratio of those (by simply dividing the two vectors and, in the last line, use standard arithmetical operators to work out those that have a log2 fold change greater than 2). Inspecting the final prop_de variable gives the following output:

prop_de
## [1] 0.2001754

So, a proportion of about 0.2 of the features have counts changing by log2 twofold.

Step 2 looks at the distribution of the gene expression ratios. We first remove the non-finite ratios from the log2fc variable. We must do this because, when calculating ratios, we generate Inf values in R; this occurs when the denominator (the mock sample) has zero mean counts. We can remove them using indexing on the vector with the binary vector that comes from is.finite() function. With the Inf values removed, we can plot. First, we do a normal density plot using the density() function, which shows the distribution of ratios. Then, we use the qqnorm() function in the extRemes package, which plots the data against data sampled from an idealized normal distribution with the same mean. A strong, linear correlation indicates a normal distribution in the original data. We can see the output in the following screenshot:

They look pretty log-normally distributed, so we can assume a log-normal distribution. 

The longest step here, step 3, is actually only four lines. We are basically setting up the parameters for the simulation, which requires us to specify a lot of values. The first set, params, which we create with the estimateParam() function needs the data source (countData), the distribution to use (we set Distribution = "NB", which selects the negative binomial); the type of RNAseq experiment—ours is a bulk RNAseq experiment (RNAseq = "bulk"), and normalization strategy—we use the edgeR style TMM (normalization = "TMM"). The second set, desetup, is created with the DESetup() function; in this, we choose the parameters relating to the number of genes for which to simulate differential expression. We set up 1,000 total gene simulations (ngenes) and 25 simulation runs (nsims). We set the proportion to be differentially expressed to that estimated in step 1 in prop_de. We use the vector of fold changes, finite_log2fc, as input for the pLFC parameter. Setting sim.seed is not necessary but will ensure reproducibility between runs. The third line uses the SimSetup() function to combine params and desetup into a single object, sim_opts. Finally, we create a num_replicates vector specifying the number of biological replicates (RNA samples) to simulate.

Step 4 is relatively straightforward: we run the differential expression simulation using the sim_opts parameters created in the previous steps, choosing "edgeR-LRT" as the differential expression method and "TMM" as the normalization. The simulation data is stored in the simDE variable. 

In Step 5, we create an evaluation of the simulation—this analyzes and extracts various statistics. We pass the simDE simulation data to the evaluateDE() function along with values for things pertaining to grouping, filtering, and significance.

Finally, in Step 6, we can plot the evalDE object from Step 5 and see the results of the simulation. We get the following plot in which we can see the different powers at different replicate numbers. Note the x-axis indicates the number of replicate RNA samples used, and the metrics include FDR, False Negative/Positive Rate (FNR/FPR), and TNR/TPR (True Negative/Positive Rate):

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.28.93