GVPPSR Kumar,, A Kumar and AP Sahoo
Animal Biotechnology Division, IVRI, UP, India
For detection of miRNA from NGS data, several software tools have been developed to support the data analysis. These include: miRTRAP; DSAP; miRExpress; mirTools; miRDeep; miRNAkey and mireap; miRanalyzer; Mirena, and so on. Among this software, miRDeep and mireap are considered to be the best for prediction of novel miRNAs from mammalian data sets (Li et al., 2012). Here, we will discuss miRDeep (Friedlander et al., 2008, 2012). The codes and associated annotations have been taken from available guidances available online. In several cases, the explanations are verbatim with the source. The source URLs have been duly cited in this chapter.
miRDeep is a tool that helps in identifying miRNAs from the large pool of sequenced transcripts from a deep sequencing run. A probabilistic model is used to take into account the miRNA biogenesis for scoring fitness, and position the RNA sequence with the secondary structure of the miRNA precursor. miRDeep2 is an overhauled version of the original miRDeep algorithm, with added extensive new packages. The accuracy and sensitivity of miRDeep2 are estimated through its internal statistical controls.
Both the canonical and non‐canonical miRNAs in deep sequencing data can be identified through miRDeep2. The miRNA expression profiling across samples can also be done using this tool. This includes: preprocessing of raw Illumina reads with mapper.pl script; quantification and expression profiling by quantifier.pl script; and miRNA identification by the miRDeep2.pl script.
The reads are processed and mapped to the reference genome using mapper.pl script. This mapper module processes deep sequencing reads and/or maps them to the reference sequence.
The module can process or map data that are in FASTA format, and can also handle sequence space data. It has a number of functions that can be implemented specifically with Illumina data. This entire chapter is explained using the datasets available in the miRDeep2 tutorial: (https://www.mdc‐berlin.de/36105849/en/research/research_teams/systems_biology_of_gene_regulatory_elements/projects/miRDeep/documentation).
The default input file can be in FASTA, seq.txt or qseq.txt formats. For more options, please refer to https://www.mdc‐berlin.de/36105849/en/research/research_teams/systems_biology_of_gene_regulatory_elements/projects/miRDeep/documentation
The output depends on the options used. A *.fasta file containing the processed reads or an *.arf file with mapped reads (or both) can be generated as output. For example, we may say that the user generally wishes to analyze deep sequencing data mapping to a ≈ 6 kb region on C. elegans chromosome II for known and novel miRNA genes (this is as per the mirDeep2 tutorial at the address given previously).
These are as per the miRdeep2 tutorial:
: >./bowtie‐build cel_cluster.fa cel_cluster.
This command generates six files in the bowtie folder. Copy all the index files to the miRDeep2 folder.
Go to the mirdeep2 directory and type the following command:
mapper.pl reads.fa –c –j –k TCGTATGCCGTCTTCTGCTTGT –l 18 –m –p cel_cluster –s reads_collapsed.fa –t reads_collapsed_vs_genome.arf –v
The reads collapsed are those reads that are generated after clipping the adapter sequence. The collapsed reads mapped to the genome are given in the .arf file.
Figure 45.5 shows the reads in reads.fa that were collapsed to collapsedreads.fa. For example, the first read in collapsedreads.fa is obtained after clipping the adaptor sequence of sequence 4 (>nematiode_4) in the reads.fa file. The.arf file is the aligned reads file that shows the place where the reads match exactly in the genome. For example, the collapsed read one exactly matches with reference genome at positions 3060–3081.
Quantification of reads to known mirBase precursors is done using a quantifier.pl script. The deep sequencing reads are mapped to the predefined miRNA precursors by the quantifier module, to determine the expression of the corresponding miRNAs. Initially, the predefined mature miRNA sequences are mapped to the predefined precursors, followed by the mapping of the deep sequencing reads to the precursors.
The command is:
quantifier.pl –p precursors_ref_this_species.fa –m mature_ref_this_species.fa –r reads_collapsed.fa –t cel –y 16_19
The –p option denotes miRNA precursor sequences from miRBase database. The –m option designates miRNA sequences from miRBase database, the –t option designates the name of the species which is being analyzed, and the –y option designates the timestamp.
The output is generated in the form of:
miRNAs_expressed _all_samples_16_19.csv, which gives the read counts of the reference miRNAs in the data in tabular format
pdfs_16_19 – details of miRNA were identified.
expression_16_19. html – presents all the results in html format. This file is present in the expression analyses folder in the mirdeep2 directory
The novel and known miRNA detection can be done using the miRDeep2.pl script. The output from mapper module is used by the miRDeep2 module.
Go to the miRDeep2 directory and type the following command:
>miRDeep2.pl reads_collapsed.fa cel_cluster.fa reads_collapsed_vs_genome.arf mature_ref_this_species.fa mature_ref_other_species.fa precursors_ref_this_ species.fa ‐t C.elegans 2 > report.log
The file “mature_ref_this_species.fa” contains all mature miRNA of C. elegans species, while the “mature_ref_other_species.fa” file contains all mature miRNA of C. briggsae and D. melanogaster species. By using “2>”, all progress output will be piped to the report.log file.
The results.html generated after running the above command contains all the results generated from miRDeep2.pl. In addition, the command will also generate a directory with .pdfs showing the read signatures, structures, and score breakdowns of novel and known miRNAs in the data.
3.15.195.18