Appendix B: Important Web Resources for Bioinformatics Databases and Tools

CS Mukhopadhyay and RK Choudhary

School of Animal Biotechnology, GADVASU, Ludhiana

INTRODUCTION

This chapter is a daily companion for bench workers who need to use web‐based tools as well as the databases for various bioinformatics work. The major databases and tools required for basic bioinformatics jobs have been outlined, along with the uniform/universal resource locator (URL). It is not possible to provide an exhaustive list of the links to bioinformatics tools and databases in an appendix of this book. Some of the most well‐known and frequently used sites have been covered here. Users are requested to bookmark the URLs in their systems and also keep abreast with the changes in the URLs of the pages, if applicable.

Site nameUniversal Resource Locator (URL)Description
NCBI, EMBL, DDBJ
NCBI home pagehttp://www.ncbi.nlm.nih.gov/A vast repository and a public database of nucleic acid sequences, literature and genome‐specific resources. It also provides several biocomputational tools for sequence analysis and FTPs for sequence retrieval.
NCBI‐dbVarhttp://www.ncbi.nlm.nih.gov/dbvar/dbVar is a database maintained by NCBI for structural variations at the genomic level.
GenBankhttp://www.ncbi.nlm.nih.gov/genbank/A public repository of nucleotide sequences provided by NCBI.
NCBI Human Genome Browserhttp://www.ncbi.nlm.nih.gov/mapview/map_search.cgi?chr=hum_chr.inf&queryThis map‐viewer displays the human reference genome assembly. It depicts several components of the genome (genes, sequence tagged sites, expressed sequence tags, contigs, etc.) and the experimentally derived maps (BAC component map, cytogenetic and physical map, radiation hybrid map, etc.).
NIH Human Microbiome Projecthttp://www.hmpdacc.org/resources/data_browser.php/The Human Microbiome Project (HMP) is the initiative of NIH, with an aim to explore the microbes in different organs and contents within the organs. The identified microbiome is characterized and associated with healthy and diseased states of human.
Entrezhttp://www.ncbi.nlm.nih.gov/sites/entrez?db=pubmedA search engine (global query search system against cross‐databases) which is used to look for literature (journal articles: review or research category), book, documents in various sections of NCBI (e.g., OMIM, Genome, Structure, etc.). Users can also save the searched results in their NCBI account for referring later.
NCBI GenBank Taxonomy Databasehttp://www.ncbi.nlm.nih.gov/taxonomy/taxonomyhome.html/Provides taxonomical information of an organism (used in molecular biology research work).
EMBL‐EBIhttp://www.ebi.ac.ukThe EBI, a part of EMBL, is an academic research institute located on the Wellcome Trust Genome Campus in Cambridge (UK). It serves as a public repository of molecular data. It also provides free online bioinformatic software and tools.
ENA‐EMBLhttp://www.ebi.ac.uk/embl/This European Nucleotide Archive of EMBL‐Bank is the repository of nucleotide sequences of various types, like NCBI GenBank. The latest release (ENA release 125: http://www.ebi.ac.uk/about/news/service‐news/ena‐release‐125) maintains the annotated sequences of ENA.
DDBJhttp://www.ddbj.nig.ac.jp/This nucleotide databank (DNA Databank of Japan) is similar to ENA‐EMBL and GenBank‐Nucleotide of NCBI. The International Nucleotide Sequence Database Collaboration (INSDC: http://insdc.org) links these three databanks to each other by computerized synchronization.
Protein databases
PDBhttp://www.rcsb.org/pdb/home/home.doThe RCSB‐PDB is a repository of structural information and curated annotation of different types of experimentally determined structures, like protein, nucleic acids or other complex assemblies.
Pfamhttp://pfam.sanger.ac.uk/Information on protein families, characterized by alignment (by HMM‐based algorithm) of amino acid sequences of the same family, annotation, multiple domain architecture analysis, etc. Links to protein structures are also provided.
PIRhttp://pir.georgetown.edu/A centralized resource for information on proteins in terms of sequence, function, resources for protein annotation (PIRSF, iProClass, iProLINK). In 2012, a single database called “UniProt” was created after merging the PIR, Swiss‐Prot and TrEMBL databases.
PROSITEhttp://www.expasy.ch/prosite/This database maintains information on domains, families and functional sites of proteins and the profiles for identifying proteins (based on a collection of rules called pro‐rule). Tools for protein sequence analysis and detection of motifs are also provided by PROSITE.
SWISSPROT‐ TrEMBLhttp://www.expasy.ch/sprot/This is an official database that contains manually curated protein sequences with high‐level annotation. Information on protein structure, post‐translational modifications, etc. is available in this non‐redundant database.
RCSBhttp://home.rcsb.org/The Research Collaboratory for Structural Bioinformatics (RCSB) undertakes research works to decipher the relationship between 3D‐structural features of macromolecules and their functional aspects. RCSB is responsible for citation and annotation of PDB data.
NDBhttp://ndbserver.rutgers.edu/NDB maintains information about the three‐dimensional structure of nucleic acids.
RNA Databases
The RNAdbhttp://research.imb.uq.edu.au/rnadb/This is a popular non‐coding RNA database (RNAdb) of mammals that harbors sequences and annotations for several noncoding RNAs, including microRNAs, snRNAs, and lncRNAs.
Comparative RNA databasehttp://www.rna.ccbb.utexas.edu/This database maintains information about structural and evolutionary perspectives of RNAs, obtained through comparative analysis of RNA sequences.
European rRNA databaseThe related sequences (complete or partial) of small and large sub‐units of ribosomal RNAs (rRNAs) are aligned and displayed in this database, along with secondary structure information.
miRNA Databasehttp://www.mirbase.org/The miRBase is one of the most popular microRNA databases and archives the published miRNA sequences, position of each mature‐miRNA in the respective pre‐miRNA sequences and annotations. The nomenclature of the miRNAs is determined according to some set tenets.
Genome databases
Genomes online database (GOLD)Maintains information about the genome and metagenome sequencing projects operated around the world, plus the associated metadata.
A quick guide to sequenced genomeshttp://www.genomenewsnetwork.org/resources/sequenced_genomes/genome_guide_p1.shtmlDescribes the sequenced organisms, links to the published abstracts and provides the URL for (hyperlinks to) the sequencing centers/institutes.
Completed genomes: Eukaryoteshttp://www.bioinfbook.org/chapt16.htmThe web resources for completed eukaryotic genomes.
KEGGhttp://www.genome.jp/kegg/The Kyoto Encyclopedia of Genes and Genomes (KEGG) is a comprehensive collection of the database to assemble pertinent systems biology information viz. pathways (maps the cellular/organismal functions), complete genomes, chemical substances, drugs and diseases.
Metagenomics
MEGAN4‐MEtaGenome Analyzerhttp://ab.inf.uni‐tuebingen.de/software/megan/A standalone tool for metagenomic analyses of short‐read data.
MG‐RASThttp://metagenomics.anl.gov/An automated analysis platform for metagenomes. Use the Firefox web browser to use this server. The results quantitatively report the microbial populations from the analysis of the metagenomic data.
Terragenomehttp://www.terragenome.org/An international soil metagenome sequencing consortium.
R and PERL programming resources
The Comprehensive R Archive Networkhttp://cran.r‐project.org/bin/windows/base/To download R for the specific operating system.
R and Data Mininghttp://www.rdatamining.com/This website is dedicated to R programming, and gives lots of examples of R code usage.
Bioconductor login pagehttps://stat.ethz.ch/mailman/options/bioconductorAn open source software project to enable development, sharing (codes and packages) of R packages for analysis of genomic data.
R Function Indexhttp://www.math.montana.edu/rweb/rhelp/00index.htmlA list of R‐hyperlinked function names. Each of the functions has been briefly discussed for usage, along with a description and example.
R Tutorialhttp://heather.cs.ucdavis.edu/~matloff/r.old.htmlTutorial for R programming, package usage, etc.
Comprehensive Perl Archive Networkhttp://www.cpan.org/A hub of Perl modules, Perl ports and source.
Perl‐Memehttp://perlmeme.org/start_hereindex.htmlThe user will find standard Perl‐codes, examples and Perl‐meme on this page.
Perl‐Learning siteThis site contains all relevant information about Perl programming language, including books, basic aspect, module, etc.
List of Perl Functionshttp://perldoc.perl.org/5.12.4/index‐functions.htmlCategorizes Perl functions either alphabetically or categorywise.
NGS data analysis related
FASTX Tool Kithttp://hannonlab.cshl.edu/fastx_toolkit/A collection of command line tools for Short‐Reads FASTA/FASTQ files preprocessing.
The Genome Analysis Toolkit (GATK)http://bioops.info/2011/05/gatk‐the‐genome‐analysis‐toolkit/A toolkit (a set of bioinformatics tools) that enables next‐generation sequence data analysis, data quality checking, variant discovery, etc. The server is very fast in executing the codes to analyze the NGS data of genomes from a variety of organisms.
Genome‐wide Complex Trait Analysis (GCTA)http://www.complextraitgenomics.com/software/gcta/A powerful tool to estimate various breeding‐related parameters using genome‐wide SNP data, such as inbreeding coefficient, chromosome‐specific genetic variance, genetic analysis of complex traits to find out the proportion of phenotypic variance explained by genome‐ or chromosome‐wide SNPs.
Burrows–Wheeler Algorithm Downloadhttp://sourceforge.net/projects/bio‐bwa/files/The NGS‐derived sequence‐reads (short and long reads, separately) are aligned to the reference genome using BWA.
SAM Toolshttp://samtools.sourceforge.net/Alignment of SAM‐formatted reads to reference sequence can be manipulated, including sorting, merging, indexing, etc.
Genome2Seqhttp://agbase.msstate.edu/cgi‐bin/tools/genome2seq.cgiUsing the genome coordinates of transcripts from the RNA‐seq data, the transcript sequences are retrieved in a FASTA file.
Primer designing
FastPCRhttp://www.biocentr.helsinki.fi/bi/programs/fastpcr.htmlUsed for designing PCR primers or probe, oligonucleotide assembly and for repeat searching. This program can be downloaded and run in PCs.
Primer3 (version 0.4.0)http://frodo.wi.mit.edu/Freely available online software for designing primers and probe from a DNA sequence. A very popular software package, due to the availability of several parameters to design primers with high specificity and accuracy.
OligoAnalyzer 3.1http://eu.idtdna.com/analyzer/applications/oligoanalyzer/This online tool is provided by IDT for analyzing the properties of the oligos, as well as for predicting the likelihood of self‐ and heterodimer formation by oligos.
IDT Antisense Designhttp://www.idtdna.com/scitools/applications/antisense/antisense.aspxTo synthesize antisense oligos for a specific target sequence of interest.
Oligonucleotide Properties Calculatorhttp://www.basic.northwestern.edu/biotools/oligocalc.htmlA very useful oligonucleotide properties calculator. It displays the reverse complementary sequence, physical properties (length, molecular weight, GC%), Tm, thermodynamic constants, and hairpin and self‐dimer production by a given primer/sequence.
UnaFoldhttp://www.idtdna.com/scitools/applications/unafold/The likelihood of secondary structure formation by the single‐stranded target is checked by this software from IDT (freely available online).
Restriction digestion
RestrictionMapperhttp://www.restrictionmapper.org/Online, freely available tool for mapping restriction endonuclease sites on a DNA sequence.
Webcutter 2.0http://rna.lundberg.gu.se/cutter2/Another RE site detection program (online, free) for linear and circular DNA.
NEB Cutterhttp://tools.neb.com/nebcutter2/indexAn RE site mapper, hosted by New England Biolabs.
Sequence alignment
Dotlethttp://myhits.isb‐sib.ch/util/dotlet/doc/dotlet_about.htmlFree online software used as a tool for diagonal plotting of sequences.
Dotplot(+)http://www.hku.hk/bruhk/gcgdoc/dotplot.htmlUsed to identify the overlapping portions of two sequences and to identify the repeats and inverted repeats in a sequence.
Dotterhttp://sonnhammer.sbc.su.se/dotter.htmlA graphical dotplot program for detailed comparison of two sequences. It runs on MAC, Linux, Sun Solaris and Windows OS.
Clustal Omegahttp://www.ebi.ac.uk/tools/msa/clustalo/The latest form of the Clustal alignment program, it is online and command‐line based. The distinguishing feature of Clustal Omega is its scalability, as several thousands of medium‐ to large‐sized sequences can be aligned simultaneously. It will also make use of multiple processors where present. In addition, the quality of alignments is superior to the previous versions. The algorithm uses seeded guide trees and HMM profile‐profile progressive alignments.
ClustalWhttp://www.ebi.ac.uk/tools/clustalw2/index.htmlA very popular site for pairwise and multiple sequence alignment. It runs in Windows, Linux/Unix and Mac operating systems.
ClustalXhttp://bips.u‐strasbg.fr/en/documentation/clustalx/The latest version (v.2.0) is provided by “Plate‐Forme Bio‐Informatique de Strasbourge”, along with detailed instructions (help) for operating ClustalX. This site also provides online tools (Actin Related Proteins Annotation server, EMBOSS, Gene Ontology Annotation, SAGE experiment parameters, GPAT, etc.), databases (SRS, BAliBase, InPACT) and documentation (tutorials to elucidate the parameters of Clustal, GCG, EMBOSS, Bioinformatics protocols, etc.).
LALIGNhttp://www.ch.embnet.org/software/lalign_form.htmlOnline free tool for finding local alignment between two sequences (provided in stipulated input format, i.e., plain text without header line, Swiss‐Prot ID, TrEMBL ID, EMBL ID, EST ID, etc.).
FASTAhttp://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtmlThis server is hosted by the University of Virginia, USA. It is a repository of online software for sequence (nucleic acid and amino acid) comparison, local and global alignment, Hydropathy plotting and protein secondary structure prediction.
MAFFT version 6http://align.bmr.kyushu‐u.ac.jp/mafft/software/Another useful tool to perform MSA (online or offline) with precise scope to alter or modify the alignment parameters. The other facilities are Jalview depiction of whole alignment, Construction of NJ Tree and downloading the Newick file (*.NWK).
T‐Coffeehttp://www.es.embnet.org/services/molbio/t‐coffee/Tree‐based Consistency Objective Function For alignment Evaluation (T‐Coffee) is another popular multiple sequence alignment program, developed by Cedric Notredame, CRG Centro de Regulacio Genomica (Barcelona). It allows combining of results obtained from several alignment methods. The URL is http://www.ebi.ac.uk/Tools/msa/tcoffee/. The default output format is Clustal, but it also accepts sequences in PIR and FASTA format.
Online books
Online Biology Bookhttp://www.estrellamountain.edu/faculty/farabee/biobk/biobooktoc.htmlAn online free book of biology covering the basic topics, including plant and animal cell, molecular genetics, muscular, reproduction, essential systems of animals, biological diversity, human evolution, etc.
MendelWebhttp://www.mendelweb.org/A website containing resources for genetics studies. The site contains Mendel’s papers, secondary sources of Mendel’s paper, essays and commentary, etc.
The Biology Projecthttp://www.biology.arizona.edu/Useful for learning the basic aspects of genetics and participating in discussions among teachers and the taught.
Molecular Biology Web Bookhttp://www.web‐books.com/mobio/A Web‐book that covers molecular biology topics, including cell biology, structural and functional genetics, biotechnology and bioinformatics.
NCBI Bookshelfhttp://www.ncbi.nlm.nih.gov/booksOnline book repository (life science books) maintained by NCBI.
Bioinformatics and Functional Genomicshttp://www.bioinfbook.org/index.htmlA very useful site that “features a complete bioinformatics teaching curriculum: PowerPoints for an entire course taught at the Johns Hopkins School of Medicine, and 1100 website links organized by chapter in the new textbook, Bioinformatics and Functional Genomics (John Wiley, 2003)”.
Tutorials
Notes on Population Geneticshttp://darwin.eeb.uconn.edu/eeb348/lecture‐notes/notes.htmlVisit this item to get notes on population genetics.
Genetics Education Centrehttp://www.kumc.edu/gec/This website is hosted by University of Kansas Medical Center. It provides a link to Human Genome Project, Genetic Resources (books, videos, curricula), Lesson Plans, Networking, Genetic Conditions, Careers, Glossaries, etc.
Complete PCR Solutionhttp://www.pcrlinks.com/A web guide to PCR, with several links to PCR‐related topics, books and variants in PCR.
Biology Related Internet Siteshttp://lib.berkeley.edu/bios/selected_sites.htmlA link to selected listed biology‐related sites.
Protocols
The Electronic Protocol Bookhttp://www.changbioscience.com/protocols/An online protocol link for molecular biology and bioinformatics works.
Protocol Onlinehttp://www.protocol‐online.org/Protocols for molecular biology works.
Protocols in Cytogeneticshttp://www.biologia.uniba.it/rmc/0‐1a_pagina/2_2_protocols.htmlThe website contains protocols for cytogenetics and molecular genetics.

For more details, the user is requested to visit http://www.bioinformaticssoftwareandtools.co.in/

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.3.153