Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Appendix D: EMBL Databases and Tools: An Overview

S Jain¹, S Panwar² and A Kumar³

¹ Department of Applied Sciences & Humanities, Jai Parkash Mukand Lal Innovative Engineering and Technology Institute, Haryana, India

² Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Uttar Pradesh, India

³ Department of Nutrition Biology, Central University of Haryana, Haryana, India

INTRODUCTION

The European Bioinformatics Institute (EBI) is a constituent body of EMBL and is situated at the Wellcome Trust Genome Campus, Cambridge (UK). It provides all sorts of molecular data, as well as bioinformatics databases, software and tools, at no cost. It has all kinds of life sciences information, and helps in basic and advanced research. The information in the databases and tools described in this chapter is extracted from the EMBL‐guide and related sites. Therefore, in several instances, the information given may be verbatim.

THE EMBL DATABASES

Information on each of the databases has been collected from EMBL. The databases available via dbfetch are listed in Table 1. An overview of each database is also provided, which includes a short description and link to the databases.

TABLE 1 Features and links of various EMBL databases.

S.N.	Databases	Features	Links
1.	EDAM	EMBRACE Data and Methods (EDAM) Ontology.	http://edamontology.sourceforge.net/
2.	ENA Coding	European Nucleotide Archive (ENA) Coding is a database of nucleotide sequences of the CDS (coding sequence) features, as annotated in the ENA Sequence database. ENA Coding records contain the nucleotide sequence of the CDS, along with annotated parent nucleotide, in addition to spontaneously produced annotation.	http://www.ebi.ac.uk/ena/
3.	ENA Geospatial	A database of nucleotide sequences of the ENA Geospatial Sequence.	http://www.ebi.ac.uk/ena/
4.	ENA Non‐coding	A database of nucleotide sequences of the non‐coding RNA features, as annotated in the ENA Sequence database. ENA Non‐coding records contain the nucleotide sequence of the RNA feature, along with annotated parent nucleotide, in addition to spontaneously produced annotation.	http://www.ebi.ac.uk/ena/
5.	ENA Sequence	ENA Sequence (formerly known as EMBL‐Bank) is Europe’s primary nucleotide sequence resource. The main sources of the DNA and RNA sequences in the database are submissions from individual researchers, genome sequencing projects, and patent applications.	http://www.ebi.ac.uk/ena/
6.	ENA Sequence Constructed	The ENA Sequence Constructed database division represents complete genomes and other long sequences constructed from segment entries. Instead of containing the sequence, these entries detail how to assemble the sequence from other ENA Sequence entries.	http://www.ebi.ac.uk/ena/
7.	ENA Sequence Constructed Expanded	Expanded entries include the complete nucleotide sequence of the constructed entry.	http://www.ebi.ac.uk/ena/
8.	ENA/SVA	The ENA Sequence Version Archive (SVA) is a repository of all entries which have ever appeared in the EMBL Nucleotide Sequence Databank (EMBL‐Bank) or ENA Sequence databases.	http://www.ebi.ac.uk/cgi‐bin/sva/sva.pl
9.	Ensembl Gene	Ensembl genome databases for vertebrate species and model organisms. For other species, see below.	http://www.ensembl.org/
10.	Ensembl Genomes Gene	Genome databases for metazoa, plants, fungi, protists and bacteria.	http://www.ensemblgenomes.org/
11.	Ensembl Genomes Transcript	Genome databases for metazoa, plants, fungi, protists and bacteria.	http://www.ensemblgenomes.org/
12.	Ensembl Transcript	Ensembl genome databases for vertebrate species and model organisms. For other species, see Ensembl Genomes instead.	http://www.ensembl.org/
13.	European Patent Office (EPO) Proteins	Patented Protein present in the European Patent Office.	http://www.ebi.ac.uk/patentdata/proteins/
14.	HGNC	HUGO Gene Nomenclature Committee (HGNC) approved gene name and symbol (short‐form abbreviation) for each human gene.	http://genenames.org/
15.	IMGT/HLA	The International ImMunoGeneTics (IMGT) database provides a specialist database for the sequences of the human major histocompatibility complex (HLA), including the official sequences for the WHO Nomenclature Committee For Factors of the HLA System.	http://www.ebi.ac.uk/imgt/hla/
16.	IMGT/LIGM‐DB	A comprehensive database of immunoglobulins and T cell receptors (LIGM) from human and other vertebrates.	http://imgt.cines.fr/cgi‐bin/IMGTlect.jv
17.	InterPro	The InterPro database (Integrated Resource of Protein Domains and Functional Sites) is an integrated documentation resource for protein families, domains, and functional sites. It was originally used to rationalize the complementary efforts of the PROSITE, PRINTS, Pfam and ProDom database projects, but now it also includes the SMART, TIGRFAMs, PIR SuperFamilies and most recently SUPERFAMILY databases.	http://www.ebi.ac.uk/interpro/
18.	IPD‐KIR	A centralized repository for human Killer‐cell Immunoglobulin‐like Receptor (KIR) sequences.	http://www.ebi.ac.uk/ipd/kir/
19.	IPD‐MHC	Sequences of the major histocompatibility complex (MHC) in a number of species.	http://www.ebi.ac.uk/ipd/mhc/
20.	IPRMC	InterPro Matches Complete (IPRMC) for UniProtKB proteins.	http://www.ebi.ac.uk/interpro/
21.	IPRMC UniParc	InterPro Matches Complete (IPRMC) for UniParc proteins.	http://www.ebi.ac.uk/interpro/
22.	JPO Proteins	Protein sequences are appearing in patents from the Japanese Patent Office (JPO).	http://www.ebi.ac.uk/patentdata/proteins/
23.	KIPO Proteins	Patented Protein present in the Korean Intellectual Property Office (KIPO).	http://www.ebi.ac.uk/patentdata/proteins/
24.	MEDLINE	Comprises citations and abstracts records of more than 5000 medically related journals published in the United States and 70 other countries. The files contain over 19 million citations, dating back to the mid‐1940s, and are updated weekly.	http://www.nlm.nih.gov/pubs/factsheets/medline.html
25.	Patent DNA NRL1	Non‐redundant patent nucleotides level 1 (NRL‐1). Nucleotide sequences from patents clustered by 100% sequence identity over the whole length.	http://www.ebi.ac.uk/patentdata/nr/
26.	Patent DNA NRL2	Non‐redundant patent nucleotides level 2 (NRL‐2). Nucleotide sequences from patents clustered by patent family, and then by 100% sequence identity over the whole length.	http://www.ebi.ac.uk/patentdata/nr/
27.	Patent Protein NRL1	Non‐redundant patent proteins level 1. Protein sequences from patents clustered by 100% sequence identity over the whole length.	http://www.ebi.ac.uk/patentdata/nr/
28.	Patent Protein NRL2	Non‐redundant patent proteins level 2. Protein sequences from patents clustered by patent family and then by 100% sequence identity over the whole length.	http://www.ebi.ac.uk/patentdata/nr/
29.	Patent Equivalents	Patent number equivalents (families) and patent classifications for patents containing sequence data. The patent equivalents are obtained from the patent numbers cited in the major sequence databases (e.g., EMBL‐Bank and Patent Proteins), which are then expanded into a set of patent equivalents forming a WIPO Simple Patent Family.	http://www.ebi.ac.uk/patentdata/
30.	PDB	Comprises structure and sequence information of proteins and nucleotides.	http://www.ebi.ac.uk/pdbe/
31.	Reference Sequence project (RefSeq)	All sorts of information on reference sequences of natural molecules.	http://www.ncbi.nlm.nih.gov/refseq/
32.	RefSeq (protein)	All sorts of information on reference sequences of natural molecules.	http://www.ncbi.nlm.nih.gov/refseq/
33.	SGT	Structural Genomics Targets (SGT) is a protein target registration database, providing information on the experimental progress and status of target amino acid sequences selected for structural determination.	http://targetdb.pdb.org/
34.	Taxonomy	Taxonomic classification of organisms for which there are sequences in the INSDC databases (i.e., DDBJ, EMBL‐Bank, and GenBank) and many other biological databases.	http://www.ncbi.nlm.nih.gov/Taxonomy/
35.	Trace Archive	An archive of capillary electrophoresis trace data.	http://www.ebi.ac.uk/ena/
36.	UniParc	Protein sequences retrieval system.	http://www.uniprot.org/
37.	UniProtKB	Curated protein information retrieval system.	http://www.uniprot.org/
38.	The UniProt Reference Clusters UniRef100/UniRef90/UniRef50	Access point for combined resemble sequences. In UniRef100, UniRef90 and UniRef50, no sequence mutual pair identity exceeds > 100%, > 90% or > 50%.	http://www.uniprot.org/
39.	UniProtKB Sequence/Annotation Version Archive (UniSave)	Access point for UniProtKB/Swiss‐Prot and UniProtKB/TrEMBL admitted versions.	http://www.ebi.ac.uk/uniprot/unisave/
40.	United States Patent and Trademark Office (USPTO) Proteins	Patented Protein present in the USPTO.	http://www.ebi.ac.uk/patentdata/proteins/

THE EMBL TOOLS

This is the access and analysis point for numerous data resources through Web Services technologies (Li et al., 2015; Lopez et al., 2014). The program basically works on integration and inter‐operation technology and has been created from Representational state transfer (REST), Simple Object Access Protocol (SOAP) and Web Services Description Language (WSDL).

The details and description of EMBL services are given in Table 2.

TABLE 2 Description of various EMBL tools.

General Services Including data retrieval, access various sequence, and structural databases
S.N.	Service	Description
1.	ArrayExpress	Microarray data searching with ArrayExpress.
2.	ChEBI Web Services	Entry retrieval from the ChEBI database.
3.	ChEMBL Web Services	Retrieval data system.
4.	EB‐eye (SOAP)/(REST)	EBI search engine (EB‐eye).
5.	ENA Browser	Access point for sequence retrieval .
6.	Gene Expression Atlas API	Access point for statistics data over a curated subset of ArrayExpress Archive.
7.	MartService	Searching and retrieving the data through BioMart.
8.	PDBe (REST)	Helps in gathering facts from PDB and EMDB.
9.	PSICQUIC	Information retrieval system for molecular interaction, comprising ChEMBL, Reactome, and IntAct.
10.	Rhea	Access point for manually annotated chemical reactions information.
11.	Universal Protein Resource UniProt.org	Protein sequence information including annotated.
12.	WSDbfetch (REST)/(SOAP)	Identifier entry retrieval system.
Protein Functional Analysis (PFA) Identifying protein‐related information, i.e., sequences, motifs, conserved regions, etc.
	REST/SOAP Service	Description
13.	FingerPRINTScan	Recognizing the proximal matching fingerprints motif.
14.	InterProScan 5	This tool is used for bringing different protein signature recognition methods into one platform or page.
15.	HMMER hmmscan	Access point for Hidden Markov Models (HMMs) database.
16.	PfamScan	PfamScan is used to explore the similar sequences for a query FASTA sequence against a library of Pfam HMM.
17.	Phobius	Prediction of transmembrane topology and signal peptides from the amino acid sequences of protein.
18.	Pratt	Identifying conserved patterns in unaligned protein sequences.
19.	PROSITE Scan	Comparing a protein sequence against the signatures in PROSITE (both patterns and profiles).
20.	RADAR	Repeat identification and alignment system in protein sequences.
Sequence Similarity Search (SSS) Provides the identification of homologous sequences.
	REST/SOAP Service	Description
21.	FASTA	Fast protein or nucleotide comparison access tool.
22.	FASTM	Peptide fragment access point from FASTA.
23.	NCBI BLAST	Nucleotide and protein sequence comparison system.
24.	PSI‐BLAST	Position Specific Iterative BLAST (PSI‐BLAST), guided mode
25.	PSI‐Search	Iterative Smith and Waterman using a PSI‐BLAST strategy
Multiple Sequence Alignment (MSA) Alignment of a set of three or more, protein or nucleotide sequences.
	REST/SOAP Service	Description
26.	Clustal Omega	Sequence alignments tool.
27.	ClustalW2	Global multiple sequence alignment of DNA and protein sequences using ClustalW2.
28.	DbClustal	Global multiple sequence alignment of DNA or protein sequences using anchor regions from BLAST results
29.	Kalign	Sequence alignment system of large sequences.
30.	MAFFT	Sequence alignment using the MAFFT method. Fast, and capable of handling large sequences.
31.	Multiple Sequence Comparison by Log‐Expectation (MUSCLE)	Sequence alignment tool.
32.	MView	Reformat a multiple sequence alignment or create a multiple sequence alignment from a sequence similarity search result (e.g., BLAST or FASTA).
33.	PRANK	Sequence alignment using the PRANK method.
34.	T‐Coffee	Sequence alignment using the T‐Coffee method.
Phylogeny Phylogenetic analysis
	REST/SOAP Service	Description
35.	ClustalW2 Phylogeny	Neighbor‐joining or UPGMA phylogenetic trees access system.
Pairwise Sequence Alignment (PSA) Alignment of two sequences
	REST/SOAP Service	Description
36.	EMBOSS matcher	Waterman–Eggert local alignment using EMBOSS matcher.
37.	EMBOSS needle	Needleman–Wunsch global alignment using EMBOSS needle.
38.	EMBOSS stretcher	Myers and Miller global alignment using EMBOSS stretcher.
39.	EMBOSS water	Smith–Waterman local alignment using EMBOSS water.
40.	GeneWise	Provides comparison of protein and genomic DNA sequence.
41.	lalign	Huang and Miller sim local alignment using lalign.
42.	PromoterWise	Comparison of two DNA sequences, allowing for inversions and translocations.
43.	Wise2DBA	The Wise2 DNA Block Aligner (DBA) aligns two DNA sequences.
RNA RNA Analysis
	REST/SOAP Service	Description
44.	Infernal cmscan	Searching system for CM‐format Rfam database.
45.	MapMi	Accessing mapping and analysis of miRNA sequences.
Sequence Format Conversion Convert between homologous sequences or confirm the formatting of a sequence.
	REST/SOAP Service	Description
46.	EMBOSS seqret	Accessing manipulated sequence entries.
47.	MView	Reformatting of multiple sequence alignment data.
48.	Readseq	Convert biosequences between a selection of common biological sequence formats.
Sequence Statistics Analyze a sequence to determine its properties and use statistics to assign significance.
	REST/SOAP Service	Description
49.	EMBOSS cpgplot	European Molecular Biology Open Software Suite (EMBOSS) cpgplot identifies and plots CpG islands in a nucleotide sequence.
50.	EMBOSS isochore	Plots isochores in DNA sequences.
51.	EMBOSS pepinfo	Plots amino acid properties.
52.	EMBOSS pepstats	Provides calculation of protein properties.
53.	EMBOSS pepwindow	Generates a hydropathy plot for protein.
54.	SAPS	Statistical Analysis of Protein Sequences.
Sequence Translation Translate a coding nucleotide sequence into a protein sequence and vice versa.
	REST/SOAP Service	Description
55.	EMBOSS transeq	Translates the nucleiceotide sequences.
56.	EMBOSS sixpack	Displays DNA sequences with six‐frame translation and ORFs.
57.	EMBOSS backtranseq	Back‐translates the protein sequences.
58.	EMBOSS backtranambig	Back‐translates protein sequences to ambiguous nucleotide sequences.
Structural Analysis Analysis of macromolecular structures.
	REST/SOAP Service	Description
59.	DaliLite	Pairwise structure comparison.
60.	MaxSprout	Provides fast database algorithm for making protein backbone and side chain.
Literature and Ontologies Look‐up ontology terms and navigate ontology relationships.
	Service	Description
61.	BioModels	Access point for mathematical models of biological interest.
62.	PICR	Protein Identifier Cross‐Reference Service.
63.	QuickGO	Gene Ontology (GO) and Gene Ontology Annotation (GOA) databases.
64.	Europe PMC Web Service	Provides searching access from Europe PubMed Central.
65.	WSMIRIAM	Web Services for the Minimal Information Requested In the Annotation of biochemical Models (MIRIAM).
66.	WSOntology Lookup	Search multiple ontologies from a single location.
67.	WSSBO	Web Services for the Systems Biology Ontology (SBO).
68.	WSWhatizit	permits text mining tasks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Appendix D: EMBL Databases and Tools: An Overview

Create new playlist

Sign In

Sign Up

INTRODUCTION

THE EMBL DATABASES

THE EMBL TOOLS

Table of Contents for
Appendix D: EMBL Databases and Tools: An Overview