CS Mukhopadhyay and RK Choudhary
School of Animal Biotechnology, GADVASU, Ludhiana
The information on each of the databases listed below has been collected from NCBI. In several cases, the description of the databases will be verbatim to that available in the NCBI pages. The information regarding these databases has been taken from NCBI‐guide (https://www.ncbi.nlm.nih.gov/guide/) and related sites.
SN | Databases | Features |
1 | Assembly | This database maintains and periodically updates organism‐wise information on assembled genomes (WGS) or complete chromosome sequence of prokaryotic and eukaryotic organisms. |
2 | Bio project | This database holds data and information related to a single project or a consortium. It enables users to obtain voluminous data belonging to a project, in one place. The type of records maintained in Bioprojects are Genome sequencing and assembly; Metagenomes; Genetic or RH maps; Targeted locus sequencing; Epigenetics; Phenotype or Genotype and Variation detection, Transcriptome sequencing and expression. |
3 | Biosystems | A data repository of information (list, sequence, structure) regarding biological molecules (genes, proteins, small molecules) and pathways involved in biological systems. This includes data from BioCyc (including its Tier 1 EcoCyc and MetaCyc databases and its tier 2 databases), KEGG, Reactome, the National Cancer Institute’s Pathway Interaction Database, WikiPathways and Gene Ontology (GO). |
4 | Bookshelf | A database of freely accessible electronic books and documents in life science and healthcare. It integrates NCBI resources such as PubMed, Gene, OMIM and Pubchem. |
5 | ClinVar | ClinVar is the public repository of sequence variation, and information about its relationship to human health. ClinVar maintains records on various medical conditions due to genetic aberration(s) collected from a number of distinct sources, including SNOMEDCT, MeSH & OMIM, etc. |
6 | Clone DB | A public database that maintains information (sequence data, map positions, and distributor information) for clones associated with genomics, cDNA and cell‐based libraries belonging to different eukaryotic organisms. |
7 | Biosample | A central repository of biological resources (including tissues, cell lines, experimental organisms) used in different assays. |
8 | Computational resources from NCBI’s structure group | It maintains access and links to resources (databases and tools) developed by the division of Biocomputational Structure Group of NCBI that determines macromolecular structures and identifies conserved domains. This resource also maintains tools for classification of protein, for determining small molecular biological activity and pathways analysis, etc. |
9 | Consensus CDS (CCDS) | A consensual collaboration among NCBI, EBI, University of California at Santa Cruz (UCSC) and Wellcome Trust Sanger Institute (WTSI) to identify and annotate a core set of protein‐coding regions. |
10 | Conserved Domains Database (CDD) | CDD, a protein annotation resource, holds models of well‐annotated multiple sequence alignment about primal domains, as well as the complete peptides. |
11 | Database of Expressed Sequence Tags (dbEST) | This is the EST database that contains short single‐read transcript sequences obtained from GenBank. |
12 | Database of Genome Survey Sequences(dbGSS) | This NCBI database contains comprehensively annotated short, single‐pass reads obtained for genomic sequences (which could be cDNA or non‐coding DNA) obtained from sources such as random survey sequences, clone‐end sequences, artificial chromosomes (BAC/YAC) or cosmids and exon‐ and gene‐trapped sequences. |
13 | Database of Genomic Structural variation (dbVar) | Maintains information regarding large‐scale genomic variation, namely sizeable InDels, translocations and inversions with regard to the association of these variations with phenotypes. |
14 | Database of Genotypes and Phenotypes (dbGaP) | This database archives and distributes the results of studies on the interaction of genotype and phenotype. The information pertains to molecular diagnostics, genome‐wide association studies (GWAS) concerning the association of genotype with non‐clinical traits. The GaP database also offers cloud computing services. |
15 | Database of Major Histocompatibility Complex (dbMHC) | Information on gene and related clinical data associated with Major Histocompatibility Complex (MHC) of human are maintained here. The tool dbMHCms searches for the portrayal for reported short tandem repeats (STRs) belonging to MHC. It has a “Reagent Database” section (reagent data needed to trace DNA typing) and a “Clinical” section (maintains clinical data from anonymous individuals sharing their clinical data in the project). |
16 | Database of Short Genetic Variations (dbSNP) | A public database for obtaining information regarding genetic variation within and across different species. SNP data obtained from several experiments, starting from physical mapping and association studies, pharmacogenomics to evolutionary studies can be submitted to dbSNP. |
17 | Epigenomics | This database holds epigenomic data on a biological sample, and also serves as a tool (as genome browser) for selecting, downloading and viewing multiple sets of epigenomic data. |
18 | GenBank | A public repository of annotated DNA sequences. The International Nucleotide Sequence Database Collaboration maintains the collaborative liaison among the DNA data of NCBI, EMBL and DDBJ. The FTP is updated every two months. |
19 | Gene | This database integrates information on nomenclature, variations and reference sequences (RefSeqs), gene‐maps, molecular‐pathways regarding phenomes. This information is linked to genome‐, phenotype‐, and locus‐specific resources, with regard to highly divergent species. “Gene” can be accessed by querying on any word, restricting the query term to a certain field, or applying filters or properties. |
20 | Gene Expression Omnibus (GEO) Database | A public repository of experimental data generated from microarray experiment and high‐throughput genomic data like next generation sequencing (NGS). |
21 | Gene Expression Omnibus (GEO) Datasets | Stores compiled gene expression DataSets, and original series, samples and platform records in the Gene Expression Omnibus (GEO) repository. The differential expression pattern is collated and displayed along with clustered heatmaps for easy comprehension. |
22 | Gene Expression Omnibus (GEO) Profiles | Maintains the curated gene expression profiles belonging to the Gene Expression Omnibus (GEO) archive. |
23 | GeneReviews | This database, being a part of the GeneTests website, archives peer‐reviewed descriptions (diagnosis, counseling, etc.) of inherited diseases. |
24 | GeneTests | The repository is a knowledge base of diagnosis of the management of inherited diseases and genetic testing. |
25 | Genes and Disease | This database contains the articles related to genetic diseases and the causative genes. |
26 | Genetic Testing Registry (GTR) | This acts as a repository of information on genetic tests, including premises, promises, methodology, validity, utility, challenges, etc. associated with the testing of inherited diseases which are submitted by the test providers voluntarily. |
27 | Genome | This database archives the sequences and related map data from the whole genomes of different organisms (bacteria, archaea, and eukaryota), including the genomes of completely sequenced organisms and not yet complete ones. |
28 | Genome Reference Consortium (GRC) | This international consortium includes the eminent research institutes working on unraveling the genomic information in terms of genome mapping, association studies, genome‐informatics, etc. with an aim to improve the human and mouse genome reference assemblies. |
29 | HIV‐1, Human Protein Interaction Database | This database harbors links to PubMed records on interactions between HIV‐protein and human‐protein vis‐a‐vis to relevant sequences. |
30 | HomoloGene | A tool to identify the possible orthologs by comparing the homologous nucleotide sequences from different species. |
31 | Influenza Virus | Holds the data from the National Institute of Allergy and Infectious Diseases (NIAID), Influenza Genome Sequencing Project and GenBank, and maintains the NCBI Influenza Virus Sequence Database. Another important use of this database is the analysis of flu sequences, which are then submitted to GenBank following annotation. |
32 | Journals in NCBI Databases | A subset of the NLM Catalog database that maintains information on journals cataloged in PubMed and other NCBI database records. |
33 | Medical Subject Headings (MeSH) db | A comprehensive catalog of medical vocabulary used for indexing journal papers and books in the life sciences. The database is used to search for MeSH terminologies, get their definition and pertinent information and strategy building for PubMed search. |
34 | NCBI C++ Toolkit Manual | A public domain library containing system‐independent (mostly) useful libraries, development framework, demos, release notes, etc. |
35 | NCBI Glossary | Contains definitions/portrayal of the tools available at NCBI, explanation of bioinformatic terms and acronyms, etc. |
36 | NCBI Handbook | Includes exhaustive explanatory notes on NCBI databases and software, which can be accessed through NCBI Bookshelf. |
37 | NCBI Help Manual | A collection of Help documents (downloadable) on tools like BLAST, Entrez (search engine), GenBank (databank), PubMed and NLM, etc. |
38 | NCBI Website Search | A search tool provided by NCBI to search documents, newsletters, sample codes and other resources at NCBI. |
39 | National Library of Medicine (NLM) Catalog | An electronic library catalog that enables searching the bibliographic data for around 1.5 million journals, books, software, audiovisuals‐documents, etc. at National Library of Medicine, the largest online library of medical science. |
40 | Nucleotide Database | This maintains a vast repository of nucleotide sequences (gene/transcript/genome data) obtained from sources like GenBank, RefSeq, TPA and PDB. |
41 | Online Mendelian Inheritance in Animals (OMIA) | Textual information and references related to inherited disorders and associated genes in about 200 animal species are cataloged in this database. However, human and mice are not covered. The genetic disorders are linked to genes, and relevant literature (Pubmed) is also linked. |
42 | Online Mendelian Inheritance in Man (OMIM) | This database was developed to supply comprehensive information and reference on Mendelian disorders in a human being. The related genes, the relationship between genotype and disease phenotype are also detailed here. Each entry is linked to multiple genetic databases (gene and protein sequences), literature, genetic tests, mutation databases, etc. |
43 | PopSet | A repository of DNA sequences obtained from the members of a population (composed of individuals from different species or multiple species) to study their evolutionary relationship. One can submit DNA sequences to PopSet via Sequin of NCBI. |
44 | Probe | A public database for maintaining detailed information on reagents used in nucleic acid experiments (RNAi, microarray, genotyping, gene expression, etc.) conducted for a vast array of biomedical research. This helps researchers from different parts of the globe to assess information about useful biochemicals, molecular probes, distributors, etc. |
45 | Protein Clusters | The protclustdb (protein cluster database) maintains the clusters of RefSeq proteins from a variety of sources, including prokaryotic genome and plasmid, viruses, organelles, protozoa, and plants. The database consists of uncurated and manually curated cluster data, and is updated every three months. Cross‐references to related external links (NCBI‐COG, KEGG, InterPro, etc.) are provided for proteins and protein clusters. |
46 | Protein Database | In silico translated amino acid sequences from annotated coding sequences obtained from NCBI RefSeq, GenBank, etc., along with records from external sources of protein sequences, including SwissProt, PDB, PIR, etc. are maintained by this database. The GenPept sequence provides cross‐references to cds (if applicable), PubMed, etc. |
47 | PubChemBioAssay | PubChemBioAssay is one of the three components of NCBI PubChem (a search tool to determine chemical similarity). The PubChemBioAssay is a link to the PubChem compounds that elaborates their bioactivity, including describing the bioassays, screening conditions, etc. |
48 | PubChem Compound | This database depicts the structure of the validated substances of the PubChem substance page of NCBI. This page maintains pre‐clustered compounds based on similarity and links to related databases and information (structure information, references). |
49 | PubChem Substance | Describes the contents of PubChem (structure, cross‐references, etc.) and provides links to biological screening results. |
50 | PubMed | This is one of the most popular databases and repositories of NCBI‐NLM. It maintains biomedical books, as well as a wide range (including bioengineering and chemical sciences) of literature from different sources, including biological journals and MEDLINE. Each record is given a unique PMID. |
51 | PubMed Central (PMC) | Freely available biomedical literature are maintained by PMC. |
52 | PubMed Health | This is an archive of clinical reviews with an aim to cater to the clinicians and end users, so that they have access to research works directed towards biomedical and clinical issues. |
53 | RefSeqGene | A subset of the RefSeq database, where the reference genomic sequences pertaining to human genes are maintained. The curations obtained from locus‐specific data, as well as information available from the genetic testing community, are included. |
53 | Reference Sequence (RefSeq) | This curated, non‐redundant database maintains naturally occurring nucleotide (DNA, RNA) and protein sequences from a large number of species regarding linked records, from genomes to transcripts and translation products. |
54 | Retrovirus Resources | A public resource of research works on retroviruses, this provides certain online tools (genotyping tool using BLAST algorithm; alignment tool for global alignment; annotated maps, etc.). |
55 | SARS‐CoV | Data (regarding sequence, genome sequence alignments of various isolates) and information (publication) on the SARS coronavirus are maintained in this database. |
56 | Sequence Read Archive (SRA) | This database archives short sequences (<1000 bases) produced from high‐throughput sequencing, from massive parallel sequencing platforms, including Roche, Illumina, ABI SOLiD System, etc. |
57 | Structure (Molecular Modeling Database) | Macromolecular structures (from PDB) and visualization tools are available here. The Molecular Modeling DataBase (MMDB) or Entrez Structure DataBase (ESDB) stores the experimentally determined 3D structures of biomolecules. |
58 | Taxonomy | Holds standard nomenclature and scientific classifications of taxa from prokaryotic and eukaryotic origin. The species names are manually compiled for each of the organisms linked to the entries of INSDC (International Nucleotide Sequence Database Collaboration: GenBank + EMBL + DDBJ). |
59 | Third Party Annotation Database | The TPA database aims at maintaining and providing experimental (peer‐annotated from evidence of wet‐lab experiment) or inferential (not from direct wet‐lab experimentation) results. It derives the TPA‐sequence from already‐available GenBank sequence data, and also annotates the sequences. |
60 | Trace Archives | This public repository has three sections: Sequence read archive: to store NGS data from a variety of NGS platforms; Trace Archive: sequencing data from gel or capillary sequencer; Trace assembly archive: assembles the reads of sequencing by pairwise or multiple sequence alignment. |
61 | UniGene | A repository of transcriptome sequencing reads obtained from expressed genes or pseudogenes. Each entry links to all the encoded transcripts from the same locus, and provides information about gene expression and genomic location, complementary DNA, and protein similarity. |
62 | UniGene Library browser | A database that enables users to browse the expressed sequence tags with respect to the organisms, tissue type, and stages of biological development. |
63 | UniSTS | Experimentally derived sequence tagged sites (STS) are archived in this comprehensive database. |
64 | Viral Genomes | Curated virus genome sequences are maintained in this database. |
65 | Virus variation | An organized collection of viral genome sequences with an aim to extend facilities for easy search, retrieval, display, and analysis of virus genomes. It provides pipelines for analysis of viral genomes to assist discovery using the available sequence data. |
18.227.161.173