Chapter 23

Microarray

Grier P. Page and Xiangqin Cui

In this review, we highlight some issues an investigator must consider when thinking about using expression arrays in a clinical trial. We present the material in a series of sections. Section 23.1 highlights previous uses of microarrays in clinical trials. Section 23.2 discusses expression microarrays, whereas other uses for microarrays are discussed in Section 23.3. Generating high-quality microarray data for these applications requires careful consideration of experimental design and conduct. The next sections highlight the steps in the planning and analysis of a microarray study. Section 23.4 discusses defining the objectives of the study, followed by experimental design, data extraction from images, microarray informatics, single-gene analysis, data annotation, analyses beyond a single gene, microarray result validation, and finally conclusions.

23.1 Introduction

Microarrays have been described as “the hottest thing in biology and medicine since the advent of the polymerase chain reaction a decade ago.” (Anonymous, 2001). This technology emerged circa 1996 [1,2] and had its first high-profile uses in 1998 and 1999 [3–5]. What is “hot” is that microarrays allow the simultaneous measurement of mRNA expression levels from thousands of genes from samples. DNA microarray studies have already been published for a variety of species, including human [6], mouse [7], rat [8], and have made major contributions to clinical trials [9,10], basic science, cancer, diabetes, and nutrition to name just a few of the uses. In the future, we envision the impact of microarrays on research increasing as they are increasingly applied in comparative genomic hybridization [11], genotyping [12], phage display [13], tissue arrays [14], tiling arrays [15], and especially clinical trials.

Microarrays have been a part of clinical trials essentially since microarrays were first developed. They have been used to differentiate lymphatic cancers, breast cancer, and small blue cell cancers to name a few. They have been used to identify new disease mechanisms for diseases such as diabetes [16] and have been used to predict response to drugs [17,18] and diagnose diseases [19]. However, many of these studies, especially the early ones, have not yielded reproducible or generalizable results [20], which is because of a lack of understanding of the need for careful experimental design and conduct of microarray studies.

Clinical trials and microarray studies on clinical samples have been a driving factor in the development and maturity of microarray technology. Whereas most microarray studies have been small(ish) basic science experiments, microarray studies on clinical trial samples, although less numerous, have tended to have many samples that have provided sufficient power and information to identify the issues with microarrays that we discuss in the rest of this article. The identification of these issues has allowed them to be controlled for and reduced or eliminated. In addition, regulatory and U.S. Food and Drug Administration (FDA) requirements drove the development of the microarray quality control consortium (MAQC), which establishes the validity and reproducibility of array technology. Now that researchers understand the methods and techniques to run a microarray study to generate valid results, microarrays can provide great insights in clinical trials. Some potential uses include determining why certain people respond or do not respond to a treatment, to understand the mechanisms of action of a drug or treatment, and to identify new drug targets and ultimately personalized medicine.

However, microarrays are only successful with careful design conduct, analysis, and replication. This success is best exemplified by the recent approval by the FDA of the first microarray based device, the MammaPrint (Molecular Profile Institute, Phoenix, AZ) array.

23.1.1 MammaPrint

MammaPrint is the first commercial product approved (February 2007) by the FDA for disease prognosis. This product uses an Agilent-based microarray that contains a 70-gene breast cancer signature to classify unfixed tumors from node-negative women under age 61 as low or high risk for disease recurrence of the disease.

The 70-gene breast cancer signature was initially developed by van’t Veer in a 2002 paper in Nature. The initial study of 78 people identified a signature from a genome-wide expression array [21]. That signature was established in a study of 151 people [22]. A diagnostic array was developed from the genomewide array. This array contains 232 probes that query the 70 signature genes, 289 probes for hybridization and printing quality control, as well as 915 normalization probes [23]. This diagnostic array was validated in a study of 302 patients [24].

The 70 gene profiles were developed in the initial study by identifying the genes most associated with good outcome called the “good prognosis template” and then validated with “leave one out cross validation.” For new samples, the 70 profile genes are correlated to the “good prognosis template” with a cosine correlation. A value higher than 0.4 results in a patient being assigned to the good-profile group, otherwise they are assigned to the poor-profile group.

MammaPrint was approved by the FDA because of several important steps the group took, which are explored in greater detail below. These steps included the use of large sample size, replication samples, a good statistical analysis plan, careful quality control, and the results of the MAQC.

23.2 What is a Microarray?

Usually, a microarray is a solid substrate (usually glass) on which many different sense or antisense (which depends on the technology) cRNAs or cDNAs have been spotted in specific locations, and some technologies attach the probes to beads. Normally, many different cDNAs are spotted onto a single array in a grid pattern, and the cDNAs are taken from genes known to code for poly A+ RNA or from EST sequences. First, RNA from a tissue sample is extracted and labeled with fluorescent dyes or radioactive nucleotides. This labeled RNA is then hybridized to the cRNA or cDNA immobilized on the array. The labeled RNA binds to its complementary sequence approximately proportional to the amount of each mRNA transcript in a sample. The amount of radioactivity of fluorescence can be measured, which allows estimation of the amount of RNA for each transcript in the sample. Alternative uses of microarrays are described in Section 23.3.

23.2.1 Types of Expression Microarrays

A high-quality microarray study requires high-quality microarrays. The manufacture of high-quality spotted arrays (both long oligos and cDNA) is very time- and resource intensive. Failure to generate high-quality arrays can lead to the arrays being no more than an expensive random number generator. The particulars on how to manufacture high-quality arrays are beyond the scope of this review [25].

A variety of different microarray technologies, depending on the species being studied, may be available for use. In general, the technologies can be broken down into three general groups: short oligonucleotides (Affymetrix, NimbleGen), long oligonucleotides (Aglient, Illumina, Amersham), and “spotted” cDNA amplicons (NIA, Stanford, etc.). Each particular technology has its advantages and disadvantages.

The short oligos have certain advantages in that they are manufactured to (usually) high standards. Because the arrays use the sum of several probes, the quantification of RNA levels seems to be robust. In addition, the probe sequences are known. Also, some factors could be of disadvantage; specifically, they are not particularly robust to polymorphisms. This factor can be a problem if one is working in an atypical species for which an array is not available. Short oligo arrays tend to be relatively expensive, with a limited number of species available. However, NimbleGen seems to offer relatively inexpensive custom arrays for any species for which NimbleGen can be provided with sequence information.

Long oligos arrays (50–80 mers) offer advantages in that they are robust to some polymorphisms and can often be used across related species, the sequence is known, the spots can be applied at a known concentration, and probes are single stranded. However, long oligos have a high cost of synthesizing and are nonrenewable, but they are often less expensive than short oligo platforms. The complete genome sequence is required to generate robust (unique) oligos with minimal cross-hybridization. Agilent has a very flexible format that allows for the development of highly customized arrays.

cDNA amplicons or spotted arrays [4] use sets of plasmids of specific cDNAs, which may be full- or partial-length sequences to generate the sequence to be spotted on an array. The cDNA microarrays allow investigators to print any genes from any species for which a clone can be obtained. cDNA and some long oligo array microarrays allow for two samples (usually labeled with Cy3 and Cy5) to be hybridized to a single array at a time. The array’s design can be changed rapidly, and the cost per array is low once the libraries have been established. Also, the cDNAs are a renewable supply. Several disadvantages of this method include a variable amount of DNA spotted in each spot and a high (10–20%) drop-out rates caused by failed PCR, failed spots, and possible contamination and cross-hybridization with homologous sequences. As a result, many cDNA arrays are of poor quality.

Although microarray technology was originally developed to measure gene expression arrays [26], it has been modified for various other types of applications described in Section 23.3.

23.2.2 Microarrays Can Generate Reproducible Results

After a series of high-profile papers [27] indicating that different microarray technologies did not generate reproducible results, the FDA led the MAQC project. The purpose of the MAQC project was to provide an assessment of the reproducibility, precision, and accuracy of a variety of human microarrays

The MAQC is study of seven microarray platforms (Applied Biosystems, Affymetrix, Agilent, Eppendorf, GE Healthcare, Illumina, and NIC Operon array), with each being tested at three to six sites with at least three types of validation (TaqMan, StaRT-PCR, and QuantiGene). Approximately 150 investigators in seven organizations have been involved. In a series of papers in the September 2006 Nature Biotechnology [28–31], authors revealed that although a range of reproducibility was observed across platform types, microarrays, when conducted with high-quality experiment design, generate highly reproducible results both across platforms as well as to alternative technologies such as reverse transcription polymerase chain reaction (RT-PCR). The most reproducible technologies in both intrasite and intersite studies were those from Affymetrix, Illumina, and Agilent. These reports greatly increased the confidence of the scientific community in the reliability of microarrays and also encouraged the FDA to approve microarray-based prognostic tools.

23.3 Other Array Technologies

Although microarrays were initially used for expression quantification, they are applied in the analysis of other types of data. All these types of arrays could be included potentially in a clinical trial.

23.3.1 Genotyping Using Expression Microarrays

Instead of using mRNA as hybridization target, DNA can be used to hybridize the arrays designed to measure gene expression (expression arrays), such as the Affymetrix expression chips that have redundant probe for each gene. Because only one copy is in the genome for most genes, the hybridization difference between two DNA samples will just reflect the amount of DNA used in each sample for most genes. Therefore, redundant probes from the same gene should give similar fold changes across samples unless there is a polymorphism between one probe and the target in a sample. The fold change computed from the probe that contains the polymorphism will be different from those obtained from other probes in the same gene. This mechanism can be used to identify polymorphisms between samples, and the polymorphism revealed by this method is called single-feature polymorphism (SFP) [32–35].

Hybridizations with mRNA as target can also be used to identify DNA polymorphisms and genotype samples. Similarly, the presence of polymorphism between the two compared samples will cause the probe to show different fold change across samples compared with the nearby probes in the same gene [36]. However, this strategy can be affected by some RNA expression properties, such as alternative splicing and RNA degradation.

23.3.2 Splicing Arrays

When introns are removed (splicing) during transcription, different splicing products can be generated from the same gene because of the alternative usage of intron and exons (alternative splicing). Alternative splicing is becoming one of the most important mechanisms in control of gene expression by generating structurally and functionally distinct proteins. In addition, alternatively spliced transcripts may regulate stability and function of corresponding transcripts. It is believed that >50% of all human genes are subjected to alternative splicing [37]. Although the machinery of splicing is well known, how splice sites are selected to generate alternative transcripts remains poorly understood. For most alternative splicing events, their functional significance remains to be determined.

In recent years, many researchers have characterized alternative splicing using microarrays with probes from exons, introns, and intron–exon junctions. The basic idea is that the discordance among multiple probes from the same gene indicates difference in transcripts themselves across samples. For example, skipping of an exon in one sample will cause the probes that hybridize to the skipped exon or the related exon-intron junctions to show different fold change compared with other probes from the same gene. Hu et al. [38] used a custom oligo array of 1600 rat genes with 20 pairs of probes for each gene to survey various rat tissues. A total of 268 genes were detected to have alternative splicing, and 50% of them were confirmed by RT-PCR. Clark et al. [39] monitored the global RNA splicing in mutants of yeast RNA processing factors using exon, intron, and exon-j unction probes. Two of 18 mutants tested were confirmed by RT-PCR. In addition, they could cluster the mutants based on the indexes for intron exclusion and junction formation to infer the function of associated mutant factors. In humans, Johnson et al. [40] analyzed the global alternative splicing of 10,000 multiexon genes in 52 normal tissues, using probes from every exon-exon junction, and estimated that at least 74% of human multiexon genes are alternatively spliced. They validated a random sample of 153 positives using RT-PCR and successfully confirmed 73. Other similar experiments were also conducted in humans to investigate alternative splicing in a tumor cell line [41] and various tissues [42] but on a smaller scale.

23.3.3 Exon Array

Most current designs of short oligo arrays have probes concentrated at the 3′ end of genes because of the better EST support and commonly used reverse transcription methods starting from the 3′ end. For sequenced genomes, the gene structure is annotated. Every exon instead of only the 3′ end can be represented on the array, which is called an exon array. The exon array has a better representation of the genes. It provides not only information on the gene expression level but also an opportunity to detect alternative splicing by examining the relative expression of the exons across samples. If the relative expressions of different exons of the same gene do not agree, then alternative splicing is indicated. It can detect some types of alternative splicing such as exon skipping, but it is not sensitive to some types of alternative splicing such as intron retention. In addition to detecting gene expression and alternative splicing, exon arrays can also be used to detect gene copy number difference through comparative genomic hybridization [43].

23.3.4 Tiling Array—Including Methylation Arrays

Current knowledge of genes and gene expression is mainly based on the study of expressed mRNA and computational analysis of the EST collections. The full genome sequences of several organisms and the advance of microarray technology provide a means, which are also called tiling arrays, to survey the whole genome for gene expression [44]. The tiling arrays contain probes that cover the whole genome (or a few chromosomes) in an overlap, head-to-tail, or with small gaps. In theory, the expression of any region in the genome can be read out from the expression intensity of the probes. Tiling arrays have been developed for several organisms, including human [45,46]. Drosophila [47]. Arabidopsis [48,49]. and rice [50]. The results from the tiling experiments showed that tiling arrays can detect much more expressed regions in the genome than what have been known; however, because of the lack of reproducibility, many false positives may be observed [51,52]. In addition, tiling arrays can be used to survey the genome for copy numbers using DNA instead of mRNA as hybridization target [53]. A similar idea was also used to detect the epigenetic modification of the genome, DNA methylation [54, 55]. and histone acetylation [56]. For detecting DNA methylations, arrays contain probes that correspond to each fragment of the methylation-sensitive restriction enzymes. After digestion with these enzymes, the different fragments from hypermethylated or hypomethylated regions are enriched by specific adapters and PCR schemes. The enriched samples are then hybridized to tiling arrays or arrays that cover just GpC islands in the genome [57].

23.3.5 SNP Chip

A major task in dissecting the genetic factors that affect complex human disease is genotyping each individual in the study population. The most common polymorphism across individuals is the single nucleotide polymorphism (SNP). Special microarrays are specially designed for genotyping SNPs, such as Affymetrix 10k, 100k, and 500k SNP chips and Illumina SNP chips. These chips have probes centered on the SNP site and have perfect match probes for each allele as well as corresponding mismatch probes [58–60]. Each array platform can genotype 10,000 or 500,000 SNP in a single hybridization. Although each hybridization is relatively expensive, the cost per genotype is low [61,62].

23.3.6 ChIP-on-Chip

Microarrays can also be used to characterize the binding of protein to DNA in defining gene regulation elements in the genome. The DNA sequences with proteins bound are enriched by chromatin immuno-precipitation (ChIP) and are then compared with regular DNA samples to detect the enrichment of some DNA elements. These DNA elements are associated with protein binding and tend to be involved in regulation of gene expression and other features related to the binding protein. The microarrays used in ChIP-chip studies are genomic arrays such as tiling arrays. However, because of the large number of arrays that it takes to cover the genome, some special arrays are designed to cover certain regions of the genome, such as the promoter regions and arrays, which use DNA from large probes (several kb or mb BAC clones). However, the resolution of the protein binding position decreases as the probe size increases. For reviews, see References 63–65.

23.3.7 Protein Arrays

So far, most applications of microarrays in biological research are DNA arrays that analyze mRNA expression and DNA genotyping. However, the functional units for most genes are proteins instead of mRNA, which is just a messenger. Researchers have put forth a large effort to increase the throughput for characterizing the assays with protein, such as the proteome technologies. Considerable efforts have been spent in developing protein arrays; however, because of the challenging nature of protein, there has been less success in producing and applying protein arrays. Nonetheless, the research and development of protein arrays is still an active field with the hope of success similar to that of DNA microarrays in the near future [66,67].

Based on the specific use, two major types of protein arrays are available. One type is the analytical array, in which antibodies are arrayed on a solid surface to detect the corresponding proteins in the hybridization soup [68–70]. Another type is more function oriented to detect protein–protein, protein–DNA, and protein–small molecule interaction. For the protein–protein interaction arrays, proteins are arrayed on the surface, and they can interact and bind to their interaction partners in the hybridization solution [71,72]. Protein–DNA interaction arrays are used for detecting the binding sites of some proteins in the genome [73,74]. The protein-small molecule interaction arrays are used to identify substrates for protein [75]. drug targets [76]. and immune response to proteins [77–79].

23.4 Define Objectives of the Study

In this section and Sections 23.5–23.11, we will discuss the steps of a microarray study and what an investigator needs to be aware of before, during, and after starting a clinical trial with microarrays.

The first step of any experiment, microarray or otherwise, is to define a hypothesis. Although it may seem perfectly obvious, the objectives of a study are likely to be met if that study is designed in such a way that it is consistent with meeting those objectives, and a study can only be designed to meet the objectives if the objectives can be articulated clearly prior to initiating the study. It is not uncommon for investigators who have conducted a microarray study to be unable to state the objective(s) of their study clearly (e.g., to identify genes that are differentially expressed in response to a treatment or genes that may be responsible for differential response to treatments). Often, investigators have inadvertently allowed themselves to go down the slippery slope from hypothesis-generating studies to studies that might be termed “objective generating.” By objective generating, we mean studies in which the investigator has initiated a microarray study without a clear conception of the exact objectives of the study in the sole hope that, by some mysterious process, the mass of data from the microarray study will make all things clear. We do not believe that such outcomes are likely.

However, that is not to say that the experimental objectives may not include the generation of new objectives/hypotheses on interesting pathways/genes that may be highlighted as a result of the microarray study or be very broadly defined. Thus, we urge investigators to articulate clearly what they hope to obtain from microarray studies so that the studies can be designed to meet those objectives from the beginning; in other words, researchers should make a hypothesis.

23.5 Experimental Design for Microarray

After stating a hypothesis, the most important step of an experiment is the experimental design. If an experiment is designed well, then the choice of analytical methods will be defined and the analysis plan will follow. Several essential principles must be in mind when designing a microarray experiment. These include avoidance of experimental artifact; elimination of bias via use of a simultaneous control group; randomization and (potentially) blinding; and reduction of sampling error via use of replication, balance design, and (where appropriate) blocking.

23.5.1 Avoidance of Experimental Artifacts

Microarrays and most laboratory techniques are liable to nonbiological sources of variation including day, chip lot, reagent lot, day of extraction, where clinical samples came from, and personnel (post doc effects). In many cases, these sources of variation are larger than the biological variation [80–84]. If nonbiological and biological differences are confounded, then the experiment can be essentially meaningless. Thus, careful consideration and identification of all factors must be undertaken before starting a study. These factors must then be eliminated by making the experimental conduct more homogeneous or controlling by randomization and blocking.

23.5.2 Randomization, Blocking, and Blinding

Although blocking can be used to control for measured or known confounding factor, such as the number of samples that can be run in a day, randomization of varieties/samples/groups and random sampling of populations is very useful for reducing the effect of unmeasured confounding factors [81,85,86], such as difference, in weather and interruptions in sample delivery. Microarray experiments can require multiple levels of randomization and blocking to help minimize unanticipated biases from known factors. For example, if only four samples can be processed in a day and there are two experimental groups, then two samples from each treatment group can be run each day (blocking), but the two samples are randomly selected from all samples in a the experimental group. Proper randomization and blocking can greatly reduce the bias of studies.

Blinding should, of course, be a part of the conduct of any clinical trial, but blinding may also be appropriate in array analysis at the sample collection and processing steps. Unintentional biases can be introduced by individuals collecting samples; for example, margins may be cleaned more carefully in one group compared to another. In addition, more care may be paid to the processing of one treatment group over another. All of these may cause bias, and if possible blinding should be used in microarray experiments.

23.5.3 Replication

Individuals within a population vary, and all measurement tools such as microarrays measure with some error; thus, a single sample cannot be used to make generalizable inferences about a group or population. Replication of microarrays is needed at several levels.

Types of Replication Replication in the context of microarrays can be incorporated at several levels: (R1) gene-to-gene: genes can be spotted multiple times per array; (R2) array-to-array: mRNA samples can be used on multiple arrays and each array is hybridized separately; and (R3) subject-to-subject: mRNA samples can be taken from multiple individuals to account for inherent biological variability. The first two types of replication are referred to as technical replication. The first type measures within-array variation, whereas the second type measures between-array variation. These types of replication are important for assessing the measurement error and reproducibility of microarray experiments and are extremely useful for improving precision of measurements. On the other hand, the third type of replication allows us to assess the biological variation within populations and thereby to make biologically interesting observations. R1 technical replicates cannot substitute for biological replicates (R3). Although R2 technical replicates have a specific role when the cost of samples is far larger than arrays, an experiment cannot be run only with R2 replicates and biologically generalizable results cannot be obtained [87].

Replication, Power, and Sample Size Sample size has a major impact on how confidently genes can be declared either differentially (sensitivity and power) or not differentially (specificity) expressed [80,88]. Sample sizes can be determined in a variety of ways. One way is traditional statistical power analysis programs such as PS, which contains the following: power (1 − beta), significance (alpha), a measure of variation (Standard deviation), and a detectable difference (delta). As an example at 80% power at a Bonferroni corrected significance level α = 0.05 to detect a 1/2 standard deviation (SD) reduction requires a sample size of over 250 per group, which is not normally achievable in microarray experimentations for budgetary reasons. Another approach we believe is more appropriate would be to choose the sample size based on control of the false discovery rate (FDR) [89] and the expected discovery rate (EDR). The FDR is an estimate of the expected proportion of genes declared significant that are in fact not differentially expressed [i.e., that are “false discoveries” [90,91]). The EDR is the expected proportion of genes in which true differences between conditions exist that are found to be significantly different. This approach has been developed and applied in the PowerAtlas [92–94] (www.poweratlas.org). In addition, the PowerAtlas allows an investigator to either upload his or her own pilot data or to choose from among over 1000 public microarray experiments to use as pilot data for estimating sample size.

23.5.4 Practice, Practice, Practice

No experimenter runs every step of a microarray experiment perfectly the first time. A learning curve is observed for all steps, and the learning process is a confounding factor. Training at all steps is necessary from sample collection to RNA processing, hybridization, and analysis. Thus, all the individual steps should be practiced before running an experiment, and new people who will handle samples need to be trained to sufficient standards before they run “real” samples. Resources spent on training are not wasted, and training samples should not be included in a “real” experiment.

23.5.5 Strict Experimental Practices

Because microarray experiments are liable to many nonbiological sources of error, it is critical to conduct microarray studies that follow very strictly defined protocols. For example, know exactly what types of samples are and are not acceptable, what cut of the samples is needed, what protocol will be used to extract samples, what analyses will be used, what is good-quality RNA, and so on before a study is started. Consider a microarray study like a clinical trial in which the researcher must perform a full disclosure of all steps before starting a clinical trial. Deviations from these protocols are strongly discouraged for fear of introducing biases.

23.6 Data Extraction

Once a microarray experiment has been conducted and the image of an array is obtained, several steps must occur to convert the image to analyzable data, and the methods are specific to each technology.

23.6.1 Image Processing from cDNA and Long Oligo Arrays

Image processing consists of three steps: addressing, segmentation, and information extraction.

Gridding/Addressing Typically, a microarray is an ordered array of spots with constant separation between the row and column; grids or spots must be the same throughout the microarray. Addressing is the process of finding the location of spots on the microarray or assigning coordinates to each spot on the microarray, However. the spotting is rarely perfect; variations must be dealt with. Although software usually does a good job, manual review and occasional intervention results in better data, but it is a very time-consuming process.

Segmentation Segmentation is the most important and also the most difficult part of the image analysis. In this process, each image pixel is classified as either signal or background noise. The popular methods of segmentation use fixed circles, adaptive circles, adaptive shapes, or the histogram method. The first two methods provide the means to separate the circular spots from the background by clearly defining the boundaries of the spots. A variety of comparisons of the methods has been published, with no clear winner [95,96].

Information Extraction In the final step of image analysis, the mean and median values of the spot intensities and the background spot intensities are calculated. Usually, correlations between spot intensities, percentage of the spots without any signal, their distribution and signal-to-noise ratio (SNR) for each spot, and variation of the pixel intensities are also calculated. The spot intensities are then measured as the sum of intensities of all the pixels inside the spot.

23.6.2 Image Analysis of Affymetrix GeneChip Microarrays

Affymetrix GeneChips are the most widely used oligonucleotide arrays. Unlike the other systems, the Affymetrix system represents each gene as 11–20 probe pairs. Each probe pair is composed of a 25-base pair perfect match (PM) probe that represents the gene’s known sequence and a mismatch (MM) probe that differs from the PM by the middle base. The expression level is some function of the averages of difference in intensities of PM and MM over the 20 sequences. Several algorithms have been developed for averaging the probe pairs to yield a final quantification. These include Dchip [97], GCRMAEB and GCRMA-MLE [98], MAS5 [99], PDNN [100], and RMA [101,102], all of which have different measurement properties, and it is not vet clear which is best [103].

Other technologies such as Illumina and NimbleGen have their own image analysis steps as well.

23.6.3 Normalization of DNA Data

One of the early and near-universal steps in microarray study is the use of a technique called either normalization or transformation. Normalization has at least two purposes: to adjust microarray data for effects that develop from variation in the technology rather than from biological differences between the RNA samples or between the printed probes [104] and “aiding in the analysis of the data by bending the data nearer the Procrustean bed of the assumptions underlying conventional analyses” [105], which will allow for reliable statistical and biological analyses. The former is really more adjusting for measured covariates such as dye biases, whereas the later is the true meaning of normalization. Either way, a wide variety of methods has been developed for all meanings of normalization including several varieties of linear models [106], loess [104], quantile-quantle [107], log2, and others [108, 109]. Normalization is usually required in cDNA microarray experiments to reduce dye biases. This area still requires active research, and it is not clear which methods are appropriate for each chip and experimental design.

23.7 Microarray Informatics

For many investigators, microarrays will involve the highest data storage, analysis, and informatics hurdle they will face in their research careers. Microarrays generate huge amounts of data, which can make data storage and handling difficult. In addition, data reporting standards are being enforced by many journals for publication [110,111], and the NIH has started to make data sharing mandatory for certain grants.

Microarray experiments generate volumes of data that many biological researchers may not be accustomed to. A single Affymetrix chip will generate about 50 MB of data. After initial processing, each chip will provide thousands to tens of thousands of numbers per array for analysis. After analysis, summary statistics, such as changes in expression and associated significance probabilities, will be available for all genes on the chips. Sorting through significance tests for tens of thousands of genes “manually” and trying to deduce a biological meaning is a Sisyphean task because of the dimensionality of the data and the speed at which new information is generated. These data can be overwhelming. Before an experiment has begun, consideration should be paid to how data are stored, viewed, and interpreted [112].

23.8 Statistical Analysis

Three types of single-gene analyses are typically conducted on microarray data. Class prediction involves building models to predict which group to which samples should be assigned. This method is often used in clinical trials, for example, to develop profiles that predict poor prognosis of cancer [21] or to differentiate among pathologically similar samples [113]. The second set of analyses is class discovery, which involves the unsupervised analysis of data to identify previously unknown relationships between genes or samples. The final type is class differentiation, which usually involves inferential statistical analysis.

23.8.1 Class Prediction Analysis

We use the term “prediction” to define the construction of a model that uses gene expression experiments to classify objects into preexisting known classes, to develop indexes that can serve as a biomarker, or to predict to which class a sample should be assigned [114,115].

Many methods can be used to construct such scores [116–118], and it is not clear which technique is best, but the goal of all methods is to find the best compromise between complexity and simplicity in the model. As the predicted model becomes more and more complex by using more and more sample information, the predicted ability in the sample in hand will increase; however, the sample data contain not only information about the true structure of the data but also “noise” because of sample variation. Thus, great care must be taken in the model building [119–121]. To build models that will predict new samples well, one must build cross-validated models. Cross-validating requires that one have sufficient data to hold some models from the estimation process so that one can subsequently check how well the prediction is using the held-back data. For cross-validation to be accurate, the held-back data used in the cross-validation must have not been used in the selection of the structure of the model used for the prediction or that parameters go into that model [122,123]. This validation has often been violated in microarray research.

23.8.2 Class Discovery Analysis

Since Eisen et al. [4] first applied hierarchical clustering to microarray data analysis in 1998, cluster analysis has emerged as a prevalent tool for the exploration and visualization of microarray data. A variety of cluster methods are available, which include hierarchical and nonhierarchical methods. Among hierarchical methods are agglomerative and divisive methods. For the hierarchical methods, different ways can be used to measure distance between genes, which includes Pearson’s correlation, Euclidian distance, and Kendall’s tau as well as a variety of methods for linking genes based on their distance including average, single, complete, and median. Several nonhierarchical methods exist, including K-nearest neighbors, self-organizing maps, and related techniques such as support vector machine and singular value decomposition. Each method and approach has its own positive and negative aspects that should be evaluated.

Clustering is a commonly used tool for microarray data analysis, but, unlike other statistical methods, no theoretical foundations provide the correct answer. This problem leads directly to several related criticisms of cluster analysis. First, the cluster algorithms are guaranteed to produce clusters from data, no matter what kind of data have been used. Second, different methods can produce drastically different results, and the search for the best choice among them has just begun [124,125]. Third, no valid method is available to establish the number of clusters in nonhierarchical cluster analysis. Therefore, cautions are required for performing-such analysis, and one should avoid over-interpreting the results; however, cluster analysis is good to provide exploratory descriptive analysis and concise displays of complex data.

23.8.3 Class Differentiation Analysis

One of the main tasks in analyzing microarray data is to determine which genes are differentially expressed between two or more groups of samples. This type of analysis is a conventional hypothesis test. For making inference, virtually any statistical method can be used including t-test, analysis of variance, and linear models. A variety of Bayesian methods and information borrowing approaches such as Cyber-t [126] and SAM [90] have been developed. Because of the small sample sizes, it is often useful to employ variance shrinkage-based methods for more robust estimation [127,128].

Adjusting for Multiple Testing Because each microarray can contain thousands of genes, some adjustment for multiple testing is required to avoid many false positive results. One way is to control the family-wise error rate (FWE), which is the probability of wrongly declaring at least one gene as differentially expressed. A Bonferroni correction is a method to adjust P-values from independent tests. Permutation methods can be used to control FWE in presence of nonindependent tests [129]. Another approach to address the problem of multiple testing is the false discovery rate (FDR), which is the proportion of false positives among all genes initially identified as being differentially expressed [89,130]. In addition, a variety of Bayesian and alternative FDR methods have been developed [91].

23.9 Annotation

The gene-by-gene statistical analysis of microarray data is not the end of a study by any stretch of the imagination. The next step is the annotation of the expression data. The amount of information about the functions of genes is beyond what any one person can know. Consequently, it is useful to pull in information on what others have discovered about genes to interpret an expression study fully and correctly. A variety of tools such as array manufacturers’ web sites, KEGG (Kyoto Encyclopedia of Genes and Genomes) [131, 132], Gene Index [133, 134], Entrez Gene, MedMiner [135], DAVID (Database for Annotation, Visualization and Integrated Discovery), and Gene Ontology [136]. Each database and tool has slightly different data, and one should use multiple databases when annotating. Also be aware that databases can be different with respect to the same information.

23.10 Pathway, GO, and Class-Level Analysis Tools

Analysis of microarray experiments should not stop at a single gene, but rather several approaches can be used to get a picture beyond a single gene. These tools are called by a variety of names including pathway analysis, gene class testing, global testing, entrez testing, or GO (Gene Ontology) analysis. The goal of all these tools is to relate the expression data to other attributes such as cellular localization, biological process, molecular function, or a pathway for individual genes or groups of related genes. The most common way to analyze a gene list functionally is to gather information from the literature or from databases that cover the whole genome. In recent years, many tools have been developed to assess the statistical significance of association of a list of genes with GO annotation terms, and new ones are released regularly [137]. Extensive discussion has occurred as to the most appropriate methods for the class-level analysis of microarray data [138–140]. The methods and tools are based on different methodological assumptions. Two key points must be considered: (1) whether the method use gene sampling or subject sampling and (2) whether the method uses competitive or self-contained procedures. The subject sampling methods are preferred, and the competitive versus self-contained debate continues. Gene sampling methods base their calculation of the P-value for the gene set on a distribution in which the gene is the unit of sampling, whereas the subject sampling methods take the subject as the sampling unit. The latter, which is based on the subjects, not the genes, is typically the unit of randomization in a study [141–143].

Competitive tests, which encompass most existing tools, test whether a gene class, defined by a specific GO term or pathway or similar, is overrepresented in the list of genes differentially expressed compared to a reference set of genes. A self-contained test compares the gene set with a fixed standard that does not depend on the measurements of genes outside the gene set. Goeman et al. [144,145], Mansmann and Meister [141], and Tomfohr et al. [143] applied self-contained methods. These methods are also implemented in SAFE and Globaltest.

Another important aspect of ontological analysis regardless of the tool or statistical method is the choice of the reference gene list against which the list of differentially regulated genes is compared. Inappropriate choice of reference genes may lead to false functional characterization of the differentiated gene list. Khatri and Draghici [146] pointed out that only the genes represented on the array should be used as reference list instead of the whole genome as is a common practice. In addition, correct, up-to-date, and complete annotation of genes with GO terms is critical; the competitive and gene sample-based procedures tend to have better and more complete databases. GO allows annotation of genes at different levels of abstraction because of the directed acyclic graph structure of the GO. In this hierarchical structure, each term can have one or more child terms as well as one or more parent terms. For instance, the same gene list is annotated with a more general GO term such as “cell communication” at a higher level of abstraction, whereas the lowest level provides a more specific ontology term such as “intracellular signaling cascade.” It is important to integrate the hierarchical structure of the GO in the analysis because various levels of abstraction usually give different P-values. The large number (hundreds or thousands) of tests performed during ontological analysis may lead to spurious associations just by chance. Correction for multiple testing is a necessary step to take.

Other analyses look beyond single genes, such as coexpression [147], network analysis [148,149], and promoter and transcriptional regulations [150,151].

23.11 Validation of Microarray Experiments

A plethora of factors, which include biological and technical factors, inherent characteristics of different array platforms, and processing and analytical steps, can affect results of a typical microarray experiment [152]. Thus, several journals now require some sort of validation for a paper to be published. Sometimes, it is possible to confirm the outcome without doing any additional laboratory-based analysis. For example, array results can be compared with information available in the literature or in expression databases such as GEO [153]. However, such in silico validation is not always possible or appropriate. Thus, other techniques such as RT-PCR, SAGE [154], and proteomics are used. However, many studies merely conduct technical validation of microarray results. This method may have been appropriate before the result of MAQC established the validity of expression studies. Thus, in our opinion, if microarray studies are well planned, then a valid and technical validation of the array results is not needed, but rather the verification that investigators should pursue should advance their hypotheses rather than arbitrarily technical validation of certain genes.

23.12 Conclusions

When coupled with good experiment design, a high-quality analysis, and thorough interpretation, microarray technology has matured to the point where it can generate incredibly valuable information. Microarrays can be used to provide greater understanding of the disease being studied, to develop profiles to predict response to compounds, and to predict side effects or poor outcomes. In the near future, microarrays may be used determine which treatments a person may best respond to. We hope this article will help investigators in their use of microarrays.

Acknowledgments. The contents here were developed over many years in discussion with many investigators at UAB and around the world including David Allison, Steve Barnes, Lang Chen, Jode Edwards, Gary Gadbury, Issa Coulibaly, Kyoungmi Kim, Tapan Mehta, Prinal Patal, Mahyar Sabripour, Jelai Wang, Hairong Wei, Richard Weindruch, Stanislav Zakharkin, and Kui Zhang. The work could not have been conducted without their thoughts and input. GPP and XQ were supported by NIH Grants AT-100949, AG-020681, and ES-012933 and GPP by NSF Grant 0501890.

References

[1] M. Chee, R. Yang, E. Hubbell, A. Berno, Z. C. Hunag, D. Stern, J. Winkler, D. J. Lockhart, M. S. Morris, and S. P. A. Fodor, Accessing genetic information with high-density DNA arrays. Science 1996; 274: 610–614.

[2] D. J. Lockhart, H. Ding, M. Byrne, M. T. Follettie, M.V. Gallo, M. A. S. Chee, M. Mittmann, C. Wang, M. Kobayashi, H. Horton, and E. L. Brown, Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotech. 1996; 14: 1675–1680.

[3] C.-K. Lee, R. G. Kloop, R. Weindruch, and T. A. Prolla, Gene expression profile of aging and its restriction by caloric restriction. Science 1999; 285: 1390–1393.

[4] M. B. Eisen, P.T. Spellman, P.O. Brown, and D. Botstein, Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Set. USA 1998; 95: 14863–14868.

[5] C. M. Perou, S. S. Jeffrey, R. M. van de, C. A. Rees, M. B. Eisen, D. T. Ross, A. Pergamenschikov, C. F. Williams, S. X. Zhu, J. C. Lee, D. Lashkari, D. Shalon, P. O. Brown, and D. Botstein, Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc. Natl. Acad. Sci. USA 1999; 96: 9212–9217.

[6] M. A. Ginos, G. P. Page, B. S. Michalowicz, K. J. Patel, S. E. Volker, S. E. Pambuccian, F. G. Ondrey, G. L. Adams, and P. M. Gaffney, Identification of a gene expression signature associated with recurrent disease in squamous cell carcinoma of the head and neck. Cancer Res. 2004; 64: 55–63.

[7] Y. Higami, T. D. Pugh, G. P. Page, D.B. Allison, T. A. Prolla, and R. Weindruch, Adipose tissue energy metabolism: altered gene expression profile of mice subjected to long-term caloric restriction. FASEB J. 2003; 8: 415–417.

[8] S. O. Zakharkin, K. Kim, T. Mehta, L. Chen, S. Barnes, K.E. Scheirer, R. S. Parrish, D. B. Allison, and G. P. Page, Sources of variation in Affymetrix microarray experiments. BMC Bioinformat. 2005; 6: 214.

[9] J. C. Lacal, How molecular biology can improve clinical management: the MammaPrint experience. Clin. Transl. Oncol. 2007; 9: 203.

[10] S. Mook, L. J. van’t Veer, Rutgers E.J., Piccart-Gebhart M.J., and F. Cardoso, Individualization of therapy using MammaPrint: from development to the MINDACT Trial. Cancer Genom. Proteom. 2007; 4: 147–155.

[11] J. Zhao, J. Roth, B. Bode-Lesniewska, M. Pfaltz, P.U. Heitz, and P. Komminoth, Combined comparative genomic hybridization and genomic microarray for detection of gene amplifications in pulmonary artery intimal sarcomas and adrenocortical tumors. Genes Chromos. Cancer 2002; 34: 48–57.

[12] K. L. Gunderson, F. J. Steemers, G. Lee, L. G. Mendoza, and M., S. Chee, A genome-wide scalable SNP genotyping assay using microarray technology. Nat. Genet. 2005; 37: 549–554.

[13] L. Cekaite, O. Haug, O. Myklebost, M. Aldrin, B. Ostenstad, M. Holden, A. Frigessi, E. Hovig, and M. Sioud, Analysis of the humoral immune response to immunoselected phage-displayed peptides by a microarray-based method. Proteomics 2004; 4: 2572–2582.

[14] C. Gulmann, D. Butler, E. Kay, A. Grace, and M. Leader, Biopsy of a biopsy; validation of immunoprofiling in gastric cancer biopsy tissue microarrays. Histopathology 2003; 42: 70–6.

[15] T. C. Mockler, S. Chan, A, Sundaresan, H. Chen, S. E. Jacobsen, and J. R. Ecker, Applications of DNA tiling arrays for whole-genome analysis. Genomics 2005; 85: 1–15.

[16] V. K. Mootha, C. M. Lindgren, K. F. Eriksson, A. Subramanian, S. Sihag, J. Lehar, P. Puigserver, E. Carlsson, M. Ridderstrale, E. Laurila, N. Houstis, M. J. Daly, N. Patterson. J. P. Mesirov, T. R. Golub, P. Tamayo, B. Spiegelman, E. S. Lander, J. N. Hirschhorn, D. Altshuler, and L. C. Groop, PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 2003; 34: 267–273.

[17] L. Cabusora, E. Sutton, A. Fulmer, and C. V. Forst, Differential network expression during drug and stress response. Bioinformatics 2005; 21: 2898–2905.

[18] J. M. Naciff, M. L. Jump, S. M. Torontali, G. J. Carr, J. P. Tiesman, G. J. Overmann, and G. P. Daston, Gene expression profile induced by 17alpha-ethynyl estradiol, bisphenol A, and genistein in the developing female reproductive system of the rat. Toxicol. Sci. 2002; 68: 184–199.

[19] Y. Tang, D. L. Gilbert, T. A. Glauser, A. D. Hershey, and F. R. Sharp, Blood gene expression profiling of neurologic diseases: a pilot microarray study. Arch. Neurol. 2005; 62: 210–215.

[20] J. P. Ioannidis, Microarrays and molecular research: noise discovery? Lancet 2005; 365: 454–455.

[21] L. J. van’t Veer, H. Dai, M. J. van de Vijver, Y. D. He, A. A. Hart, M. Mao, H. L. Peterse, K. K. van der, M. J. Marton, A. T. Witteveen, G. J. Schreiber, R. M. Kerkhoven, C. Roberts, P. S. Linsley, R. Bernards, and S. H. Friend, Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002; 415: 530–536.

[22] M. J. van de Vijver, Y. D. He, L. J. van’t Veer, H. Dai, A. A. Hart, D. W. Voskuil, G. J. Schreiber, J. L. Peterse, C. Roberts, M. J, Marton, M. Parrish, D. Atsma, A. Witteveen, A. Glas, L. Delahaye, T. van der Velde, H. Bartelink, S. Rodenhuis, E. T. Rutgers, S. H. Friend, and R. Bernards, A gene-expression signature as a predictor of survival in breast cancer. N. Engl. J. Med. 2002; 347: 1999–2009.

[23] Glas A. M., A. Floore, L. J. Delahaye, A. T. Witteveen, R. C. Pover, N. Bakx, J. S. Lahti-Domenici, T. J. Bruinsma, M. O. Warmoes, R. Bernards, L. F. Wessels, and L. J. van’t Veer, Converting a breast cancer microarray signature into a high-throughput diagnostic test. BMC Genom. 2006; 7: 278.

[24] M. Buyse, S. Loi, L. van’t Veer, G. Viale, M. Delorenzi, A. M. Glas, M. S. d’Assignies, J. Bergh, R. Lidereau, P. Ellis, A. Harris, J. Bogaerts, P. Therasse, A. Floore, M. Amakrane, F. Piette, E. Rutgers, C. Sotiriou, F. Cardoso, and M. J. Piccart, Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J. Natl. Cancer Inst. 2006; 98: 1183–1192.

[25] W. Zhang, I. Shmulevich, and J. Astola, Microarray Quality Control. 2004. John Wiley & sons, Inc., Hoboken, NJ.

[26] M. Schena, D. Shalon, R. W. Davis, and P. O. Brown, Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995; 270: 467–470.

[27] P. K. Tan, T. J. Downey, E, L. Spitznagel Jr, P. Xu, D. Fu, D. S. Dimitrov, R. A. Lempicki, B. M. Raaka, and M. C. Cam, Evaluation of gene expression measurements from commercial microarray platforms. Nucleic Acids Res. 2003; 31: 5676–5684.

[28] T. A. Patterson, E. K. Lobenhofer, S. B. Fulmer-Smentek, P. J. Collins, T. M. Chu, W. Bao, H. Fang, E. S. Kawasaki, J. Hager, I. R. Tikhonova, S. J. Walker, L. Zhang, P. Hurban, F. de Longueville, J. C. Fuscoe, W. Tong, L. Shi, and R. D. Wolfinger, Performance comparison of one-color and two-color platforms within the Microarray Quality Control (MAQC) project. Nat. Biotechnol. 2006; 24: 1140–1150.

[29] L. Shi, L. H. Reid, W, D. Jones, R. Shippy, J. A. Warrington, S. C. Baker, P. J. Collins, F. de Longueville, E. S. Kawasaki, and K. Y. Lee, The Microarray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat. Biotechnol. 2006; 24: 1151–1161.

[30] R. Shippy, S. Fulmer-Smentek, R. V. Jensen, W. D. Jones, P. K. Wolber, C. D. Johnson, P. S. Pine, C. Boysen, X. Guo, E. Chudin, et al., Using RNA sample titrations to assess microarray platform performance and normalization techniques. Nat. Biotechnol. 2006; 24: 1123–1131.

[31] W. Tong, A. B. Lucas, R. Shippy, X. Fan, H. Fang, H. Hong, M. S. Orr, T. M. Chu, X. Guo, P. J. Collins, Y. A. Sun, S. J. Wang, W. Bao, R. D. Wolfinger, S. Shchegrova, L. Guo, J. A. Warrington, and L. Shi, Evaluation of external RNA controls for the assessment of microarray performance. Nat. Biotechnol. 2006; 24: 1132–1139.

[32] J. O. Borevitz, D. Liang, D. Plouffe, H. S. Chang, T. Zhu, D. Weigel, C. C. Berry, E. Winzeler, and J. Chory, Large-scale identification of single-feature polymorphisms in complex genomes. Genome Res. 2003; 13: 513–523.

[33] X. Cui, J. Xu, R. Asghar, P. Condamine, J. T. Svensson, S. Wanamaker, N. Stein, M. Roose, and T. J. Close, Detecting single-feature polymorphisms using oligonucleotide arrays and robustified projection pursuit. Bioinformatics 2005; 21: 3852–3858.

[34] N. Rostoks, J. Borevitz, P, Hedley, J. Russell, S. Mudie, J. Morris, L. Cardle, D. Marshall, and R. Waugh, Single-feature polymorphism discovery in the barley transcriptome. Genome Biol. 2005; 6: R54.

[35] E. A. Winzeler, C. I. Castillo-Davis, G. Oshiro, D. Liang, D. R. Richards, Y. Zhou, and D. L. Hartl, Genetic diversity in yeast assessed with whole-genome oligonucleotide arrays. Genetics 2003; 163: 79–89.

[36] J. Ronald, J. M. Akey, J. Whittle, E. N. Smith, G. Yvert, and L. Kruglyak, Simultaneous genotyping, gene-expression measurement, and detection of allele-specific expression with oligonucleotide arrays. Genome Res. 2005; 15: 284–291.

[37] P. A. Sharp, The discovery of split genes and RNA splicing. Trends Biochem. Sci. 2005; 30: 279–281.

[38] G. K. Hu, S. J. Madore, B. Moldover, T. Jatkoe, D. Balaban, J. Thomas, and Y. Wang, Predicting splice variant from DNA chip expression data. Genome Res. 2001; 11: 1237–1245.

[39] T. A. Clark, C. W. Sugnet, M. Ares Jr., Genomewide analysis of mRNA processing in yeast using splicing-specific microarrays. Science 2002; 296: 907–910.

[40] J. M. Johnson, J. Castle, P. Garrett-Engele, Z. Kan, P. M. Loerch, C. D. Armour, R. Santos, E. E. Schadt, R. Stoughton, and D. D. Shoemaker, Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science 2003; 302: 2141–2144.

[41] A. Relogio, C. Ben Dov, M. Baum, M. Ruggiu, C. Gemund, V. Benes, R. B. Darnell, and J. Valcarcel, Alternative splicing microarrays reveal functional expression of neuron-specific regulators in Hodgkin lymphoma cells. J. Biol. Chem. 2005; 280: 4779–4784.

[42] K. Le, K. Mitsouras, M. Roy, Q. Wang, Q. Xu, S. F. Nelson, and C. Lee, Detecting tissue-specific regulation of alternative splicing as a qualitative change in microarray data. Nucleic Acids Res. 2004; 32:e180.

[43] P. Dhami, A. J. Coffey, S. Abbs, J. R. Vermeesch, J. P. Dumanski, K. J. Woodward, R. M. Andrews, C. Langford, and D. Vetrie, Exon array CGH: detection of copy-number changes at the resolution of individual exons in the human genome. Am. J. Hum. Genet. 2005; 76: 750–762.

[44] T. C. Mockler, S. Chan, A. Sundaresan, H. Chen, S. E. Jacobsen, and J. R. Ecker, Applications of DNA tiling arrays for whole-genome analysis. Genomics 2005; 85: 1–15.

[45] P. Kapranov, S. E. Cawley, J. Drenkow, S. Bekiranov, R. L. Strausberg, S. P. Fodor, and T. R. Gingeras, Large-scale transcriptional activity in chromosomes 21 and 22. Science 2002; 296: 916–919.

[46] D. Kampa, J. Cheng, P. Kapranov, M. Yamanaka, S. Brubaker, S. Cawley, J. Drenkow, A. Piccolboni, S. Bekiranov, G. Helt, H. Tammana, and T. R. Gingeras, Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res. 2004; 14: 331–342.

[47] M. Hild, B. Beckmann, S. A. Haas, B. Koch, V. Solovyev, C. Busold, K. Fellenberg, M. Boutros, M. Vingron, F. Sauer, J. D. Hoheisel, and R. Paro, An integrated gene annotation and transcriptional profiling approach towards the full gene content of the Drosophila genome. Genome Biol. 2003; 5:R3.

[48] K. Yamada, J. Lim, J. M. Dale, H. Chen, P. Shinn, C. J. Palm, A. M. Southwick, H. C. Wu, C. Kim, M. Nguyen, et al., Empirical analysis of transcriptional activity in the arabidopsis genome. Science 2003; 302: 842–846.

[49] V. Stole, M. P. Samanta, W. Tongprasit, H. Sethi, S. Liang, D. C. Nelson, A. Hegeman, C. Nelson, D. Rancour, S. Bednarek, E. L. Ulrich, Q. Zhao, R. L. Wrobel, C. S. Newman, B. G. Fox, G. N. Phillips Jr, J. L. Markley, and M. R. Sussman, Identification of transcribed sequences in Arabidopsis thaliana by using high-resolution genome tiling arrays. Proc. Natl. Acad. Sci. USA 2005; 102: 4453–4458.

[50] L. Li, X. Wang, V. Stole, X. Li, D. Zhang, N. Su, W. Tongprasit, S. Li, Z. Cheng, J. Wang, and X. W. Deng, Genome-wide transcription analyses in rice using tiling microarrays. Nat. Genet. 2006; 38: 124–129.

[51] J. M. Johnson, S. Edwards, D. Shoemaker, and E. E. Schadt, Dark matter in the genome: evidence of widespread transcription detected by microarray tiling experiments. Trends Genet. 2005; 21: 93–102.

[52] T. E. Royce, J. S. Rozowsky, P. Bertone, M. Samanta, V. Stole, S. Weissman, M. Snyder, and M. Gerstein, Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping. Trends Genet. 2005; 21: 466–475.

[53] A. E. Urban, J. O. Korbel, R. Selzer. T. Richmond, A. Hacker, G. V. Popescu, J. F. Cubells, R. Green, B. S. Emanuel, M. B. Gerstein, S. M. Weissman, and M. Snyder, High-resolution mapping of DNA copy alterations in human chromosome 22 using high-density tiling oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 2006; 103: 4534–4539.

[54] A. Schumacher, P. Kapranov, Z. Kaminsky, J. Flanagan, A. Assadzadeh, P. Yau, C. Virtanen, N. Winegarden, J. Cheng, T. Gingeras, and A. Petronis, Microarray-based DNA methylation profiling: technology and applications. Nucleic Acids Res. 2006; 34: 528–542.

[55] X. Zhang, J. Yazaki, A. Sundaresan, S. Cokus, S. W. L. Chan, H. Chen, I. R. Henderson, P. Shinn, M. Pellegrini, S. E. Jacobsen, and J. J S. Ecker, Genome-wide high-resolution mapping and functional analysis of DNA methylation in arabidopsis. Cell 2006; 126: 1189–1201.

[56] C. L. Liu, T. Kaplan, M. Kim, S. Buratowski, S. L. Schreiber, N. Friedman, and O. J. Rando, Single-nucleosome mapping of histone modifications in S. cerevisiae. PLoS Biol. 2005; 3: e328.

[57] A. Schumacher, P. Kapranov, Z. Kaminsky, J. Flanagan, A. Assadzadeh, P. Yau, C. Virtanen, N. Winegarden, J. Cheng, T. Gingeras, and A. Petronis, Microarray-based DNA methylation profiling: technology and applications. Nucleic Acids Res. 2006; 34: 528–542.

[58] G. C. Kennedy, H. Matsuzaki, S. Dong, W. Liu, J. Huang, G. Liu, X. Su, M. Cao, W. Chen, J. Zhang, et al., Large-scale genotyping of complex DNA. Nat. Biotech. 2003; 21: 1233–1237.

[59] G. C. Kennedy, H. Matsuzaki, S. Dong, W. Liu, J. Huang, G. Liu, X. Su, M. Cao, W. Chen, J. Zhang, et al., Large-scale genotyping of complex DNA. Nat. Biotech. 2003; 21: 1233–1237.

[60] H. Matsuzaki, S. Dong, H. Loi, X. Di, G. Liu, E. Hubbell, J. Law, T. Berntsen, M. Chadha, H. Hui, et al., Genotyping over 100,000 SNPs on a pair of oligonucleotide arrays. Nat. Methods 2004; 1: 109–111.

[61] S. John, N. Shephard, G. Liu, E. Zeggini, M. Cao, W. Chen, N. Vasavda, T. Mills, A. Barton, A. Hinks, S. Eyre, et al., Whole-genome scan, in a complex disease, using 11,245 single-nucleotide polymorphisms: comparison with microsatellites. Am. J. Hum. Genet. 2004; 75: 54–64.

[62] C. I. Amos, W. V. Chen, A. Lee, W. Li, M. Kern, R. Lundsten, F. Batliwalla, M. Wener, E. Remmers, D. A. Kastner, L. A. Criswell, M. F. Seldin, and P. K. Gregersen, High-density SNP analysis of 642 Caucasian families with rheumatoid arthritis identifies two new linkage regions on 11p12 and 2q33. Genes Immun. 2006; 7: 277–286.

[63] Buck MJ, Lieb JD. ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics 2004; 83: 349–360.

[64] J. Wu, L. T. Smith, C. Plass, and T. H. M. Huang, ChIP-chip comes of age for genome-wide functional analysis. Cancer Res. 2006; 66: 6899–6902.

[65] M. L. Bulyk, DNA microarray technologies for measuring protein-DNA interactions. Curr. Opin. Biotechnol. 2006; 17: 422–430.

[66] C. S. Chen and H. Zhu, Protein microarrays. Biotechniques 2006; 40: 423, 425, 427.

[67] P. Bertone and M. Snyder Advances in functional protein microarray technology. FEBS J. 2005; 272: 5400–5411.

[68] B. B. Haab, M. J. Dunham, and P. O. Brown, Protein microarrays for highly parallel detection and quantitation of specific proteins and antibodies in complex solutions. Genome Biol. 2001; 2:RESEARCH0004.

[69] A. Sreekumar, M. K. Nyati, S. Varambally, T. R. Barrette, D. Ghosh, T. S. Lawrence, and A. M. Chinnaiyan, Profiling of cancer cells using protein microarrays: discovery of novel radiation-regulated proteins. Cancer Res. 2001; 61: 7585–7593.

[70] B. Schweitzer, S. Roberts, B. Grimwade. W. Shao, M. Wang, Q. Fu, Q. Shu, I. Laroche, Z. Zhou, V. T. Tchernev, J. Christiansen, M. Velleca, and S. F. Kingsmore, Multiplexed protein profiling on microarrays by rolling-circle amplification. Nat. Biotechnol. 2002; 20: 359–365.

[71] H. Zhu, M. Bilgin, R. Bangham, D. Hall, A. Casamayor, P. Bertone, N. Lan, R. Jansen, S. Bidlingmaier, T. Houfek, et al., Global analysis of protein activities using proteome chips. Science 2001; 293: 2101–2105.

[72] M. Arifuzzaman, M. Maeda, A. Itoh, K. Nishikata, C. Takita, R. Saito, T. Ara, K. Nakahigashi, H. C. Huang, A. Hirai, et al., Large-scale identification of protein-protein interaction of Escherichia coli K-12. Genome Res. 2006; 16: 686–691.

[73] S. W. Ho, G. Jona, C. T. L. Chen, M. Johnston, and M. Snyder. Linking DNA-binding proteins to their recognition sequences by using protein microarrays. Proc. Natl. Acad. Sci. USA 2006; 103: 9940–9945.

[74] D. A. Hall, H. Zhu, X. Zhu, T. Royce, M. Gerstein, and M. Snyder, Regulation of gene expression by a metabolic enzyme. Science 2004; 306: 482–484.

[75] T. Feilner, C. Hultschig, J. Lee, S. Meyer, R. G. H. Immink, A. Koenig, A. Possling, H. Seitz, A. Beveridge, D. Scheel, et al., High throughput identification of potential arabidopsis mitogen-activated protein kinase substrates. Molec. Cell. Proteom. 2005; 4: 1558–1568.

[76] H. Du, M. Wu, W. Yang, G. Yuan, Y. Sun, Y. Lu. S. Zhao, Q. Du, J. Wang, S. Yang, et al., Development of miniaturized competitive immunoassays on a protein chip as a screening tool for drugs. Clin. Chem. 2005; 51: 368–375.

[77] A. Lueking, O. Huber, C. Wirths, K. Schulte, K. M. Stieler, U. Blume-Peytavi, A. Kowald, K. Hensel-Wiegel, R. Tauber, H, Lehrach, et al., Profiling of alopecia areata autoantigens based on protein microarray technology. Molec. Cell. Proteom. 2005; 4: 1382–1390.

[78] W. H. Robinson, C. DiGennaro, W. Hueber, B. B. Haab, M. Kamachi, E. J. Dean, S. Fournel, D. Fong, M. C. Genovese, H. E. de Vegvar, et al., Autoantigen microarrays for multiplex characterization of autoantibody responses. Nat. Med. 2002; 8: 295–301.

[79] A. Lueking, A. Possling, O. Huber, A. Beveridge, M. Horn, H. Eickhoff, J. Schuchardt, H. Lehrach, and D. J. Cahill, A nonredundant human protein chip for antibody screening and serum profiling. Molec. Cell. Proteom. 2003; 2: 1342–1349.

[80] M. T. Lee, F. C. Kuo, G. A. Whitmore, and J. Sklar, Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. USA 2000; 97: 9834–9839.

[81] M. K. Kerr and G. A. Churchill, Statistical design and the analysis of gene expression microarray data. Genet. Res. 2001; 77: 123–128.

[82] K. Mimics, Microarrays in brain research: the good, the bad and the ugly. Nat. Rev. Neurosci. 2001; 2: 444–447.

[83] K. R. Coombes, W. E. Highsmith, T. A. Krogmann, K. A. Baggerly, D. N. Stivers, and L. V. Abruzzo, Identifying and quantifying sources of variation in microarray data using high-density cDNA membrane arrays. J. Comput. Biol. 2002; 9: 655–669.

[84] Y. Woo, J. Affourtit, S. Daigle, A. Viale, K. Johnson, J. Naggert, and G. Churchill, A comparison of cDNA, oligonucleotide, and Affymetrix GeneChip gene expression microarray platforms. J. Biomol. Tech. 2004; 15: 276–284.

[85] D. Rubin, Practical implications of modes of statistical inference for causal effects and the critical role of the assignment mechanism. Biometrics 1991; 47: 1213–1234.

[86] M. K. Kerr and G. A. Churchill, Experimental design for gene expression microarrays. Biostatistics 2001; 2: 183-201.

[87] M. F. Oleksiak and G. A. Churchill, and D. L. Crawford, Variation in gene expression within and among natural populations. Nat. Genet. 2002; 32: 261–266.

[88] D. B. Allison and C. S. Coffey, Two-stage testing in microarray analysis: what is gained? J. Gerontol. A Biol. Sci. Med. Sci. 2002; 57:B189–B192.

[89] Y. Benjamini and Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Stat. Soc. Series B 1995; 57: 289–300.

[90] V. G. Tusher, R. Tibshirani, and G. Chu, Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 2001; 98: 5116–5121.

[91] D. Allison, G. Gadbury, M. Heo, J. R. Fernandez, C. K. Lee, T. A. Prolla, and R. Weindruch, A mixture model approach for the analysis of microarray gene expression data. Computat. Stat. Data Anal. 2002; 39: 1–20.

[92] G. L. Gadbury, G. Xiang, J. Edwards, G. Page, and D. B, Allison, The role of sample size on measures of uncertainty and power. In D. B. Allison, J. W. Edwards, T. M. Beasley, and G. Page (eds.), DNA Microarrays and Related Genomics Techniques. Boca Raton, FL: CRC Press, 2005, pp. 51–60.

[93] G. Gadbury. G. Page, J. Edwards, T. Kayo, R. Weindruch, P. A. Permana, J. Mountz, and D. B. Allison, Power analysis and sample size estimation in the age of high dimensional biology. Stat. Meth. Med. Res. 2004; 13: 325–338.

[94] G. P. Page, J. W. Edwards, G. L. Gadbury, P. Yelisetti, J. Wang, P. Trivedi, and D. B. Allison, The PowerAtlas: a power and sample size atlas for microarray experimental design and research. BMC Bioinformat. 2006; 7: 84.

[95] R. Nagarajan, Intensity-based segmentation of microarray images. IEEE Trans. Med. Imag. 2003; 22: 882–889.

[96] Q. Li, C. Fraley, R. E. Bumgarner, K. Y. Yeung, and A. E. Raftery, Donuts, scratches and blanks: robust model-based segmentation of microarray images. Bioinformatics 2005; 21: 2875–2882.

[97] C. Li and W. W. Hung, Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol. 2001; 2: 32–35.

[98] Z. Wu and R. A. Irizarry, Stochastic models inspired by hybridization theory for short oligonucleotide arrays. J. Comput. Biol. 2005; 12: 882–893.

[99] E. Hubbell, W. M. Liu, and R. Mei, Robust estimators for expression analysis. Bioinformatics 2002; 18: 1585–1592.

[100] L. Zhang, L. Wang, A. Ravindranathan, and M. F. Miles, A new algorithm for analysis of oligonucleotide arrays: application to expression profiling in mouse brain regions. J. Mol. Biol. 2002; 317: 225–235.

[101] R. A. Irizarry, B. M. Bolstad, F. Collin, L. M. Cope, B. Hobbs, T. P. Speed, Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003; 31:e15.

[102] R. A. Irizarry, B. Hobbs, F. Collin, Y. D. Beazer-Barclay, K. J. Antonellis, U. Scherf, T. P. Speed, Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 2003; 4: 249–264.

[103] K. Shedden, W. Chen, R. Kuick, D. Ghosh, J. Macdonald, K. R. Cho, T. J. Giordano, S. B. Gruber, E. R. Fearon, J. M. Taylor, and S. Hanash, Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling data. BMC Bioinformat. 2005; 6:26.

[104] G. K. Smyth and T. Speed, Normalization of cDNA microarray data. Methods 2003; 31: 265–273.

[105] J. Tukey, On the comparative anatomy of transformation. Ann. Mathemat. Statist. 1964; 28: 602–632.

[106] R. D. Wolfinger, G. Gibson, E, D. Wolfinger, L. Bennett, H. Hamadeh, P. Bushel, C. Afshari, and R. S. Paules, Assessing gene significance from cDNA microarray expression data via mixed models. J. Comput. Biol. 2001; 8: 625–637.

[107] B. M. Bolstad, R. A. Irizarry, M. Astrand, and T. P. Speed, A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003; 19: 185–193.

[108] B. P. Durbin, J. S. Hardin, D. M. Hawkins, and D. M. Rocke, A variance-stabilizing transformation for gene-expression microarray data. Bioinformatics 2002; 18(suppl 1): S105–S110.

[109] B. P. Durbin and D. M. Rocke, Variance-stabilizing transformations for two-color microarrays. Bioinformatics 2004; 20: 660–667.

[110] C. A. Ball, G. Sherlock, H. Parkinson, P. Rocca-Sera, C. Brooksbank, HC. Causton, D. Cavalieri, T. Gaasterland, P. Hingamp. F. Holstege, et al., Standards for microarray data. Science 2002, 298: 539.

[111] C. A. Ball, G. Sherlock, H. Parkinson, P. Rocca-Sera, C. Brooksbank, H. C. Causton, D. Cavalieri, T. Gaasterland, P. Hingamp, F. Holstege, et al., An open letter to the scientific journals. Bioinformatics 2002; 18: 1409.

[112] K. H. Cheung, K. White, J. Hager, M. Gerstein, V. Reinke, K. Nelson, P. Masiar, R. Srivastava, Y. Li, J. Li, J. Li, et al., YMD: a microarray database for large-scale gene expression analysis. Proc. AMIA Symp. 2002; 140–144.

[113] C. Baer, M. Nees, S. Breit, B. Selle, A. E. Kulozik, K. L. Schaefer, Y. Braun, D. Wai, and C. Poremba, Profiling and functional annotation of mRN A gene expression in pediatric rhabdomyosarcoma and Ewing’s sarcoma. Int. J. Cancer 2004; 110: 687–694.

[114] R. L. Somorjai, B. Dolenko, and R. Baumgartner, Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions. Bioinformatics 2003; 19: 1484–1491.

[115] M. D. Radmacher, L. M. McShane, and R. Simon, A paradigm for class prediction using gene expression profiles. J. Comput. Biol. 2002; 9: 505–511.

[116] M. Ringner and C. Peterson, Microarray-based cancer diagnosis with artificial neural networks. Biotechnology 2003(suppl): 30–35.

[117] U. M. Braga-Neto and E. R. Dougherty, Is cross-validation valid for small-sample microarray classification? Bioinformatics 2004; 20: 374–380.

[118] C. Romualdi, S. Campanaro, D. Campagna, B. Celegato, N. Cannata, S. Toppo, G. Valle, and G. Lanfranchi, Pattern recognition in gene expression profiling using DNA array: a comparative study of different statistical methods applied to cancer classification. Hum. Mol. Genet. 2003; 12: 823–836.

[119] R. Simon and M. D. Radmacher, and K. Dobbin, Design of studies using DNA microarrays. Genet. Epidemiol. 2002; 23: 21–36.

[120] R. Simon, M. D. Radmacher, K. Dobbin, and L. M. McShane, Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J. Natl. Cancer Inst. 2003; 95: 14–18.

[121] R. Simon, Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data. Br. J. Cancer 2003; 89: 1599–1604.

[122] C. Ambroise and G. J. McLachlan, Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. USA 2002; 99: 6562–6566.

[123] R. Simon, M. D. Radmacher, K. Dobbin, and L. M. McShane, Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J. Natl. Cancer Inst. 2003; 95: 14–18.

[124] N. R. Garge, G. P. Page, A. P. Sprague, B. S. Gorman, and D. B. Allison, Reproducible clusters from microarray research: whither? BMC Bioinformat. 2005; 6(suppl 2):S10.

[125] S. Datta and S. Datta, Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 2003; 19: 459–466.

[126] P. Baldi and A. D. Long, A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes. Bioinformatics 2001; 17: 509–519.

[127] D. B. Allison, X. Cui, G. P. Page, and M. Sabripour, Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 2006; 7: 55–65.

[128] X. Cui, J. T. Hwang, J. Qiu, N. J. Blades, and G. A. Churchill, Improved statistical tests for differential gene expression by shrinking variance components estimates. Biostatistics 2005; 6: 59–75.

[129] P. H. Westfall, D. V. Zaykin, and S. S. Young, Multiple tests for genetic effects in association studies. Methods Molec. Biol. 2002; 184: 143–168.

[130] Y. Benjamini, D. Drai, G. Elmer, N. Kafkafi, and I. Golani, Controlling the false discovery rate in behavior genetics research. Behav. Brain Res. 2001; 125: 279–284.

[131] M. Kanehisa, S. Goto, M. Hattori, K.F. Aoki-Kinoshita, M. Itoh, S. Kawashima, T. Katayama, M. Araki, and M. Hirakawa, From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. 2006; 34: D354–D357.

[132] X. Mao, T. Cai, J. G. Olyarchuk, and L. Wei, Automated genome annotation and pathway identification using the KEGG Orthology (KO) as a controlled vocabulary. Bioinformatics 2005; 21: 3787–3793.

[133] Y. Lee, R. Sultana, G. Pertea, J. Cho, S. Karamycheva, J. Tsai, B. Parvizi, F. Cheung, V. Antonescu, J. White, et al., Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res. 2002; 12: 493–502.

[134] Y. Lee, J. Tsai, S. Sunkara, S. Karamycheva, G. Pertea, R. Sultana, V. Antonescu, A. Chan, F. Cheung, and J. Quackenbush, The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes. Nucleic Acids Res. 2005; 33: D71–D74.

[135] L. Tanabe, U. Scherf, L. H. Smith, J. K. Lee, L. Hunter, and J. N. Weinstein, MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. Biotechnology 1999; 27: 1210–1217.

[136] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig, et al., Gene Ontology: tool for the unification of biology. Nat. Genet, 2000; 25: 25–29.

[137] P. Khatri and S. Draghici. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 2005; 21: 3587–3595.

[138] J. J. Goeman and P. Buhlmann, Analyzing gene expression data in terms of gene sets: methodological issues 2. Bioinformatics 2007; 23: 980–987.

[139] I. Rivals, L. Personnaz, L. Taing, and M. C. Potier, Enrichment or depletion of a GO category within a class of genes: which test? Bioinformatics 2007; 23: 401–407.

[140] D. B. Allison, X. Cui, G. P. Page, and M. Sabripour, Microarray data analysis: from disarray to consolidation and consensus. Nat. Rev. Genet. 2006; 7: 55–65.

[141] U. Mansmann and R. Meister, Testing differential gene expression in functional groups. Goeman’s global test versus an ANCOVA approach. Methods Inf. Med. 2005; 44: 449–453.

[142] V.K. Mootha, CM. Lindgren, K.F. Eriksson, A. Subramanian, S. Sihag, J. Lehar, P. Puigserver, E. Carlsson, M. Ridderstrale, E. Laurila, et al., PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 2003; 34: 267–273.

[143] J. Tomfohr, J. Lu, and T. B. Kepler, Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformat 2005; 6: 225.

[144] J. J. Goeman, S. A. van de Geer, F. de Kort, and H. C. van Houwelingen, A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 2004; 20: 93–99.

[145] J. J. Goeman, J. Oosting, A. M. Cleton-Jansen, J. K. Anninga, H. C. van Houwelingen, Testing association of a pathway with survival using gene expression data. Bioinformatics 2005; 21: 1950–1957.

[146] P. Khatri and S. Draghici. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 2005; 21: 3587–3595.

[147] P. Zimmermann, M. Hirsch-Hoffmann, L. Hennig, and W. Gruissem, GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol. 2004; 136: 2621–2632.

[148] F. A. de la, P. Brazhnik, and P. Mendes, Linking the genes: inferring quantitative gene networks from microarray data. Trends Genet. 2002; 18: 395–398.

[149] S. Imoto, T. Higuchi, T. Goto, K. Tashiro. S. Kuhara, and S. Miyano, Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks. J. Bioinform. Comput. Biol. 2004; 2: 77–98.

[150] Z. S. Qin, L. A. McCue, W. Thompson, L. Mayerhofer, C. E. Lawrence, and J. S. Liu, Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites. Nat. Biotechnol. 2003.

[151] B. Xing and M. J. van der Laan, A statistical method for constructing transcriptional regulatory networks using gene expression and sequence data. J. Comput. Biol. 2005; 12: 229–246.

[152] D. Murphy, Gene expression studies using microarrays: principles, problems, and prospects. Adv. Physiol. Educ. 2002; 26: 256–270.

[153] R. Edgar, M. Domrachev, and A. E. Lash, Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002; 30: 207–210.

[154] R. Tuteja and N. Tuteja, Serial analysis of gene expression (SAGE): unraveling the bioinformatics tools. BioEssays 2004; 26: 916–922.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.199.19