Chapter 10

Protein Contact Order Prediction: Update

YI SHI, JIANJUN ZHOU, DAVID S. WISHART and GUOHUI LIN

10.1 Introduction

Contact order (CO) is the most widely adopted property used to measure the topological complexity of a protein structure. More specifically, contact order quantitatively measures the nonadjacent amino acid proximity within a folded protein. A contact between two distinct amino acid residues in a protein is formed when there is a pair of heavy atoms (C, O, S, or N), one from each residue, whose physical (Euclidean) distance is within a defined threshold [22, 36]. The absolute contact order (denoted as Abs_CO in this chapter) of a protein is defined as the average number of residues separating the contacts inside the protein (where two sequentially adjacent residues are separated by one residue). The relative contact order, or simply the contact order (denoted as CO), is the Abs_CO normalized over the protein length.

Mathematically, given a protein with a primary sequence of length c10-math-0001, we use c10-math-0002 to denote its c10-math-0003th amino acid residue. For two distinct residues c10-math-0004 and c10-math-0005, if there are two heavy atoms (C, O, S, or N), one from each residue, within c10-math-0006 Å, then c10-math-0007 and c10-math-0008 form a contact. Let c10-math-0009 denote the number of residues, c10-math-0010, separating this contact. Assuming that there are a total of c10-math-0011 contacts in the protein, the Abs_CO of this protein is defined as

10.1 c10-math-0012

where the summation goes over all contacting residue pairs c10-math-0013 in the protein [22, 36]. The CO is defined as

10.2 c10-math-0014

Essentially, Abs_CO measures the average separation between contacting residues in the native state of a protein, while CO is the normalized variant. Abs_CO (CO as well) increases with the proportion of interacting atoms that are far away in the protein sequence.

10.2 Correlated protein properties

Since the early 1980s, considerable computational and experimental efforts have been devoted to learning about or predicting how proteins fold. Bulk properties such as protein folding rates [26, 32], free energies of folding [43], and hydrogen exchange rates [13] can be measured experimentally to provide insights into protein folding mechanisms. These folding mechanisms are correlated with molecular properties such as secondary structure [23], molecular topology [36], and solvent accessibility [33]. In the past few decades, there were remarkable observations that protein folding rates vary over many orders of magnitude, from microseconds [32] to hours [26]. In combination with theoretical studies, these experimental observations have led to a general agreement that protein folding mechanisms and folding landscapes are determined largely by the topology of the protein native state and are relatively insensitive either to the details of the interatomic interactions [1, 7, 16, 22, 36] or to protein length [36].

To quantify the topological complexity and the stability of protein native states, various measurements pertaining to the contacts between amino acid residues in protein 3D structure have been proposed [12, 15, 20, 22, 27, 29, 36–39, 48], where the relative contact order CO is the most robust representative. Both positive and negative correlations have been found between CO and several bulk protein properties such as protein folding rate and transition state placements [5, 17, 18, 21, 22, 31, 36].

Using a test dataset containing (only) c10-math-0015 proteins, Plaxco et al. showed that there is a statistically significant relation between protein folding kinetics and native state topological complexity [36]. As it is proportional to the height of the transition state barrier, the logarithm of the intrinsic refolding rate was found to be well correlated with CO with a coefficient of c10-math-0016 and an associated c10-math-0017 value of c10-math-0018. It was also observed that the correlation coefficient between estimates of folding transition state placement c10-math-0019 and CO is c10-math-0020 with an associated c10-math-0021 value of c10-math-0022 [36]. Here c10-math-0023 is computed from the ratio of the denaturant dependences of the relative free energy of the native folding and folding transition states. It is thought to reflect the fraction of solvent-accessible surface buried in the native state that is also buried in the transition state. Plaxco et al. also found a high correlation coefficient between CO and the helical content of the protein [36]. This observation is not surprising because helices have numerous close contacts characterized by a three-residue periodicity. However, the correlations between helical content or protein folding rate and transition state placement were shown to be much less significant; likewise, relationships between the size or stability of the proteins in the dataset and their refolding kinetics were found to be weak or nonexistent.

Contact order has also been shown to have certain utility in ab initio protein structure prediction [7], in addition to its application in predicting protein folding kinetic properties. Bonneau et al. observed that protein “decoy” (i.e., candidate) structures with higher topological complexity are more likely to be undersampled during the candidate structure generation stage in ab initio structure prediction programs, especially among larger proteins [7]. Such a bias can be alleviated by normalizing the CO distribution of candidate structures, and subsequently better protein structure predictions were generally achieved [7]. CO filtering is now an integral part of the Rosetta protein structure prediction package [10].

10.3 Other contact measurements

As shown in early studies, Abs_CO exhibits a weaker correlation with two-state protein folding kinetics than CO does [16, 36]. More recently, however, Ivankov et al. showed that Abs_CO is a more appropriate parameter for predicting the folding rate of proteins as it actually spans a wider range of folding state kinetics (i.e., two-state, multistate, and short peptides) [22]. Ivankov et al. summarized CO and protein length (c10-math-0024) into a general parameter called the size-modified contact order (SMCO):

10.3 c10-math-0025

Apparently, from Eq. (10.3), SMCO reduces to CO when c10-math-0026, and reduces to Abs_CO when c10-math-0027. It was observed that any c10-math-0028 results in approximately the same correlation for the totality of proteins and peptides collected [22], with the best correlation achieved at c10-math-0029, that is, when SMCO c10-math-0030 Abs_CO. This hints that the more promising applications of CO prediction or calculation lie in the prediction of protein folding rates, folding transition state placements, and other folding properties.

In the literature, there are several other well-studied concepts on residue contacts, such as residuewise contact order (RWCO) [25, 28, 30, 42], effective contact order (ECO) [39], contact number [CN; also known as residue contact number or residue coordination number (RCN)] [12, 15, 20, 27, 29, 37, 38, 48], and Kendall's tau nearest-neighbor topology (c10-math-0031-NN) [39]. These measurements are used largely to characterize the topology or topological complexity of protein native structure, but unlike CO, they are not directly correlated with certain global protein properties such as protein folding rate and folding transition state placements.

The residuewise contact order (RWCO) describes the sequence separations between the residues of interest and its contacting residues in a protein sequence [42]. RWCO provides important information for reconstructing a protein 3D structure from a set of one-dimensional structural properties. RWCO can also assist in protein 3D structure prediction and protein folding rate prediction, as well as providing insights into protein sequence–structure relationships. The discrete RWCO value of the c10-math-0032th residue in a length c10-math-0033 protein sequence is defined by

equation

where c10-math-0035 if c10-math-0036 and c10-math-0037 otherwise. c10-math-0038 is the Euclidean distance between the c10-math-0039 atoms of the c10-math-0040th and the c10-math-0041th residues in the protein sequence. Note that a sequential separation of at least two residues is required. By replacing the step function c10-math-0042 with a sigmoid function, one obtains the continuous RWCO, or simply RWCO [42]. Song et al. developed a (continuous) RWCO prediction method based on PSI-BLAST and support vector regression from protein primary sequences, and achieved a correlation coefficient of c10-math-0043 and a root-mean-square error of c10-math-0044 on a well-curated dataset containing c10-math-0045 protein sequences [42].

The effective contact order (ECO), as an alternative measurement to contact order, was proposed by Dill and coworkers [11, 14] who adjusted the number of residues between a contacts as the number of residues on the shortest path between the two contacting residues. For example, when residues c10-math-0046 and c10-math-0047 (c10-math-0048) form a contact with c10-math-0049, and in between them residues c10-math-0050 and c10-math-0051 c10-math-0052 form an inner contact with c10-math-0053, then in ECO c10-math-0054 is adjusted to be c10-math-0055 instead of c10-math-0056, as used in Equation (10.1). The shortest path between residues c10-math-0057 and c10-math-0058 in this case is c10-math-0059. Because of the presence of existing covalent or topological links, ECO operationalizes contacts and scope in terms of shortest pathlengths between residues. ECO is assumed to relate to protein folding rate because it attempts to capture loop size, which is related to the size of the conformational search space necessary to form a conditional contact based on preexisting links.

The contact number (CN) measures how amino acid residues are spatially arranged. It is defined as the number of c10-math-0060 atoms in other residues and within a sphere centered at the c10-math-0061 atom of interest [48]. Therefore, CN is a residuewise measure. The same as for RWCO, Yuan et al. defined two kinds of contact number in their study: the discrete and the continuous. The discrete CN of the c10-math-0062th residue c10-math-0063 in a length c10-math-0064 protein is defined as

equation

where c10-math-0066 is the same step function used in the definition of c10-math-0067, as well as c10-math-0068 and the radius c10-math-0069. Note again that a sequential separation of at least two residues is required. Also, if one replaces the step function c10-math-0070 with a sigmoid function, the continuous CN of the c10-math-0071th residue is defined. CN can be used to assist in protein fold recognition [24], to describe conserved solvent exposure of similar folds without a common evolutionary origin [19], to determine the energy function allowing molecular dynamics simulations of protein structures [27], and to partly characterize protein 3D structure [48]. Yuan et al. proposed a CN prediction method using PSI-BLAST and support vector regression, and achieved a correlation coefficient of c10-math-0072; furthermore, they showed that if residues are classified as being either “contacted” or “non-contacted” then the correlation coefficient can reach c10-math-0073.

Most recently, Segal et al. proposed a new topological measurement based on a means for operationalizing 3D proximity with respect to the underlying chain [39]. Specifically, the euclidean distances between all pairs of residues are computed and recorded in a distance matrix using their 3D coordinates. Then, for each residue, its Euclidean distances are mapped to a nearest-neighbor ranking. With this ranking, cycle structure can be used to capture topology with respect to the underlying chain. Such a ranking-based approach is insensitive to noise, which is a known concern with regard to structure determination experiments. For each residue a reference ranking is generated to capture the denatured random coil and the Kendall's tau nearest-neighbor (c10-math-0074-NN) approach is used to measure the difference between the ranking of the folded structure's residue and the ranking of the reference residue. To measure the topology of the whole folded structure, Segal et al. took the average over all residues' c10-math-0075-NN values. Compared to CO, the c10-math-0076-NN measurement needs no tuning parameters during the computation; moreover, the chain deformation/structural information between the reference and contacting residues captured by the c10-math-0077-NN measurement is ignored in computing CO. When tested on a set of two-state proteins under standardized conditions, this measurement showed an improved correlation coefficient with folding and unfolding rates. On a selected dataset containing c10-math-0078 proteins, Segal et al. showed that the correlation coefficient between the folding and the unfolding rates of the proteins and the c10-math-0079-NN values are c10-math-0080 and c10-math-0081, respectively, while the correlation coefficient between the two rates and the CO are only c10-math-0082 and c10-math-0083, respectively.

10.4 Contact order calculation

Here and in Sections 10.5 and 10.6, we introduce our work on contact order calculation and prediction. Because of the trivial relationship between Abs_CO and CO, where one can be directly computed from the other, in the sequel, we often refer to Abs_CO and CO interchangeably, unless otherwise explicitly specified.

While a large number of contact order algorithms have been described, the limited accessibility or availability of these algorithms (i.e., lack of downloadable programs or web servers) has prevented their widespread use by protein chemists or structural biologists. To increase the utility and accessibility of contact order calculations for experimentalists, we have developed a web server that both calculates Abs_CO (and CO) from 3D coordinate data and predicts Abs_CO (and CO) from protein sequence data. This public web server (http://www.copredictor.ca) calculates Abs_CO using Equation (10.1), where two distinct residues form a contact if there are two heavy atoms (C, O, S, or N), one from each residue, within c10-math-0084 Å. It is worth pointing out that in the literature, several different distance thresholds other than c10-math-0085 Å have been tested with no significant difference found [48]; besides, some constraints on contacting residue sequential separation (such as at least three residues apart from each other) and/or heavy atoms (such as Cβ only) have been suggested [7], but again no essential difference exists as the underlying idea of using CO to quantify the topology of a protein's native state 3D structure remains the same.

The input to our Abs_CO calculator is a three-dimensional structure (uploading either the PDB [6] coordinate file or the PDB ID). The calculator typically returns the Abs_CO value within a few seconds. We note that there is an earlier published CO calculation server [36], and these two calculators returned nearly identical Abs_CO values with a correlation coefficient of c10-math-0086. However, this earlier server failed (tested on May 2, 2007) to recognize 61% of the c10-math-0087 monomeric PDB files that were successfully processed by our server.

10.5 Contact order prediction by homology

Many protein properties, including tertiary structure, secondary structure, and solvent accessibility, can be predicted via homology [35]. In other words, the properties of a query sequence can be predicted by directly transferring the properties or features of a homologous protein to the query protein. Since CO is a property that is a function of structure, we hypothesized that the calculated CO of known 3D structures could be used to predict the CO of homologous proteins.

This approach has been implemented in our public CO web server http://www.copredictor.ca.

More specifically, the Abs_CO for 16,499 nonredundant proteins obtained from the PDB were calculated. These proteins were selected using the PDB culling/filtering service called PISCES [45]. Structures were initially selected using a 95% identity sequence redundancy cutoff and a requirement for better than 3 Å resolution (for X-ray structures). Structures were further processed by removing disordered structures (i.e., total secondary structure content <10%) as well as all membrane proteins (i.e., membrane c10-math-0088 barrel and trans-membrane helix proteins). These 16,499 sequences with their Abs_CO values form the CO database that the server uses for its homology search, via a local copy of BLAST [2].

When the input to our web server is the primary sequence of a query protein, the BLAST search finds a homolog from the web server's CO database of 16,499 sequences. If this homolog is not an exact match to the query sequence but exhibits more than 20% sequence identity (which is computed as the number of identical residues divided by the query sequence length) and the query sequence is c10-math-0089 of the length of the homolog, the pre-computed Abs_CO of the homolog is used as the predicted Abs_CO of the query sequence [40].

We performed tests through a modified five-fold cross-validation using a random sample of 1,000 sequences on the CO database, using a variety of sequence identity cutoffs and sequence length thresholds to assess their influence on both the accuracy and the coverage (coverage refers to the percentage of query sequences that could be predicted by this homology-based method) [40]. Among other tested settings, the 20% sequence identity cutoff and the 40% length threshold provided the best overall accuracy–coverage tradeoff. Under this specific setting, on average c10-math-0090 (c10-math-0091 standard deviation) of sample sequences found homologs in the CO database. CO prediction by sequence homology turns out to be a surprisingly accurate prediction method, with a correlation coefficient of c10-math-0092 between the c10-math-0093 pairs of true absolute contact order and predicted absolute contact order values. These Abs_CO predictions are on average c10-math-0094 correct (c10-math-0095 standard deviation) [40].

10.6 Contact order prediction from sequence

Obviously not every protein can have its CO calculated from coordinate data or predicted via sequence homology. In the above experiment with 1,000 sequences, c10-math-0096 of them failed to find a homolog. In order to deal with the situation where no homolog can be found for CO prediction, we have implemented a regression-based prediction method in the http://www.copredictor.ca CO web server [40]. As Abs_CO is observed to correlate well with a linear combination of the percentage of residues in c10-math-0097-helices c10-math-0098, the percentage of residues in c10-math-0099 strands c10-math-0100 and the protein length c10-math-0101, a linear regression to optimize the correlation between Abs_CO and the protein primary and secondary structures was developed

10.4 c10-math-0102

where c10-math-0103, c10-math-0104, are the coefficients of the three factors c10-math-0105, c10-math-0106, and c10-math-0107, and c10-math-0108 is a constant. During prediction, proteins with unknown three-dimensional structure can have their secondary structures predicted by Proteus [35] (or any other similar programs such as PSIPRED [34]). Proteus is a secondary structure predictor that achieves highly accurate predictions (c10-math-0109 accuracy score of c10-math-0110) based on VADAR [46] and the PPT-Database [47]. To train the regressor, that is, to determine the values for c10-math-0111, c10-math-0112, c10-math-0113, a set of c10-math-0114 monomeric proteins with an X-ray resolution <1.5 Å were extracted from the PDB [6]. Readers may refer to the study by shi et al. [40] for more detailed statistics on these proteins, such as their SCOP classification [4] and length distribution. Using these c10-math-0115 high-resolution three-dimensional protein structures through a five-fold cross-validation, the optimal parameters in Equation (10.4) localize at c10-math-0116, c10-math-0117, c10-math-0118, and c10-math-0119.

In addition to this three-factor linear CO predictor, denoted as F3-LR, several other linear regressors have also been developed to include more factors that might be strongly correlated to Abs_CO. These factors are the number of c10-math-0120 hairpins (two adjacent c10-math-0121-strand segments form a hairpin if they are separated by two to five residues), the number of distant beta strands (two adjacent c10-math-0122-strand segments are considered “distant” if they are separated by at least five residues), the amino acid frequencies, and the hydrophobicity frequencies [40]. Besides linear regression, support vector regression (SVR) [41] and neural network (NN) [3] methods have also been implemented on our web server.

On the training dataset of c10-math-0123 proteins, the five-fold cross-validation in the study by shi et al. [40] shows that the simplest regression model (F3-LR) actually performed remarkably well, achieving a correlation coefficient of c10-math-0124. Using more factors was shown to improve the correlation coefficient by a small amount. For instance, using four more factors, the F7-LR model improved the correlation coefficient by c10-math-0125; and the F27-LR achieved a correlation coefficient of c10-math-0126.

10.7 The public contact order web server

A contact order calculator, the homology-based contact order predictor, and the linear regression–based contact order predictors are implemented as a public web server http://www.copredictor.ca. The input to the server can be either a three-dimensional structure (either the PDB coordinate file or the PDB Id) or the primary sequence of the query protein. When the input is a sequence, our server will first use BLAST to identify a sequence in our CO database that is either identical or the most homologous to the query. There are three possible scenarios: (1) if the input is a 3D structure, or the query sequence matches exactly a known structure in our database of 16,499 proteins, our server will calculate its Abs_CO; (2) if the input is a sequence and the BLAST search finds a homolog that is not an exact match but satisfies certain criteria, the precomputed Abs_CO of the homolog is used as the predicted Abs_CO of the query sequence; (3) if the input is a sequence and has no BLAST match that falls into scenario 2, our server will call Proteus to predict the secondary structure content for the query protein, and then report its Abs_CO predicted by Equation (10.4).

We have used our program to predict the Abs_COs and the derived protein folding rates [22] for all the proteins collected in TrEMBL (http://www.uniprot.org) [44], as of July 21, 2011. The result is available as a downloadable file from the server website.

10.8 Conclusions

Contact order (CO) is the most widely used approach to quantitatively measure the topological complexity of protein structures. CO can be used to accurately predict protein folding rates and to assist in de novo protein structure prediction/generation. However, the utility of the CO method, especially for experimentalists, has been limited by the lack of availability of programs or web servers that either support CO calculation (from coordinate data) or allow CO prediction (from sequence data). For proteins with solved three-dimensional structures, we have developed a public web server (http://www.copredictor.ca) that accurately calculates COs, thereby overcoming the limited functionality of an earlier web server. In addition, this server also offers a very effective method for predicting protein contact order from primary sequence data. This latter function is particularly important because of the 3D structure only a tiny fraction of known proteins is known. Many factors, in particular the percentage of residues in alpha helices, the percentage of residues in beta strands, and sequence length, are known to be strongly correlated with the absolute contact order. Tests using a large dataset of high-resolution monomeric proteins showed that our method achieved a correlation coefficient of 0.857–0.870. In addition, we have shown that it is possible to use sequence homology to accurately predict the contact order for proteins for which no 3D structure exists, with a high correlation coefficient of c10-math-0127. This web server has been recognized or used to help in a number of studies in protein folding [8, 9, 39, 49] and to demonstrate the effectiveness of c10-math-0128-NN [39].

References

1. Alm E, Baker D, Matching theory and experiment in protein folding, Curr. Opin. Struct. Biol. 9:189–196 (1999).

2. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ, Basic local alignment search tool, J. Mol. Biol. 215:403–410 (1990).

3. Anderson JA, An Introduction to Neural Networks, MIT Press, 1995.

4. Andreeva A, Howorth D, Brenner SE, Hubbard TJP, Chothia C, Murzin AG, SCOP database in 2004: Refinements integrate structure and sequence family data, Nucleic Acids Res. 32:D226–D229 (2004).

5. Baker D, A surprising simplicity to protein folding, Nature 405:39–42 (2000).

6. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE, The protein data bank, Nucleic Acids Res. 28:235–242 (2000).

7. Bonneau R, Ruczinski I, Tsai J, Baker D, Contact order and ab initio protein structure prediction, Protein Sci. 11:1937–1944 (2002).

8. Campbell K, Kurgan L, Sequence-only based prediction of beta-turn location and type using collocation of amino acid pairs, Open Bioinformatics J. 2:37–49 (2008).

9. Chen K, Stach W, Homaeian L, Kurgan L, iFC2: An integrated web-server for improved prediction of protein structural class, fold type, and secondary structure content, Amino Acids 40:963–973 (2011).

10. Chivian D, Kim DE, Malmstrom L, Schonbrun J, Rohl CA, Baker D, Prediction of CASP6 structures using automated Robetta protocols. (Available at http://robetta.bakerlab.org/pub/dylan/).

11. Dill KA, Fiebig KM, Chan HS, Cooperativity in protein-folding kinetics, Proc. Nat. Acad. Sci. USA 90:1942–1946 (1993).

12. Fariselli P, Casadio R, RCNPRED: prediction of the residue co-ordination numbers in proteins, Bioinformatics 17:202–204 (2001).

13. Fezoui Y, Braswell EH, Xian W, Osterhout JJ, Dissection of the de novo designed peptide alpha-t-alpha: Stability and properties of the intact molecule and its constituent helices, Biochemistry 38:2796–2804 (1999).

14. Fiebig KM, Dill KA, Protein core assembly process, J. Chem. Phys. 98:3475–3487 (1993).

15. Flöckner H, Braxenthaler M, Lackner P, Jaritz M, Ortner M, Sippl MJ, Progress in fold recognition, Proteins 23:376–386 (1995).

16. Grantcharova V, Alm EJ, Baker D, Horwich AL, Mechanisms of protein folding, Curr. Opin. Struct. Biol. 11:70–82 (2001).

17. Gromiha MM, Selvaraj S, Comparison between long-range interactions and contact order in determining the folding rate of two-state proteins: Application of long-range order to folding rate prediction, J. Mol. Biol. 310:27–32 (2001).

18. Gromiha MM, Thangakani AM, Selvaraj S, Fold-rate: Prediction of protein folding rates from amino acid sequence, Nucleic Acids Res. 34:W70–W74 (2006).

19. Hamelryck T, An amino acid has two sides: A new 2d measure provides a different view of solvent exposure. Proteins 59:38–48 (2005).

20. Ishida T, Nakamura S, Shimizu K, Potential for assessing quality of protein structure based on contact number prediction, Proteins 64:940–947 (2006).

21. Ivankov DN, Finkelstein AV, Prediction of protein folding rates from the amino acid sequence-predicted secondary structure, Proc. Nat. Acad. Sci. USA 101:8942–8944 (2004).

22. Ivankov DN, Garbuzynskiy SO, Alm E, Plaxco KW, Baker D, Finkelstein AV, Contact order revisited: Influence of protein size on the folding rate, Protein Sci. 13:2057–2062 (2003).

23. Kabsch W, Sander C, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features, Biopolymers 22:2577–2637 (1983).

24. Karchin R, Cline M, Karplus K, Evaluation of local structure alphabets based on residue burial, Proteins 55:508–518 (2004).

25. Kihara D, On the effect of long range interactions on secondary structure formation in proteins, Protein Sci. 14:1955–1963 (2005).

26. Kim PS, Baldwin RL, Intermediates in the folding reactions of small proteins, Annu. Rev. Biochem. 59:631–660 (1990).

27. Kinjo AR, Horimoto K, Nishikawa K, Predicting absolute contact numbers of native protein structure from amino acid sequence, Proteins 58:158–165 (2005).

28. Kinjo AR, Nishikawa K, Predicting secondary structures, contact numbers, and residue-wise contact orders of native protein structure from amino acid sequence using critical random networks, Biophysics 1:67–74 (2005).

29. Kinjo AR, Nishikawa K, Recoverable one-dimensional encoding of three-dimensional protein structures, Bioinformatics 21:2167–2170 (2005).

30. Kinjo AR, Nishikawa K, CRNPRED: Highly accurate prediction of one-dimensional protein structures by large-scale critical random networks, BMC Bioinformatics 7:401 (2006).

31. Koga N, Takada S, Roles of native topology and chain-length scaling in protein folding: A simulation study with a Go-like model, J. Mol. Biol. 313:171–180 (2001).

32. Kubelka J, Hofrichter J, Eaton WA, The protein folding “speed limit,” Curr. Opin. Struct. Biol. 14:76–88 (2002).

33. Lee B, Richards FM, The interpretation of protein structures: Estimation of static accessibility, J. Mol. Biol. 55:379–380 (1971).

34. McGuffin LJ, Bryson K, Jones DT, The PSIPRED protein structure prediction server, Bioinformatics 16:404–405 (2000).

35. Montgomerie S, Sundararaj S, Gallin W, Wishart DS, Improving the accuracy of protein secondary structure prediction using structural alignment, BMC Bioinformatics 7:301 (2006).

36. Plaxco KW, Simons KT, Baker D, Contact order, transition state placement and the refolding rates of single domain proteins, J. Mol. Biol. 227:985–994 (1998). (Available at http://depts.washington.edu/bakerpg/contact_order/).

37. Pollastri G, Baldi P, Fariselli P, Casadio R, Improved prediction of the number of residue contacts in proteins by recurrent neural networks, Bioinformatics 17:S234–S242 (2001).

38. Pollastri G, Baldi P, Fariselli P, Casadio R, Prediction of coordination number and relative solvent accessibility in proteins, Proteins 47:142–153 (2002).

39. Segal MR, A novel topology for representing protein folds, Protein Sci. 18:686–693 (2009).

40. Shi Y, Zhou J, Arndt D, Wishart DS, Lin G, Protein contact order prediction from primary sequences, BMC Bioinformatics 9:255 (2008). (Available at http://www.copredictor.ca/).

41. Smola AJ, Schölkopf B, A tutorial on support vector regression, Stat. Comput. 14:199–222 (2003).

42. Song J, Burrage K, Predicting residue-wise contact orders in proteins by support vector regression, BMC Bioinformatics 7:425 (2006).

43. Tanaka S, Scheraga HA, Model of protein folding: inclusion of short-, medium-, and long-range interactions, Proc. Nat. Acad. Sci. USA 72(10): 3802–3806 (1975).

44. The UniProt Consortium, Ongoing and future developments at the Universal Protein Resource, Nucleic Acids Res. 39:D214–D219 (2011). (Available at http://www.uniprot.org/).

45. Wang G, Dunbrack RL Jr, PISCES: A protein sequence culling server, Bioinformatics 19:1589–1591 (2003).

46. Willard L, Ranjan A, Zhang H, Monzavi H, Boyko RF, Sykes BD, Wishart DS, VADAR: A web server for quantitative evaluation of protein structure quality, Nucleic Acids Res. 31:3316–3319 (2003).

47. Wishart DS, Arndt D, Berjanskii M, Guo AC, Shi Y, Shrivastava S, Zhou J, Zhu Y, Lin G, PPT-DB: The protein property prediction and testing database, Nucleic Acids Res. 36:D222–D229 (2008).

48. Yuan Z, Better prediction of protein contact number using a support vector regression analysis of amino acid sequence, BMC Bioinformatics 6:248 (2005).

49. Zhang H, Zhang T, Chen K, Kedarisetti KD, Mizianty MJ, Bao Q, Stach W, Kurgan L, Critical assessment of high-throughput standalone methods for secondary structure prediction, Brief. Bioinformatics 12:672–688 (2011).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.146.61