Preface

Proteins are any of a group of complex organic macromolecules that contain carbon, hydrogen, oxygen, nitrogen, and usually sulfur and are composed of one or more chains of amino acids. Proteins are fundamental components of all living cells and include many substances, such as enzymes, hormones, and antibodies, which are essential for the proper functioning of an organism. Protein bioinformatics is a newer name for an already existing discipline. It encompasses the techniques and methodologies used in bioinformatics that are related to proteins. Proteins can be described as a sequence, a two-dimensional (2D) structure, or a three-dimensional (3D) structure. In addition, interactions among proteins can be described as a network or a graph. Hence, many traditional algorithmic techniques such as graph algorithms, heuristic algorithms, approximate algorithms, parameterized algorithms, and linear programming can be applied to analyze protein interaction networks. On the other hand, because of the large amount of data available from wet labs and experiments with proteins, traditional algorithmic methods may not be sufficiently powerful and intelligent to be applied. Hence, we can use many mature machine learning or artificial intelligence (AI) methods to analyze protein data such as predicting protein structures based on existing databases or datasets. These AI techniques include support vector machines (SVMs), hidden Markov models (HMMs), neural networks, decision trees, reinforcement learning, genetic algorithms, pattern recognition, clustering, and random forests. Combinations of traditional algorithms such as graph algorithms, statistical methods, and AI techniques such as SVM have been used in protein structure prediction and protein interaction networks, and many good results have been achieved. The objective of this book is to promote collaboration between computer scientists working on algorithms and AI and biologists working on proteins by presenting cutting-edge research topics and methodologies in the area of protein bioinformatics.

This book comprises chapters written by experts on a wide range of topics that are associated with novel algorithmic and AI methods for analysis of protein data. This book includes chapters on analysis of protein sequences, structures, and their interaction networks using both traditional algorithms and AI methods. It comprehensively summarizes the most recent developments in this exciting research area. Protein bioinformatics plays a key role in life science, including protein engineering via designing tailor-made proteins, drug design based on finding docking molecules to kill disease cells, and improvement of protein effectiveness through modifying biocatalysts. Because of the many advantages of protein bioinformatics compared to traditional wet lab experiments, applications of protein bioinformatics are also described in this book. The important work of some representative researchers in protein bioinformatics is brought together for the first time in one volume. The topic is treated in depth and is related to, where applicable, other emerging technologies such as data mining and visualization. The goal of the book is to introduce readers to the most recent work and results in protein bioinformatics in the hope that they will build on them to make new discoveries of their own. It also arms the readers with the analysis tools and methods used in protein bioinformatics to enable them to tackle these problems in the future. The key elements of each chapter are briefly summarized below.

This is the first edited book dealing with the topic of protein bioinformatics and its applications in such a comprehensive manner. The material included in this book was carefully chosen for quality, coverage, and relevance. This book also provides a mixture of algorithms, AI methods, data preparation, simulation, experiments, evaluation methods, and applications, which provide both qualitative and quantitative insights into the rich field of protein bioinformatics.

This book is intended to be a repository of case studies that deal with a variety of protein bioinformatics problems and to show how algorithms and AI methods are used (sometimes together) to study the protein biological data and to achieve a better understanding of the data. It is hoped that this book will generate more interest in developing more efficient and accurate methodologies and solutions to protein bioinformatics problems and applications. This should enable researchers to handle more complicated and larger protein data once they understand the theories of the algorithms and AI methods described in this book and how to apply them. Although the material contained in this book spans a number of protein bioinformatics topics and applications, the chapters are presented in such a way that makes the book self-contained so that the reader does not have to consult external sources. This book offers (in a single volume) a comprehensive coverage of a range of protein bioinformatics applications and how they can be analyzed and used through the use of algorithms and AI methods to achieve meaningful results and interpretations of the protein data more accurately and efficiently.

The goal of this edited book is to provide an excellent reference for students, faculty, researchers, and people in the industry in the fields of bioinformatics, computer science, statistics, and biology who are interested in applying algorithms and AI methods to solve biological problems. This book is divided into five parts: (I) From Protein Sequence to Structure, (II) Protein Analysis and Prediction, (III) Protein Structure Alignment and Assessment, (IV) Protein–Protein Analysis of Biological Networks, and (V) Application of Protein Bioinformatics. The chapters are briefly summarized as follows:

  • Chapter 1 discusses scaling of similarity sensitivity in remote homology modeling on yeast species and how the candidate genes are searched; these studies are important for different stages of embryogenesis of model plant species Arabidopsis thaliana in light of the concept of dynamical patterning modules.
  • Understanding the biological term sequence motif is an important task in modern bioinformatics research, and these motif patterns may be able to predict the structural or functional area of other proteins. Protein sequence motif discovery is discussed in Chapter 2.
  • Chapter 3 introduces methods for identifying calcium binding sites in proteins. Three methodologies for predicting calcium binding sites in proteins are reviewed and compared using different algorithms and AI methods.
  • Chapter 4 proposes an imbalance learning method for protein methylation prediction using ensemble SVMs. It focuses on computational predictions of a particular posttranslational modification (PTM)–protein arginine methylation.
  • Chapter 5 studies the prediction of protein posttranslational modification sites. By taking advantage of the large magnitude of experimentally verified PTM sites and utilizing a comprehensive machine learning method, a useful bioinformatics software system for PTM site prediction is provided.
  • Chapter 6 describes an effective and a reliable tool using data mining and machine learning techniques for predicting local protein structure.
  • In Chapter 7, a novel effective approach for predicting the boundaries of protein structure elements instead of individual residues structures using SVM is proposed.
  • The states of the art of different machine learning-based RNA binding site prediction methods are overviewed in Chapter 8.
  • In Chapter 9, many sequence-based and mass spectrometry data-based frameworks for determining disulfide bonds are presented.
  • Chapter 10 gives the most recent update on protein contact order prediction. A new contact order web server is described that can predict the contact order by structure and sequence homology contrarily to the existing servers.
  • Chapter 11 surveys about 15 computational methods for cysteine oxidation state prediction developed since the early 1990s.
  • Chapter 12 addresses the computational methods in cryoelectron microscopy 3D structure reconstruction and its multilevel parallel strategy on the GPU platform.
  • Chapter 13 gives a brief introduction to the biological, mathematical, and computational aspects of making pairwise comparisons between protein structures.
  • To discover protein structures for optimal structure alignment, methods for using vector space model and suffix trees for efficient string matching and querying and how to index 3D protein structure are explained in Chapter 14. Furthermore, a protein similarity algorithm is explained in detail.
  • Chapter 15 discusses several issues of structural alignment and methods that are implemented for sequence-order-independent structural alignment at both the global and local surface levels.
  • Chapter 16 describes the methods used to study the prediction of protein structure classes and functions and measures, such as physicochemical features of amino acids, Z-curve representation, and the chaos game representation of proteins.
  • Chapter 17 describes a new machine learning algorithm that uses a support vector machine (SVM) technique that understands structures from the Protein Databank (PDB) and, when given a new model, predicts whether it belongs to the same class as the PDB structures.
  • The characteristics, strengths, and shortages of many network algorithms for clustering biological networks are discussed in Chapter 18. It includes various algorithms to cluster on protein–protein interaction networks (PPINs) based on the features of PPINs.
  • Chapter 19 describes different algorithms applied to identify protein complexes, including methods based solely on PPIN data, methods combined with multiple information sources, and new trends in prediction of protein complexes on dynamic networks.
  • To detect functional modules from protein–protein interaction networks, an ant colony optimization (ACO)-based algorithm with the topology of the network for the functional module detection is proposed and discussed in Chapter 20.
  • Chapter 21 gives a brief overview of current state of the art in metabolic pathway/network alignment and how it can be used in automatic data curating.
  • Chapter 22 starts by providing some background information on how PPI networks can be modeled on different PPI network alignments, and then focuses on local PPI network alignment algorithms and global PPI network alignment algorithms. Coarse-grain comparison is also addressed in that chapter.
  • Among many machine learning techniques proposed for quantitative structure–activity relationship (QSAR) analysis and drug activity comparison, Chapter 23 focuses on the design and results of SVMs used for protein-related drug activity comparison.
  • The main goal of Chapter 24 is to analyze how the general problem of finding repetitions in biological data evolved from sequences to networks data, by focusing on the open challenges and specific applications in biological networks.
  • Chapter 25 gives a brief overview of an online resource and prediction server named MeTaDoR that provides comprehensive structural and functional information on membrane targeting domains.
  • Chapter 26 gives a brief review of network-based identification and integration of gene signature of complex disease. In particular, it focuses on breast cancer gene signature in protein interaction networks using graph centrality.

We would like to express our sincere thanks to all authors for their effort and important contributions. We highly appreciate the reviews and corrections done by Ms. Tammie Dudley, which have improved the manuscript tremendously. We would also like to extend our deepest gratitude to Simone Taylor (senior editor) and Diana Gialo (editorial assistant) from Wiley for their guidance and help in finalizing this book. Finally, we would like to thank our families for their support, patience, and love. Without the collective effort of all of the above mentioned individuals, this book might be still in preparation. We hope that our readers will enjoy reading this book and give us feedback for future improvements.

Yi Pan

Department of Computer Science, Georgia State University, Atlanta, Georgia, USAEmail: [email protected]

Jianxin Wang

School of Information Science and Engineering, Central South University, Changsha, ChinaEmail: [email protected]

Min Li

School of Information Science and Engineering, State Key Laboratory of Medical Genetics, Central South University, Changsha, ChinaEmail: [email protected]

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.121.79