Foreword by Jack Dongarra

It is apparent that in the era of Big Data, when every major field of science, engineering, business, and finance is producing and needs to (repeatedly) process truly extraordinary amounts of data, the many unsolved problems of when, where, and how all those data are to be produced, transformed, and analyzed have taken center stage. This book, Big Data: Algorithms, Analytics, and Applications, addresses and examines important areas such as management, processing, data stream techniques, privacy, and applications.

The collection presented in the book covers fundamental and realistic issues about Big Data, including efficient algorithmic methods to process data, better analytical strategies to digest data, and representative applications in diverse fields such as medicine, science, and engineering, seeking to bridge the gap between huge amounts of data and appropriate computational methods for scientific and social discovery, and to bring together technologies for media/data communication, elastic media/data storage, cross-network media/data fusion, SaaS, and others. It also aims at interesting applications related to Big Data.

This timely book edited by Kuan-Ching Li, Hai Jiang, Laurence T. Yang, and Alfredo Cuzzocrea gives a quick introduction to the understanding and use of Big Data and provides examples and insights into the process of handling and analyzing problems. It presents introductory concepts as well as complex issues. The book has five major sections: management, processing, streaming techniques and algorithms, privacy, and applications.

Throughout the book, examples and practical approaches for dealing with Big Data are provided to help reinforce the concepts that have been presented in the chapters.

This book is a required understanding for anyone working in a major field of science, engineering, business, and financing. It explores Big Data in depth and provides different angles on just how to approach Big Data and how they are currently being handled today.

I have enjoyed and learned from this book, and I feel confident that you will as well.

Jack Dongarra

University of Tennessee, Knoxville

Foreword by Dr. Yi Pan

In 1945, mathematician and physicist John von Neumann and his colleagues wrote an article, “First Draft of a Report on the EDVAC,” to describe their new machine EDVAC based on some ideas from J. Presper Eckert and John Mauchly. The proposed concept of stored-program computer, known as von Neumann machine, has been widely adopted in modern computer architectures. Both the program and data for an application are saved in computer memory for execution flexibility. Computers were not dedicated to single jobs anymore.

In 1965, Intel’s cofounder Gordon Moore proposed the so-called Moore’s law to predict that the number of transistors doubles every 18 months. Computer hardware roughly sticks to Moore’s law over these years. However, to ensure that the computability doubles every 18 months, computer hardware has to work with system software and application software closely. For each computer job, both computations and data aspects should be handled properly for better performance.

To scale up computing systems, multiple levels of parallelism, including instruction-level, thread-level, and task-level parallelism, have been exploited. Multicore and many-core processors as well as symmetric multiprocessing (SMP) have been used in hardware. Data parallelism is explored through byte-level parallelization, vectorization, and SIMD architectures. To tolerate the latency of data access in memory, memory hierarchy with multiple levels of cache is utilized to help overcome the memory wall issue.

To scale out, multicomputer systems such as computer clusters, peer-to-peer systems, grids, and clouds are adopted. Although the aggregated processing and data storage capabilities have been increased, the communication overhead and data locality remain as the major issues. Still, scaled-up and scaled-out systems have been deployed to support computer applications over the years.

Recently, Big Data has become a buzzword in the computer world, as the size of data increases exponentially and has reached the petabyte and terabyte ranges. The representative Big Data sets come from some fields in science, engineering, government, private sector, and daily life. Data digestion has taken over the previous data generation as the leading challenge for the current computer world. Several leading countries and their governments have noticed this new trend and released new policies for further investigation and research activities. As distributed systems and clouds become popular, compared to the traditional data in common applications, Big Data exhibit some distinguished characteristics, such as volume, variety, velocity, variability, veracity, value, and complexity. The manners for data collection, data storage, data representation, data fusion, data processing, and visualization have to be changed. Since modern computers still follow von Neumann architecture and other potential ones such as biocomputers and quantum computers are still in infancy, the algorithmic and software adjustments are still the main consideration for Big Data.

Li, Jiang, Yang, and Cuzocrea’s timely book addresses the Big Data issue. These editors are active researchers and have done a lot of work in the area of Big Data. They assembled a group of outstanding authors. The book’s content mainly focuses on algorithm, analytics, and application aspects in five separate sections: Big Data management, Big Data processing, Big Data stream techniques and algorithms, Big Data privacy, and Big Data applications. Each section contains several case studies to demonstrate how the related issues are addressed.

Several Big Data management strategies such as indexing and clustering are introduced to illustrate how to organize data sets. Some processing and scheduling schemes as well as representative frameworks such as MapReduce and Hadoop are also included to show how to speed up Big Data applications. Particularly, stream technique is explained in detail due to its importance in Big Data processing. Privacy is still a concern in Big Data and proper attention is necessary. Finally, the text includes several actual Big Data applications in finance, media, biometrics, geoscience, and the social sector. They help demonstrate how the Big Data issue has been addressed in various fields.

I hope that the publication of this book will help university students, researchers, and professionals understand the major issues in Big Data applications as well as the corresponding strategies to tackle them. This book might also stimulate research activities in all aspects of Big Data. As the size of data still increases dramatically every day, the Big Data issue might become obviously more challenging than people expect. However, this may also be a good opportunity for new discoveries in this exciting field. I highly recommend this timely and valuable book. I believe that it will benefit many readers and contribute to the further development of Big Data research.

Dr. Yi Pan

Distinguished University Professor of Computer Science

Interim Associate Dean of Arts and Science

Georgia State University, Atlanta

Foreword by D. Frank Hsu

Due to instrumentation and interconnection in the past decades, the Internet and the World Wide Web have transformed our society into a complex cyber–physical–social ecosystem where everyone is connected to everything (the Internet of Everything [IOE]) and everybody is an information user as well as an information provider. More recently, Big Data with high volume and wide variety have been generated at a rapid velocity. Everyone agrees that Big Data have great potential to transform the way human beings work, live, and behave. However, their true value to the individual, organization, society, planet Earth, and intergalactic universe hinges on understanding and solving fundamental and realistic issues about Big Data. These include efficient algorithms to process data, intelligent informatics to digest and analyze data, wide applications to academic disciplines in the arts and sciences, and operational practices in the public and private sectors. The current book, Big Data: Algorithms, Analytics, and Applications (BDA3), comes at the right time with the right purpose.

In his presentation to the Information and Telecommunication Board of the National Research Council (http://www.TheFourthParadigm.com), Jim Gray describes the long history of the scientific inquiry process and knowledge discovery method as having gone through three paradigms: from empirical (thousands of years ago), to theoretical (in the last few hundred years, since the 1600s), and then to modeling (in the last few decades). More recently, the complexity of Big Data (structure vs. unstructured, spatial vs. temporal, logical vs. cognitive, data-driven vs. hypothesis-driven, etc.) posts great challenges not only in application domains but also in analytic method. These domain applications include climate change, environmental issues, health, legal affairs, and other critical infrastructures such as banking, education, finance, energy, healthcare, information and communication technology, manufacturing, and transportation. In the scientific discovery process, efficient and effective methods for analyzing data are needed in a variety of disciplines, including STEM (astronomy, biomedicine, chemistry, ecology, engineering, geology, computer science, information technology, and mathematics) and other professional studies (architecture, business, design, education, journalism, law, medicine, etc.). The current book, BDA3, covers many of the application areas and academic disciplines.

Big Data phenomenon has altered the way we analyze data using statistics and computing. Traditional statistical problems tend to have many observations but few parameters with a small number of hypotheses. More recently, problems such as analyzing fMRI data set and social networks have diverted our attention to the situation with a small number of observations, a large number of variables (such as cues or features), and a relatively bigger number of hypotheses (also see D.J. Spiegelhalter, Science, 345: 264–265, 2014). In the Big Data environment, the use of statistical significance (the P value) may not always be appropriate. In analytics terms, correlation is not equivalent to causality and normal distribution may not be that normal. An ensemble of multiple models is often used to improve forecasting, prediction, or decision making. Traditional computing problems use static data base, take input from logical and structured data, and run deterministic algorithms. In the Big Data era, a relational data base has to be supplemented with other structures due to scalability issue. Moreover, input data are often unstructured and illogical (due to acquisition through cognition, speech, or perception). Due to rapid streaming of incoming data, it is necessary to bring computing to the data acquisition point. Intelligent informatics will use data mining and semisupervised machine learning techniques to deal with the uncertainty factor of the complex Big Data environment. The current book, BDA3, has included many of these data-centric methods and analyzing techniques.

The editors have assembled an impressive book consisting of 22 chapters, written by 57 authors, from 12 countries across America, Europe, and Asia. The chapters are properly divided into five sections on Big Data: management, processing, stream technologies and algorithms, privacy, and applications. Although the authors come from different disciplines and subfields, their journey is the same: to discover and analyze Big Data and to create value for them, for their organizations and society, and for the whole world. The chapters are well written by various authors who are active researchers or practical experts in the area related to or in Big Data. BDA3 will contribute tremendously to the emerging new paradigm (the fourth paradigm) of the scientific discovery process and will help generate many new research fields and disciplines such as those in computational x and x-informatics (x can be biology, neuroscience, social science, or history), as Jim Gray envisioned. On the other hand, it will stimulate technology innovation and possibly inspire entrepreneurship. In addition, it will have a great impact on cyber security, cloud computing, and mobility management for public and private sectors.

I would like to thank and congratulate the four editors of BDA3—Kuan-Ching Li, Hai Jiang, Laurence T. Yang, and Alfredo Cuzzocrea—for their energy and dedication in putting together this significant volume. In the Big Data era, many institutions and enterprises in the public and private sectors have launched their Big Data strategy and platform. The current book, BDA3, is different from those strategies and platforms and focuses on essential Big Data issues, such as management, processing, streaming technologies, privacy, and applications. This book has great potential to provide fundamental insight and privacy to individuals, long-lasting value to organizations, and security and sustainability to the cyber–physical–social ecosystem on the planet.

D. Frank Hsu

Fordham University, New York

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.28.108