Preface
The purpose of this book is to describe some modern approaches to analyzing and under-
standing the structure of datasets where their size outpaces the computing resources needed
to analyze them using traditional approaches. This problem domain has become increas-
ingly common with the proliferation of electronic devices that collect increasing amounts
of data along with inexpensive means to store information. As a result, researchers have
had to reimagine how we make sense of data; to identify fundamental concepts in statis-
tics and machine learning and translate them to the new problem domain; and, in some
cases, to come up with entirely new approaches. The result has been an explosion in new
methodological and technological tools for what is colloquially known as Big Data.Wehave
carefully selected representative methods and techniques to provide the reader not only
with a wide-ranging view of this landscape but also the fundamental concepts needed to
tackle new challenges. Our intent is that this book will serve to quickly orient the reader
and provide a working understanding of key statistical and computing ideas that can be
readily applied for applications and research.
A tremendous amount of research and progress in the broadly understood area of Big
Data has been made over the past few decades. The data management community in par-
ticular has made important contributions to the issue of how to effectively store and access
data. This book acknowledges these new computational capabilities and attempts to address
the question, “given that we can access and store a vast, potentially complex dataset, how
can we understand the statistical relationships within it?”
With this in mind, we have set out to accomplish three distinct tasks. First, to identify
modern, scalable approaches to analyzing increasingly large datasets. By representing the
state of the art in this centralized resource, we hope to help researchers understand the
landscape of available tools and techniques as well as providing the fundamental concepts
that make them work. The second task is to help identify areas of research that need further
development. The practice is still evolving, and rigorous approaches to understanding the
statistics of very large datasets while integrating computational, methodological, and theo-
retical developments are still being formalized. This book helps to identify gaps and explore
new avenues of research. The third goal is to integrate current techniques across disciplines.
We have already begun to see Big Data sub-specialties such as genomics, computational bi-
ology, search, and even finance, ignoring inter-community advances, both in computational
statistics and machine learning. We hope that this book will encourage greater communi-
cation and collaboration between these sub-specialties and will result in a more integrated
community.
In designing this book, we have tried to strike the right balance not only between sta-
tistical methodology, theory, and applications in computer science but also between the
breadth of topics and the depth to which each topic is explored. Each chapter is designed to
be self-contained and easily digestible. We hope it serves as a useful resource for seasoned
practitioners and enthusiastic neophytes alike.
xi
xii Preface
This project could not have been completed without the encouragement, guidance, and
patience of John Kimmel of Chapman & Hall/CRC Press. We express here our gratitude
and thanks.
Peter B¨uhlmann
ETH Z¨urich
Petros Drineas
Rensselaer Polytechnic Institute
Michael Kane
Yale University
Mark van der Laan
University of California, Berkeley
MATLAB
R
is a registered trademark of The MathWorks, Inc. For product information,
please contact:
The MathWorks, Inc.
3 Apple Hill Drive
Natick, MA 01760-2098 USA
Tel: 508-647-7000
Fax: 508-647-7001
E-mail: info@mathworks.com
Web: www.mathworks.com
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.57.126