Preface

This book is about a strategic and tactical approach to data analysis where providing added value by turning numbers into insights is the main goal of an empirical study. In our long‐time experience as applied statisticians and data mining researchers (“data scientists”), we focused on developing methods for data analysis and applying them to real problems. Our experience has been, however, that data analysis is part of a bigger process that begins with problem elicitation that consists of defining unstructured problems and ends with decisions on action items and interventions that reflect on the true impact of a study.

In 2006, the first author published a paper on the statistical education bias where, typically, in courses on statistics and data analytics, only statistical methods are taught, without reference to the statistical analysis process (Kenett and Thyregod, 2006).

In 2010, the second author published a paper showing the differences between statistical modeling aimed at prediction goals versus modeling designed to explain causal effects (Shmueli, 2010), the implication being that the goal of a study should affect the way a study is performed, from data collection to data pre‐processing, exploration, modeling, validation, and deployment. A related paper (Shmueli and Koppius, 2011) focused on the role of predictive analytics in theory building and scientific development in the explanatory‐dominated social sciences and management research fields.

In 2014, we published “On Information Quality” (Kenett and Shmueli, 2014), a paper designed to lay out the foundation for a holistic approach to data analysis (using statistical modeling, data mining approaches, or any other data analysis methods) by structuring the main ingredients of what turns numbers into information. We called the approach information quality (InfoQ) and identified four InfoQ components and eight InfoQ dimensions.

Our main thesis is that data analysis, and especially the fields of statistics and data science, need to adapt to modern challenges and technologies by developing structured methods that provide a broad life cycle view, that is, from numbers to insights. This life cycle view needs to be focused on generating InfoQ as a key objective (for more on this see Kenett, 2015).

This book, Information Quality: The Potential of Data and Analytics to Generate Knowledge, offers an extensive treatment of InfoQ and the InfoQ framework. It is aimed at motivating researchers to further develop InfoQ elements and at students in programs that teach them how to make sure their analytic or statistical work is generating information of high quality.

Addressing this mixed community has been a challenge. On the one hand, we wanted to provide academic considerations, and on the other hand, we wanted to present examples and cases that motivate students and practitioners and give them guidance in their own specific projects.

We try to achieve this mix of objectives by combining Part I, which is mostly methodological, with Part II which is based on examples and case studies.

In Part III, we treat additional topics relevant to InfoQ such as reproducible research, the review of scientific and applied research publications, the incorporation of InfoQ in academic and professional development programs, and how three leading software platforms, R, MINITAB, and JMP support InfoQ implementations.

Researchers interested in applied statistics methods and strategies will most likely start in Part I and then move to Part II to see illustrations of the InfoQ framework applied in different domains. Practitioners and students learning how to turn numbers into information can start in a relevant chapter of Part II and move back to Part I.

A teacher or designer of a course on data analytics, applied statistics, or data science can build on examples in Part II and consolidate the approach by covering Chapter 13 and the chapters in Part I. Chapter 13 on “Integrating InfoQ into data science analytics programs, research methods courses and more” was specially prepared for this audience. We also developed five case studies that can be used by teachers as a rating‐based InfoQ assessment exercise (available at http://infoq.galitshmueli.com/class‐assignment).

In developing InfoQ, we received generous inputs from many people. In particular, we would like to acknowledge insightful comments by Sir David Cox, Shelley Zacks, Benny Kedem, Shirley Coleman, David Banks, Bill Woodall, Ron Snee, Peter Bruce, Shawndra Hill, Christine Anderson Cook, Ray Chambers, Fritz Sheuren, Ernest Foreman, Philip Stark, and David Steinberg. The motivation to apply InfoQ to the review of papers (Chapter 12) came from a comment by Ross Sparks who wrote to us: “I really like your framework for evaluating information quality and I have started to use it to assess papers that I am asked to review. Particularly applied papers.” In preparing the material, we benefited from comprehensive editorial inputs by Raquelle Azran and Noa Shmueli who generously provided us their invaluable expertise—we would like to thank them and recognize their help in improving the text language and style.

The last three chapters were contributed by colleagues. They create a bridge between theory and practice showing how InfoQ is supported by R, MINITAB, and JMP. We thank the authors of these chapters, Silvia Salini, Federica Cugnata, Elena Siletti, Ian Cox, Pere Grima, Lluis Marco‐Almagro, and Xavier Tort‐Martorell, for their effort, which helped make this work both theoretical and practical.

We are especially thankful to Professor David J. Hand for preparing the foreword of the book. David has been a source of inspiration to us for many years and his contribution highlights the key parts of our work.

In the course of writing this book and developing the InfoQ framework, the first author benefited from numerous discussions with colleagues at the University of Turin, in particular with a great visionary of the role of applied statistics in modern business and industry, the late Professor Roberto Corradetti. Roberto has been a close friend and has greatly influenced this work by continuously emphasizing the need for statistical work to be appreciated by its customers in business and industry. In addition, the financial support of the Diego de Castro Foundation that he managed has provided the time to work in a stimulating academic environment at both the Faculty of Economics and the “Giuseppe Peano” Department of Mathematics of UNITO, the University of Turin. The contributions of Roberto Corradetti cannot be underestimated and are humbly acknowledged. Roberto passed away in June 2015 and left behind a great void. The second author thanks participants of the 2015 Statistical Challenges in eCommerce Research Symposium, where she presented the keynote address on InfoQ, for their feedback and enthusiasm regarding the importance of the InfoQ framework to current social science and management research.

Finally we acknowledge with pleasure the professional help of the Wiley personnel including Heather Kay, Alison Oliver and Adalfin Jayasingh and thank them for their encouragements, comments, and input that were instrumental in improving the form and content of the book.

Ron S. Kenett and Galit Shmueli

References

  1. Kenett, R.S. (2015) Statistics: a life cycle view (with discussion). Quality Engineering, 27(1), pp. 111–129.
  2. Kenett, R.S. and Shmueli, G. (2014) On information quality (with discussion). Journal of the Royal Statistical Society, Series A, 177(1), pp. 3–38.
  3. Kenett, R.S. and Thyregod, P. (2006) Aspects of statistical consulting not taught by academia. Statistica Neerlandica, 60(3), pp. 396–412.
  4. Shmueli, G. (2010) To explain or to predict? Statistical Science, 25, pp. 289–310.
  5. Shmueli, G. and Koppius, O.R. (2011) Predictive analytics in information systems research. MIS Quarterly, 35(3), pp. 553–572.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.119.199