Preface

Graphical Data Analysis is useful for data cleaning, exploring data structure, detecting outliers and unusual groups, identifying trends and clusters, spotting local patterns, evaluating modelling output, and presenting results. It is essential for exploratory data analysis and data mining. There are several fine books on graphics using R, such as “ggplot2” [Wickham, 2009], “Lattice” [Sarkar, 2008], and “R Graphics” [Murrell, 2011]). These books concentrate on how you draw graphics in R. This book concentrates on why you draw graphics and which graphics to draw (and uses R to do so).

The target readership includes anyone carrying out data analyses who wants to understand their data using graphics. The book can be used as the primary textbook for a course in Graphical Data Analysis or as an accompanying text for a statistics course. Prerequisites for the book are an interest in data analysis and some basic knowledge of R.

The main aim of the book is to show, using real datasets, what information graphical displays can reveal in data. Seeing graphics in action is the best way to learn Graphical Data Analysis. Gaining experience in interpreting graphics and drawing your own data displays is the most effective way forward.

The graphics shown in the book are a starting point. Sometimes more graphics could have been drawn, and alternative graphics could always have been drawn. Readers may have their own ideas of how best to present certain features of the datasets. Although each graphic reveals information contained in its dataset, it is likely that in every case there is more to be discovered. It is certainly one of the aims of each analysis to find out as much as possible about the data. The graphics are not drawn for their own sake, they are drawn to reveal and convey information.

A central idea underlying this book is that many graphics should be drawn. The aim should not have to be to draw a single graphic that summarises everything that can be said about the data. That is too difficult, if not impossible. The aim is to find a number of graphics, maybe even a large number of them, where each contributes something to the overall picture. Just as many photographs of the same object taken from different angles in different lights make it easier for us to grasp a whole object, datasets should be visualised in many different ways.

The emphasis is on exploring datasets first and on presenting results second. Graphical Data Analysis is about using graphics to find results. One way to think about this is to imagine you are looking at a new package in R and it uses a dataset you are not familiar with for the examples in the help. What does the dataset look like? How would you go about finding out what features it has, and how that might affect the use of the methods in the package? What information can you find graphically in the data that a modelling approach should also find? What graphical displays are there that help you understand the results of other people’s models, such as the examples given on the help page? This presupposes an active interest on the part of the reader. Roland Barthes, the French structuralist, referred to readerly texts and writerly texts. In a writerly text the reader takes an active role in the construction of meaning. I hope the readers of this book will take an active role in thinking about what graphics show, what information can be gleaned from them, and why they were chosen.

As every dataset used is available in R or one of its packages, information about them can usually be found on the relevant help page, including which variables of what types are involved and how big the dataset is. Ideally there should be a description of why and how it was collected, with references to original sources. Context is important for interpreting results and you have to know your dataset and its provenance. A well-developed sense of curiosity is very helpful in data analysis.

Graphical Data Analysis is an attractive way of working with data. It encourages you to look at many different aspects and to investigate in many different directions. You can be surprised by what you uncover and even by which graphic turns out to be most effective in revealing information. Your results are easy to show to others and are easy to discuss with others.

For any result found graphically, we should try to check what statistical support there is for it, just as we use graphics to review the results of our statistical modelling. Graphical Data Analysis and more traditional statistical approaches complement each other very well and we should take advantage of this.

Acknowledgements

No book on R should omit thanking Robert Gentleman, Ross Ihaka, and all the many R contributors. They have made analysis of data much easier for the rest of us. Thanks also to Hadley Wickham for all his R packages (sometimes referred to as the Hadleyverse), especially for ggplot2, and to Yihui Xie for knitr, a major help in keeping this book in order. Particular thanks are due to Bill Venables for his words of wisdom and for R advice and code. If any of the book’s code looks elegant, then it must be Bill’s, and if it looks clumsy, it is certainly mine.

Dennis Freuer, Urs Freund, Katrin Grimm, Harold Henderson, Ross Ihaka, Kary Myers, Alexander Pilhöfer, Maryann Pirie, Friedrich Pukelsheim, Christina Sanchez, Günther Sawitzki, Rolf Turner, Chris Wild, and Aisen Yang read one or more chapters and made many helpful suggestions for improvement, some of which I have been able to adopt. John Kimmel was an encouraging and efficient publisher, who organised several constructively critical reviewers, including Di Cook, Michael Friendly, and Ramnath Vaidyanathan. I would also like to thank the Statistics Department at the University of Auckland for a stimulating and sociable environment in which to work on this book during my sabbatical.

Finally I would like to thank my family for never asking me when the book would be finished and for many other kindnesses.

Augsburg, December 2014

Antony Unwin

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.103.96