Data Analysis with R


"I believe in the power of shared data and technology to help build a better future."

                                                                                                                          – Paul Allen

Most of this book deals with data analysis with R. This chapter is intended to provide an overview of what data analysis means and what the optimal methods of analysis are. In other words, it provides a holistic overview of how to understand the characteristics of a dataset and how to visualize the information at a glance before pursuing more in-depth analytical methods.

When you first receive a dataset for analysis, it is helpful to get a sense of the high-level characteristics of the data. This generally means performing basic summary operations and thereafter visualizing the information to build an overall notion of important variables, their distribution, cardinality, and various other aspects. While there are tools that claim to automatically provide insight from data, a certain level of domain expertise is needed in order to derive meaningful and useful information from the data.

For instance, time series datasets can be very difficult to analyze in a purely automated manner. The dataset could have missing or incorrect values. There could be embedded characters, for instance, a US Dollar sign in a certain column that does not get interpreted as a numeric column as a result.

In this section, we will explore some of the ways in which we can evaluate the quality and characteristics of various datasets in an effort to find insights from the data:

  • Preparing data for analysis
  • Data summary and distribution
  • Finding relationships in data
  • Selecting the right chart types and visualizations
  • Saving analysis for future work
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.128.199.138