Preface

Data has become increasingly important almost everywhere. It's been said that software is eating the world, but that seems even truer of data. Sometimes, it seems that the focus has shifted: companies no long seem to want more users in order to show them advertisements. Now they want more users to gather data on them. Having more data is seen as a tremendous business advantage.

However, data by itself isn't really useful. It has to be analyzed, interrogated, and interpreted. Data scientists are settling on a number of great tools to do this, from R and Python to Hadoop and the web browser.

This book looks at 10 data analysis tasks. Unlike Clojure Data Analysis Cookbook, Packt Publishing, this book examines fewer problems and tries to go into more depth. It's more of a case study approach.

Why use Clojure? Clojure was first released in 2007 by Rich Hickey. It's a member of the lisp family of languages, and it has the strengths and flexibility that they provide. It's also functional, so Clojure programs are easy for reasoning. Also, it has amazing features to work concurrently and in parallel. All of these can help us as we analyze data, while keeping things simple and fast.

Moreover, Clojure runs on Java Virtual Machine (JVM), so any libraries written for Java are available as well. Throughout this book, we'll see many examples of leveraging Java libraries for machine learning and other tasks. This gives Clojure an incredible amount of breadth and power.

I hope that this book will help you analyze your data further and in a better manner and also make the process more fun and enjoyable.

What this book covers

Chapter 1, Network Analysis – The Six Degrees of Kevin Bacon, will discuss how people are socially organized into networks. These networks are reified in interesting ways in online social networks. We'll take the opportunity to get a small dataset from an online social network and analyze and look at how people are related in it.

Chapter 2, GIS Analysis – Mapping Climate Change, will explore how we can work with geographical data. It also walks us through getting the weather data and tying it to a geographical location. It then involves analyzing nearby points together to generate a graphic of a simplified and somewhat naive notion of how climate has changed over the period the weather has been tracked.

Chapter 3, Topic Modeling – Changing Concerns in the State of the Union Addresses, will address how we can scrape free text information off the Internet. It then uses topic modeling to look at the problems that presidents have faced and the themes that they've addressed over the years.

Chapter 4, Classifying UFO Sightings, will take a look at UFO sightings and talk about different ways to explore and get a grasp of what's in the dataset. It will then classify the UFO sightings based on various attributes related to the sightings as well as their descriptions.

Chapter 5, Benford's Law – Detecting Natural Progressions of Numbers, will take a look at the world population data from the World Bank data site. It will discuss Benford's Law and how it can be used to determine whether a set of numbers is naturally generated or artificially or randomly constructed.

Chapter 6, Sentiment Analysis – Categorizing Hotel Reviews, will take a look at the problems and possibilities related to sentiment analysis tasks. These are typically difficult and fraught categorizations of documents based on a notion of positive or negative. In this chapter, we'll also take a look at categorizing, both manually and automatically, a dataset of hotel reviews.

Chapter 7, Null Hypothesis Tests – Analyzing Crime Data, will take a look at planning, constructing, and performing null-hypothesis tests for statistical significance. It will use international crime data to look at the relationship between economic indicators and some types of crime.

Chapter 8, A/B Testing – Statistical Experiments for the Web, will take a look at how to determine which version of a website engages with the users in a better way. Although conceptually simple, this task does have a few pitfalls and danger points to be aware of.

Chapter 9, Analyzing Social Data Participation, will take a look at how people participate in online social networks. We will discuss and demonstrate some ways to analyze this data with an eye toward encouraging more interaction, contributions, and participation.

Chapter 10, Modeling Stock Data, will take a look at how to work with time-series data, stock data, natural language, and neural networks in order to find relationships between news articles and fluctuations in stock prices.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.134.17