Getting started with EDA

As mentioned earlier, we are going to use Python as the main tool for data analysis. Yay! Well, if you ask me why, Python has been consistently ranked among the top 10 programming languages and is widely adopted for data analysis and data mining by data science experts. In this book, we assume you have a working knowledge of Python. If you are not familiar with Python, it's probably too early to get started with data analysis. I assume you are familiar with the following Python tools and packages:

Python programming	Fundamental concepts of variables, string, and data types Conditionals and functions Sequences, collections, and iterations Working with files Object-oriented programming
NumPy	Create arrays with NumPy, copy arrays, and divide arrays Perform different operations on NumPy arrays Understand array selections, advanced indexing, and expanding Working with multi-dimensional arrays Linear algebraic functions and built-in NumPy functions
pandas	Understand and create `DataFrame` objects Subsetting data and indexing data Arithmetic functions, and mapping with pandas Managing index Building style for visual analysis
Matplotlib	Loading linear datasets Adjusting axes, grids, labels, titles, and legends Saving plots
SciPy	Importing the package Using statistical packages from SciPy Performing descriptive statistics Inference and data analysis

Before diving into details about analysis, we need to make sure we are on the same page. Let's go through the checklist and verify that you meet all of the prerequisites to get the best out of this book:

Setting up a virtual environment	> pip install virtualenv > virtualenv Local_Version_Directory -p Python_System_Directory
Reading/writing to files	filename = "datamining.txt" file = open(filename, mode="r", encoding='utf-8') for line in file: lines = file.readlines() print(lines) file.close()
Error handling	try: Value = int(input("Type a number between 47 and 100:")) except ValueError: print("You must type a number between 47 and 100!") else: if (Value > 47) and (Value <= 100): print("You typed: ", Value) else: print("The value you typed is incorrect!")
Object-oriented concept	class Disease: def __init__(self, disease = 'Depression'): self.type = disease def getName(self): print("Mental Health Diseases: {0}".format(self.type)) d1 = Disease('Social Anxiety Disorder') d1.getName()

Next, let's look at the basic operations of EDA using the NumPy library.

Table of Contents for Getting started with EDA

Create new playlist

Sign In

Sign Up

Table of Contents for
Getting started with EDA