Getting started with EDA

As mentioned earlier, we are going to use Python as the main tool for data analysis. Yay! Well, if you ask me why, Python has been consistently ranked among the top 10 programming languages and is widely adopted for data analysis and data mining by data science experts. In this book, we assume you have a working knowledge of Python. If you are not familiar with Python, it's probably too early to get started with data analysis. I assume you are familiar with the following Python tools and packages:

Python programming

Fundamental concepts of variables, string, and data types

Conditionals and functions

Sequences, collections, and iterations

Working with files

Object-oriented programming

NumPy

Create arrays with NumPy, copy arrays, and divide arrays

Perform different operations on NumPy arrays

Understand array selections, advanced indexing, and expanding

Working with multi-dimensional arrays

Linear algebraic functions and built-in NumPy functions

pandas

Understand and create DataFrame objects

Subsetting data and indexing data 

Arithmetic functions, and mapping with pandas

Managing index

Building style for visual analysis

Matplotlib

Loading linear datasets

Adjusting axes, grids, labels, titles, and legends

Saving plots

SciPy

Importing the package

Using statistical packages from SciPy

Performing descriptive statistics

Inference and data analysis

 

Before diving into details about analysis, we need to make sure we are on the same page. Let's go through the checklist and verify that you meet all of the prerequisites to get the best out of this book:

Setting up a virtual environment

> pip install virtualenv
> virtualenv Local_Version_Directory -p Python_System_Directory

Reading/writing to files

filename = "datamining.txt" 
file = open(filename, mode="r", encoding='utf-8')
for line in file:
lines = file.readlines()
print(lines)
file.close()

Error handling

try:
Value = int(input("Type a number between 47 and 100:"))
except ValueError:
print("You must type a number between 47 and 100!")
else:
if (Value > 47) and (Value <= 100):
print("You typed: ", Value)
else:
print("The value you typed is incorrect!")

Object-oriented concept

class Disease:
def __init__(self, disease = 'Depression'):
self.type = disease

def getName(self):
print("Mental Health Diseases: {0}".format(self.type))

d1 = Disease('Social Anxiety Disorder')
d1.getName()

 

Next, let's look at the basic operations of EDA using the NumPy library.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.144.229