© K. Mohaideen Abdul Kadhar and G. Anand 2021
K. M. Abdul Kadhar, G. AnandData Science with Raspberry Pihttps://doi.org/10.1007/978-1-4842-6825-4_2

2. Basics of Python Programming

K. Mohaideen Abdul Kadhar1   and G. Anand1
(1)
Pollachi, Tamil Nadu, India
 

Python is a general-purpose dynamic programming language that was created by Dutch programmer Guido van Rossum in 1989. It is the most commonly used programming language in the field of data science. Since it is easier to learn and write code in Python than other languages, it is an optimal choice for beginners. The widespread use of Python is also attributed to the fact that it is free and open source. The number of scientific libraries and packages developed by the Python community allows for data scientists to work with data-intensive real-time applications. Some of the leading organizations such as Google, Dropbox, and Netflix are using Python at various levels to enhance their software. In this chapter, we will discuss Python installation on the Windows operating system, different Python IDEs, the fundamental data types available with Python, control flow statements, Python functions, and different Python libraries for data science.

Why Python?

Python is the most preferred programming language for data scientists because of the following reasons:
  • It is an open source programming language with a strong and growing community of contributors and users.

  • It has a simpler syntax than other programming languages such as C, C++, and Java.

  • It allows users to perform object-oriented programming.

  • It has a large set of libraries that can be used to perform a variety of tasks such as developing a website, building machine learning applications, etc.

  • It can be used in embedded, small hardware devices like the Raspberry Pi that allows for real-time implementation of various applications.

Python Installation

Most distributions of the Linux operating system come with the preloaded Python package, but it has to be installed separately in the case of Windows operating system. The procedure to install Python on the Windows operating system is as follows:
  1. 1.

    Open a browser and go to Python.org, the official site for Python.

     
  2. 2.

    On that page, click the Downloads tab and download the latest version of the software on the resulting page.

     
  3. 3.

    Once the download is complete, open the installer package. In the installation wizard, shown in Figure 2-1, select Add Python to PATH, which will ensure that Python is added automatically to your system variable path; otherwise, this path must be added manually in the Environment Variables settings in your system.

     
  4. 4.

    Click Install Now to install the package.

     
../images/496535_1_En_2_Chapter/496535_1_En_2_Fig1_HTML.jpg
Figure 2-1

Installation wizard for Python

After the installation is completed, you can verify the installation by typing python --version at the command prompt, which will display the version of Python installed on the system. If it does not show the version, then there could be a problem either with the installation or with the system path variable.

Refer to the Python documentation available on the official site to understand the procedure for downloading additional modules and packages for the software. Either you can start working with Python at the command prompt itself or you can install one among the various IDEs that are discussed in the next section.

Python IDEs

An integrated development environment (IDE) is a software suite that combines developer tools into a graphical user interface (GUI), which includes options for editing code and building, executing, and debugging programs. A number of IDEs are available for Python, each of which comes with its own advantages. Some of the commonly used IDEs are discussed here.

PyCharm

The PyCharm IDE was developed by the Czech company JetBrains. It is a cross-platform IDE that can be used on Windows, macOS, and Linux. It provides code analysis and a graphical debugger. It also supports web development with Django as well as data science with Anaconda. Some of the attractive features of PyCharm are the intelligent code completion, a simple package management interface, and the refactoring option, which provides the ability to make changes across multiple lines in a code.

Spyder

Spyder is a cross-platform IDE for scientific programming in the Python language. Spyder integrates with a number of scientific packages including NumPy, SciPy, Matplotlib, Pandas, IPython, and other open source software. It was released under the MIT license.

Jupyter Notebook

Jupyter Notebook is a web-based interactive computational environment. This notebook integrates code and its output in a single document that combines visualizations, text, mathematical equations, and other media thereby making it suitable for data science applications.

Python Programming with IDLE

IDLE is a simple cross-platform IDE suitable for beginners in an educational environment. It comes with features such as a multiwindow text editor, a Python shell with syntax highlighting, and an integrated debugger. Since this is a default editor that comes with Python, let’s see how to execute Python code using IDLE.

There are two ways of executing the Python code in this IDLE. The first way is the interactive mode in which you can directly type the code next to the symbol >>> in the Python shell, as illustrated in Figure 2-2. Each line of code will be executed once you press Enter. The disadvantage of using the interactive mode is that when you save the code, it is saved along with the results, and this implies that you cannot use the saved code for execution later.
../images/496535_1_En_2_Chapter/496535_1_En_2_Fig2_HTML.jpg
Figure 2-2

Running Python code in interactive mode

The second way is to run the code in script mode where you can open a script window and type the entire code there, which can then be saved with a .py extension to be used later. To open a script file window, go to the File menu at the top and click New File. In the script window, type the same two lines of code, shown in Figure 2-2. Figure 2-3 shows the script file window with the code. Then go to the File menu, click Save, and then save the program by specifying a proper filename. Ensure that the filename does not start with a number or have the same name as existing Python keywords.
../images/496535_1_En_2_Chapter/496535_1_En_2_Fig3_HTML.jpg
Figure 2-3

Script file window

Once the file is saved, the script can be executed by going to the Run menu at the top and clicking Run Module. This will execute the script and print the output in the Python shell, as shown in Figure 2-4.
../images/496535_1_En_2_Chapter/496535_1_En_2_Fig4_HTML.jpg
Figure 2-4

Output of the script file

Python Comments

Before we start to discuss the Python data types, it is essential to know about comment lines in Python as we will be using them often in our code. There are two ways to write comment lines based on the purpose of your comment.

If you intend to write a short comment, regarding a particular line in the code, for yourself, then single-line comments are the best choice. These single-line comments can be created by simply beginning the line with a hash (#) character, and they are terminated automatically by the end of the line. While executing the code, the Python compiler will ignore everything after the hash symbol up to the end of the line.

Multiple-line comments are intended to explain a particular aspect of your code to others and can be created by adding three single quotes (''') at the beginning and end of the comment. The Python compiler will not ignore these comments, and they will appear in the output if your script has nothing else other than the comment. These two comments are illustrated using the IDLE Python shell format, as shown here:
>>> # This is a comment
>>> "'This is a comment"'
'This is a comment'

Python Data Types

A data type, in a programming language, is defined by the type of value that a variable can take. Python data types can be primarily classified into numeric and sequence data types. The data types that fall under these two categories are discussed in this section with relevant illustrations for each.

Numeric Data Types

Numeric data types are scalar variables that can take numeric values. The categories of numeric data types are int, float, and complex. In addition, we will discuss the bool data type that uses Boolean variables.

int

The int data type represents integers that are signed whole numbers without a decimal point. The code in Listing 2-1 displays the data type of an integer.
a=5
"'print the data type of variable a using type() funcion"'
print("a is of type",type(a))
Output:
a is of type <class 'int'>
Listing 2-1

Integer Data Type

float

The float data type represents floating-point numbers with a decimal point separating the integer and fractional parts. The code in Listing 2-2 prints the data type of a float value.

a = 5.0
print('a is of type',type(a))
Output:
a is of type <class 'float'>
Listing 2-2

float Data Type

complex

The complex data type represents complex numbers of the form a+bj where a and b are the real part and imaginary part, respectively. The numbers a and b may be either integers or floating-point numbers. The code in Listing 2-3 prints the data type of a complex number.

a=3.5+4j
print('a is of type',type(a))
Output :
 a is of type <class 'complex'>
Listing 2-3

complex Data Type

bool

In Python, Boolean variables are defined by True and False keywords. As Python is case sensitive, the keywords True and False must have an uppercase first letter. Listing 2-4 illustrates the bool data type.

a= 8>9
print('a is of type',type(a))
print(a)
Output:
a is of type <class 'bool'>
False
Listing 2-4

bool Data Type

Boolean values can be manipulated with Boolean operators, which include and, or, and not, as illustrated in Listing 2-5.

a = True
b = False
print(a or b)
Output:
True
Listing 2-5

Manipulation of boolean Data Type

Numeric Operators

Table 2-1 summarizes the numeric operations available in Python that can be applied to the numeric data types.
Table 2-1

Numeric Operators in Python

Operator

Operation

( )

Parentheses

**

Exponentiation

*

Multiplication

/

Division

+

Addition

-

Subtraction

%

Modulo operation

The operators in Table 2-1 are listed in their order of precedence. When more than one operation is performed in a particular line of your code, the order of execution will be according to the order of precedence in Table 2-1. Consider the example 2*3+5 where both multiplication and addition are involved. Since multiplication has higher precedence than addition, as observed from Table 2-1, the multiplication operator (*) will be executed first giving 2*3=6, followed by the addition operator (+), which would give the final result of 6+5=11.

Sequence Data Types

Sequence data types allow multiple values to be stored in a variable. The five categories of sequence data types are list, tuple, str, set, and dict.

list

Lists are the most commonly used data type in Python by data scientists. A list is an ordered sequence of elements. The elements in the list need not be of the same data type. A list can be declared as items separated by commas enclosed within square brackets, []. Lists are mutable; i.e., the value of the elements in the list can be changed. The elements in the list are indexed starting from zero, and hence any element in the list can be accessed by its corresponding index, as illustrated in Listing 2-6. The index should be integers, and using any other data type for index will result in TypeError. Similarly, trying to access an index outside the range of the list will result in IndexError.
a = [1, 2.5, 5, 3+4j, 3, -2]
print("a is of type",type(a))
"'print the first value in the list"'
print("a[0]=",a[0])
"'print the third value in the list"'
print("a[2]=",a[2])
"' print the values from index 0 to 2"'
print("a[0:3]=",a[0:3])
"'print the values from index 4 till the end of the list"'
print("a[4:]=",a[4:])
"'Change the value at the index 3 to 4"'
a[3]=4
print("a=",a)
"'fractional index leads to TypeError"'
print(a[1.5])
"out of range index leads to IndexError"'
print(a[8])
Output of line 2: a is of type <class 'list'>
Output of line 4: a[0]= 1
Output of line 6: a[2]= 5
Output of line 8: a[0:3]= [1, 2.5, 5]
Output of line 10: a[4:]= [3, -2]
Output of line 13: a= [1, 2.5, 5, 4, 3, -2]
Otuput of line 15: TypeError: list indices must be integers or slices, not float
Output of line 17: IndexError: list index out of range
Listing 2-6

Operations in a List

Consider two lists stored in the variables a and b, respectively. Table 2-2 shows some additional operations provided by Python that can be performed on the lists a and b. Some of these functions apply to tuples, strings, and sets as well.
Table 2-2

List Operations in Python

Function

Description

a+b

Concatenates the two lists a and b

a*n

Repeats the list a by n times where n is an integer

len(a)

Computes the number of elements in list a

a.append()

Adds an element to the end of list a

a.remove()

Removes an item from list a

a.pop()

Removes and returns an element at the given index in list a

a.index()

Returns the index of the first matched item in list a

a.count()

Returns the count of number of items passed as an argument in list a

a.sort()

Sorts items in list a in ascending order

a.reverse()

Reverses the order of items in list a

tuple

A tuple is also an ordered sequence of elements like a list, but the difference is that the tuples are immutable; i.e., the values in a tuple cannot be changed. Trying to change the value of an element in a tuple will result in TypeError. By storing data that doesn’t change as tuples, it can be ensured that they remain write-protected. Tuples can be declared as items separated by commas enclosed within parentheses, (). Tuples can also be indexed in the same way as lists, as described in Listing 2-7.
a = (1, 3, -2, 4, 6)
print("a is of type",type(a))
print("a[3]=",a[3])
a[2] = 5
Output of line 2: a is of type <class 'tuple'>
Output of line 3: a[3]= 4
Output of line 4: TypeError: 'tuple' object does not support item assignment
Listing 2-7

Operations in a Tuple

str

The str data type represents a string of characters. The string can be declared as characters enclosed within double quotes (" "). Single quotes (' ') can also be used, but since they appear as apostrophes in some words, using double quotes can avoid confusion. The characters in a string are indexed in the same way as list and tuples. The space between two words in a string is also treated as a character. Like tuples, strings are also immutable and described in Listing 2-8.
a = "Hello World!"
print("a is of type",type(a))
print("a[3:7]=",a[3:7]
a[2] = "r"
Output of line 2: a is of type <class 'str'>
Output of line 3: a[3:7]= lo W
Output of line 4: TypeError: 'str' object does not support item assignment
Listing 2-8

Operations in a String

set

A set is an unordered collection of items and hence does not support indexing. A set is defined by values separated by commas inside set braces, {}. A set can be used for removing duplicates from a sequence. Listing 2-9 shows the operations in a set.
a = {1, 2, 3, 2, 4, 1, 3}
print("a is of type",type(a))
print("a=",a)
Output of line 2: a is of type <class 'set'>
Output of line 3: a= {1, 2, 3, 4}
Listing 2-9

Operations in a Set

Consider two sets stored in variables a and b, respectively. Table 2-3 illustrates the various set operations supported by Python that can be applied on these two sets.
Table 2-3

Set Operations in Python

Function

Description

a.union(b)

Returns the union of the two sets a and b in a new set

a.difference(b)

Returns the difference of two sets a and b as a new set

a.intersection(b)

Returns the intersection of the two sets a and b as a new set

a.isdisjoint(b)

Returns True if the two sets a and b have a null intersection

a.issubset(b)

Returns True if a is a subset of b; i.e., all elements of set a are present in set b

a.symmetric_difference(b)

Returns the symmetric difference between the two sets a and b as a new set

dict

A dict represents the dictionary data type, which is an unordered collection of data represented as key-value pairs. Dictionaries can be defined within set braces, {}, with each item being a pair in the form {key:value}. Dictionaries are optimized for retrieving data where a particular value in the dictionary can be retrieved by using its corresponding key. In other words, the key acts as the index for that value. The key and value can be of any data type. The keys are generally immutable and cannot be duplicated in a dictionary, whereas the values may have duplicate entries. Trying to access a key that is not present in the dictionary will result in KeyError, as described in Listing 2-10.
a = {1: 'Hello', 4: 3.6}
print("a is of type", type(a))
print(a[4])
print(a[2])
Output of line 2: a is of type <class 'dict'>
Output of line 3: 3.6
Output of line 4: KeyError: 2
Listing 2-10

Operations in a Dictionary

Type Conversion

Type conversion is the process of converting the value of any data type to another data type. The functions provided by Python for type conversion are listed here:
  • int(): Changes any data type to the int data type

  • float(): Changes any data type to the float data type

  • tuple(): Changes any data type to a tuple

  • list(): Changes any data type to a list

  • set(): Changes any data type to a set

  • dict(): Changes any data type to a dictionary

Listing 2-11 illustrates some of these functions.
a = 2
print(a)
float(a)
b = [2 , 3, -1, 2, 4, 3]
print(tuple(b))
print(set(b))
Output of line 2: 2.0
Output of line 4: (2, 3, -1, 2, 4, 3)
Output of line 5: (2, 3, 4, -1)
Listing 2-11

Type Conversion Operations

Control Flow Statements

Control flow statements allow for the execution of a statement or a group of statements based on the value of an expression. The control flow statements can be classified into three categories: sequential control flow statements that execute the statements in the program in the order they appear, decision control flow statements that either execute or skip a block of statements based on whether a condition is True or False, and loop control flow statements that allow the execution of a block of statements multiple times until a terminate condition is met.

if Statement

The if control statement in the decision control flow statement category starts with the if keyword, followed by a conditional statement, and ends with a colon. The conditional statement evaluates a Boolean expression and only if the Boolean expression evaluates to True, then the body of statements in the if statement be executed. if block statements start with indentation, and the first statement without indentation marks the end. The syntax for the if statement is as follows, and Listing 2-12 shows how it works:
if <expression>:
     <statement(s)>
x = 12
y=8
if x > y:
   out = "x is greater than y"
   print(out)
Output: x is greater than y
Listing 2-12

if Statement Operations

if-else Statement
The if statement can be followed up by an optional else statement. If the Boolean expression corresponding to the conditional statement in the if statement is True, then the statements in the if block are executed, and the statements in the else block are executed if the Boolean expression is False. In other words, the if-else statement provides a two-way decision process. The syntax for the if-else statement is as follows:
if <expression>:
     <statement(s)>
else:
     <statement(s)>

Listing 2-13 shows the example code for the if-else statement.

x = 7
y=9
if x > y:
   out = "x is greater than y"
else:
   out = "x is less than y"
print(out)
Output:
x is less than y
Listing 2-13

if-else Statement Operations

if...elif...else statement
The if...elif...else statement can provide a multiway decision process. The keyword elif is the short form of else-if. The elif statement can be used along with the if statement if there is a need to select from several possible alternatives. The else statement will come last, acting as the default action. The following is the syntax for the if...elif...else statement, and Listing 2-14 shows the example code:
if <expression>:
     <statement(s)>
elif <expression>:
     <statement(s)>
elif <expression>:
     <statement(s)>
...
else:
     <statement(s)>
x = 4
y=4
if x > y:
   out = "x is greater than y"
elif x<y:
   out = "x is less than y"
else:
   out = "x is equal to y"
print(out)
Output:
x is equal to y
Listing 2-14

if...elif...else Statement Operations

while loop
The while and for loops are loop control flow statements. In a while loop, the Boolean expression in the conditional statement is evaluated. The block of statements in the while loop is executed only when the Boolean expression is True. Each repetition of the loop block is called an iteration of the loop. The Boolean expression in the while statement is checked after each iteration. The execution of the loop is continued until the expression becomes False, and the while loop exits at this point. The syntax for the while loop is as follows, and Listing 2-15 shows how it works:
while <expression>:
<statement(s)>
x=0
while x < 4:
      print("Hello World!")
      x=x+1
Output:
Hello World!
Hello World!
Hello World!
Hello World!
Listing 2-15

while Loop Operations

for loop
The for loop runs with an iteration variable that is incremented with each iteration, and this increment goes on until the variable reaches the end of the sequence on which the loop is operating. In each iteration, the items in the sequence corresponding to the location given by the iteration variable are taken, and the statements in the loop are executed with those items. The syntax for the for loop is as follows:
for <iteration_variable> in <sequence>:
    <statement(s)>

The range() function is useful in the for loop as it can generate a sequence of numbers that can be iterated using the for loop. The syntax for the range() function is range([start,] stop [,step]) where start indicates the beginning of the sequence (starting from zero if not specified), stop indicates the value up to which the numbers must be generated (not including the number itself), and step indicates the difference between every two consecutive numbers in the generated sequence. The start and step values are optional. The values generated by the range argument should always be integers. Listing 2-16 shows a for loop used to print the elements in a string one by one.

x = "Hello"
for i in x:
    print(i)
Output:
H
e
l
l
o
Listing 2-16

for Loop Operations

Listing 2-17 shows how to use the range() function to print a sequence of integers.

for i in range(4):
    print(i)
Output:
0
1
2
3
Listing 2-17

for Loop Operations with range Function

Exception Handling

Exceptions are nothing but errors detected during execution. When an exception occurs in a program, the execution is terminated and thereby interrupts the normal flow of the program. By means of exception handling, meaningful information about the error rather than the system-generated message can be provided to the user. Exceptions can be built-in or user-defined. User-defined exceptions are custom exceptions created by the user, which can be done using try...except statements, as shown in Listing 2-18.
while True:
try:
    n=int(input("Enter a number"))
print("The number you entered is",n)
     break
except ValueError:
    print("The number you entered is not
          the correct data type")
    print("Enter a different number")
Output:
Enter a number 5
The number you have entered is 5
Enter a number3.6
The number you entered is not the correct data type
Enter a different number
Listing 2-18

Exception Handling

In Listing 2-18, a ValueError exception occurs when a variable receives a value of an inappropriate data type. If no exception occurs, i.e., the number entered as input is an integer, then the except block is skipped, and only the try block is executed. If an exception occurs while entering a number of a different data type, then the rest of the statements in the try block are skipped, the except block is executed, and the program is returned to the try block.

Functions

Functions are fundamental blocks in the Python programming that can be used when a block of statements needs to be executed multiple times within a program. Functions can be created by grouping this block of statements and giving it a name so that the statements can be invoked at any part of the program simply by this name rather than repeating the entire block. Thus, functions can be used to reduce the size of the program by eliminating redundant code. The functions can be either built-in or user-defined.

The Python interpreter has a number of built-in functions some of which we have seen already such as print(), range(), len(), etc. On the other hand, Python enables users to define their own functions and use them as needed. The syntax for function definition is as follows:
def function_name(parameter1, ....  parameter n):
        statement(s)
The function name can have letters, numbers, or an underscore, but it cannot start with a number and should not have the same name as a keyword. Let’s consider a simple function that takes a single parameter as input and computes its square; see Listing 2-19.
def sq(a):
    b = a * a
    print(b)
sq(36)
Output:1296
Listing 2-19

Square Functions

Let’s see a slightly complicated function that computes the binary representation of a given decimal number.

As shown in Listing 2-20, the five lines of code required to compute the binary representation of a decimal number can be replaced by a single line using the user-defined function.
    import math as mt
    def dec2bin(a):
        b=' '
        while a!=0:
            b=b+str(a%2)#concatenation operation
            a=math.floor(a/2)
            return b[:-1]# reverse the string b
    print(int(dec2bin(19))
Output: 10011
Listing 2-20

Square Functions

Python Libraries for Data Science

The Python community is actively involved in the development of a number of toolboxes intended for various applications. Some of the toolboxes that are used mostly in data science applications are NumPy, SciPy, Pandas, and Scikit-Learn.

NumPy and SciPy for Scientific Computation

NumPy is a scientific computation package available with Python. NumPy provides support for multidimensional arrays, linear algebra functions, and matrices. NumPy array representations provide an effective data structure for data scientists. A NumPy array is called an ndarray, and it can be created using the array() function. Listing 2-21 illustrates how to create 1D and 2D arrays and how to index their elements.
'''import the NumPy library'''
import numpy as np
'''creates an 1D array'''
a=np.array([1,2,3,4])
'''print the data type of variable a'''
print(type(a))
'''creates a 2D array'''
a=np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(a)
'''print the dimension of the array'''
print(a.ndim)
'''print the number of rows and columns in the array'''
print(a.shape)
'''print the third element in the first row'''
print(a[0,2])
'''print the sliced matrix as per given index'''
print(a[0:2,1:3])
a=np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
'''reshape the 1 x 9 array into a 3 x 3 array'''
b=a.reshape(3,3))
print(b)
Output of line 6: <class 'numpy.ndarray'>
Output of line 9:
[[1 2 3 4]
 [5 6 7 8]]
Output of line 11: 2
Output of line 13: (2, 4)
Output of line 15:3
Output of line 17
[[2 3]
 [6 7]]
Output of line 21:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Listing 2-21

Array Using NumPy

The sum of elements in an array of any dimension can be computed using sum(). The sum can be computed either for the entire elements in the array or along one of the dimensions as illustrated in Listing 2-22 for the array b created earlier.
'''print the sum of elements in array b'''
print(b.sum())
'''print the sum of elements along each column'''
print(b.sum(axis=0))
'''print the sum of elements along each row'''
print(b.sum(axis=1))
Output:
Output of line 2: 45
Output of line 4: array([12,15,18])
Output of line 6: array([6, 15, 18])
Listing 2-22

Array Using NumPy

Another important operation with respect to arrays is the flattening of multidimensional arrays. This process is more common in many of the machine learning–based applications, and it can be done by using the flatten() function , as illustrated here:
b.flatten()
Output:
       array([1, 2, 3, 4, 5, 6, 7, 8, 9]
The flatten() function converts an array of any dimension into a single-dimensional array. This can be achieved using reshape() as well, but unlike the flatten() function, the size of the single-dimensional array has to be specified in that case. Table 2-4 describes some other array operations that may come in handy while working with data analysis applications.
Table 2-4

NumPy Functions for Data Analysis

Syntax

Description

np.ones()

Creates an array of ones in the dimension specified within the parentheses.

np.zeros()

Creates an array of zeros in the dimension specified within the parentheses.

np.flip(a,axis)

Reverses the array a along the given axis. If axis is not specified, the array is reversed along both dimensions.

np.concatenate(a,b,axis)

Concatenates two arrays a and b along the specified axis (=0 or 1 corresponding to vertical and horizontal direction).

np.split(a,n)

Splits the array a into n number of smaller arrays. Here n can be any positive integer.

np.where(a==n)

Gives the index values of the number n present in an array a.

np.sort(a,axis)

Sorts the numbers in an array a along the given axis.

np.random.randint(n,size)

Generates an array of the given size using integers ranging from 0 to the number n.

The SciPy ecosystem is a collection of open source software for scientific computation built on the NumPy extension of Python. It provides high-level commands for manipulating and visualizing data. Two major components of this ecosystem are the SciPy library, which is a collection of numerical algorithms and domain-specific toolboxes, and Matplotlib, which is a plotting package that provides 2D and 3D plotting. The following syntax can be used to import and use any function from a SciPy module in your code:
from scipy import some_module
some_module.some_function()
As per the official SciPy documentation, the library is organized into different subtypes covering different domains, as summarized in Table 2-5.
Table 2-5

Subpackages in SciPY

Subpackage

Description

cluster

Clustering algorithms

constants

Physical and mathematical constants

fftpack

Fast Fourier Transform routines

integrate

Integration and ordinary differential equation solvers

interpolate

Interpolation and smoothing splines

io

Input and output

linalg

Linear algebra

ndimage

N-dimensional image processing

odr

Orthogonal distance regression

optimize

Optimization and root-finding routines

signal

Signal processing

sparse

Sparse matrices and associated routines

spatial

Spatial data structures and algorithms

special

Special functions

stats

Statistical distributions and functions

Scikit-Learn for Machine Learning

Scikit-Learn is an open source machine learning library for Python programming that features various classification, regression, and clustering algorithms. It is designed to interoperate with other Python libraries like NumPy and SciPy.

Pandas for Data Analysis

Pandas is a fast and powerful open source library for data analysis and manipulation written for Python programming. It has a fast and efficient DataFrame object for data manipulation with integrated indexing. It has tools for reading and writing data between in-memory data structures and different file formats such as CSV, Microsoft Excel, etc. Consider a CSV file called data.csv containing the grades of three students in three subjects, as shown in Figure 2-5. Listing 2-23 shows the procedure for reading and accessing this data using Pandas.
../images/496535_1_En_2_Chapter/496535_1_En_2_Fig5_HTML.png
Figure 2-5

CSV file with grade data of students

import pandas as pd
'''reads the file data.csv with read_csv package and the header=None option allows pandas to assign default names to the colums
Consider the data in the above table is typed in a excel sheet and saved as csv file in the following path C:Python_bookdata.csv
'''
d = pd.read_csv("C:Python_bookdata.csv",header=None)
print(type(d))
print(d)
"'print the element common to row1-column2"'
print(d.loc[1,2])
"'print the elements common to rows 1,2 and
  columns 1,2"'
d.loc[1:2, 1:2]
Output of line 4:
<class 'pandas.core.frame.DataFrame'>
Output of line 5:
     0           1       2           3
0  Roll No    Science  Maths   English
1    RN001       70        76          85
2    RN002       86        98          88
3    RN003       76        65          74
Output of line 7: 76
Output of line 9:
    1      2
1   70   76
2   86   98
Listing 2-23

Data Modification Using Pandas Functions

Similarly, there are other read functions such as read_excel, read_sql, read_html, etc., to read files in other formats, and every one of these read functions comes with their corresponding write functions like to_csv, to_excel, to_sql, to_html, etc., that allows you to write the Pandas dataframe to different formats.

Most of the real-time data gathered from sensors is in the form of time-series data, which is a series of data indexed in time order. Let’s consider a dataset that consists of the minimum daily temperatures in degrees Celsius over 10 years (1981 to 1990) in Melbourne, Australia. The source of the data is the Australian Bureau of Meteorology. Even though this is also a CSV file, it is time-series data unlike the DataFrame in the previous illustration. Listing 2-24 shows the different ways to explore the time-series data.
Series=pd.read_csv('daily-min-
             temperatures.csv',header=0, index_col=0)
"'prints first 5 data from the top of the series"'
print(series.head(5))
"'prints the number of entries in the series"'
print(series.size)
print(series.describe())
"'describe() function creates 7 descriptive   statistics of the time series data including mean, standard deviation, median, minimum, and maximum of the observations"'
      Output of line 3:
   Date                  Temp
1981-01-01            20.7
1981-01-02            17.9
1981-01-03            18.8
1981-01-04            14.6
1981-01-05            15.8
Output of line 5: 3650
Output of line 6:
            Temp
count  3650.000000
mean     11.177753
std       4.071837
min       0.000000
25%       8.300000
50%      11.000000
75%      14.000000
max      26.300000
Listing 2-24

Data Modification in Pandas

TensorFlow for Machine Learning

TensorFlow is an end-to-end open source platform for machine learning created by the Google Brain team. TensorFlow has a slew of machine learning models and algorithms. It uses Python to provide a front-end API for building applications with the framework. Keras is a high-level neural network API that runs on top of TensorFlow. Keras allows for easy and fast prototyping and supports both convolutional networks and recurrent neural networks.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.215.183.194