Importing a file to Python is important to learning how to manage datasets. In this chapter we examine the basics. We can import many types of data to Python: from the most canonical format (.csv) or Excel data formats, to text formats for text mining, and to binary files such as images, video and audio. First, let’s look at some basic ways to import files. Sometimes the process for doing so may seem a bit tricky. The pandas package, which we examine in Chapter 8, makes importing datasets for analysis much easier.
Read only (“r”)
Write (“w”)
Append some text at the end of the document (“a”)
Read and write (“r+”)
Modes for Opening a File
Mode | Description |
---|---|
‘r’ | Read only, default mode |
‘rb’ | Read only in binary format |
‘r+’ | Read and write |
‘rb+’ | Read and write in binary format |
‘w’ | Write |
‘wb’ | Write in binary format only. Overwrites an existing file. If the file does not exist, a new one is created. |
‘w+’ | Read and write. Overwrites an existing file. If the file does not exist, a new one is created. |
‘wb+’ | Read and write in binary format. Overwrites an existing file. If the file does not exist, a new one is created. |
‘a’ | Adds to an existing file without overwriting. If the file does not exist, a new one is created. |
‘ab’ | Adds to an existing file or creates a new binary file |
‘a+’ | Reads, adds, and overwrites a new file (or creates a new one) |
‘ab+’ | Reads and adds in binary format; overwrites a new file or creates a new one |
.csv Format
From the Web
In JSON
In Chapter 8, we learn how to use pandas to create and export data frames in JSON.
Other Formats
lxml —particularly the objectify module—allows you to import files into XML.
SQLite3 allows you to import SQL databases.
PyMongo allows you to manage Mongo databases
feedparser allows you to process feeds in many formats, including RSS
xlrd allows you to import files into Excel (note, however, that pandas is much easier to use)
Summary
Importing a file and a dataset in multiple formats is one of the most important things in data analysis. The procedures described in this chapter are important because we don’t need a library to import files and data. We can use the basic functions in Python.