1
Setting Up Your Computer

In order to write code to work with data, you will need to use a number of different (free) software programs for writing, executing, and managing your code. This chapter details which software you will need and explains how to install those programs. While there are a variety of options for each task, we discuss software programs that are largely supported within the data science community, and whose popularity continues to grow.

It is an unfortunate reality that one of the most frustrating and confusing barriers to working with code is getting your machine properly set up. This chapter aims to provide sufficient information for setting up your machine and troubleshooting the installation process.

In short, you will need to install the following programs, each of which is described in detail in the following sections.

For Writing Code

There are two different programs that we suggest you use for writing code:

  • RStudio: An integrated development environment (IDE) for writing and executing R code. This will be your primary work environment for doing data science. You will also need to install the R software so that RStudio will be able to execute your code (discussed later in this section).

  • Atom: A lightweight text editor that supports programming in lots of different languages. (Other text editors will also work effectively; some further suggestions are included in this chapter.)

For Managing Code

To manage your code, you will need to install and set up the following programs:

  • git: An application used to track changes to your files (namely, your code). This is crucial for maintaining an organized project, and can help facilitate collaboration with other developers. This program is already installed on Macs.

  • GitHub: A web service for hosting code online. You don’t actually need to install anything (GitHub uses git), but you will need to create a free account on the GitHub website. The corresponding exercises for this book are hosted on GitHub.

For Executing Code

To provide instructions to your machine (i.e., run code), you will need to have an environment in which to provide those instructions, while also ensuring that your machine is able to understand the language in which you’re writing your code.

  • Bash shell: A command line interface for controlling your computer. This will provide you with a text-based interface you can use to work with your machine. Macs already have a Bash shell program called Terminal, which you can use “out of the box.” On Windows, installing git will also install an application called Git Bash, which you can use as your Bash shell.

  • R: A programming language commonly used for data science. This is the primary programming language used throughout this book. “Installing R” actually means downloading and installing tools that will let your computer understand and run R code.

The remainder of this chapter has additional information about the purpose of each software system, how to install it, and alternative configurations or options. The programs are described in the order they are introduced in the book (though in many cases, the software programs are used in tandem).

1.1 Setting up Command Line Tools

The command line provides a text-based interface for giving instructions to your computer (much more on this in Chapter 2). As you are getting started with data science, you will largely use the command line for navigating your computer’s file structure and executing commands that allow you to keep track of changes to the code you write (i.e., version control with git).

To use the command line, you will need to use a command shell (also called a command prompt or terminal). This computer program provides the interface in which you type commands. In particular, this book discusses the Bash shell, which provides a particular set of commands common to Mac and Linux machines.

1.1.1 Command Line on a Mac

On a Mac, you will want to use the built-in app called Terminal as your Bash shell. This application is part of the Mac operating system, so you don’t need to install anything. You can open Terminal by searching via Spotlight (press cmd+spacebar together, type in “terminal”, then select the app to open it), or by finding it in the Applications > Utilities folder. This will open your Terminal window, as described in Chapter 2.

1.1.2 Command Line on Windows

On Windows, we recommend using Git Bash as your Bash shell, which is installed along with git. Open this program to open the command shell. This works great, since you will primarily be using the command line for performing version control.

Alternatively, the 64-bit Windows 10 Anniversary Update (August 2016) includes a version of an integrated Bash shell. You can access this by enabling the subsystem for Linux1 and then running bash in the command prompt.

1Install the Windows subsystem for Linux: https://msdn.microsoft.com/en-us/commandline/wsl/install_guide

Caution

Windows includes its own command shell, called Command Prompt (previously DOS Prompt), but it has a different set of commands and features. If you try to use the commands described in Chapter 2 with DOS Prompt, they will not work. For a more advanced Windows Management Framework, you can look into using Powershell.a Because Bash is more common in open source programming like in this book, we will focus on that set of commands.

a https://docs.microsoft.com/en-us/powershell/scripting/getting-started/getting-started-with-windows-powershell

1.1.3 Command Line on Linux

Most Linux flavors come with a command shell pre-installed; for example, in Ubuntu you can use the Terminal application (use ctrl+alt+t to open it, or search for it from the Ubuntu dashboard).

1.2 Installing git

One of the most important aspects of doing data science is keeping track of the changes that you (and others) make to your code. git is a version control system that provides a set of commands that you can use to manage changes to written code, particularly when collaborating with other programmers (version control is described in more detail in Chapter 3).

git comes pre-installed on Macs, though it is possible that the first time you try to use the tool you will be prompted to install the Xcode command line developer tools via a dialog box. You can choose to install these tools, or download the latest version of git online.

On Windows, you will need to download2 the git software. Once you have downloaded the installer, double-click on your downloaded file, and follow the instructions to complete installation. This will also install a program called Git Bash, which provides a command line (text-based) interface for executing commands on your computer. See Section 1.1.2 for alternative and additional Windows command line tools.

On Linux, you can install git using apt-get or a similar command. For more information, see the download page for Linux.3

2git downloads: https://git-scm.com/downloads

3git download for Linux and Unix: https://git-scm.com/download/linux

1.3 Creating a GitHub Account

GitHub4 is a website that is used to store copies of computer code that are being managed with git. To use GitHub, you will need to create a free GitHub account.5 When you register, remember that your profile is public, and future collaborators or employers may review your GitHub account to assess your background and ongoing projects. Because GitHub leverages the git software package, you don’t need to install anything else on your machine to use GitHub.

4GitHub: https://github.com

5Join GitHub: https://github.com/join

1.4 Selecting a Text Editor

While you will be using RStudio to write R code, you will sometimes want to use another text editor that is more lightweight (e.g., runs faster), more robust, or supports a different programming language than R. A coding-focused text editor provides features such as automatic formatting and coloring for easier interpretation of the code, auto-completion, and integration with version control (features that are also available in RStudio).

Many different text editors are available, all of which have slightly different appearances and features. You only need to download and use one of the following programs (we recommend Atom as a default), but feel free to try out different ones until you find something you like (and then evangelize about it to your friends!).

Tip

Programming involves working with many different file types, each of which is indicated by its extension (the letters after the . in the file name, such as .pdf). It is useful to specify that your computer should show these extensions in File Explorer or Finder; see instructions for Windowsa or for Macb to enable this.

ahttps://helpx.adobe.com/x-productkb/global/show-hidden-files-folders-extensions.html

b https://support.apple.com/kb/PH25381?locale=en_US

1.4.1 Atom

Atom6 is a text editor built by the folks at GitHub. As it is an open source project, people are continually building (and making available) interesting and useful extensions to Atom. Atom’s built-in spell-check is a great feature, especially for documents that require lots of written text. It also has excellent support for Markdown, a markup language used regularly in this book (see Chapter 4). In fact, much of this text was written using Atom!

6Atom: https://atom.io

To download Atom, visit the application’s webpage and click the “Download” button to download the program. On Windows, you will download the installer AtomSetup.exe file; double-click on that icon to install the application. On a Mac, you will download a zip file; open that file and drag the Atom.app file to your “Applications” folder.

Once you’ve installed Atom, you can open the program and create a new text file (just like you would create a new file with a word processor such as Microsoft Word). When you save a document that is a particular file type (e.g., FILE_NAME.R or FILE_NAME.md), Atom (or any other modern text editor) will apply a language specific color scheme to your text, making it easier to read.

The trick to using Atom more efficiently is to get comfortable with the Command Palette.7 If you press cmd+shift+p (Mac) or ctrl+shift+p (Windows), Atom will open a small window where you can search for whatever you want the editor to do. For example, if you type in markdown, you can get a list of commands related to Markdown files (including the ability to open up a preview right in Atom).

7Atom Command Palette: http://flight-manual.atom.io/getting-started/sections/atom-basics/#command-palette

For more information about using Atom, see the manual.8

8Atom Flight Manual: http://flight-manual.atom.io

1.4.2 Visual Studio Code

Visual Studio Code9 (or VS Code; not to be confused with Visual Studio) is a free, open source editor developed by Microsoft—yes, really. While it focuses on web programming and JavaScript, it readily supports lots of languages, including Markdown and R, and provides a number of extensions for adding even more features. It has a similar command palette to Atom, but isn’t quite as nice for editing Markdown specifically. Although fairly new, it is updated regularly and has become one of the authors’ main editors for programming.

9Visual Studio Code: https://code.visualstudio.com

1.4.3 Sublime Text

Sublime Text10 is a very popular text editor with excellent defaults and a variety of available extensions (though you will need to manage and install extensions to achieve the functionality offered by other editors out of the box). While the software can be used for free, every 20 or so saves it will prompt you to purchase the full version (an offer that you can decline without loss of functionality).

10Sublime Text: https://www.sublimetext.com/3

1.5 Downloading the R Language

The primary programming language used throughout this book is called R.11 It is a very powerful statistical programming language that is built to work well with large and diverse data sets. Chapter 5 provides a more in-depth introduction to the language.

11The R Project for Statistical Computing: https://www.r-project.org

To program with R, you will need to install the R Interpreter on your machine. This software is able to “read” code written in R and use that code to control your computer, thereby “programming” it.

The easiest way to install R is to download it from the Comprehensive R Archive Network (CRAN).12 Click on the appropriate link for your operating system to find a link to the installer. On a Mac, click the link to download the .pkg file for the latest version supported by your computer. Double-click on the .pkg file and follow the prompts to install the software. On Windows, follow the Windows link to “install R for the first time,” then click the link to download the latest version of R for Windows. You will need to double-click on the .exe file and follow the prompts to install the software.

12The Comprehensive R Archive Network (CRAN): https://cran.rstudio.com

1.6 Downloading RStudio

While you are able to execute R scripts without a dedicated application, the RStudio program provides a wonderful way to engage with the R language by providing a single interface to write and execute code, search documentation, and view results such as charts and maps. RStudio is described in more detail in Chapter 5. This book assumes you are using RStudio to write R code.

To install the RStudio program, visit the download page,13 select to “Download” the free version of RStudio Desktop, and then select the installer for your operating system to download it.

13Download RStudio: https://www.rstudio.com/products/rstudio/download/

After the download is complete, double-click on the .exe or .dmg file to run the installer. Follow the steps of the installer, and you should be prepared to use RStudio.

This chapter has walked you through setting up the necessary software for basic data science, including the following programs:

  • Bash for controlling your computer

  • R for programmatically analyzing and working with data

  • RStudio as an IDE for writing and executing R code

  • git for version control

  • Atom as a general text editor for creating and editing documents

With this software installed, you are ready to get started programming for data science!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.224.197