F

Working Directories

Building on Appendix C, Appendix D, and Appendix E, this appendix covers working directories, especially when you are working with project templates (Appendix D).

A working directory simply tells the program where the base or reference location is. It’s common to place all of your code, data, output, figures, and other project files all in the same folder, because it means the working directory is easy to figure out. However, this practice can easily lead to a messy folder, as mentioned in Appendix D.

We like fully documented project templates that tell us where and how to run our scripts. With this approach, all our scripts have a predictable and consistent working directory.

There are a few ways to figure out what your current working directory is. If you are using IPython, then you can type pwd into the IPython prompt, and it will return the folder path of your current working directory. This method also works if you are using the Jupyter notebook.

If you are executing your Python code as scripts directly in the command line, then the working directory is the output after you run cd on Windows (note there is nothing else after the command), and pwd on OSX and Linux.

Here is an example of how working directories affect your code. Suppose you have the following project structure, where the current working directory is denoted by a star (*).

my_project/
  |
  |- data/
  |    |
  |    + data.csv
  |
  |- src/ *
  |    |
  |    + script.py
  |
  +- output/

If your script.py wants to read in a data set from the data folder, it would have to do something like data = pd.read_csv('../data/data.csv'). Note that because the current working directory is in the src folder, to navigate to the data.csv, you need to go up one level .. to the my_project folder and then down into the data folder to get to your data set. The benefit of this is that you can run your code by tying it to python script.py, though this can lead to some issues discussed later in this appendix.

Let’s use a different working directory:

my_project/ *
  |
  |- data/
  |   |
  |   + data.csv
  |
  |- src/
  |    |
  |    + script.py
  |
  +- output/

Now that the working directory is on the top level, script.py can reference the data set with the command data = pd.read_csv('data/data.csv'). Note that you no longer need to go up a level to reference your data. However, now if you want to run your code, you have to reference the file as such: python src/script.py. This may be annoying, but it allows you to create any amount of subfolders, and data and output will always be referenced the same way across all the files.

It also means you as a user have one and only one working directory to execute any script in this project.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.119.192.167