Building on Appendix C, Appendix D, and Appendix E, this appendix covers working directories, especially when you are working with project templates (Appendix D).
A working directory simply tells the program where the base or reference location is. It’s common to place all of your code, data, output, figures, and other project files all in the same folder, because it means the working directory is easy to figure out. However, this practice can easily lead to a messy folder, as mentioned in Appendix D.
We like fully documented project templates that tell us where and how to run our scripts. With this approach, all our scripts have a predictable and consistent working directory.
There are a few ways to figure out what your current working directory is. If you are using IPython, then you can type pwd
into the IPython
prompt, and it will return the folder path of your current working directory. This method also works if you are using the Jupyter notebook.
If you are executing your Python code as scripts directly in the command line, then the working directory is the output after you run cd
on Windows (note there is nothing else after the command), and pwd
on OSX and Linux.
Here is an example of how working directories affect your code. Suppose you have the following project structure, where the current working directory is denoted by a star (*).
my_project/
|
|- data/
| |
| + data.csv
|
|- src/ *
| |
| + script.py
|
+- output/
If your script.py
wants to read in a data set from the data
folder, it would have to do something like data = pd.read_csv('../data/data.csv')
. Note that because the current working directory is in the src
folder, to navigate to the data.csv
, you need to go up one level ..
to the my_project
folder and then down into the data
folder to get to your data set. The benefit of this is that you can run your code by tying it to python script.py
, though this can lead to some issues discussed later in this appendix.
Let’s use a different working directory:
my_project/ *
|
|- data/
| |
| + data.csv
|
|- src/
| |
| + script.py
|
+- output/
Now that the working directory is on the top level, script.py
can reference the data set with the command data = pd.read_csv('data/data.csv')
. Note that you no longer need to go up a level to reference your data. However, now if you want to run your code, you have to reference the file as such: python src/script.py
. This may be annoying, but it allows you to create any amount of subfolders, and data
and output
will always be referenced the same way across all the files.
It also means you as a user have one and only one working directory to execute any script in this project.
18.189.178.237