Be sure to include all the following modules in the requirements.txt file and install them into your virtual environment:
beautifulsoup4==4.6.0
Pillow==5.1.0
PyPDF2==1.26.0
python-docx==0.8.6
Check that the directory to search has the following files (all are available in GitHub in the Chapter04/documents directory). Note that file5.pdf and file6.pdf are copies of document-1.pdf, for simplicity. file1.txt to file4.txt are empty files:
├── dir
│ ├── file1.txt
│ ├── file2.txt
│ ├── file6.pdf
│ └── subdir
│ ├── file3.txt
│ ├── file4.txt
│ └── file5.pdf
├── document-1.docx
├── document-1.pdf
├── document-2-1.pdf
├── document-2.pdf
├── example_iso.txt
├── example_output_iso.txt
├── example_utf8.txt
├── top_films.csv
└── zen_of_python.txt
We've prepared a script, scan.py, that will search for a word in all the .txt, .csv, .pdf, and .docx files. The script is available in the Chapter04 directory of the GitHub repository.