Getting ready

Be sure to include all the following modules in the requirements.txt file and install them into your virtual environment:

beautifulsoup4==4.6.0
Pillow==5.1.0
PyPDF2==1.26.0
python-docx==0.8.6

Check that the directory to search has the following files (all are available in GitHub in the Chapter04/documents directory). Note that file5.pdf and file6.pdf are copies of document-1.pdf, for simplicity. file1.txt to file4.txt are empty files:

├── dir
│ ├── file1.txt
│ ├── file2.txt
│ ├── file6.pdf
│ └── subdir
│ ├── file3.txt
│ ├── file4.txt
│ └── file5.pdf
├── document-1.docx
├── document-1.pdf
├── document-2-1.pdf
├── document-2.pdf
├── example_iso.txt
├── example_output_iso.txt
├── example_utf8.txt
├── top_films.csv
└── zen_of_python.txt

We've prepared a script, scan.py, that will search for a word in all the .txt, .csv, .pdf, and .docx files. The script is available in the Chapter04 directory of the GitHub repository.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.76.154