How to do it...

  1. Import the module:
>>> from PyPDF2 import PdfFileReader
  1. Open the document-1.pdf file and create a PDF document object. Notice the file needs to be open for the whole reading:
>>> file = open('document-1.pdf', 'rb')
>>> document = PdfFileReader(file)
  1. Get the number of pages of the document, and check it is not encrypted:
>>> document.numPages
3
>>> document.isEncrypted
False
  1. Get the creation date from the document info (2018-Jun-24 11:15:18), and discover that it has been created with a Mac Quartz PDFContext:
>>> document.documentInfo['/CreationDate']
"D:20180624111518Z00'00'"
>>> document.documentInfo['/Producer']
'Mac OS X 10.13.5 Quartz PDFContext'
  1. Get the first page, and read the text on it:
>>> document.pages[0].extractText()
'!A VERY IMPORTANT DOCUMENT By James McCormac CEO Loose Seal Inc '
  1. Do the same operation for the second page (redacted here):
>>> document.pages[1].extractText()
'"!This is an example of a test document that is stored in PDF format. It contains some sentences to describe what it is and the it has lore ipsum text. !" Lorem ipsum dolor sit amet, consectetur adipiscing elit. ...$'
  1. Close the file and open document-2.pdf:
>>> file.close()
>>> file = open('document-2.pdf', 'rb')
>>> document = PdfFileReader(file)
  1. Check the document is encrypted (it requires a password) and raises an error if trying to access its content:
>>> document.isEncrypted
True
>>> document.numPages
...
PyPDF2.utils.PdfReadError: File has not been decrypted
  1. Decrypt the file and access its content:
>>> document.decrypt('automate')
1
>>> document.numPages
3
>>> document.pages[0].extractText()
'!A VERY IMPORTANT DOCUMENT By James McCormac CEO Loose Seal Inc '
  1. Close the file to clean up:
>>> file.close()
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.117.192