Preface

The volume of documents is exponentially growing in the digital era, and it has become paramount to process this data accurately and in an accelerated manner to get value out of it. Most often, the data is in raw document format, and being able to process these documents in an accelerated manner is critical to meeting the growing business needs, but legacy document processing doesn’t meet this growing demand.

This book is a comprehensive guide that takes you through the fundamentals of Artificial Intelligence (AI) and Machine Learning (ML) and the core concepts required to process any type of document. You will also obtain hands-on experience with popular Python libraries for automating document processing. This book not only starts with the basics but also takes you through real industry use cases – for document processing in the healthcare industry to deliver value-based care, for claims processing in the insurance industry, and for accelerating loan application processing in the financial industry. That way, you are learning how to apply your skill sets to practical problems.

By the end of this book, you will have mastered the fundamentals of document processing with ML using practical implementations.

Who this book is for

This book is for technical professionals and thought leaders who want to understand and solve business problems by leveraging insights from their documents. If you want to learn about ML and AI and solve real-world use cases, such as document processing with technology, this book is for you. In order to learn from this book, you should have a basic knowledge of AI, ML, and Python programming concepts. The book is also excellent for developers who want to explore AI/ML with industry use cases.

What this book covers

Chapter 1, Intelligent Document Processing with AWS AI and ML, will explain how AWS wants to make ML accessible to everyone. For that reason, it has defined a three-layer AWS ML stack. AWS AI services can be called and leveraged by calling an API. First, the reader will learn about the AWS AI/ML stack. Then, we will define document processing, the challenges in document processing, and how AWS can help. We will also discuss common IDP use cases across industries. Finally, we will show the reader the stages of the IDP pipeline.

Chapter 2, Document Capture and Categorization, will detail how to collect data in a scalable, highly available data store. We will look into some of the security features for our data capture stage. Then, we will look into the accurate classification of documents. Readers will learn about the document splitter and how to use it on a code sample. Readers will learn to train their custom classifiers to accurately classify their document types.

Chapter 3, Accurate Document Extraction with Amazon Textract, will dive into key use cases for extracting data accurately from structured, unstructured, and semi-structured types of documents. Readers will learn about specialized documents, such as invoices, receipts, driver’s licenses, and passports, and how we can leverage the AWS AI service Amazon Textract for accurate extraction.

Chapter 4, Accurate Extraction with Amazon Comprehend, will explain document extraction with Amazon Comprehend. Here, we will learn about the extraction features for Entities and Custom Entities in Amazon Comprehend. Readers will learn how to train their own custom Comprehend model with Amazon Comprehend. Finally, the reader will learn about the key phrases to extract for accurate document tagging and categorization.

Chapter 5, Document Enrichment in Intelligent Document Processing, will explore the document enrichment stage of IDP. Readers will learn about document enrichment and the redaction of sensitive information with PII detection in Amazon Comprehend. They will learn about extracting health insights from Amazon Comprehend Medical and how we can augment document processing with health insights and ontology linking.

Chapter 6, Review and Verification of Intelligent Document Processing, will elaborate on the post-processing stage, with completeness checks and access control. Readers will learn about the document completeness check during the post-processing of a document. They will also learn about PII detection in Comprehend and PHI detection in Comprehend Medical, with APIs for sensitive data redaction, and setting policies for right access control. Finally, the reader will learn about accuracy checks with human review.

Chapter 7, Accurate Extraction and Health Insights with Amazon HealthLake, will start with a brief introduction to healthcare interoperability with FHIR and explain the requirement to store documents in a healthcare datastore, which can be done with Amazon HealthLake. Readers will learn about the features of Amazon HealthLake and how to extend IDP to process and store documents in the health datastore.

Chapter 8, IDP Healthcare Industry Use Cases, will explore healthcare prior authorization and healthcare claims processing as IDP use cases. Readers will learn about the prior authorization process and how to build an IDP pipeline for prior authorization to accelerate the pre-certification process. Finally, the reader will learn about the claims adjudication process and build an end-to-end IDP pipeline for it.

Chapter 9, Intelligent Document Processing – Insurance Industry, will look into two use cases in the insurance industry – processing benefit registration and claims adjudication – as IDP solutions. Readers will learn how to use the various stages in the IDP pipeline to build and automate these use cases. Finally, we will accurately extract data from multiple document types and layouts for the verification of the claims form.

Chapter 10, Intelligent Document Processing – Mortgage Processing, will analyze lending document processing as an IDP solution. Readers will learn about mortgage and lending document processing with the IDP pipeline. Finally, we will accurately extract data from multiple document types and layouts for the verification of mortgage documents.

To get the most out of this book

You will need access to an AWS account, so before getting started, we recommend that you create one.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book’s GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Intelligent-Document-Processing-with-AWS-AI-ML-. If there’s an update to the code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://packt.link/2mHlD.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: “Comprehend can take time to train your model. You can use Amazon Comprehend’s describe_document_classifier()command or check on the AWS Management Console for the completion status.”

A block of code is set as follows:

chapter2_syncdensedoc = "syncdensetext.png"
display(Image(url=s3.generate_presigned_url('get_
object', Params={'Bucket': s3BucketName, 'Key': chapter2_
syncdensedoc})))

Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: “The installer will present the following License Agreement screen. Click I Agree.”

Tips or Important Notes

Appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share your thoughts

Once you’ve read Intelligent Document Processing with AWS AI/ML, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.8.42