Screen Scraping

Screen Scraping is a method of extracting data from documents, websites, and PDFs. It is a very powerful method for extracting text. We can extract text using the Screen Scraper wizard. The Screen Scraper wizard has three scraping methods:

  • Full Text
  • Native
  • OCR

We shall elaborate on each of these methods one by one. One should have a clear understanding of these methods in order to know when to use which method. There will be situations when we have to choose the best method for our needs:

  • Full text: The Full text activity is used to extract information from various types of documents and websites. It has a 100% accuracy rate. It is the fastest method among all three methods. It even works in the background. It is also capable of extracting hidden text. However, it is not suitable for Citrix environments.
  • Native: This is similar to the Full text method but has some differences. It has a slower speed than the Full text method. It has a 100% accuracy rate, like the Full text method. It does not work in the background. It has an advantage over the Full text method in that it is also capable of extracting the text's position. It cannot extract hidden text. It also does not work with a Citrix environment.
  • OCR: This method is used when the previous two methods fail to extract information. It uses the two OCR engines: Microsoft OCR and Google OCR. It has also a scale property: you can choose the scale level as per your need. Changing the scale property will give the best results:
Capability Method Speed Accuracy Background Execution Extract text position Extract hidden text Support for Citrix
Full Text 10/10 100% Yes No Yes No
Native 8/10 100% No Yes No No
OCR 3/10 98% No Yes No Yes

Let us consider an example of extracting text from the UiPath website's main page:

  1. Create a Blank project and give it a meaningful name.
  2. Log on to the UiPath website by logging in to https://www.uipath.com/ in your browser.
  3. Drag and drop a Flowchart activity on the Designer panel. Click on the Screen Scraping icon and locate the area from which you want to extract the information. Just choose an area on the UiPath website. A window will pop up stating that the AUTOMATIC method failed to scrape this UI Element.

By default, the Screen Scraper Wizard chooses the best scraping method to extract data, but it failed to do so in our case:

  1. Try choosing another method. We shall choose the Full text method. This too will fail. Next, choose the Native method. This will also fail, as you can see in the following screenshot:
  1. This time, choose the OCR Scraping method. You can clearly see the extracted text:
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.165.246