Chapter 1: Getting started with Puppeteer

I remember the first time I heard about browser automation. A friend told me that their QA team was testing using "automation." That sounded magical to me. People testing websites using "automation." After a few years, I learned that automation wasn't a magic potion, but instead a powerful tool not only for QA but also for developers, because we developers love to automate stuff, right?

That's why in the first part of this chapter, I want to show you how browser automation works and what makes Puppeteer unique. In the latter part of this chapter, we are going to review some asynchronous techniques that are going to be useful throughout the rest of the book, and throughout your automation journey.

This chapter will cover the following topics:

  • What is browser automation?
  • Introducing headless browsers
  • Puppeteer use cases
  • Setting up the environment
  • Our first Puppeteer code
  • Asynchronous programming in JavaScript

What is browser automation?

If you go and look for the word "automation" in Wikipedia, it will tell you that it is "a process or procedure performed with minimal human assistance." If you are a developer, or just a geek, I bet you love to create scripts to automate tasks. You might also create environment variables, so you don't have to type long paths, or even create cool Git commands, so you don't need to remember all the steps required to create a new branch upstream.

When I got my first Mac, I discovered an app called Automator. I fell in love with it. You can automate tasks and connect applications just using drag and drop. If you use macOS and you've never played with Automator, please give it a try! But Automator isn't the only app. There are many workflow apps in the market, such as Hazel or Alfred.

Automation is even in the cloud and is available to the general public. Apps such as IFTTT and Zapier allow users to automate everyday tasks. You can create automations such as "When I post on Instagram, share the same image on Twitter," all from your phone. Regular people doing automation, that's great!

We also have mail rules. Most mail clients, even web clients, let you create rules, so you can mark emails as read, label them, or even remove them based on conditions. That's also automation.

Maybe you've taken it to the next level and coded an application for some of your daily tasks. You have that report that you need to send to your boss every month. That report is the result of many CSV files. You just wrote a tiny app, using your favorite language to make that report for you.

In a few words, automation means using an app to do a repetitive task for us. And as we have seen, it doesn't necessarily involve coding that app. So now, we can say that browser automation is telling an app to do a repetitive task in the browser for us.

Ok, that's a simple statement. But how's that possible? When you automate an app, you accomplish this using some kind of application program interface (API). For example, when you write a bat/bash file, you use the command-line arguments as an interface. If you use IFTTT, it employs Twitter's and Instagram's HTTP APIs to fetch images and create tweets. You need some kind of API, some way to interact with the app you are trying to automate. How are we supposed to interact with the browser? Good question.

To make things a little bit more complicated, we also need to consider that we have two apps to automate: the browser itself and the website. We don't want just to open a browser, create a new tab, and navigate to a page. We also want to go to that page and perform some actions. We want to click on a button, or enter some text in an input element.

Automating a browser sounds challenging. But, luckily for us, we have some brilliant people who did an excellent job for us and created tools such as Selenium and Puppeteer.

Selenium and Puppeteer

A quick search on Google will show that Selenium is one of the top, if not the top, UI testing tool on the market. I think a question many people would ask is: Why should I choose Puppeteer over Selenium? Which one is better?

The first thing you need to know is that Puppeteer was not created to compete with Selenium. Selenium is a cross-language, cross-browser testing tool, whereas Puppeteer was created as a multi-purpose automation tool to exploit all the power of Chromium. I think both are great automation tools, but they tackle browser automation in two different ways. They are different in two important aspects that define the target audience of a browser automation library:

  • The interface between the tool and the browser
  • The interface between the tool and the user

Let's first unpack how Selenium works.

Selenium's approach

In order to automate most browsers in the market, Selenium wrote a spec (an API) called WebDriver, which the W3C then accepted as a standard (https://www.hardkoded.com/ui-testing-with-puppeteer/webdriver), and asked the browsers to implement that interface. Selenium will use this WebDriver API to interact with the browser. If you take a look at the paper at the preceding URL, you will find two words showing up over and over: testing and simplicity. In other words, they defined an API with a clear focus on testing and simplicity and asked the browsers to implement that interface. Cross-browser testing is, in my opinion, the main feature of Selenium.

What is an API?

An API is the set of classes, functions, properties, and events that a library allows us to use. An API is critical for a library's success because it will determine how much you can do with it and how easy (or not) the interaction will be with the library.

The API that Selenium exposes to users is also considered a part of the WebDriver spec, and it follows the same philosophy: it's focused on testing and simplicity. This API provides a layer of abstraction between the user and all the different browsers and provides an interface that will easily help the developer write tests.

Puppeteer's approach

Puppeteer doesn't need to think in terms of cross-browser support. Although there are some efforts to run Puppeteer on Firefox, the focus is on grabbing all the developer tools that Chromium has and making them available to the user. With this goal in mind, Puppeteer can access way more tools than those exposed by the WebDriver API that Selenium uses.

The difference in how they communicate with the browser is also reflected in the APIs. Puppeteer provides an API that will help us take advantage of all the power of Chromium. I think it's important to highlight that Puppeteer was created in JavaScript, so the API will feel more natural than Selenium's, which comes from a cross-language philosophy.

Puppeteer doesn't need to ask anybody to implement the API because it takes advantage of the headless capability of Chromium. Let's now see what headless browsers are.

Introducing Headless browsers

What is a headless browser? No, it's not something from a horror movie. A headless browser is a browser that you can launch and interact with using a particular protocol over a particular communication transport, with no UI involved. This means that you will have one active process (or many processes, as we know how browsers are these days), but there will be no "window" for you to interact with the browser. I think that "windowless browser" would have been a more accurate name.

Available headless browsers

Both Chromium and Firefox support headless browser mode. It's important to mention that, at the time of writing this book, Firefox's headless mode was still experimental. That might sound bad, compared with the six browsers Selenium offers (https://www.hardkoded.com/ui-testing-with-puppeteer/selenium-browsers), but, as you might have noticed, I didn't say Chrome, I said Chromium. Chromium is the engine Chrome uses under the hood. But Chrome is not the only browser using Chromium; in the past few years, many browsers have started to use the chromium engine. These are a few examples of chromium-based browsers:

  • Google Chrome
  • Microsoft Edge, a.k.a. Edgium, to avoid confusion with the previous version of Microsoft Edge based on Trident
  • Opera
  • Brave

That's much better. We can automate at least five browsers. But there are two major browsers with no headless support: Microsoft Internet Explorer and Safari. The case of Safari is interesting. In the same way that Chromium is the engine behind Chrome, Webkit is the engine of Safari and, although Safari doesn't support headless mode, there are a few Webkit builds created for testing purposes with headless support. Microsoft Playwright has its own Webkit build to support cross-browser automation.

Do you want to see a headless browser for the very first time?

Let's try this out:

If you have Chrome installed, grab the full path of the executable and pass these command arguments: --headless --remote-debugging-port=9222 --crash-dumps-dir=/tmp:

~ % /Applications/Google Chrome.app/Contents/MacOS/Google Chrome --headless --remote-debugging-port=9222 --crash-dumps-dir=/tmp

Tip

If you are a macOS user, the Chrome executable will be inside the "Google Chrome.app" pseudo-file. As you can see, it's: "Google Chrome.app/Contents/MacOS/Google Chrome".

After executing that command, you should get something like this in the console:

DevTools listening on ws://127.0.0.1:9222/devtools/browser/e7e52f93-8f1e-491c-b718-94ae7a8e81b7

Now we have a headless chrome browser waiting for commands through a WebSocket on ws://127.0.0.1:9222.

Firefox also provides a headless mode:

~ % /Applications/Firefox.app/Contents/MacOS/firefox --headless

*** You are running in headless mode.

It doesn't say much, but trust me, now we have a Firefox browser running in headless mode.

As I mentioned before, a headless browser doesn't have a UI. The only way to interact with the browser is to use the transport the browser created, in this case, a WebSocket, and to send messages using some kind of protocol. In the case of Chromium and Firefox, it's the Chromium DevTools Protocol.

The Chromium DevTools Protocol

If you are a web developer, I'm 100% sure you have used Chrome DevTools. If you don't know what I'm talking about, you can open DevTools by clicking on the three dots button in the top-right corner, and then go to More Tools > Developer Tools. You will get something like this:

Chrome DevTools

Chrome DevTools

It's impressive all the things you can accomplish using this fantastic tool:

  • Inspect the DOM.
  • Evaluate CSS styles.
  • Run JavaScript code.
  • Debug JavaScript code.
  • See network calls.
  • Measure performance.

And the good news is that it's the Chromium Developer Protocol (which we'll call CDP from now on) that drives most of the DevTools' features. And that same CDP is the protocol that headless browsers use to interact with the outside world.

CDP sounds perfect. We can interact with the browser and do all the things I have mentioned. You can create a Node.JS app to launch a browser and start sending CDP messages through a WebSocket, but that would be quite complex and hard to maintain. That's where Puppeteer comes to the rescue and offers a human-friendly interface to interact with the browser.

Introducing Puppeteer

Puppeteer is nothing more, and nothing less, than a Node.js package that knows how to open a browser, send commands, and react to messages coming from that browser. At the time of writing this book, Puppeteer supports Chromium and Firefox, but Firefox support is still considered experimental. I think it's a good time for you to go to the Puppeteer repository (https://www.hardkoded.com/ui-testing-with-puppeteer/puppeteer-repo) and check whether things have changed since then.

There are also some community projects that implement Puppeteer in other languages. You will find Puppeteer-Sharp (https://www.hardkoded.com/ui-testing-with-puppeteer/puppeteer-sharp) for .NET or Pyppeteer (https://www.hardkoded.com/ui-testing-with-puppeteer/pypeteer) for Python.

When you use Puppeteer, you are, in fact, using more than just a JavaScript library. Many people call this the "Puppeteer pyramid":

 The Puppeteer pyramid

The Puppeteer pyramid

The Puppeteer pyramid consists of three components:

  • The headless browser is the engine that will run the pages we want to automate.
  • The Chromium DevTools Protocol allows any external user to interact with the browser.
  • Puppeteer provides a JavaScript API to interact with the browser using the CDP.

What I find valuable about Puppeteer is that its model clearly represents the browser structure:

The Puppeteer object model

Puppeteer Model

Puppeteer Model

Let's see what these objects represent inside the browser.

Browser

The browser is the main class. It's the object created when Puppeteer connects to a browser. The keyword here is connect. The browser that Puppeteer will use can be launched by Puppeteer itself. But it could also be a browser that is already running on your local machine, or it could even be a browser running in the cloud, like Browserless.io (https://www.hardkoded.com/ui-testing-with-puppeteer/browserless).

Browser context

A browser can contain more than one context. A context is a browser session (not to be confused with a browser window). The best example is the Incognito Mode or private mode, depending on the browser, which creates an isolated session inside the same browser process.

Page

A page is a tab in a browser or even a pop-up page.

Frame

The frame object is more important than it looks. Every page has at least one frame, which is called the main frame. Most of the page actions we will learn across this book are, in fact, a call to the main frame; for example, page.click calls mainframe.click.

The frame is a tree. One page has only one main frame, but a frame can contain many child frames.

Worker

A worker is a model that interacts with Web Workers. This is not a feature we will talk about in this book.

Execution context

The execution context is a mechanism Chromium uses to isolate the page from the browser extensions. Each frame will have its own execution context. Internally, all the frame functions that involve executing JavaScript code will use an execution context to run the code inside the browser.

There are other objects involved, such as ElementHandles and JSHandles, but we are going to talk about them later in the book.

Now that we know some of the differences between Selenium and Puppeteer, it's a perfect moment to review many possible use cases for Puppeteer.

Puppeteer use cases

Remember, the main difference between Puppeteer and Selenium is that Selenium is designed for end-to-end testing. In contrast, Puppeteer is designed as an API to exploit all the power of the DevTools, which means that besides end-to-end tests, there are also other use cases where you can use Puppeteer, as we will see now.

Task Automation

There are many things we do on the web that you can automate. For example, you can download a report, fill in a form, or check flight prices. You might also want to check your website's health, monitor its performance, or check whether your website is working correctly. In Chapter 6, Executing and Injecting JavaScript, we will see how to use Checkly to monitor your website in production.

Web Scraping

Most library authors won't like to say that you can use their library to do web scraping. Web Scraping has a reputation for being illegal. But in Chapter 9, Scraping tools, we will see how to do web scraping in the right way, without getting banned or sued.

Content generation

Generating content is not a use case that would come to your mind if you had to think about possible use cases. But Puppeteer is a great tool for generating two kinds of content:

  • Screenshots: Why would you need to take screenshots using an app? Think about thumbnails or previews. Imagine you want to create a paywall, showing part of your website content but as a blurred image. You could use Puppeteer to take a screenshot of your site, blur it, and use that image.
  • PDF files: Invoices are a great example of PDF generation. Imagine you have an e-commerce site. When the user makes a purchase, you show them a nice, well-designed invoice, but you need to send them that exact invoice by email. You could use Puppeteer to navigate to that invoice page and print it to PDF. You could also use your landing page to generate a PDF and use it as a brochure.

In Chapter 7, Generating Content with Puppeteer, we will talk about this use case and how to use screenshots to write UI regression tests.

End-to-end testing

I think Puppeteer is great for testing modern web apps because it's close to the browser. The API feels great, modern, and is designed for the JavaScript developer. It lets you execute JavaScript code easily and gives you access to all the power of Chromium. But I also have to say that Selenium's tolling for end-to-end testing is impressive. Puppeteer is not even close to what Selenium offers with its Selenium Grid. It's up to you to decide which is the right tool for you.

Enough with the theory. It's time to get started and set up our environment.

Setting up the environment

What's good about Node.js and Puppeteer is that they are cross-platform. My local environment is macOS Catalina 10.15.6. But you won't see much difference if you use a Windows or a Linux environment.

Time is a tech book's worst enemy. At the time of writing this book, I was using Node.JS 12.18.3 and Puppeteer 7. I'm pretty sure that by the time you read this book, new versions will have come to light. But don't feel discouraged about that; we expect that to happen. That's why I encourage you to go now and take a look at the GitHub repository of this book (https://github.com/PacktPublishing/ui-testing-with-Puppeteer). If you see that something is not working or has changed, please create an issue on that repository. We will try to keep it updated.

We only need two things to run our first Puppeteer code: Node.JS and Puppeteer. Let's begin with Node.JS.

Node.js

For the purposes of this book, the only thing you need to know about Node.js is that it's a runtime that allows us to run JavaScript code outside the browser.

It's important to highlight that the website we want to automate doesn't necessarily need to run on Node.js. You wouldn't need to know the language used to write the website, nor the platform that the website is running, but if you get to know those details, that could give you some ideas to write better automation code. For instance, if you know that the site is an ASP.NET Webforms project, you will know that it uses some hidden inputs to perform postbacks. That becomes more evident if you know the client-side frameworks, such as Vue or React.

As I mentioned before, we will install Node.JS v12.18.3 (or higher). The process is quite simple:

  1. Go to the official site: https://nodejs.org/.
  2. Download the LTS version. LTS stands for Long-Term Support.
  3. Run the installer as you would typically do on your platform:
Node.js setup

Node.js setup

If you want to see whether the installation was successful, you can open a terminal and execute node --version:

~ % node --version

v12.18.3

Visual Studio Code

You don't need any special code editor to write a Node.js app. But Visual Studio Code is a great editor. It's free, cross-platform, and you can use it not only to code JavaScript, but also to code in many other languages as well.

You can download it at https://code.visualstudio.com/. It doesn't even require running a setup on macOS. It's just an app you copy to your Applications folder:

Visual Studio Code

Visual Studio Code

Now that we have Node.js installed along with a code editor, we can create our first app.

Our first Puppeteer code

We first need to create a folder where our hello-puppeteer project will be located. I'm going to use a terminal, but you can use whatever you feel more comfortable with. Our project will be called hello-puppeteer:

> mkdir hello-puppeteer

> cd hello-puppeteer

We now need to initialize this brand-new node.js application. We create new applications in node.js using the npm init command. In this case, we will pass the -y argument, so it creates our app using default values:

> npm init -y

Wrote to /Users/neo/Documents/Coding/hello-puppeteer/package.json:

{

  "name": "hello-puppeteer",

  "version": "1.0.0",

  "description": "",

  "main": "index.js",

  "scripts": {

    "test": "echo "Error: no test specified" && exit 1"

  },

  "keywords": [],

  "author": "",

  "license": "ISC"

}

This output doesn't say much. It shows us that it has created a package.json file with some default values. Now, I will create an index.js file using the touch command. Again, you can perform this action in the way you feel most comfortable:

> touch index.js

Touch should have created the entry point of our app. But before coding our app, we need to install Puppeteer.

Installing Puppeteer

Most frameworks, if not all of them, have a way to publish and reuse components from different authors. The most popular package manager in Node.js is NPM (https://www.npmjs.com/). Does that sound familiar? We used npm init to create our app. As Puppeteer is a package published in NPM, we can download and install it using the npm install command.

If you don't want to jump between apps, you can open a terminal inside Visual Studio Code. If you are still in the terminal, you can open Visual Studio Code using the following command:

> code .

That will open Visual Studio Code. Once there, you will be able to launch a new terminal from the Terminal menu, as shown in the following screenshot:

Terminal inside Visual Studio Code

Terminal inside Visual Studio Code

After opening a terminal, we can install Puppeteer using npm install:

> npm install puppeteer@">=7.0.0 <8.0.0"

Downloading Chromium r848005 - 128 Mb [=========           ] 44% 5.3s

I would like to highlight two things here. As this book is based on Puppeteer 7, we are specifying the version as @">=7.0.0 <8.0.0", which means that we want the latest Puppeteer version greater than or equal to 7.0.0 and less than version 8.0.0. By forcing this version to be used, you will be able to follow the examples in this chapter using the same version I used.

Puppeteer versioning

Puppeteer follows the Semantic Versioning Specification (SemVer) to version their releases, which means that those three numbers in the version follow a rule. A change in the major number (the first number) means that there was a breaking change in the API. When a package changes the major number, it tells you that the new version might break your code. A change in the minor number (the second number) means that they added new functionality, maintaining backward compatibility. Lastly, a change in the patch number means that they fixed a bug, maintaining backward compatibility.

If you see that Puppeteer is in version 8, 9, or 10, it doesn't mean that this book is now obsolete. It means that they changed something that broke someone else's code. For instance, the change from version 6 to version 7 was just some change they made in the way they take screenshots.

In real life, you can use the latest version available. And second, you might have noticed that the package downloaded a specific version of Chromium, in this case, r848005. That doesn't mean that your code won't work with any version of Chromium you download from the internet. But, remember, Puppeteer interacts with the browser using the Chrome DevTools Protocol, so it needs a version of Chromium that reacts in the way Puppeteer expects. In the case of Puppeteer v7.0.1, it needs Chromium 90.0.4403.0, and there is no guarantee that any other version of Chromium (newer or older) would work with your current Puppeteer version. It doesn't mean that it won't work. It means that it's not guaranteed. You need to experiment and see. You can check which chromium version you should use for every version of Puppeteer on the API page (https://www.hardkoded.com/ui-testing-with-puppeteer/puppeteer-api).

Hello world in Puppeteer

Every language has its own hello world program. Puppeteer's hello world program would be navigating to https://en.wikipedia.org/wiki/%22Hello,_World!%22_program and taking a screenshot of the page. Let's see what it would look like:

const puppeteer = require('puppeteer');

(async function() {

    const browser = await puppeteer.launch();

    const page = await browser.newPage();

    await page.goto('https://en.wikipedia.org/wiki/%22Hello,_World!%22_program');

    await page.screenshot({ path: './screenshot.png'});

    browser.close();

})();

This is what we are doing in this small script:

  1. We import the Puppeteer library using require.
  2. Launch a new browser.
  3. Open a new page (tab) inside that browser.
  4. Navigate to the Wikipedia page.
  5. Take a screenshot.
  6. Close the browser.

I love how simple and easy it is to get started with Puppeteer. It's now time to run it. Using the same terminal you used to run npm install, now run node index.js:

> node index.js

A Chromium browser opened, navigated to Wikipedia, and closed by itself. You didn't see it because it was a headless browser, but it happened. Now, if you check your working directory, you should have a new file called screenshot.png:

Screenshot

Screenshot

Our code worked as expected. We got our screenshot from Wikipedia.

I bet you noticed that we used four awaits in our small hello puppeteer example. Asynchronous programming plays a big role in Puppeteer. Let's now talk about asynchronous programming in JavaScript.

Asynchronous programming in JavaScript

Normally, a program runs synchronously, which means that each line of code is executed one after the other. Let's take, for instance, these two lines of code:

const x = 3 + 4;

console.log(x);

Those two lines will run in order. The result of 3 + 4 will be assigned to the x constant, and then the variable x will be printed on the screen using console.log. The console.log function can't start until x is assigned.

But there are tasks, such as network requests, disk access, or any other I/O operation, that are time-consuming, and we don't necessarily want to wait for those tasks to finish to keep executing our code. For instance, we could start downloading a file, perform other tasks while that file is loading, and then check that file when the download is completed. Asynchronous programming will allow us to execute those long-running tasks without blocking our code.

An asynchronous function returns a Promise immediately to avoid blocking your code while waiting for a task. This Promise is an object that can be in one of the following three states:

  • Pending: This means that the asynchronous task is still in progress.
  • Fulfilled: This means that the asynchronous task was completed successfully.
  • Rejected: This means that the asynchronous task has failed.

Let's say that we have a function called downloadAFileFromTheInternet. The most common way to wait for a task to finish is to use the await keyword:

await downloadAFileFromTheInternet();

It's important to highlight that the await keyword here is not waiting for the function itself; it is waiting for the Promise returned by that function. That means that you can also assign that Promise to a variable and await it later in the code:

const promise = downloadAFileFromTheInternet();

// some code

await promise;

Or you can just not wait for the promise at all:

downloadAFileFromTheInternet();

If you want to learn more about asynchronous JavaScript, check out the Asynchronous JavaScript Deep Dive videos by Steven Hancock (https://www.packtpub.com/product/asynchronous-javascript-deep-dive-video/9781800202665).

Puppeteer relies on asynchronous programming techniques because the communication between Puppeteer and Chrome DevTools is asynchronous. After all, the communication between Chrome DevTools and the browser is asynchronous. Think about what would happen under the hood when you click a link:

Click timeline

Click timeline

When you call page.click, the result of that action is not immediate. As we saw, there are many things going on under the hood. When you call page.click, you will need to do one of the things mentioned previously: await it; keep the promise in a variable and await it later; or don't await it at all.

Now that we know more about asynchronous programming, I would like to review five utilities that we will use across the book.

Promise.all

Promise.all is a function that expects an array of promises and returns a promise that will be resolved when all the promises are fulfilled or rejected. Yes, a promise could be fulfilled, completed successfully, or rejected, which means it failed.

A common scenario is clicking on a link, and waiting for the page to navigate to the next page:

await Promise.all([

  page.click('a'),

  page.waitForNavigation()

]);

This promise will wait for the link to click and the waitForNavigation promises to be either fulfilled or rejected.

Promise.race

Like Promise.all, Promise.race expects an array of promises, but in this case, it will resolve when any of the promises are resolved.

A typical usage is for timeouts. We want to take a screenshot, but only if it takes less than 2 seconds:

await Promise.race([

  page.screenshot(),

  new Promise((resolve,reject)=>{

    setTimeout(()=>{

      reject(new Error('Too long!!!'));},2000);

  })]);

In this case, if the screenshot promise takes more than 2,000 milliseconds, the promise created as the second element in the array will be rejected, rejecting the Promise.

Fulfilling our own promises

You saw in our previous example how you can create a promise, return that Promise or assign it to a variable, and then fulfill it.

This is great when you want to wait for an event to happen. We can create a promise that will be resolved when the page closes:

const promise = new Promise((x) => page.on('close', x));

// …

await promise;

This kind of await is quite risky. If the Promise is never fulfilled, your code will hang. I recommend using these promises with Promise.race and timeouts.

We will see lots of promises throughout this book. Maybe some recipes such as "fulfill our own promises" look odd now, but we will use them a lot.

Summary

We covered a lot in this first chapter. We learned about browser automation and the difference between Selenium and Puppeteer. Then we saw that Puppeteer isn't limited only to end-to-end testing and reviewed some use case scenarios. Then we got our hands dirty and coded our first Puppeteer script. In the last section of the chapter, we covered many asynchronous techniques that we will use in this book.

In the next chapter, we are going to focus on end-to-end testing. We will review some tools available on the market and will consider how to organize our code to create reliable end-to-end tests.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
44.213.80.174