Thanks to Chapter 3, Navigating through a website, we now know how to open a browser and all the different options we have to launch browsers and create new pages. We also know how to navigate through other pages. We learned about HTTP responses and how they are related to a request.
This chapter is about interaction. Emulating user interaction is essential in UI testing. There is one pattern in unit testing called Arrange-Act-Assert (AAA). This pattern enforces a particular order in the test code:
In this chapter, we will learn how to find elements on a page. We will understand how the development team can improve their HTML so that you can easily find elements. But if you cannot change the page HTML, we will also look at another set of tools to find the elements we need.
Once we find an element, we will want to interact with it. Puppeteer provides two sets of APIs: One is action functions, such as click, select, or type. Then we have a set of emulation functions, such as mouse events or keyboard emulation. We will cover all those functions.
This chapter will introduce a new object we haven't mentioned yet: The element handle.
By the end of this chapter, we will have added another tool to our toolbox: The Visual Studio Code debugging tools.
We will cover the following topics in this chapter:
By the end of this chapter, you will be able to emulate most types of user interaction. But first, we need to lay the groundwork. Let's talk about HTML, the Document Object Model (DOM), and CSS.
You will find all the code of this chapter on the GitHub repository (https://github.com/PacktPublishing/UI-Testing-with-Puppeteer) under the Chapter4 directory. Remember to run npm install on that directory and then go to the Chapter4/vuejs-firebase-shopping-cart directory directory and run npm install again.
If you want to implement the code while following this chapter, you can start from the code you left in the Chapter3 directory.
You won't be able to find elements if you don't know CSS, and you won't understand CSS if you don't understand the DOM and HTML. So, we need to start with the basics.
I bet you've heard that you can build a site with HTML, CSS, and JavaScript. You might be using different server-side technologies. Your frontend might be implemented using cool technologies such as React or Angular. But in the end, the result will be a page based on HTML, CSS, and JavaScript.
HTML is the page's content. If you go to any website, open the DevTools, and click on the Elements tab, you will see the content of the page. You will see the page's title. If it's a news site, you will see all the articles there. If you visit a blog post, you will see the text of that post.
Without CSS, an HTML page would look like text written in Notepad. CSS not only brings color and fonts, but it's also the scaffolding that gives structure to a page.
Fun fact
Firefox has a built-in tool to disable all the styles on a page. If you go to View | Page Style and click on No Style, you will see how our life would be without CSS.
The last piece is JavaScript. JavaScript brings behavior to a page. Once the browser parses the HTML and builds the DOM, it allows us to manipulate and give life to a page.
But, as I mentioned before, we need to go to the basics, to the foundations of the web. Let's begin with HTML.
HTML stands for HyperText Markup Language: HyperText because the HTML is not content per se; HTML contains the content. Markup because it uses tags to give meaning to that content. And language because, although many developers disagree and they get mad about the idea, HTML is a language.
If we read an HTML file as a data structure, we can say that HTML is a relaxed version of XML. So, to better understand HTML, we need to look at the basics of XML.
These are the basic elements of XML content:
If you look at this figure, you already know almost everything you need to know about XML. Well, maybe I'm exaggerating. But this is the idea:
XML parsers are very strict with these rules. If the XML content you are trying to parse breaks just a single rule, the parser will consider the entire XML invalid. Whether it's a missing closing element or an attribute without quotes, the parser will fail to evaluate the XML content.
But we will find that browsers are not that strict when parsing HTML content. Let's take a look at the following HTML:
This simple HTML will print Hello World in red in the browser.
Is this valid XML? No. As you can see, the <div> element is not closed. But is this valid HTML? Yes.
Important Note
The fact that a browser would try to render broken HTML doesn't mean that you should take that lightly. It's possible you have heard a developer say that a particular bug was due to a missing closing div. If the HTML is broken, for instance, it has a missing closing div, the browser will try to guess the best way to render that HTML. The decision the browser makes when trying to fix broken HTML could end up with the page working as expected or with the full page layout broken.
Another interesting concept is that the XML specification doesn't give meaning to the elements. The names of the elements, the attributes, and the resulting information coming from that content depend on who wrote the XML and who is reading it.
HTML is XML with meaning. In 1993, Tim Berners-Lee, who is known as the inventor of the World Wide Web, decided that the main element would be called HTML and that it would contain a BODY. He decided that images would be represented as IMG elements, paragraphs would be P elements, and so on. Over the years, browser and web developers followed and improved this convention, getting to what we today call HTML5. We, as a community, agreed on the meaning of HTML elements.
We agreed that if we add the text attribute with the value red, we will get the text in red, and so on. How many types of elements do we have in HTML? A lot! The good news is that you don't need to know all of them.
The more you know, the more productive you will be. However, these are the most common elements you will find on a page.
Every HTML document will be contained inside an <html> element. That HTML element will have two child elements. The first element you will find is <head>. Inside that <head> element, you will find metadata elements, such as <title> with the page title, and many <meta> elements with metadata not supported by the standard HTML. Many sites use <meta> to enforce how the page should be shown on social media. The second set of elements you will find are include elements: <link> elements, including CSS files, and <script> files, including JavaScript code. Although the script elements are accepted in the header, most sites would add their script elements at the bottom of the page for faster rendering.
The second element you will find is the <body> element. The page itself will be inside this element.
Then we have the basic text elements.
<h1>, <h2>, <h3>, <h4>, <h5>, and <h6> are headings. If you have a text editor, you might have seen that there are many levels of headings and subheadings.
<p> will denote paragraphs. Then you might find <span> elements, which help style part of the text in a paragraph.
Another type of text element is <label>. These labels are linked to an input control, such as a radio button, giving context to that control. For example, a radio button or a checkbox doesn't have text; it's just a check or a radio. You need a label to give them context:
This HTML has three labels. Huey gives context to the first radio option, Dewey to the second, and Louie to the last one.
The last type of text element we will look at is list elements. Lists are expressed as a parent element, <ul> for unordered lists or <ol> for ordered lists, and <li> elements. You will see lots of these in menu bars.
There are two main action elements in HTML. The <a> anchor, also known as a link, was designed to take you to another page, but these days it's not limited to that, and it could trigger actions inside the page.
The second element is <button>, which again, although it was designed to send data to the server using an HTTP POST request, is now being used for many other kinds of actions:
Important note
The days when you would only use buttons and links to perform actions are in the past. As most HTML elements support click events, you will find pages that show elements as buttons, but in fact, those buttons are HTML elements such as DIVs.
Many times, you won't notice the difference between a link and a button. For instance, in the packtpub.com site, the search button is a button element, whereas the cart button is, in fact, an anchor.
Most of your automation code will involve clicking on these action elements.
The role of container elements is grouping elements, mostly for layout and style purposes. The most popular element is DIV. What is DIV? It can be anything: A list of items, a popup, a header, anything. It is used to create groups of elements.
One element that was the king of the container elements was TABLE. As you can infer from the name, a table represents a grid. Inside a TABLE element, you can have TR elements representing rows, TH elements representing header cells, and TD elements representing a column inside a row. I mentioned that this was the king of containers because the community has now moved on from tables to DIVs due to performance issues, the need for more complex layouts, and responsiveness issues. But you might still see some tables on sites showing information using a grid style.
HTML5 brought a new kind of container element: The Semantic Elements. The goal of these semantic HTML elements is to communicate the type of content the element contains. So, instead of using DIVs for everything, developers should start using elements such as <header> for the site header, <footer> for the footer, <nav> for the navigation options, <articles> for blog posts, and so on. The purpose of these elements is to help external tools (such as screen readers, search engines, and even the same browser) to understand the HTML content.
The last group of elements we need to know about are the input elements. The most common input element is the multifaceted input element. Depending on the type attribute, it can be "text", "password", "checkbox", "file" (upload), and so on; the list goes on to a total of 22 types.
Then we have select elements for drop-down lists and the option element to represent the items of a drop-down list.
Of course, we shouldn't forget the <IMG> element. It's impossible to picture a site without images.
Important note
Not every input you will see these days will be one of these elements. To make inputs more user-friendly or just nicer, you will find that developers might build inputs based on many other elements. For instance, you could find a drop-down list, which instead of being a select element would be an input element, plus an arrow button, which would show a floating list on clicking it. This kind of control makes sites prettier but automation more challenging.
HTML has not only a known list of elements but also a known list of attributes. These are the most common attributes you will find:
HTML won't limit the attributes you can add to an element. You can add any attribute you want, for instance, defaultColor="blue". One convention is using data- attributes (pronounced data dash attributes). The browser will parse these attributes and make them available in the DOM. So, although defaultColor is a valid attribute, the general convention uses data-default-color="blue" instead.
The other set of attributes of interest to us is the Accessible Rich Internet Applications (ARIA) attributes. These attributes are being added to help accessibility tools, such as screen readers. Why would we be interested in those attributes? Because developers express things such as the role or the state of an element. If you find a site using ARIA, finding the selected menu item would be a matter of finding the element with role="treeitem" and aria-expanded="true".
In the past few paragraphs, the DOM has been mentioned a few times. Let's talk about the DOM.
The DOM is the interface you can use in JavaScript to interact with the HTML. According to the MDN (https://www.hardkoded.com/ui-testing-with-puppeteer/dom), it is the data representation of the objects that comprise the structure and content of a document on the web. Why should we care about that? Because we are going to use the same tools to automate our pages.
In the previous section, we mentioned that an element might have an ID. You'll find that the search input at https://www.packtpub.com/ has the ID search, so you will be able to get that element in JavaScript using document.getElementById('search').
You might be wondering: How do I know the ID of a button? Or how do I check that the ID is valid? Remember we talked about the dev tools?
The developer tools can be opened by clicking on the three dots in the top-right corner of Chrome and then going to More Tools | Developer Tools. You can also use the Ctrl + Shift + J shortcut in Windows or Cmd + Option + I in macOS:
If you right-click on any element on the page, for instance, the search button, you will find the Inspect option, which will select that element in the Elements tab. There you will be able to see all the attributes of that element:
Another tab you will use a lot is the Console tab, where you will be able to run JavaScript code. If you are in the Elements tab and press the Esc key, you will get the Console tab below the Elements one. From there, you will be able to test your code:
Another set of functions that you will use a lot are document.querySelector and document.querySelectorAll. The first function returns the first element matching a CSS selector, whereas the second function returns a list of elements matching a CSS selector. So, we need to learn about some CSS selectors next.
You don't need to learn CSS to understand how to style a page, but you should master how to find elements on a page. There are around 60 different selectors (https://www.w3schools.com/cssref/css_selectors.asp) we can use for finding elements. We won't cover all 60 here, but let's go through the most common selectors:
Selector: ElementName.
Example: input will select <input> elements.
Selector: .ClassName.
Example: .input-text will select any element that contains the input-text class.
If you look at the search input in https://www.packtpub.com/, the class attribute is class="input-text algolia-search-input aa-input". This selector won't check whether the class attribute is equal to input-text. It has to contain it.
Selector: #SomeID.
Example: #search will select the element with the search ID. In this case, it does check equality.
Selector: [attribute=value].
Example: [aria-labelledby= "search"] will select the element with the aria-labelledby attribute with the value search. This is an excellent example of the use of ARIA attributes for automation.
This selector is not limited by only the equality check (=). You could use only [attribute] to check whether the element contains the attribute, no matter the value. You can also use many other operators. For example, you can use *= to check whether the attribute contains a value or |= to check whether it begins with a value.
What's great about CSS is that you can combine all these selectors. You could use input.input-search[aria-labelledby=" search"] to select an input with the input-search class and the aria-labelledby attribute with the value search.
You can also look for child elements. CSS allows us to "cascade" (that's what the C in CSS stands for) selectors. You can search for child elements by adding new selectors separated by a space. Let's take, for instance, the following selector:
form .algolia-autocomplete input
If you read it backwards, it will select an input inside an element with the algolia-autocomplete class, which is inside a form element. Notice that I said an input inside an element with the algolia-autocomplete class. That doesn't need to be the direct parent of the input element.
If you want to check strictly a parent-child relationship, you can separate selectors with a > instead of a space:
.algolia-autocomplete > input
This selector will look for an input whose direct parent element is an element with the algolia-autocomplete class.
Maybe you are thinking, why do I need to know all this information? I just want to get up and running with Puppeteer! Let me tell you something: You will spend half of your time inside the developer tools, and the most frequent element in your code will be a CSS selector. The more you know about HTML, the DOM, and CSS, the more proficient you will be at browser automation.
But now it's time to go back to the Puppeteer world.
It's time to apply everything we have learned so far. We need to master selectors because our Puppeteer code will be mostly about finding elements and interacting with them.
Let's bring back the login page from our e-commerce app:
If we want to test the login page, we need to find these three elements: The email input, the password input, and the login button.
If we right-click on each input and click on the Inspect element menu item, we will find the following:
Puppeteer provides two functions to get elements from the page. The $(selector) function will run the document.querySelector function and return the first element matching that selector or null if no elements were found. The $$(selector) function will run the document.querySelectorAll function, returning an array of elements matching the selector or an empty array if no elements were found.
If we want to implement the login function in our LoginPageModel class using these new functions, finding the login inputs would be easy:
const emailInput = await this.page.$('#email');
const passwordInput = await this.page.$('#password');
Tip
To find the login button, you might think that you could use the btn-success selector, and you could, but you shouldn't use classes used to style a button because they might change in the future if the development team changes the style. You should try to pick a CSS selector to overcome a design change.
Let's re-evaluate our login button. If you look for button elements, you will find that you have five buttons on that page, so the button selector won't work. But, we can see that the login button is the only button with a type="submit" attribute, so we could use the [type=submit] CSS selector to find this element.
But the [type=submit] selector is too generic. The developers might, for instance, add a new button with the submit type in the toolbar, breaking our code. But we can see that the login button is inside a form with the ID login-form. So now, we can create a more stable selector. So, we could look for the login button in our login function in this way:
const loginBtn = await this.page.$('#login-form [type=submit]');
Now we have everything we need to test our login page. But we are not going to interact with the login page yet. Let's go to the home page and find some more complex scenarios:
Let's say we want to test that the Macbook Pro 13.3' Retina MF841LL/A product has 15 items left in stock, and the price is $1,199.
First, a piece of advice: It's better to code these kinds of tests down the testing pyramid. You could test the API that sends those values or the function that makes that query to the database.
But let's try to solve this as a UI test:
If we take a look at the HTML, there is nothing that helps us find the product on the list, and if we were able to find the product, it's hard to find the elements inside that div element.
Here is where the collaboration between the development team and the QA team becomes valuable. How can developers help the QA team? Using data- attributes. Your team can use a data-test- attribute to help you find the elements you need:
As you can see in this HTML, it will be way easier to find elements with those new attributes. This is how we can get the values to test product ID 2:
const productId = config.productToTestId;
const productDiv = await this.page.$(`[data-test-product-id="${productId}"]`);
const stockElement = await productDiv.$('[data-test-stock]');
const priceElement = await productDiv.$('[data-test-price]');
With these four lines, we were able to find the three elements for our new test: The product container and the elements containing the stock and the price.
The are a few things to notice in this piece of code:
If we'd used page.$$('[data-test-stock]'), we would get many elements because each product has a data-test-stock element, but as we use productDiv.$('[data-test-stock]'), we'll get the element inside productDiv. This is an important resource.
What if we don't have the chance to add these attributes? There is one more resource – trying to find those elements using XPath.
XPath is a language to query XML-like documents. Remember how we said that HTML was a relaxed kind of XML? This means that we could navigate through the DOM using some kind of XML query language such as XPath.
Before digging into XPath's selectors, if you want to try XPath queries, Chrome DevTools includes a set of functions you can use inside the developer tools Console tab (https://hardkoded.com/ui-testing-with-puppeteer/console). One of these functions is $x, which expects an XPath expression and returns an array of elements:
If you open the Console tab on any page, you can run $x('//*') to test the //* selector.
To better understand an XPath expression, you need to see your HTML as XML content. We are going to navigate this XML document from the very same root, the HTML attribute.
Selector: //. This means "From the current node, bring me everything inside, no matter the position."
Example: $x('//div//a') will return, from the root, all the divs inside the document, no matter the position, and from those divs all a elements inside that div, no matter the position.
Are you confused about the "no matter the position" part? Well, let's now see the root selector.
Selector: /. This means "From the current node, bring me all the direct child elements."
Example: If we use $x('/div//a'), we'll get no results because there is no div as a child of the root object. The only valid root option would be $x('/HTML') because the HTML element is the only one under the main root object. But we could do something such as $x('//div/a'), which would mean "Bring me all the div elements, and from there all the a elements that are a direct child of those divs."
Selector: *. This means "Bring me all the elements."
Example: When we say "all the elements," it will be based on the previous selector. $x('/*') will bring only the HTML element because that would mean "all the direct elements." But $x('//*') will bring you all the elements from the page.
Selector: [@attributeName=value].
Example: $x('//div[@class="card-body"]') will bring all the div elements where the class attribute is equal to card-body. This might look similar to the class selector in CSS, but it's not because this selector won't work if div has more than one class.
Up to this point, it seems just like CSS with another syntax. What's so powerful about XPath? Well, let's get to some power tools.
It turns out that the syntax we used to filter attributes is, in fact, expressions, also called predicates. This gives us the chance to not only use the @attributeName option but to also check for many other things.
Selector: [text()=value].
Example: $x('//div[text()="Admin Panel (Testing purpose)"]') will bring all the div elements where its content is a the text Admin Panel (Testing purpose). You could even make it more generic and use something like this, $x('//*[text()="Admin Panel (Testing purpose)"]'), so you wouldn't care whether it's a div or another type of element.
This function is by far one of the main reasons you would see people using XPath.
Selector: [contains(text(), value)].
Example: Filter by text can be tricky. The text could have some space before or after the content. If you try to select the grid button on the page using this command, $x('//*[text()= "Grid"]'), you won't get any results because the element has some spaces after and before the word. This contains function can help us when we have spaces before or after the word, or when the word is part of a larger piece of text. This is how we can use this function: $x('//*[contains(text(),"Grid")]').
There are many more functions. Mozilla has a good list of all the available functions (https://www.hardkoded.com/ui-testing-with-puppeteer/xpath).
We get to do really complex queries with XPath. Let's take a look our last example. We want all the elements with a price over $2,000:
$x('//div[@class="row"]/p[1][number(substring-after(text(), "$")) > 2000]')
Wow, let's see what we are doing there:
There is one more feature that makes XPath a powerful tool. Unlike CSS selectors, you can select the parent element with XPath using ...
If we want to return the entire main div of the product with a price over $2,000, we can use the following:
$x('//div[@class="row"]/p[1][number(substring-after(text(), "$")) > 2000]/../..')
How do we use XPath expressions in Puppeteer? You already know how to do it: We have a $x function.
Let's go back to our test: We want to test that the Macbook Pro 13.3' Retina MF841LL/A has 15 items left in stock, and the price is $1,199.
What if the only way to find that product would be with the product name? We could do something like this:
const productName = config.productToTestName;
const productDiv = (await this.page.$x(`//a[text()="${productName}"]/../..`))[0];
const stockElement = (await productDiv.$('//h6'))[0];
const priceElement = (await productDiv.$(' //div[@class="row"]/p[1]'))[0];
Remember that $x returns an array of elements. In this case, as we know that they will always return one element, we take the first one.
In the same way, we shouldn't rely on design classes for CSS selectors. We should try not to rely too much on the HTML structure in XPath selectors. We are assuming a couple of things in this code:
If the design team decides that the stock will look better using div instead of h6, if they wrapped the price inside a div element to improve mobile navigation, your test will break.
We learned how to get elements from the page, but it's important to know that the $, $$, and $x functions don't return an element from the DOM. They return something called element handles.
Element handles are a reference to a DOM element on the page. They are a pointer that helps Puppeteer send commands to the browser, referencing an existing DOM element. They are also one of the ways we have to interact with those elements.
Let's go back to our login test. We already have the three elements we need: The user input, the password input, and the login button. Now we need to enter the email and the password and click on the button.
The ElementHandle class has a function called type. The signature is type(text, [options]). The options class is not big this time. It only has a delay property. The delay is the number of milliseconds Puppeteer will wait between letters. This is great to emulate real user interaction.
The first part of our test would look like this:
const emailInput = await this.page.$('#email');
await emailInput.type(user, {delay: 100});
const passwordInput = await this.page.$('#password');
await passwordInput.type(password, {delay: 100});
Here, we are looking for the email and password elements, and then emulating a user typing on those inputs.
Now, we need to click on the button.
The ElementHandle class also has a function called click. I bet you are already getting the pattern. The signature is click([options]). You can simply call click(), and that would do the job. But we can also use the three available options:
In our case, we don't need to use these options:
const loginBtn = await this.page.$('#login-form [type=submit]');
await loginBtn.click();
With these two lines, we can finally finish our login function. We find the login button and then we click on it.
The site now has a drop-down list, a SELECT element in HTML, to switch between the grid and the list view:
As you might have guessed, the function to select an option is called select, and the signature is select(…values). It's a list of values if the select element has the multiple attribute.
The next thing we need to know about this function is that the value select expects is not the text you see in the option, but the option of the value. We can see that by inspecting the element:
In this case, we are lucky as the value is almost the same as the visible text, but it's not the same. If we want to select the Grid item, we need to use grid, instead of Grid.
If we switch the option to list mode, we can see that a list-group-item class is added to the elements:
This is how we can test this functionality:
var switchSelect = await page.$('#viewMode');
await switchSelect.select('list');
expect(await page.$$('.list-group-item')).not.to.be.empty;
await switchSelect.select('grid');
expect(await page.$$('.list-group-item')).to.be.empty;
Using await and page.$ every time we need to interact with an element requires a lot of boilerplate. Imagine if we had eight inputs to fill; that would be a lot. That's why both Page and Frame (if you are dealing with child frames) have most of the functions an element handle has, but they expect a selector as a first argument.
So, say we have this piece of code:
var switchSelect = await page.$('#viewMode');
await switchSelect.select('list');
It could be as simple as this:
await page.select('#viewMode', 'list');
You will find functions such as page.click(selector, [options]), page.type(selector, text, [options]), and many other interaction functions.
We have covered the most common user interactions. But we can go a little deeper and try to emulate how the user would interact with the page using their keyboard and mouse.
Although you will be able to test the most common scenarios by typing or clicking on elements, there are other scenarios where you would need to emulate how the users interact with a site using the keyboard and the mouse. Let's take, for instance, a Google spreadsheet:
The Google spreadsheet page has a lot of keyboard and mouse interactions. You can move through the cells using your keyboard arrows or copy values by doing drag and drop with the mouse.
But it doesn't need to be that complicated. Let's say that you work in the QA team at GitHub.com, and you need to test the search box from the home page.
As GitHub.com is for developers, and developers for some weird reason hate using the mouse, the development team added many shortcuts on the site. We want to create a test to check that those shortcuts are working as expected:
As we can see there, the shortcut to the search input is a /. So, we need to do the following:
We are going to use the Keyboard class that the Page class exposes as a property.
The first step is to press slash. To do that, we are going to use, you guessed it, the press function. The signature is press(key, options). The first thing we need to know about press is that it's a shortcut to two other functions – down(key, options) and up(key). As you can see, you can get an almost complete keyboard emulation.
Notice that the first argument is not text but key. You will find the full list of supported keys here: https://www.hardkoded.com/ui-testing-with-puppeteer/USKeyboardLayout. There, you will find keys such as Enter, Backspace, or Shift. The press function has two options available: First, if you assign the text property, Puppeteer will create an input event with that value. It would work like a macro. For instance, if the key is p and the text is puppeteer, when you would press p, you would get puppeteer in the input element. I've never found a usage for that argument, but it's there. The down function also has this option. The second option is delay, which is the time between the key down and the key up actions.
The official Puppeteer documentation (https://www.hardkoded.com/ui-testing-with-puppeteer/keyboard) has a perfect example for this:
await page.keyboard.type('Hello World!');
await page.keyboard.press('ArrowLeft');
await page.keyboard.down('Shift');
for (let i = 0; i < ' World'.length; i++) {
await page.keyboard.press('ArrowLeft');
}
await page.keyboard.up('Shift');
await page.keyboard.press('Backspace');
Let's unpack this code:
Now we can go and test the GitHub.com home page:
const browser = await puppeteer.launch({headless: false, defaultViewport: null});
const page = await browser.newPage();
await page.goto('https://www.github.com/');
await page.keyboard.press('Slash');
await page.keyboard.type('puppeteer')
await page.keyboard.press('Enter');
If we go back to our login example, we could test that you should be able to log in by pressing Enter instead of clicking on the login button. Or if the navigation between controls is important, you can jump from the user input to the password and then to the login button by pressing Tab.
Do you want to play tic-tac-toe? Let's play it using the mouse.
In the Chapter4 folder, you will find a tictactoe.html file with a small tic-tac-toe game made in React:
If we consider the page as a canvas, where the top-left corner of the window is the coordinate (0;0) and the bottom right is the coordinate (window width, window height), mouse interaction is about moving the mouse to an (X;Y) coordinate and clicking using one of the mouse buttons. Puppeteer offers the following functionalities.
Move the mouse using mouse.move(x, y, [options]). The only option available in this move function is steps. With steps, you can tell Puppeteer how many times you want to send mousemove events to the page. By default, it will send only one event at the end of the mouse move action.
In the same way as with the keyboard you have the up/down and press functions, with the mouse, you have up/down and click.
The mouse has one extra action that the keyboard doesn't have, which is wheel. You can emulate mouse scrolling using mouse.wheel([options]). This option has two properties: deltaX and deltaY, which can be positive or negative scroll values expressed in CSS pixels.
Let's go back to our tic-tac-toe game. We will do a simple test: Player 1 will use the first row and player 2 will use the second row, so player 1 will win after three moves. As this is a canvas, we need to know which coordinates we need to click.
We can use the style section of the developer tools to get those coordinates. If we look at the body, we will see a 20-pixel margin that will make (20;20) the starting point:
We also know that each square is 32 px by 32 px, so the middle of the square should be delta + (32 / 2). Let's test it:
const startingX = 20;
const startingY = 20;
const boxMiddle = 16;
// X turn 1;
await page.mouse.click(startingX + boxMiddle, startingY + boxMiddle);
// Y turn 1;
await page.mouse.click(startingX + boxMiddle, startingY + boxMiddle * 3);
// X turn 2;
await page.mouse.click(startingX + boxMiddle * 3, startingY + boxMiddle);
// Y turn 2;
await page.mouse.click(startingX + boxMiddle * 3, startingY + boxMiddle * 3);
// X turn 3;
await page.mouse.click(startingX + boxMiddle * 5, startingY + boxMiddle);
expect(await page.$eval('#status', status => status.innerHTML)).to.be('Winner: X');
So, here we know that the tic-tac-toe grid starts at the coordinate (20,20), and from there is simple math to find the right coordinates in our canvas. The first box will be clicked at the coordinate (startingX + boxMiddle; startingY + boxMiddle). If we want to click on the second row, it would be three middle squares, startingX + boxMiddle * 3, and so on until we know that we have a winner.
Don't worry about the last $eval. We'll get there.
But this is not just for games. Many modern UIs might require some mouse interactions, for instance, hoverable dropdowns or menus. We can see one example on the W3Schools site (https://www.w3schools.com/howto/howto_css_dropdown.asp):
To be able to click on any item in that dropdown, we need to hover first on the button and then link on the option:
await page.goto("https://www.w3schools.com/howto/howto_css_dropdown.asp");
const btn = await page.$(".dropbtn");
const box = await btn.boundingBox();
await page.mouse.move(box.x + (box.width / 2), box.y + (box.height / 2));
const option = (await page.$x('//*[text()="Link 2"]'))[0];
await option.click();
As you can see, we don't need to guess the Hover me button's location. The element handle provides a function called boundingBox, which returns the position (x and y) and the element's size (width and height).
Is there an easier way? Yes, we can simply use await btn.hover(), which would hover on the element. I wanted to give you a complete example because sometimes UI components are quite sensitive to the mouse position, so you need to put the mouse in a precise location to get the desired result.
Time for a bonus track. Let's talk about debugging.
Many developers consider debugging a last resort. Others would flood their code with console.log messages. I consider debugging a productivity tool.
Debugging is trying to find bugs by running an application step by step.
We have two ways of launching our tests in debug mode. The first option is creating a JavaScript debug terminal from the Terminal tab. That will create a new terminal as we did before, but in this case, Visual Studio will enable the debugger when you run a command from that terminal:
The second option is going to the Run tab and creating a launch.json file. You could also create that file manually inside the .vscode folder:
Once we have the file, we can create a new configuration so that we can run npm run test in the terminal:
{
"version": "0.2.0",
"configurations": [
{
"name": "Test",
"request": "launch",
"runtimeArgs": [
"run",
"test"
],
"runtimeExecutable": "npm",
"skipFiles": [
"<node_internals>/**"
],
"type": "pwa-node"
},
]
}
Which one is the best? Well, if you will work on this project for many days, creating the launch.json file is more productive; once created, you just need to hit F5, and you would be in debug mode. The terminal option is easier just to get running.
Once you have everything set up, it is about creating breakpoints in the line you want the debugger to stop, and from there it is about taking advantage of all the tools Visual Studio Code offers:
There you will find the following:
This chapter was massive. We began the chapter with a brief but complete introduction to HTML, the DOM, and CSS. These concepts are crucial to create top-notch tests. Then, we learned a lot about XPath, which is not a very popular tool, yet it is extremely powerful and will help you face scenarios where CSS selectors are not enough.
In the second part of this chapter, we went through the most common ways to interact with a page. Not only did we learn how to interact with elements but we also covered keyboard and mouse emulation.
I hope you enjoyed the tools section. Debugging with Visual Studio Code is a great tool to add to your toolbox.
In the next chapter, we are going to wait for stuff. Things take time on the web. Pages take time to load. Some actions on the page might trigger network calls. The next chapter is important because you will learn how to make your tests even more stable.
18.205.114.205