Playing with DOM elements

There are a lot of things we can do with the page we are accessing beyond getting the title of the document, and this can be done with a little help from the Document Object Model API. We are not going to discuss each object and function of the DOM API, but we will touch on some that are very useful. If you want to learn more about DOM API, the best place to start is the Mozilla Development Network: https://developer.mozilla.org/en-US/docs/DOM.

Selecting elements

Everything starts with the document object, and it contains nested elements. To select an element, we either traverse the entire document or use the DOM selectors. There are different methods to reference a document element, which can be done by element ID, class, name, tag, or XPath.

getElementById

This retrieves the element using a unique ID

getElementByClassName

This selects the element using the element class name

getElementByName

This provides a reference using the element name

getElementByTagName

This gets the element using the element tag name

querySelector

This searches for elements using the CSS selector

Each function is used depending on the scenario or document composition. We can use these functions as we normally do in JavaScript code; however, since we are going to access within the context of the page, we need to enclose the code with the evaluate() function of the webpage module.

var content = page.evaluate(function () {
  var element = document.querySelector('#elem1'),
  return element.textContent;
});
console.log(content);

In the preceding code, based on the fetched page, we select the element with the ID elem1, extract the text content, and return it to display the data in our console.

Let's look at a real world example.

Pinterest is an exciting social media website, which allows you to share photos in a pinboard-style. Each user in Pinterest can have several boards and photo postings, and can follow social friends. With Pinterest's web profile, we can view user details in terms of how many photos are pinned, the number of boards created, total followers, and how many users are being followed.

Selecting elements

We want our script to accept one parameter, which is the Pinterest user ID. The first section of our script accepts that parameter. We use the system module to retrieve the parameter from the command line.

var system = require('system'),
var userid = system.args[1];
var page = require('webpage').create();

Next, we open the user's page based on Pinterest's user profile URL at http://www.pinterest.com/userid.

We create a page object from the webpage module, then use open() to navigate to that page.

var profileUrl = "http://www.pinterest.com/" + userid; page.open(profileUrl, function(status) {

After the page is loaded, we now extract the information from the page. Let's get the number of pins first. We can use querySelector to select a certain part of the page and retrieve the content. However, from what we have learned previously, to retrieve and use DOM functions, we need to be in the context of the page, and use the evaluate() function of webpage.

But wait; how do we get that element ID that we will pass to querySelector? The only answer is to check the page's HTML source and inspect it. In most modern browsers. this is easily done.

In a Safari browser, simply right-click on a target element or text and select Inspect Element. It will show the ID or any unique identifier and will display the reference in the code.

Selecting elements

Based on the inspected DOM element, we can identify which form or path of the element we can use. Based on Pinterest's approach of creating the layout and how the details are presented, all the statistical data is put into the page with a clickable reference and are embedded using the <a href> element. With this information, we can use querySelector and lookup by the href attribute. The format for that will be as follows:

document.querySelector('[href="URL"]'),

In the preceding example, we will replace URL with the proper href URL value, as we found in the web page. If we go back to our browser's investigate view and inspect the pins' text content, it will show us the URL value (see the following screenshot for more details):

Selecting elements

So, we can now use querySelector('[href="/iamaries/pins/"]') to get the number of pins for this user.

if ( status === "success" ) {
  var pinterest = page.evaluate(function () { 
    var numberPins = document.querySelector('[href="/iamaries/pins/"]').innerText.trim();
    
    return {
      pins: numberPins
    };
  });

Now with this code, we can get the number of pins for this user; however, this is only applicable for a specific user, since we hardcoded href with the specific user ID. Let's tweak a bit of our code and make it more flexible based on the passed user ID that we get as the argument. First, we need to add the userid variable that we have declared before and that holds the value based on the passed argument as the second parameter of page.evaluate(), as shown in the following code:

page.evaluate(function (uid) {
  // code content
}, userid);

The second and succeeding parameters of the evaluate() function denote the list of values we need to pass to the page context, as we now know that any other variables that we have cannot be referenced with the page context, as well as within the evaluate() function. You will also notice that our function callback definition does now have the uid parameter; this represents the value we passed from userid. Each extra parameter of evaluate should be passed as a parameter of the function callback for us to retrieve and reference the value.

Modify our code to have this concept applied as follows:

if ( status === "success" ) {
  var pinterest = page.evaluate(function (uid) {
    var numberPins = document.querySelector('[href="/' + uid + '/pins/"]').innerText.trim();
  
    return {
      pins: numberPins
    };
  }, userid);

From the previous hardcoded user ID, we now have a parameterized user ID. With this, we can now construct the selector for a certain user. There are other items that we can retrieve and are able to display back, such as the number of boards, number of pins that the user likes, the number of followers, and the number of users that the user follows. Let's complete our Pinterest code.

var system = require('system'),
var userid = system.args[1];
var page = require('webpage').create();

var profileUrl = "http://www.pinterest.com/" + userid; page.open(profileUrl, function(status) {
  
  if ( status === "success" ) {
    var pinterest = page.evaluate(function (uid) {
      var numberPins = document.querySelector('[href="/' + uid + '/pins/"]').innerText.trim();
      var numberFollowers = document.querySelector('[href="/' + uid + '/followers/"]').innerText.trim();
      var numberFollowing = document.querySelector('[href="/' + uid + '/following/"]').innerText.trim();
      var numberBoards = document.querySelector('[href="/' + uid + '/boards/"]').innerText.trim();
      var numberLikes = document.querySelector('[href="/' + uid + '/likes/"]').innerText.trim();
      var userName = document.getElementsByClassName("userProfileHeaderName").item(0).innerText.trim();
      
      return {
        name: userName,
        social: {
        followers: numberFollowers,
        following: numberFollowing
      },
      stats: {
        boards: numberBoards,
        pins: numberPins,
        likes: numberLikes
      }
    };
    }, userid);
    
    console.log(pinterest.name + ' has ' + pinterest.stats.pins + 
                ', ' + pinterest.stats.boards + 
                ', ' + pinterest.stats.likes + 
                ' with ' + pinterest.social.followers + 
                ' and ' + pinterest.social.following + ' Awesome Users.'),
  }
  
  phantom.exit(0);
});

Since there is a lot of data to be returned, it is better to return the data as a JSON object, as shown in the following code:

return {
  name: userName,
  social: {
    followers: numberFollowers,
    following: numberFollowing
  },
  stats: {
    boards: numberBoards,
    pins: numberPins,
    likes: numberLikes
  }
};

Now, let's use that object and display the output:

console.log(pinterest.name + ' has ' + pinterest.stats.pins + 
            ', ' + pinterest.stats.boards + 
            ', ' + pinterest.stats.likes + 
            ' with ' + pinterest.social.followers + 
            ' and ' + pinterest.social.following + 
            ' Awesome Users.'),

Our code is now complete. Let's try it by passing pinterest as our target user ID.

Selecting elements

Using PhantomJS, we can extract certain information and process it, as demonstrated previously. The possibilities that can be done with the page context are relative to what you can do with the DOM API. We can modify the HTML code, content, attributes, and even change the CSS styling. Let's explore more and do some page interactions.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.97.204