Formal inspection

In the previous section we used the URL http://search.debian.org/cgibin/omega, and the dictionary data_dict = {'P': 'Python'}. But where did these come from?

We get these by visiting the web page containing the form we would submit to get the results manually. We then inspect the HTML source code of the web page. If we were carrying out the aforementioned search in a web browser, then we would most likely be on the http://www.debian.org page, and we would be running a search by typing our search term into the search box at the top right corner and then clicking on Search.

Most modern browsers allow you to directly inspect the source for any element on a page. To do this right-click on the element, which in this case is the search box, then select the Inspect Element option, as shown in the screenshot here:

Formal inspection

The source code will pop up in a section of the window. In the preceding screenshot, it's at the bottom left corner of the screen. Here, you will see some lines of code that looks like the following example:

<form action="http://search.debian.org/cgi-bin/omega"
method="get" name="P">
  <p>
    <input type="hidden" value="en" name="DB"></input>
    <input size="27" value="" name="P"></input>
    <input type="submit" value="Search"></input>
  </p>
</form>

You should see the second <input> highlighted. This is the tag that corresponds to the search text box. The value of the name attribute on the highlighted <input> tag is the key that we use in our data_dict, which in this case is P. The value in our data_dict is the term that we want to search for.

To get the URL, we need to look above the highlighted <input> for the enclosing <form> tag. Here, our URL will be of the value of the action attribute, http://search.debian.org/cgi-bin/omega. The source code for this web page is included in the source code download for this book, in case Debian changes their website before you read this.

This process can be applied to most HTML pages. To do this, look for the <input> corresponding to the input text box, then find the URL from the enclosing <form> tag. If you're not familiar with HTML, then this can be a bit of a trial and error process. We'll be looking at some more methods of parsing HTML in the next chapter.

Once we have our input name and URL, we can construct and submit the POST request, as shown in the previous section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.75.165