Looking for values over an intranet or Internet

This example is similar to the previous one, with the difference being that you have to lookup the museum opening hours on a website instead of a web server. In this case, you will use the HTTP Client step.

Getting ready

You must have a database with the museum structure shown in the Appendix, Data Structures and a web page that provides the museum opening hours. The recipe uses an ASP page named hours.asp, but you can use the language of your preference. The page receives the museum's identification and returns a string with the schedule. You can download a sample web page from the book's website.

How to do it...

Carry out the following steps:

  1. Create a new transformation.
  2. Drop a Table input step into the canvas, in order to obtain the museum's information. Use the following SQL statement:
    SELECT id_museum
    , name
    , city
    , country
    FROM museums
    JOIN cities
    ON museums.id_city=cities.id_city
    
  3. Add a HTTP Client step from the Lookup category.
  4. Double-click on the step. In the URL field under the General tab, type the http web address of the webpage that provides the opening hours. For example: http://localhost/museum/hours.asp.
  5. Set the Result fieldname textbox to Hours.
  6. In the HTTP status code fieldname, type status.

    Note

    Under the General tab, you can include authentication credentials for the web service and proxy information, if it is needed.

  7. Under the Fields tab, set the parameter that will be sent to the page as a GET parameter. Type id_museum in both in the Name and Parameter columns.
  8. The result for the transformation will be the same as the one obtained in the previous recipe.
  9. Take a look at that recipe for a preview of the final results.

How it works...

The HTTP Client step looks for the museums' opening hours over the intranet; the step does a request to the web page for each museum in the dataset. One example of this request passing the parameter would be the following:

http://localhost/museum/hours.asp?id_museum=25

Then, the response of the page containing the museum opening hours will set the Hours field.

The status field will hold the status code of the operation. For example, a status code equal to 200 means a successful request, whereas a status code 400 is a bad request. You can check the different status codes at the following URL:

http://en.wikipedia.org/wiki/List_of_HTTP_status_codes.

There's more...

Suppose that each museum has a different website (and different URL address) with a web page that provides its opening hours. In this case, you can store this specific URL as a new field in the museum dataset. Then in the HTTP Client step check the Accept URL from field? checkbox and select that field from the URL field name drop-down list.

Note

One alternative to this step is the HTTP Post Lookup step. Using this step, you connect to the website and pass the parameters through a POST method instead of a GET method.

See also

For an example using the HTTP Client step to get data from an Internet service take a look at the sample transformation in the Introduction of Chapter 8, Integrating Kettle and the Pentaho Suite.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.84.157