Now that we have our web scraper written, we need a way to take the VideoGame
objects that are returned, and actually store them in our database. Furthermore, we need a way to communicate with our server once it's up and running and tell it to scrape the site and insert it into our JDO database. Our gateway for communicating with our server is through what's called an HTTP servlet—something that we briefly mentioned earlier in the book.
Setting up your backend in this way will be especially useful when we talk later about CRON jobs which, in order to automatically run some kind of function, require a servlet to communicate with (more on this soon). For now though, let's see how we can extend the HttpServlet
class and implement its doGet()
method, which will listen and handle all HTTP GET requests sent to it. But first, what exactly is an HTTP GET request? Well, an HTTP web request is simply a user making a request to some server that will be sent over the network (that is, the Internet). Depending on the type of request, the server will then handle and send an HTTP response back to the user. There are then two common types of HTTP requests:
In this case, since we aren't getting any data for a user or submitting any data from a user (in fact we're not really interacting with any users at all), it really doesn't make a difference which type of request we use, so we'll stick with the simpler GET request as follows:
import javax.servlet.ServletException; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; // EXTEND THE HTTPSERVLET CLASS TO MAKE THIS METHOD AVAILABLE // TO EXTERNAL WEB REQUESTS, NAMELY CLIENTS AND CRON JOBS public class VideoGameScrapeServlet extends HttpServlet { private ArrayList<VideoGame> games; /** * METHOD THAT IS HIT WHEN HTTP GET REQUEST IS MADE * * @param request * a servlet request object (any params passed can be retrieved * with this) * @param response * a servlet response that you can embed data back to user * @throws IOException * @throws ServletException */ public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { games = new ArrayList<VideoGame>(); String message = "Success"; try { // GRAB GAMES FROM ALL PLATFORMS games.addAll( VideoGameScraper.getVideoGamesByConsole(VideoGameConsole.DS)); games.addAll( VideoGameScraper.getVideoGamesByConsole(VideoGameConsole.PS2)); games.addAll( VideoGameScraper.getVideoGamesByConsole(VideoGameConsole.PS3)); games.addAll( VideoGameScraper.getVideoGamesByConsole(VideoGameConsole.PSP)); games.addAll( VideoGameScraper.getVideoGamesByConsole(VideoGameConsole.WII)); games.addAll( VideoGameScraper.getVideoGamesByConsole(VideoGameConsole.XBOX)); } catch (Exception e) { e.printStackTrace(); message = "Failed"; } // HERE WE ADD ALL GAMES TO OUR VIDEOGAME JDO WRAPPER VideoGameJDOWrapper.batchInsertGames(games); // WRITE A RESPONSE BACK TO ORIGINAL HTTP REQUESTER response.setContentType("text/html"); response.setHeader("Cache-Control", "no-cache"); response.getWriter().write(message); } }
So the method itself is quite simple. We already have our
getVideoGamesByConsole()
method from earlier, which goes and does all the scraping, returning a list of VideoGame
objects as a result. We then simply run it for every console that we want, and at the end use our nifty JDO database wrapper class and call its
batchInsertGames()
method for quicker insertions. Once that's done, we take the HTTP response object that is passed in and quickly write some kind of message back to the user to let them know whether or not the scraping was successful. In this case, we don't make use of the HttpServletRequest
object that gets passed in, but that object will come in very handy if the requester passes parameters into the URL. For instance, say you wanted to write your servlet in a way that only scrapes one specific game platform instead of all of them. In that case, you would need some way of passing a platform-type parameter to your servlet, and then extracting that passed-in parameter value within the servlet. Well, just like how earlier we saw that Yahoo! Finance allows you to pass in tickers with key value s
, to pass in a platform type, we could simply do the following:
http://{your-GAE-base-url}.appspot.com/videoGameScrapeServlet?type=Xbox
Then, on the servlet side do:
public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { String type = request.getParameter("type"); games = new ArrayList<VideoGame>(); String message = "Success"; try { // GRAB GAMES FROM SPECIFIC PLATFORM games.addAll(VideoGameScraper.getVideoGamesByConsole(type)); } catch (Exception e) { e.printStackTrace(); message = "Failed"; } // ADD GAMES TO JDO DATABASE VideoGameJDOWrapper.batchInsertGames(games); // WRITE A RESPONSE BACK TO ORIGINAL HTTP REQUESTER response.setContentType("text/html"); response.setHeader("Cache-Control", "no-cache"); response.getWriter().write(message); }
Pretty simple, right? You just have to make sure that the key used in the URL matches the parameter you request within the servlet class. Now, the last and final step for getting this all hooked together is defining the URL path in your GAE project—namely, making sure your GAE project knows that the URL pattern actually points to this class you just wrote. This can be found in your GAE project's /war/WEB-INF/
directory, specifically in the web.xml
file. There you'll need to add the following. To make sure that the servlet name and class path matches the given URL pattern:
<?xml version="1.0" encoding="utf-8"?> <web-app xmlns="http://java.sun.com/xml/ns/javaee" version="2.5"> <servlet> <servlet-name>videoGameScrapeServlet</servlet-name> <servlet-class>app.httpservlets.VideoGameScrapeServlet</servlet-class> </servlet> <servlet-mapping> <servlet-name>videoGameScrapeServlet</servlet-name> <url-pattern>/videoGameScrapeServlet</url-pattern> </servlet-mapping> </web-app>
At this point, we have our scraper, we have our JDO database, and we even have our first servlet all hooked up and ready to go. The last part is scheduling your scraper to run periodically; that way, your database has the latest and most up-to-date data, without you having to sit in front of your computer every day and manually call your scraper. In this next section, we'll see how we can use CRON jobs to accomplish just this.
3.15.34.39