Downloading content from the Internet

The title of this recipe may not seem related to web services at a first glance, but since most of the available services are actually based on Hyper Text Transfer Protocol (HTTP) as well as most of the content on the Internet, it is worth starting with getting a basic understanding of simple HTTP operations before diving into the more complex world of web services.

How to do it...

For downloading HTTP-based content, we don't need any special libraries or stratagem. All that is needed are the standard Java classes—File and URL—and their Groovy extensions:

  1. We first start with defining our target and source files:
    def outputFile = new File('image.png')
    def baseUrl = 'http://groovy.codehaus.org'
    def imagePath = '/images/groovy-logo-medium.png'
    def url = new URL("${baseUrl}${imagePath}")
  2. Then, just in case, the outputFile already exists, we need to delete it to avoid appending content:
    outputFile.delete()
  3. The last step is to stream the URL's content into the outputFile:
    url.withInputStream { inputStream ->
      outputFile << inputStream
    }

How it works...

The withInputStream method is a Groovy extension added to the URL class. We already presented many extension methods that Groovy appends to the standard JDK classes in previous chapters (for example, the Using Java Classes from Groovy recipe in Chapter 2, Using Groovy Ecosystem), and we showed the way to write your own extensions in the Adding a functionality to the existing Java/Groovy classes recipe in Chapter 3, Using Groovy Language Features. More information on extended functionality, which exists in Groovy for the java.net.URL class, can be found at http://groovy.codehaus.org/groovy-jdk/java/net/URL.html.

Basically, the withInputStream method takes care of flushing and closing the stream automatically, and gives access to the java.io.InputStream instance, which contains the binary data retrieved from a remote resource over the HTTP protocol. More information on manipulating files and input streams can be found in Chapter 4, Working with Files in Groovy.

Under the hood, the URL class performs an HTTP GET request for the resource specified by the URL. A lengthier description on how to construct complex GET requests can be found in the Executing an HTTP GET request recipe.

With this new knowledge about the basics of executing an HTTP GET request described in this recipe, you can use many REST-based APIs in a read-only mode as well as download WSDL and XSD for the SOAP-based web services.

There's more...

If the content to download is textual, we can use a more concise way to fetch it with the help of the getText method available on the URL class:

def outputFile = new File('groovy.html')
def baseUrl = 'http://groovy.codehaus.org'
def url = new URL(baseUrl)

// Saving textual content.
outputFile.text = url.text

See also

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.149.27.234