Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Downloading the page for offline analysis with HTTrack

As stated on HTTrack's official website (http://www.httrack.com):

"It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer."

We will be using HTTrack in this recipe to download the whole content of an application's site.

Getting ready

HTTrack is not installed by default in Kali Linux, so we will need to install it, as shown:

apt-get update
apt-get install httrack

How to do it...

Our first step will be to create a directory to store the downloaded site and then enter it:
```
mkdir bodgeit_httrack
cd bodgeit_httrack
```
The simplest way to use HTTrack is by adding the URL that we want to download to the command:
```
httrack http://192.168.56.102/bodgeit/
```
It is important to set the last "/"; if it is omitted, HTTrack will return a 404 error because there is no "bodgeit" file in the root of the server.
Now, if we go to file:///root/MyCookbook/test/bodgeit_httrack/index.html (or the path you selected in your test environment), we will see that we can browse the whole site offline:

How it works...

HTTrack creates a full static copy of the site, which means that all dynamic content, such as responses to user inputs, won't be available. Inside the folder we downloaded the site, we can see the following files and directories:

A directory named after the server's name or address, which contains all the files that were downloaded.
A cookies.txt file, which contains the cookies information used to download the site.
The hts-cache directory contains a list of files detected by the crawler; this is the list of files that httrack processed.
The hts-log.txt file contains the errors, warnings, and other information reported during the crawling and downloading of the site.
An index.html file that redirects to the copy of the original index file located in the server-name directory.

There's more...

HTTrack also has an extensive collection of options that will allow us to customize its behavior to fit our needs better. The following are some useful modifiers to consider:

-rN: Sets the depth to N levels of links to follow
-%eN: Sets the limit depth to external links
+[pattern]: Tells HTTrack to whitelist all URL matching [pattern], for example +*google.com/*
-[pattern]: Tells HTTrack to blacklist (omit from downloading) all links matching the pattern
-F [user-agent]: This options allows us to define the user-agent (browser identifier) that we want to use to download the site

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Downloading the page for offline analysis with HTTrack

Create new playlist

Sign In

Sign Up

Downloading the page for offline analysis with HTTrack

Getting ready

How to do it...

How it works...

There's more...

Table of Contents for
Downloading the page for offline analysis with HTTrack