Name

wget — stdin  stdout  - file  -- opt  --help  --version

Synopsis

wget [options] URL

The wget command hits a URL and downloads the data to a file or standard output. It’s great for capturing individual web pages, downloading files, or duplicating entire web site hierarchies to arbitrary depth. For example, let’s capture the Yahoo home page:

$ wget http://www.yahoo.com
23:19:51 (220.84 KB/s) - `index.html' saved [31434]

which is saved to a file index.html in the current directory. wget has the added ability to resume a download if it gets interrupted in the middle, say, due to a network failure: just run wget -c with the same URL and it picks up where it left off.

Perhaps the most useful feature of wget is its ability to download files without needing a web browser:

$ wget http://www.example.com/files/manual.pdf

This is great for large files like videos and ISO images. You can even write shell scripts to download sets of files if you know their names:

$ for i in 1 2 3; do wget http://example.com/$i.mpeg; done

Another similar command is curl, which writes to standard output by default—unlike wget, which duplicates the original page and file names by default.

$ curl http://www.yahoo.com > mypage.html

wget has over 70 options, so we’ll cover just a few important ones. (curl has a different set of options; see its manpage.)

Useful options

-i filename

Read URLs from the given file and retrieve them in turn.

-O filename

Write all the captured HTML to the given file, one page appended after the other.

-c

Continue mode: if a previous retrieval was interrupted, leaving only a partial file as a result, pick up where wget left off. That is, if wget had downloaded 100K of a 150K file, the -c option says to retrieve only the remaining 50K and append it to the existing file. wget can be fooled, however, if the remote file has changed since the first (partial) download, so use this option only if you know the remote file hasn’t changed.

-t N

Try N times before giving up. N =0 means try forever.

--progress=dot

Print dots to show the download progress.

--progress=bar

Print bars to show the download progress.

--spider

Don’t download, just check existence of remote pages.

-nd

Retrieve all files into the current directory, even if remotely they are in a more complex directory tree. (By default, wget duplicates the remote directory hierarchy.)

-r

Retrieve a page hierarchy recursively, including subdirectories.

-l N

Retrieve files at most N levels deep (5 by default).

-k

Inside retrieved files, modify URLs so the files can be viewed locally in a web browser.

-p

Download all necessary files to make a page display completely, such as stylesheets and images.

-L

Follow relative links (within a page) but not absolute links.

-A pattern

Accept mode: download only files whose names match a given pattern. Patterns may contain the same wildcards as the shell.

-R pattern

Reject mode: download only files whose names do not match a given pattern.

-I pattern

Directory inclusion: download files only from directories that match a given pattern.

-X pattern

Directory exclusion: download files only from directories that do not match a given pattern.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.183.1