wget — stdin stdout - file -- opt --help --version
wget [options
]URL
The wget
command hits a URL
and downloads the data to a file or standard output. It’s great for
capturing individual web pages, downloading files, or duplicating
entire web site hierarchies to arbitrary depth. For example, let’s
capture the Yahoo home page:
$ wget http://www.yahoo.com 23:19:51 (220.84 KB/s) - `index.html' saved [31434]
which is saved to a file index.html in the current directory.
wget
has the added ability to
resume a download if it gets interrupted in the middle, say, due to
a network failure: just run wget
-c
with the same URL and it picks up where it left
off.
Perhaps the most useful feature of wget
is its ability to download files
without needing a web browser:
$ wget http://www.example.com/files/manual.pdf
This is great for large files like videos and ISO images. You can even write shell scripts to download sets of files if you know their names:
$ for i in 1 2 3; do wget http://example.com/$i.mpeg; done
Another similar command is curl
, which writes to standard output by
default—unlike wget
, which
duplicates the original page and file names by default.
$ curl http://www.yahoo.com > mypage.html
wget
has over 70 options,
so we’ll cover just a few important ones. (curl
has a different set of options; see
its manpage.)
|
Read URLs from the given file and retrieve them in turn. |
|
Write all the captured HTML to the given file, one page appended after the other. |
|
Continue mode: if a
previous retrieval was interrupted, leaving only a partial
file as a result, pick up where |
|
Try
|
|
Print dots to show the download progress. |
|
Print bars to show the download progress. |
|
Don’t download, just check existence of remote pages. |
|
Retrieve all files
into the current directory, even if remotely they are in a
more complex directory tree. (By default, |
|
Retrieve a page hierarchy recursively, including subdirectories. |
|
Retrieve files at
most |
|
Inside retrieved files, modify URLs so the files can be viewed locally in a web browser. |
|
Download all necessary files to make a page display completely, such as stylesheets and images. |
|
Follow relative links (within a page) but not absolute links. |
|
Accept mode: download only files whose names match a given pattern. Patterns may contain the same wildcards as the shell. |
|
Reject mode: download only files whose names do not match a given pattern. |
|
Directory inclusion: download files only from directories that match a given pattern. |
|
Directory exclusion: download files only from directories that do not match a given pattern. |
3.135.183.1