WGET cookbook PDF link : Download Here
wget
is a powerful command-line utility for downloading files from the web. It supports various protocols such as HTTP, HTTPS, FTP, and FTPS.
- Basic File Download
This command downloads the file named file.zip
from the specified URL.
wget https://example.com/file.zip
Example:
wget https://sample-videos.com/zip/10mb.zip
- Download to a specific directory
wget -P /path/to/directory https://example.com/file.zip
Example:
wget -P Downloads https://sample-videos.com/zip/10mb.zip
- Download with a different name
wget -O newname.zip https://example.com/file.zip
Downloads the file and save it as newname.zip
Example:
wget -O newname.zip https://sample-videos.com/zip/10mb.zip
- Download multiple files
wget https://example.com/file1.zip https://example.com/file2.zip
Example:
wget https://sample-videos.com/zip/10mb.zip https://sample-videos.com/zip/20mb.zip
- Download in Background
wget -b https://example.com/largefile.zip
Example:
wget -b -O 20mnbfile.zip https://sample-videos.com/zip/20mb.zip
It will write the logs in to wget-log txt file.
- Rate Limiting Download
wget --limit-rate=200k https://example.com/largefile.zip
it limits the download rate to 200 Kb/s.
Example:
wget --limit-rate=200k https://sample-videos.com/zip/20mb.zip
- Resume Interrupted Download
wget -c https://example.com/largefile.zip
Example:
wget --limit-rate=200k https://sample-videos.com/zip/20mb.zip
- Downloading Entire Website
wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains example.com --no-parent https://example.com
--recursive
: Enables recursive retrieval, meaningwget
will download not only the specified URL but also follow and download links within that page, continuing recursively.--no-clobber
: This option preventswget
from overwriting existing files. If a file with the same name already exists in the local directory,wget
will not download it again.--page-requisites
: Downloads all the elements needed to properly display the page offline. This includes inline images, stylesheets, and other resources referenced by the HTML.--html-extension
: Appends the.html
extension to HTML files downloaded. This is useful when saving a complete website for offline browsing, as it helps maintain proper file extensions.--convert-links
: After downloading, converts the links in the downloaded documents to point to the local files, enabling offline browsing. This is important when you want to view the downloaded content without an internet connection.--domains
example.com
: Restricts the download to files under the specified domain (example.com
). This ensures thatwget
doesn't follow links to external domains, focusing only on the specified domain.--no-parent
: Preventswget
from ascending to the parent directory while recursively downloading. It ensures that only content within the specified URL and its subdirectories is downloaded.https://example.com
: The URL from whichwget
starts the recursive download.
Example:
wget --recursive --no-clobber --page-requisites --html-extension --convert-links --domains hashnode.dev --no-parent https://redterminal.hashnode.dev
- Mirror an entire website
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://example.com
--mirror
: Enables mirroring, which includes recursion to download the entire website.--convert-links
: Converts the links in the downloaded documents to point to the local files for proper offline browsing.--adjust-extension
: Adds proper file extensions to downloaded files.--page-requisites
: Downloads all the elements needed to properly display the page offline, such as inline images and stylesheets.--no-parent
: Preventswget
from ascending to the parent directory while recursively downloading.
Example:
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent https://example.com
- Download with a user-agent
Some websites block the request, if it finds the request is not coming from a browser. In those scenarios we can add the User-Agent in the http-header.
wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" https://example.com/file.zip
Example:
wget --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3" https://sample-videos.com/zip/20mb.zip
- Download with proxy - [[Proxy Server]]
wget --proxy=http://proxy.example.com:8080 https://example.com/file.zip
- Download files matching a pattern
wget -r -l1 -np -nd -A "*.jpg" https://example.com/images/
l1 -> Recursion depth level 1
np -> No parent directory files downloaded
nd -> No directory created
13.Test download url exists before downloading
wget --spider https://example.com
Example:
wget --spider https://sample-videos.com/zip/10mb.zip
14.Quit Download when it exceeds a certain time
wget -Q5m -i FILE-WHICH-HAS-URLS
Example:
wget -Q5m -i https://sample-videos.com/zip/10mb.zip https://sample-videos.com/zip/20mb.zip
Note: This quota will not get effect when you do a download a single URL. That is irrespective of the quota size everything will get downloaded when you specify a single file. This quota is applicable only for recursive downloads.
Lets try with recursive download,
wget --recursive -Q5m --no-clobber --page-requisites --html-extension --convert-links --domains hashnode.dev --no-parent https://redterminal.hashnode.dev
- Increase total number of retries
wget --tries=75 DOWNLOAD-URL