The PHP/CURL session is configured with the curl_setopt()
function. Each individual configuration option is set with a separate call to this function. The script in Listing A-1 is unusual in its brevity. In normal use, there are many calls to curl_setopt()
. There are over 90 separate configuration options available within PHP/CURL, making the interface very versatile.[94] The average PHP/CURL user, however, uses only a small subset of the available options. The following sections describe the PHP/CURL options you are most apt to use. While these options are listed here in order of relative importance, you may declare them in any order. If the session is left open, the configu-ration may be reused many times within the same session.
Use the CURLOPT_URL
option to define the target URL for your PHP/CURL session, as shown in Listing A-2.
curl_setopt($s, CURLOPT_URL, "http://www.schrenk.com/index.php");
Listing A-2: Defining the target URL
You should use a fully formed URL describing the protocol, domain, and file in every PHP/CURL file request.
The CURLOPT_RETURNTRANSFER
option must be set to TRUE
, as in Listing A-3, if you want the result to be returned in a string. If you don't set this option to TRUE
, PHP/CURL echoes the result to the terminal.
curl_setopt($s, CURLOPT_RETURNTRANSFER, TRUE); // Return in string
Listing A-3: Telling PHP/CURL that you want the result to be returned in a string
The CURLOPT_REFERER
option allows your webbot to spoof a hyper-reference that was clicked to initiate the request for the target file. The example in Listing A-4 tells the target server that someone clicked a link on http://www.a_domain.com/index.php to request the target web page.
curl_setopt($s, CURLOPT_REFERER, "http://www.a_domain.com/index.php");
Listing A-4: Spoofing a hyper-reference
The CURLOPT_FOLLOWLOCATION
option tells cURL that you want it to follow every page redirection it finds. It's important to understand that PHP/CURL only honors header redirections and not redirections set with a refresh meta tag or with JavaScript, as shown in Listing A-5.
# Example of redirection that cURL will follow header("Location: http://www.schrenk.com"); ?> <!-- Examples of redirections that cURL will not follow--> <meta http-equiv="Refresh" content="0;url=http://www.schrenk.com"> <script>document.location="http://www.schrenk.com"</script>
Listing A-5: Redirects that cURL can and cannot follow
Any time you use CURLOPT_FOLLOWLOCATION
, set CURLOPT_MAXREDIRS
to the maximum number of redirections you care to follow. Limiting the number of redirections keeps your webbot out of infinite loops, where redirections point repeatedly to the same URL. My introduction to CURLOPT_MAXREDIRS
came while trying to solve a problem brought to my attention by a network administrator, who initially thought that someone (using a webbot I wrote) launched a DoS attack on his server. In reality, the server misinterpreted the webbot's header request as a hacking exploit and redirected the webbot to an error page. There was a bug on the error page that caused it to repeatedly redirect the webbot to the error page, causing an infinite loop (and near-infinite bandwidth usage). The addition of CURLOPT_MAXREDIRS
solved the problem, as demonstrated in Listing A-6.
curl_setopt($s, CURLOPT_FOLLOWLOCATION, TRUE); // Follow header redirections curl_setopt($s, CURLOPT_MAXREDIRS, 4); // Limit redirections to 4
Listing A-6: Using the CURLOPT_FOLLOWLOCATION
and CURLOPT_MAXREDIRS
options
Use this option to define the name of your user agent, as shown in Listing A-7. The user agent name is recorded in server access log files and is available to server-side scripts in the $_SERVER['HTTP_USER_AGENT']
variable.
$agent_name = "test_webbot"; curl_setopt($s, CURLOPT_USERAGENT, $agent_name);
Listing A-7: Setting the user agent name
Keep in mind that many websites will not serve pages correctly if your user agent name is something other than a standard web browser.
These options tell PHP/CURL to return either the web page's header or body. By default, PHP/CURL will always return the body, but not the header. This explains why setting CURL_NOBODY
to TRUE
excludes the body, and setting CURL_HEADER
to TRUE
includes the header, as shown in Listing A-8.
curl_setopt($s, CURLOPT_HEADER, TRUE); // Include the header curl_setopt($s, CURLOPT_NOBODY, TURE); // Exclude the body
Listing A-8: Using the CURLOPT_HEADER
and CURLOPT_NOBODY
options
If you don't limit how long PHP/CURL waits for a response from a server, it may wait forever—especially if the file you're fetching is on a busy server or you're trying to connect to a nonexistent or inactive IP address. (The latter happens frequently when a spider follows dead links on a website.) Setting a time-out value, as shown in Listing A-9, causes PHP/CURL to end the session if the download takes longer than the time-out value (in seconds).
curl_setopt($s, CURLOPT_TIMEOUT, 30); // Don't wait longer than 30 seconds
Listing A-9: Setting a socket time-out value
One of the slickest features of PHP/CURL is the ability to manage cookies sent to and received from a website. Use the CURLOPT_COOKIEFILE
option to define the file where previously stored cookies exist. At the end of the session, PHP/CURL writes new cookies to the file indicated by CURLOPT_COOKIEJAR
. An example is in Listing A-10; I have never seen an application where these two options don't reference the same file.
curl_setopt($s, CURLOPT_COOKIEFILE, "c:otscookies.txt"); // Read cookie file curl_setopt($s, CURLOPT_COOKIEJAR, "c:otscookies.txt"); // Write cookie file
Listing A-10: Telling PHP/CURL where to read and write cookies
When specifying the location of a cookie file, always use the complete location of the file, and do not use relative addresses. More information about managing cookies is available in Chapter 22.
The CURLOPT_HTTPHEADER
configuration allows a cURL session to send an outgoing header message to the server. The script in Listing A-11 uses this option to tell the target server the MIME type it accepts, the content type it expects, and that the user agent is capable of decompressing compressed web responses.
Note that CURLOPT_HTTPHEADER
expects to receive data in an array.
$header_array[] = "Mime-Version: 1.0"; $header_array[] = "Content-type: text/html; charset=iso-8859-1"; $header_array[] = "Accept-Encoding: compress, gzip"; curl_setopt($curl_session, CURLOPT_HTTPHEADER, $header_array);
Listing A-11: Configuring an outgoing header
You only need to use this option if the target website uses SSL encryption and the protocol in CURLOPT_URL
is https:
. An example is shown in Listing A-12.
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE); // No certificate
Listing A-12: Configuring PHP/CURL not to use a local client certificate
Depending on the version of PHP/CURL you use, this option may be required; if you don't use it, the target server will attempt to download a client certificate, which is unnecessary in all but rare cases.
As shown in Listing A-13, you may use the CURLOPT_USERPWD
option with a valid username and password to access websites that use basic authentication. In contrast to using a browser, you will have to submit the username and password to every page accessed within the basic authentication realm.
curl_setopt($s, CURLOPT_USERPWD, "username:password"); curl_setopt($s, CURLOPT_UNRESTICTED_AUTH, TRUE);
Listing A-13: Configuring PHP/CURL for basic authentication schemes
If you use this option in conjunction with CURLOPT_FOLLOWLOCATION
, you should also use the CURLOPT_UNRESTRICTED_AUTH
option, which will ensure that the username and password are sent to all pages you're redirected to, providing they are part of the same realm.
Exercise caution with using CURLOPT_USERPWD
, as it is possible that you can inadvertently send username and password information to the wrong server, where it may appear in access log files.
The CURLOPT_POST
and CURLOPT_POSTFIELDS
options configure PHP/CURL to emulate forms with the POST
method. Since the default method is GET
, you must first tell PHP/CURL to use the POST
method. Then you must specify the POST
data that you want to be sent to the target webserver. An example is shown in Listing A-14.
curl_setopt($s, CURLOPT_POST, TRUE); // Use POST method $post_data = "var1=1&var2=2&var3=3"; // Define POST data values curl_setopt($s, CURLOPT_POSTFIELDS, $post_data);
Listing A-14: Configuring POST
method transfers
Notice that the POST
data looks like a standard query string sent in a GET
method. Incidentally, to send form information with the GET
method, simply attach the query string to the target URL.
The CURLOPT_VERBOSE
option controls the quantity of status messages created during a file transfer. You may find this helpful during debugging, but it is best to turn off this option during the production phase, because it produces many entries in your server log file. A typical succession of log messages for a single file download looks like Listing A-15.
* About to connect() to www.schrenk.com port 80 * Connected to www.schrenk.com (66.179.150.101) port 80 * Connection #0 left intact * Closing connection #0
Listing A-15: Typical messages from a verbose PHP/CURL session
If you're in verbose mode on a busy server, you'll create very large log files. Listing A-16 shows how to turn off verbose mode.
curl_setopt($s, CURLOPT_VERBOSE, FALSE); // Minimal logs
Listing A-16: Turning off verbose mode reduces the size of server log files.
By default, PHP/CURL uses port 80 for all HTTP sessions, unless you are connecting to an SSL encrypted server, in which case port 443 is used.[95] These are the standard port numbers for HTTP and HTTPS protocols, respectively. If you're connecting to a custom protocol or wish to connect to a non-web protocol, use CURLOPT_PORT
to set the desired port number, as shown in Listing A-17.
curl_setopt($s, CURLOPT_PORT, 234); // Use port number 234
Listing A-17: Using nonstandard communication ports
Configuration settings must be capitalized, as shown in the previous examples. This is because the option names are predefined PHP constants. Therefore, your code will fail if you specify and option as curlopt_port
instead of CURLOPT_PORT
.
[94] You can find a complete set of PHP/CURL options at http://www.php.net/manual/en/function.curl-setopt.php.
[95] Well-known and standard port numbers are defined at http://www.iana.org/assignments/port-numbers.
18.188.151.107