Setting PHP/CURL Options

The PHP/CURL session is configured with the curl_setopt() function. Each individual configuration option is set with a separate call to this function. The script in Listing A-1 is unusual in its brevity. In normal use, there are many calls to curl_setopt(). There are over 90 separate configuration options available within PHP/CURL, making the interface very versatile.[94] The average PHP/CURL user, however, uses only a small subset of the available options. The following sections describe the PHP/CURL options you are most apt to use. While these options are listed here in order of relative importance, you may declare them in any order. If the session is left open, the configu-ration may be reused many times within the same session.

CURLOPT_URL

Use the CURLOPT_URL option to define the target URL for your PHP/CURL session, as shown in Listing A-2.

curl_setopt($s, CURLOPT_URL, "http://www.schrenk.com/index.php");

Listing A-2: Defining the target URL

You should use a fully formed URL describing the protocol, domain, and file in every PHP/CURL file request.

CURLOPT_RETURNTRANSFER

The CURLOPT_RETURNTRANSFER option must be set to TRUE, as in Listing A-3, if you want the result to be returned in a string. If you don't set this option to TRUE, PHP/CURL echoes the result to the terminal.

curl_setopt($s, CURLOPT_RETURNTRANSFER, TRUE);          // Return in string

Listing A-3: Telling PHP/CURL that you want the result to be returned in a string

CURLOPT_REFERER

The CURLOPT_REFERER option allows your webbot to spoof a hyper-reference that was clicked to initiate the request for the target file. The example in Listing A-4 tells the target server that someone clicked a link on http://www.a_domain.com/index.php to request the target web page.

curl_setopt($s, CURLOPT_REFERER, "http://www.a_domain.com/index.php");

Listing A-4: Spoofing a hyper-reference

CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS

The CURLOPT_FOLLOWLOCATION option tells cURL that you want it to follow every page redirection it finds. It's important to understand that PHP/CURL only honors header redirections and not redirections set with a refresh meta tag or with JavaScript, as shown in Listing A-5.

# Example of redirection that cURL will follow
header("Location: http://www.schrenk.com");
?>

<!-- Examples of redirections that cURL will not follow-->
<meta http-equiv="Refresh" content="0;url=http://www.schrenk.com">
<script>document.location="http://www.schrenk.com"</script>

Listing A-5: Redirects that cURL can and cannot follow

Any time you use CURLOPT_FOLLOWLOCATION, set CURLOPT_MAXREDIRS to the maximum number of redirections you care to follow. Limiting the number of redirections keeps your webbot out of infinite loops, where redirections point repeatedly to the same URL. My introduction to CURLOPT_MAXREDIRS came while trying to solve a problem brought to my attention by a network administrator, who initially thought that someone (using a webbot I wrote) launched a DoS attack on his server. In reality, the server misinterpreted the webbot's header request as a hacking exploit and redirected the webbot to an error page. There was a bug on the error page that caused it to repeatedly redirect the webbot to the error page, causing an infinite loop (and near-infinite bandwidth usage). The addition of CURLOPT_MAXREDIRS solved the problem, as demonstrated in Listing A-6.

curl_setopt($s, CURLOPT_FOLLOWLOCATION, TRUE); // Follow header redirections
curl_setopt($s, CURLOPT_MAXREDIRS, 4);         // Limit redirections to 4

Listing A-6: Using the CURLOPT_FOLLOWLOCATION and CURLOPT_MAXREDIRS options

CURLOPT_USERAGENT

Use this option to define the name of your user agent, as shown in Listing A-7. The user agent name is recorded in server access log files and is available to server-side scripts in the $_SERVER['HTTP_USER_AGENT'] variable.

$agent_name = "test_webbot";
curl_setopt($s, CURLOPT_USERAGENT, $agent_name);

Listing A-7: Setting the user agent name

Keep in mind that many websites will not serve pages correctly if your user agent name is something other than a standard web browser.

CURLOPT_NOBODY and CURLOPT_HEADER

These options tell PHP/CURL to return either the web page's header or body. By default, PHP/CURL will always return the body, but not the header. This explains why setting CURL_NOBODY to TRUE excludes the body, and setting CURL_HEADER to TRUE includes the header, as shown in Listing A-8.

curl_setopt($s, CURLOPT_HEADER, TRUE);        // Include the header
curl_setopt($s, CURLOPT_NOBODY, TURE);        // Exclude the body

Listing A-8: Using the CURLOPT_HEADER and CURLOPT_NOBODY options

CURLOPT_TIMEOUT

If you don't limit how long PHP/CURL waits for a response from a server, it may wait forever—especially if the file you're fetching is on a busy server or you're trying to connect to a nonexistent or inactive IP address. (The latter happens frequently when a spider follows dead links on a website.) Setting a time-out value, as shown in Listing A-9, causes PHP/CURL to end the session if the download takes longer than the time-out value (in seconds).

curl_setopt($s, CURLOPT_TIMEOUT, 30);   // Don't wait longer than 30 seconds

Listing A-9: Setting a socket time-out value

CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR

One of the slickest features of PHP/CURL is the ability to manage cookies sent to and received from a website. Use the CURLOPT_COOKIEFILE option to define the file where previously stored cookies exist. At the end of the session, PHP/CURL writes new cookies to the file indicated by CURLOPT_COOKIEJAR. An example is in Listing A-10; I have never seen an application where these two options don't reference the same file.

curl_setopt($s, CURLOPT_COOKIEFILE, "c:otscookies.txt"); // Read cookie file
curl_setopt($s, CURLOPT_COOKIEJAR,  "c:otscookies.txt"); // Write cookie file

Listing A-10: Telling PHP/CURL where to read and write cookies

When specifying the location of a cookie file, always use the complete location of the file, and do not use relative addresses. More information about managing cookies is available in Chapter 22.

CURLOPT_HTTPHEADER

The CURLOPT_HTTPHEADER configuration allows a cURL session to send an outgoing header message to the server. The script in Listing A-11 uses this option to tell the target server the MIME type it accepts, the content type it expects, and that the user agent is capable of decompressing compressed web responses.

Note that CURLOPT_HTTPHEADER expects to receive data in an array.

$header_array[] = "Mime-Version: 1.0";
$header_array[] = "Content-type: text/html; charset=iso-8859-1";
$header_array[] = "Accept-Encoding: compress, gzip";
curl_setopt($curl_session, CURLOPT_HTTPHEADER, $header_array);

Listing A-11: Configuring an outgoing header

CURLOPT_SSL_VERIFYPEER

You only need to use this option if the target website uses SSL encryption and the protocol in CURLOPT_URL is https:. An example is shown in Listing A-12.

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);    // No certificate

Listing A-12: Configuring PHP/CURL not to use a local client certificate

Depending on the version of PHP/CURL you use, this option may be required; if you don't use it, the target server will attempt to download a client certificate, which is unnecessary in all but rare cases.

CURLOPT_USERPWD and CURLOPT_UNRESTRICTED_AUTH

As shown in Listing A-13, you may use the CURLOPT_USERPWD option with a valid username and password to access websites that use basic authentication. In contrast to using a browser, you will have to submit the username and password to every page accessed within the basic authentication realm.

curl_setopt($s, CURLOPT_USERPWD, "username:password");
curl_setopt($s, CURLOPT_UNRESTICTED_AUTH, TRUE);

Listing A-13: Configuring PHP/CURL for basic authentication schemes

If you use this option in conjunction with CURLOPT_FOLLOWLOCATION, you should also use the CURLOPT_UNRESTRICTED_AUTH option, which will ensure that the username and password are sent to all pages you're redirected to, providing they are part of the same realm.

Exercise caution with using CURLOPT_USERPWD, as it is possible that you can inadvertently send username and password information to the wrong server, where it may appear in access log files.

CURLOPT_POST and CURLOPT_POSTFIELDS

The CURLOPT_POST and CURLOPT_POSTFIELDS options configure PHP/CURL to emulate forms with the POST method. Since the default method is GET, you must first tell PHP/CURL to use the POST method. Then you must specify the POST data that you want to be sent to the target webserver. An example is shown in Listing A-14.

curl_setopt($s, CURLOPT_POST, TRUE);             // Use POST method
$post_data = "var1=1&var2=2&var3=3";             // Define POST data values
curl_setopt($s, CURLOPT_POSTFIELDS, $post_data);

Listing A-14: Configuring POST method transfers

Notice that the POST data looks like a standard query string sent in a GET method. Incidentally, to send form information with the GET method, simply attach the query string to the target URL.

CURLOPT_VERBOSE

The CURLOPT_VERBOSE option controls the quantity of status messages created during a file transfer. You may find this helpful during debugging, but it is best to turn off this option during the production phase, because it produces many entries in your server log file. A typical succession of log messages for a single file download looks like Listing A-15.

* About to connect() to www.schrenk.com port 80
* Connected to www.schrenk.com (66.179.150.101) port 80
* Connection #0 left intact
* Closing connection #0

Listing A-15: Typical messages from a verbose PHP/CURL session

If you're in verbose mode on a busy server, you'll create very large log files. Listing A-16 shows how to turn off verbose mode.

curl_setopt($s, CURLOPT_VERBOSE, FALSE);        // Minimal logs

Listing A-16: Turning off verbose mode reduces the size of server log files.

CURLOPT_PORT

By default, PHP/CURL uses port 80 for all HTTP sessions, unless you are connecting to an SSL encrypted server, in which case port 443 is used.[95] These are the standard port numbers for HTTP and HTTPS protocols, respectively. If you're connecting to a custom protocol or wish to connect to a non-web protocol, use CURLOPT_PORT to set the desired port number, as shown in Listing A-17.

curl_setopt($s, CURLOPT_PORT, 234);            // Use port number 234

Listing A-17: Using nonstandard communication ports

Note

Configuration settings must be capitalized, as shown in the previous examples. This is because the option names are predefined PHP constants. Therefore, your code will fail if you specify and option as curlopt_port instead of CURLOPT_PORT.



[94] You can find a complete set of PHP/CURL options at http://www.php.net/manual/en/function.curl-setopt.php.

[95] Well-known and standard port numbers are defined at http://www.iana.org/assignments/port-numbers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.60.232