Bound Parameters

Statement parameters are special tokens that are inserted into the SQL command string before it is passed to one of the sqlite3_prepare_xxx() functions. They act as a placeholder for any literal value, such as a bare number or a single quote string. After the statement is prepared, but before it is executed, you can bind specific values to each statement parameter. Once you’re done executing a statement, you can reset the statement, bind new values to the parameters, and execute the statement again—only this time with the new values.

Parameter Tokens

SQLite supports five different styles of statement parameters. These short string tokens are placed directly into the SQL command string, which can then be passed to one of the sqlite3_prepare_xxx() functions. Once the statement is prepared, the individual parameters are referenced by index.

?

An anonymous parameter with automatic index. As the statement is processed, each anonymous parameter is assigned a unique, sequential index value, starting with one.

?<index>

Parameter with explicit numeric index. Duplicate indexes allow the same value to be bound multiple places in the same statement.

:<name>

A named parameter with an automatic index. Duplicate names allow the same value to be bound multiple places in the same statement.

@<name>

A named parameter with an automatic index. Duplicate names allow the same value to be bound multiple places in the same statement. Works exactly like the colon parameter.

$<name>

A named parameter with an automatic index. Duplicate names allow the same value to be bound multiple places in the same statement. This is an extended syntax to support Tcl variables. Unless you’re doing Tcl programming, I suggest you use the colon format.

To get an idea of how these work, consider this INSERT statement:

INSERT INTO people (id, name) VALUES ( ?, ? );

The two statement parameters represent the id and name values being inserted. Parameter indexes start at one, so the first parameter that represents the id value has an index of one, and the parameter used to reference the name value has an index of two.

Notice that the second parameter, which is likely a text value, does not have single quotes around it. The single quotes are part of the string-literal representation, and are not required for a parameter value.

Warning

Statement parameters should not be put in quotes. The notation '?' designates a one-character text value, not a parameter.

Once this statement has been prepared, it can be used to insert multiple rows. For each row, simply bind the appropriate values, step through the statement, and then reset the statement. After the statement has been reset, new values can be bound to the parameters and the statement can be stepped again.

You can also use explicit index values:

INSERT INTO people (id, name) VALUES ( ?1, ?2 );

Using explicit parameter indexes has two major advantages. First, you can have multiple instances of the same index value, allowing the same value to be bound to more than one place in the same statement.

Second, explicit indexes allow the parameters to appear out of order. There can even be gaps in the index sequence. This can help simplify application code maintenance if the query is modified and parameters are added or removed.

This level of abstraction can be taken even further by using named parameters. In this case, you allow SQLite to assign parameter index values as it sees fit, in a similar fashion to anonymous parameters. The difference is that you can ask SQLite to tell you the index value of a specific parameter based off the name you’ve given it. Consider this statement:

INSERT INTO people (id, name) VALUES ( :id, :name );

In this case, the parameter values are quite explicit. As we will see in the next section, the code that binds values to these parameters is also quite explicit, making it very clear what is going on. Best of all, it doesn’t matter if new parameters are added. As long as the existing names remain unchanged, the code will properly find and bind the named parameters.

Note, however, that parameters can only be used to replace literal values, such as quoted strings or numeric values. Parameters cannot be used in place of identifiers, such as table names or column names. The following bit of SQL is invalid:

SELECT * FROM ?;   -- INCORRECT: Cannot use a parameter as an identifier

If you attempt to prepare this statement, it will fail. This is because the parameter (which acts as an unknown literal value) is being used where an identifier is required. This is invalid, and the statement will not prepare correctly.

Within a statement, it is best to choose a specific parameter style and stick with it. Mixing anonymous parameters with explicit indexes or named parameters is likely to cause confusion about what index belongs to which parameter.

Personally, I prefer to used the colon-name-style parameters. Using named parameters eliminates the need to know any specific index values, allowing you to just reference the name at runtime. The use of short, significant names can also make the intent of both your SQL statements and your bind code easier to understand.

Binding Values

When you first prepare a statement with parameters, all of the parameters start out with a NULL assigned to them. Before you execute the statement, you can bind specific values to each parameter using the sqlite3_bind_xxx() family of functions.

There are nine sqlite3_bind_xxx() functions available, plus a number of utility functions. These functions can be called any time after the statement is prepared, but before sqlite3_step() is called for the first time. Once sqlite3_step() has been called, these functions cannot be called again until the statement is reset.

All the sqlite3_bind_xxx() functions have the same first and second parameters and return the same result. The first parameter is always a pointer to an sqlite3_stmt, and the second is the index of the parameter to bind. Remember that for anonymous parameters, the first index value starts with one. For the most part, the third parameter is the value to bind. The fourth parameter, if present, indicates the length of the data value in bytes. The fifth parameter, if present, is a function pointer to a memory management callback.

Warning

Remember that bind index values start with one (1), unlike result column indexes, which start with zero (0).

All the bind functions return an integer error code, which is equal to SQLITE_OK upon success.

The bind functions are:

int sqlite3_bind_blob( sqlite3_stmt *stmt, int pidx, const void *data, int data_len, mem_callback )

Binds an arbitrary length binary data BLOB.

int sqlite3_bind_double( sqlite3_stmt *stmt, int pidx, double data )

Binds a 64-bit floating point value.

int sqlite3_bind_int( sqlite3_stmt *stmt, int pidx, int data )

Binds a 32-bit signed integer value.

int sqlite3_bind_int64( sqlite3_stmt *stmt, int pidx, sqlite3_int64 )

Binds a 64-bit signed integer value.

int sqlite3_bind_null( sqlite3_stmt *stmt, int pidx )

Binds a NULL datatype.

int sqlite3_bind_text( sqlite3_stmt *stmt, int pidx, const char *data, int data_len, mem_callback )

Binds an arbitrary length UTF-8 encoded text value. The length is in bytes, not characters. If the length parameter is negative, SQLite will compute the length of the string up to, but not including, the null terminator. It is recommended that the manually computed lengths do not include the terminator (the terminator will be included when the value is returned).

int sqlite3_bind_text16( sqlite3_stmt *stmt, int pidx, const void *data, int data_len, mem_callback )

Binds an arbitrary length UTF-16 encoded text value. The length is in bytes, not characters. If the length parameter is negative, SQLite will compute the length of the string up to, but not including, the null terminator. It is recommended that the manually computed lengths do not include the terminator (the terminator will be included when the value is returned).

int sqlite3_bind_zeroblob( sqlite3_stmt *stmt, int pidx, int len )

Binds an arbitrary length binary data BLOB, where each byte is set to zero (0x00). The only additional parameter is a length value, in bytes. This function is particularly useful for creating large BLOBs that can then be updated with the incremental BLOB interface. See sqlite3_blob_open() in Appendix G for more details.

In addition to these type-specific bind functions, there is also a specialized function:

int sqlite3_bind_value( sqlite3_stmt *stmt, int pidx, const sqlite3_value *data_value )

Binds the type and value of an sqlite3_value structure. An sqlite3_value structure can hold any data format.

The text and BLOB variants of sqlite3_bind_xxx() require you to pass a buffer pointer for the data value. Normally this buffer and its contents must remain valid until a new value is bound to that parameter, or the statement is finalized. Since that might be some time later in the code, these bind functions have a fifth parameter that controls how the buffer memory is handled and possibly released.

If the fifth parameter is either NULL or the constant SQLITE_STATIC, SQLite will take a hands-off approach and assume the buffer memory is either static or that your application code is taking care of maintaining and releasing any memory.

If the fifth parameter is the constant SQLITE_TRANSIENT, SQLite will make an internal copy of the buffer. This allows you to release your buffer immediately (or allow it to go out of scope, if it happens to be on the stack). SQLite will automatically release the internal buffer at an appropriate time.

The final option is to pass a valid void mem_callback( void* ptr ) function pointer. This callback will be called when SQLite is done with the buffer and wants to release it. If the buffer was allocated with sqlite3_malloc() or sqlite3_realloc(), you can pass a reference to sqlite3_free() directly. If you allocated the buffer with a different set of memory management calls, you’ll need to pass a reference to a wrapper function that calls the appropriate memory release function.

Once a value has been bound to a parameter, there is no way to extract that value back out of the statement. If you need to reference a value after it has been bound, you must keep track of it yourself.

To help you figure out what parameter index to use, there are three utility functions:

int sqlite3_bind_parameter_count( sqlite3_stmt *stmt )

Returns an integer indicating the largest parameter index. If no explicit numeric indexes are used ( ?<number> ), this will the be the number of unique parameters that appear in a statement. If explicit numeric indexes are used, there may be gaps in the number sequence.

int sqlite3_bind_parameter_index( sqlite3_stmt *stmt, const char *name )

Returns the index of a named parameter. The name must include any leading character (such as “:”) and must be given in UTF-8, even if the statement was prepared from UTF-16. A zero is returned if a parameter with a matching name cannot be found.

const char* sqlite3_bind_parameter_name( sqlite3_stmt *stmt, int pidx )

Returns the full text representation of a specific parameter. The text is always UTF-8 encoded and includes the leading character.

Using the sqlite3_bind_parameter_index() function, you can easily find and bind named parameters. The sqlite3_bind_xxx() functions will properly detect an invalid index range, allowing you to look up the index and bind a value in one line:

sqlite3_bind_int(stmt, sqlite3_bind_parameter_index(stmt, ":pid"), pid);

If you want to clear all of the bindings back to their initial NULL defaults, you can use the function sqlite3_clear_bindings():

int sqlite3_clear_bindings( sqlite3_stmt *stmt )

Clears all parameter bindings in a statement. After calling, all parameters will have a NULL bound to them. This will cause the memory management callback to be called on any text or BLOB values that were bound with a valid function pointer. Currently, this function always returns SQLITE_OK.

If you want to be absolutely sure bound values won’t leak from one statement execution to the next, it is best to clear the bindings any time you reset the statement. If you’re doing manual memory management on data buffers, you can free any memory used by bound values after this function is called.

Security and Performance

There are significant security advantages to using bound parameters. Many times people will manipulate SQL strings to substitute the values they want to use. For example, consider building an SQL statement in C using the string function snprintf():

snprintf(buf, buf_size,
         "INSERT INTO people( id, name ) VALUES ( %d, '%s' );",
         id_val, name_val);

In this case we do need single quotes around the string value, as we’re trying to form a literal representation. If we pass in these C values:

id_val = 23;
name_val = "Fred";

Then we get the following SQL statement in our buffer:

INSERT INTO people( id, name ) VALUES ( 23, 'Fred'),

This seems simple enough, but the danger with a statement like this is that the variables need to be sanitized before they’re passed into the SQL statement. For example, consider these values:

id_val = 23;
name_val = "Fred' ); DROP TABLE people;";

This would cause our snprintf() to create the following SQL command sequence, with the individual commands split out onto their own lines for clarity:

INSERT INTO people( id, name ) VALUES ( 23, 'Fred' );
DROP TABLE people;
' );

While that last statement is nonsense, the second statement is cause for concern.

Thankfully, things are not quite as bad as they seem. The sqlite3_prepare_xxx() functions will only prepare a single statement (up to the first semicolon), unless you explicitly pass the remainder of the SQL command string to another sqlite3_prepare_xxx() call. That limits what can be done in a case like this, unless your code automatically prepares and executes multiple statements from a single command buffer.

Be warned, however, that the interfaces provided by many scripting languages will do exactly that, and will automatically process multiple SQL statements passed in with a single call. The SQLite convenience functions, including sqlite3_exec(), will also automatically process multiple SQL commands passed in through a single string. What makes sqlite3_exec() particularly dangerous is that the convenience functions don’t allow the use of bound values, forcing you to programmatically build SQL command statements and opening you up to problems. Later in the chapter, we’ll take a closer look at sqlite3_exec() and why it usually isn’t the best choice.

Even if SQLite will only process the first command, damage can still be done with subqueries and other commands. Bad input can also force a statement to fail. Consider the result if the name value is:

Fred', 'extra junk

If you’re updating a series of records based off this id value, you had better wrap all the commands up in a transaction and be prepared to roll it back if you encounter an error. If you just assume the commands will work, you’ll end up with an inconsistent database.

This type of attack is known as an SQL injection attack. An SQL injection attack inserts SQL command fragments into data values, causing the database to execute arbitrary SQL commands. Unfortunately, it is extremely common for websites to be susceptible to this kind of attack. It also borders on inexcusable, because it is typically very easy to avoid.

One defense against SQL injections is to try to sanitize any string values received from an untrusted source. For example, you might try to substitute all single quote characters with two single quote characters (the standard SQL escape mechanism). This can get quite complex, however, and you’re putting utter faith in the code’s ability to correctly sanitize untrusted strings.

A much easier way to defend yourself against SQL injections is to use SQL statement parameters. Injection attacks depend on a data value being represented as a literal value in an SQL command statement. The attack only works if the attack value is passed through the SQL parser, where it alters the meaning of the surrounding SQL commands.

In the case of SQL parameters, the bound values are never passed through the SQL parser. An SQL statement is only parsed when the command is prepared. If you’re using parameters, the SQL engine parses only the parameter tokens. Later, when you bind a value to a specific parameter, that value is bound directly in its native format (i.e. string, integer, etc.) and is not passed through the SQL parser. As long as you’re careful about how you extract and display the string, it is perfectly safe to directly bind an untrusted string value to a parameter value without fear of an SQL injection.

Besides avoiding injection attacks, parameters can also be faster and use less memory than string manipulations. Using a function such as snprintf() requires an SQL command template, and a sufficiently large output buffer. The string manipulation functions also need working memory, plus you may need additional buffers to copy and sanitize values. Additionally, a number of datatypes, such as integers and floating-point numbers (and especially BLOBs), often take up significantly more space in their string representation. This further increases memory usage. Finally, once the final command buffer has been created, all the data needs to be passed through the SQL parser, where the literal data values are converted back into their native format and stored in additional buffers.

Compare that to the resource usage of preparing and binding a statement. When using parameters, the SQL command statement is essentially static, and can be used as is, without modification or additional buffers. The parser doesn’t need to deal with converting and storing literal values. In fact, the data values normally never leave their native format, further saving time and memory by avoiding conversion in and out of a string representation.

Using parameters is the safe and wise choice, even for situations when a statement is only used once. It may take a few extra lines of code, but the process will be safer, faster, and more memory efficient.

Example

This example executes an INSERT statement. Although this statement is only executed once, it still uses bind parameters to protect against possible injection attacks. This eliminates the need to sanitize the input value.

The statement is first prepared with a statement parameter. The data value is then bound to the statement parameter before we execute the prepared statement:

    char            *data = ""; /* default to empty string */
    sqlite3_stmt    *stmt = NULL;
    int             idx = -1;

    /* ... set "data" pointer ... */
    /* ... open database ... */

    rc = sqlite3_prepare_v2( db, "INSERT INTO tbl VALUES ( :str )", -1, &stmt, NULL );
    if ( rc != SQLITE_OK) exit( -1 );

    idx = sqlite3_bind_parameter_index( stmt, ":str" );
    sqlite3_bind_text( stmt, idx, data, -1, SQLITE_STATIC );

    rc = sqlite3_step( stmt );
    if (( rc != SQLITE_DONE )&&( rc != SQLITE_ROW )) exit ( -1 );
    
    sqlite3_finalize( stmt );

    /* ... close database ... */

In this case we look for either an SQLITE_DONE or an SQLITE_ROW return value. Both are possible. Although the INSERT itself will be fully executed on the first call to sqlite3_step(), if PRAGMA count_changes is enabled, then the statement may return a value. In this case, we want to ignore any potential return value without triggering an error, so we must check for both possible return codes. For more details, see count_changes in Appendix F.

Potential Pitfalls

It is important to understand that all parameters must have some literal associated with them. As soon as you prepare a statement, all the parameters are set to NULL. If you fail to bind an alternate value, the parameter still has a literal NULL associated with it. This has a few ramifications that are not always obvious.

The general rule of thumb is that bound parameters act as literal string substitutions. Although they offer additional features and protections, if you’re trying to figure out the expected behavior of a parameter substitution, it is safe to assume you’ll get the exact same behavior as if the parameter was a literal string substitution.

In the case of an INSERT statement, there is no way to force a default value to be used. For example, if you have the following statement:

INSERT INTO membership ( pid, gid, type ) VALUES ( :pid, :gid, :type );

Even if the type column has a default value available, there is no way this statement can use it. If you fail to bind a value to the :type parameter, a NULL will be inserted, rather than the default value. The only way to insert a default value into the type column is to use a statement that doesn’t reference it, such as:

INSERT INTO membership ( pid, gid ) VALUES ( :pid, :gid );

This means that if you’re heavily dependent on database-defined default values, you may need to prepare several variations of an INSERT statement to cover the different cases when different data values are available. Of course, if your application code is aware of the proper default values, it can simply bind that value to the proper parameter.

The other area where parameters can cause surprises is in NULL comparisons. For example, consider the statement:

SELECT * FROM employee WHERE manager = :manager;

This works for normal values, but if a NULL is bound to the :manager parameter, no rows will ever be returned. If you need the ability to test for a NULL in the manager column, make sure you use the IS operator:

SELECT * FROM employee WHERE manager IS :manager;

For more details, see IS in Appendix D.

This behavior also makes it tricky to “stack” conditionals. For example, if you have the statement:

SELECT * FROM employee WHERE manager = :manager AND project = :project;

you must provide a meaningful value for both :manager and :project. If you want the ability to search on a manager, on a project, or on a manager and a project, you need to prepare multiple statements, or you need to add a bit more logic:

...WHERE ( manager = :manager OR :manager IS NULL )
     AND ( project = :project OR :project IS NULL );

This query will ignore one (or both) of the value conditions if you assign a NULL to the appropriate parameters. This expression won’t let you explicitly search for NULLs, but that can be done with additional parameters and logic. Preparing more flexible statements reduces the number of unique statements you need to manage, but it also tends to make them more complex and can make them run slower. If they get too complex, it might make more sense to simply define a new set of statements, rather than adding more and more parameter logic to the same statement.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.223.160.61