Statement parameters are special tokens that are inserted into the SQL
command string before it is passed to one of the sqlite3_prepare_xxx()
functions. They act as a placeholder for any literal value, such
as a bare number or a single quote string. After the statement is prepared,
but before it is executed, you can bind specific values to each statement
parameter. Once you’re done executing a statement, you can reset the
statement, bind new values to the parameters, and execute the statement
again—only this time with the new values.
SQLite supports five different styles of statement parameters. These short
string tokens are placed directly into the SQL command string, which
can then be passed to one of the sqlite3_prepare_xxx()
functions. Once the statement
is prepared, the individual parameters are referenced by index.
?
An anonymous parameter with automatic index. As the statement is processed, each anonymous parameter is assigned a unique, sequential index value, starting with one.
?<index>
Parameter with explicit numeric index. Duplicate indexes allow the same value to be bound multiple places in the same statement.
:<name>
A named parameter with an automatic index. Duplicate names allow the same value to be bound multiple places in the same statement.
@<name>
A named parameter with an automatic index. Duplicate names allow the same value to be bound multiple places in the same statement. Works exactly like the colon parameter.
$<name>
A named parameter with an automatic index. Duplicate names allow the same value to be bound multiple places in the same statement. This is an extended syntax to support Tcl variables. Unless you’re doing Tcl programming, I suggest you use the colon format.
To get an idea of how these work, consider this
INSERT
statement:
INSERT INTO people (id, name) VALUES ( ?, ? );
The two statement parameters represent the
id
and name
values being inserted. Parameter
indexes start at one, so the first parameter that represents the
id
value has an index of one,
and the parameter used to reference the name
value has an index of two.
Notice that the second parameter, which is likely a text value, does not have single quotes around it. The single quotes are part of the string-literal representation, and are not required for a parameter value.
Statement parameters should not be put in
quotes. The notation '?'
designates a one-character text value, not a parameter.
Once this statement has been prepared, it can be used to insert multiple rows. For each row, simply bind the appropriate values, step through the statement, and then reset the statement. After the statement has been reset, new values can be bound to the parameters and the statement can be stepped again.
You can also use explicit index values:
INSERT INTO people (id, name) VALUES ( ?1, ?2 );
Using explicit parameter indexes has two major advantages. First, you can have multiple instances of the same index value, allowing the same value to be bound to more than one place in the same statement.
Second, explicit indexes allow the parameters to appear out of order. There can even be gaps in the index sequence. This can help simplify application code maintenance if the query is modified and parameters are added or removed.
This level of abstraction can be taken even further by using named parameters. In this case, you allow SQLite to assign parameter index values as it sees fit, in a similar fashion to anonymous parameters. The difference is that you can ask SQLite to tell you the index value of a specific parameter based off the name you’ve given it. Consider this statement:
INSERT INTO people (id, name) VALUES ( :id, :name );
In this case, the parameter values are quite explicit. As we will see in the next section, the code that binds values to these parameters is also quite explicit, making it very clear what is going on. Best of all, it doesn’t matter if new parameters are added. As long as the existing names remain unchanged, the code will properly find and bind the named parameters.
Note, however, that parameters can only be used to replace literal values, such as quoted strings or numeric values. Parameters cannot be used in place of identifiers, such as table names or column names. The following bit of SQL is invalid:
SELECT * FROM ?; -- INCORRECT: Cannot use a parameter as an identifier
If you attempt to prepare this statement, it will fail. This is because the parameter (which acts as an unknown literal value) is being used where an identifier is required. This is invalid, and the statement will not prepare correctly.
Within a statement, it is best to choose a specific parameter style and stick with it. Mixing anonymous parameters with explicit indexes or named parameters is likely to cause confusion about what index belongs to which parameter.
Personally, I prefer to used the colon-name-style parameters. Using named parameters eliminates the need to know any specific index values, allowing you to just reference the name at runtime. The use of short, significant names can also make the intent of both your SQL statements and your bind code easier to understand.
When you first prepare a statement with parameters, all of the parameters start out
with a NULL assigned to them. Before you execute the statement, you
can bind specific values to each parameter using the sqlite3_bind_xxx()
family of
functions.
There are nine sqlite3_bind_xxx()
functions available, plus a number of utility functions.
These functions can be called any time after the statement is
prepared, but before sqlite3_step()
is called for the first time. Once sqlite3_step()
has been called, these functions
cannot be called again until the statement is reset.
All the sqlite3_bind_xxx()
functions have the same first and
second parameters and return the same result. The first parameter is
always a pointer to an sqlite3_stmt
, and the second is the index of the parameter
to bind. Remember that for anonymous parameters, the first index value
starts with one. For the most part, the third parameter is the value
to bind. The fourth parameter, if present, indicates the length of the
data value in bytes. The fifth parameter, if present, is a function
pointer to a memory management callback.
Remember that bind index values start with one (1), unlike result column indexes, which start with zero (0).
All the bind functions return an integer error
code, which is equal to SQLITE_OK
upon success.
int sqlite3_bind_blob( sqlite3_stmt *stmt, int pidx,
const void *data, int data_len, mem_callback )
int sqlite3_bind_double( sqlite3_stmt *stmt, int pidx, double data )
int sqlite3_bind_int( sqlite3_stmt *stmt, int pidx, int data )
int sqlite3_bind_int64( sqlite3_stmt *stmt, int pidx, sqlite3_int64 )
int sqlite3_bind_null( sqlite3_stmt *stmt, int pidx )
int sqlite3_bind_text( sqlite3_stmt *stmt, int pidx,
const char *data, int data_len, mem_callback )
Binds an arbitrary length UTF-8 encoded text value. The length is in bytes, not characters. If the length parameter is negative, SQLite will compute the length of the string up to, but not including, the null terminator. It is recommended that the manually computed lengths do not include the terminator (the terminator will be included when the value is returned).
int sqlite3_bind_text16( sqlite3_stmt *stmt, int pidx,
const void *data, int data_len, mem_callback )
Binds an arbitrary length UTF-16 encoded text value. The length is in bytes, not characters. If the length parameter is negative, SQLite will compute the length of the string up to, but not including, the null terminator. It is recommended that the manually computed lengths do not include the terminator (the terminator will be included when the value is returned).
int sqlite3_bind_zeroblob( sqlite3_stmt *stmt, int pidx, int len )
Binds an arbitrary length binary
data BLOB, where each byte is set to zero (0x00
). The only
additional parameter is a length value, in bytes.
This function is particularly useful for creating
large BLOBs that can then be updated with the
incremental BLOB interface. See sqlite3_blob_open() in Appendix G for more details.
In addition to these type-specific bind functions, there is also a specialized function:
The text and BLOB variants of sqlite3_bind_xxx()
require you to pass
a buffer pointer for the data value. Normally this buffer and its
contents must remain valid until a new value is bound to that
parameter, or the statement is finalized. Since that might be some
time later in the code, these bind functions have a fifth parameter
that controls how the buffer memory is handled and possibly
released.
If the fifth parameter is either NULL or the
constant SQLITE_STATIC
, SQLite will
take a hands-off approach and assume the buffer memory is either
static or that your application code is taking care of maintaining and
releasing any memory.
If the fifth parameter is the constant SQLITE_TRANSIENT
, SQLite will make an
internal copy of the buffer. This allows you to release your buffer
immediately (or allow it to go out of scope, if it happens to be on
the stack). SQLite will automatically release the internal buffer at
an appropriate time.
The final option is to pass a valid void mem_callback( void* ptr )
function pointer. This callback will be called when SQLite is done
with the buffer and wants to release it. If the buffer was allocated
with sqlite3_malloc()
or sqlite3_realloc()
, you can pass a
reference to sqlite3_free()
directly. If you allocated the buffer with a different set of memory
management calls, you’ll need to pass a reference to a wrapper
function that calls the appropriate memory release function.
Once a value has been bound to a parameter, there is no way to extract that value back out of the statement. If you need to reference a value after it has been bound, you must keep track of it yourself.
To help you figure out what parameter index to use, there are three utility functions:
int sqlite3_bind_parameter_count( sqlite3_stmt *stmt )
Returns an integer indicating the largest parameter
index. If no explicit numeric indexes are used (
?
), this will the be the number of unique parameters
that appear in a statement. If explicit numeric
indexes are used, there may be gaps in the number
sequence.<number>
int sqlite3_bind_parameter_index( sqlite3_stmt *stmt, const char *name )
Returns the index of a
named parameter. The name must include
any leading character (such as “:
”) and must be given in
UTF-8, even if the statement was prepared from
UTF-16. A zero is returned if a parameter with a
matching name cannot be found.
const char* sqlite3_bind_parameter_name( sqlite3_stmt *stmt, int pidx )
Returns the full text representation of a specific parameter. The text is always UTF-8 encoded and includes the leading character.
Using the sqlite3_bind_parameter_index()
function, you can
easily find and bind named parameters. The sqlite3_bind_xxx()
functions will properly detect an
invalid index range, allowing you to look up the index and bind a
value in one line:
sqlite3_bind_int(stmt, sqlite3_bind_parameter_index(stmt, ":pid"), pid);
If you want to clear all of the bindings back to
their initial NULL defaults, you can use the function sqlite3_clear_bindings()
:
int sqlite3_clear_bindings( sqlite3_stmt *stmt )
Clears all parameter bindings in a statement. After
calling, all parameters will have a NULL bound to
them. This will cause the memory management callback
to be called on any text or BLOB values that were
bound with a valid function pointer. Currently, this
function always returns SQLITE_OK
.
If you want to be absolutely sure bound values won’t leak from one statement execution to the next, it is best to clear the bindings any time you reset the statement. If you’re doing manual memory management on data buffers, you can free any memory used by bound values after this function is called.
There are significant security advantages to using bound parameters. Many times
people will manipulate SQL strings to substitute the values they want
to use. For example, consider building an SQL statement in C using the
string function
snprintf()
:
snprintf(buf, buf_size, "INSERT INTO people( id, name ) VALUES ( %d, '%s' );", id_val, name_val);
In this case we do need single quotes around the string value, as we’re trying to form a literal representation. If we pass in these C values:
id_val = 23; name_val = "Fred";
Then we get the following SQL statement in our buffer:
INSERT INTO people( id, name ) VALUES ( 23, 'Fred'),
This seems simple enough, but the danger with a statement like this is that the variables need to be sanitized before they’re passed into the SQL statement. For example, consider these values:
id_val = 23; name_val = "Fred' ); DROP TABLE people;";
This would cause our snprintf()
to create the following SQL command
sequence, with the individual commands split out onto their own lines
for clarity:
INSERT INTO people( id, name ) VALUES ( 23, 'Fred' ); DROP TABLE people; ' );
While that last statement is nonsense, the second statement is cause for concern.
Thankfully, things are not quite as bad as they
seem. The sqlite3_prepare_xxx()
functions will only prepare a single statement (up to the first
semicolon), unless you explicitly pass the remainder of the SQL command
string to another sqlite3_prepare_xxx()
call. That limits what can be
done in a case like this, unless your code automatically prepares and
executes multiple statements from a single command buffer.
Be warned, however, that the interfaces provided
by many scripting languages will do exactly that, and will
automatically process multiple SQL statements passed in with a single
call. The SQLite convenience functions, including sqlite3_exec()
, will also automatically process multiple SQL commands
passed in through a single string. What makes sqlite3_exec()
particularly dangerous is that the
convenience functions don’t allow the use of bound values, forcing you
to programmatically build SQL command statements and opening you up to
problems. Later in the chapter, we’ll take a closer look at sqlite3_exec()
and why it usually
isn’t the best choice.
Even if SQLite will only process the first command, damage can still be done with subqueries and other commands. Bad input can also force a statement to fail. Consider the result if the name value is:
Fred', 'extra junk
If you’re updating a series of records based off this id
value, you had better wrap all the
commands up in a transaction and be prepared to roll it back if you
encounter an error. If you just assume the commands will work, you’ll
end up with an inconsistent database.
This type of attack is known as an SQL injection attack. An SQL injection attack inserts SQL command fragments into data values, causing the database to execute arbitrary SQL commands. Unfortunately, it is extremely common for websites to be susceptible to this kind of attack. It also borders on inexcusable, because it is typically very easy to avoid.
One defense against SQL injections is to try to sanitize any string values received from an untrusted source. For example, you might try to substitute all single quote characters with two single quote characters (the standard SQL escape mechanism). This can get quite complex, however, and you’re putting utter faith in the code’s ability to correctly sanitize untrusted strings.
A much easier way to defend yourself against SQL injections is to use SQL statement parameters. Injection attacks depend on a data value being represented as a literal value in an SQL command statement. The attack only works if the attack value is passed through the SQL parser, where it alters the meaning of the surrounding SQL commands.
In the case of SQL parameters, the bound values are never passed through the SQL parser. An SQL statement is only parsed when the command is prepared. If you’re using parameters, the SQL engine parses only the parameter tokens. Later, when you bind a value to a specific parameter, that value is bound directly in its native format (i.e. string, integer, etc.) and is not passed through the SQL parser. As long as you’re careful about how you extract and display the string, it is perfectly safe to directly bind an untrusted string value to a parameter value without fear of an SQL injection.
Besides avoiding injection attacks, parameters can
also be faster and use less memory than string manipulations. Using a
function such as snprintf()
requires an SQL command template, and a sufficiently large output
buffer. The string manipulation functions also need working memory,
plus you may need additional buffers to copy and sanitize values.
Additionally, a number of datatypes, such as integers and
floating-point numbers (and especially BLOBs), often take up
significantly more space in their string representation. This further
increases memory usage. Finally, once the final command buffer has
been created, all the data needs to be passed through the SQL parser,
where the literal data values are converted back into their native
format and stored in additional buffers.
Compare that to the resource usage of preparing and binding a statement. When using parameters, the SQL command statement is essentially static, and can be used as is, without modification or additional buffers. The parser doesn’t need to deal with converting and storing literal values. In fact, the data values normally never leave their native format, further saving time and memory by avoiding conversion in and out of a string representation.
Using parameters is the safe and wise choice, even for situations when a statement is only used once. It may take a few extra lines of code, but the process will be safer, faster, and more memory efficient.
This example executes an INSERT
statement. Although this statement is only executed once, it still
uses bind parameters to protect against possible injection attacks.
This eliminates the need to sanitize the input value.
The statement is first prepared with a statement parameter. The data value is then bound to the statement parameter before we execute the prepared statement:
char *data = ""; /* default to empty string */ sqlite3_stmt *stmt = NULL; int idx = -1; /* ... set "data" pointer ... */ /* ... open database ... */ rc = sqlite3_prepare_v2( db, "INSERT INTO tbl VALUES ( :str )", -1, &stmt, NULL ); if ( rc != SQLITE_OK) exit( -1 ); idx = sqlite3_bind_parameter_index( stmt, ":str" ); sqlite3_bind_text( stmt, idx, data, -1, SQLITE_STATIC ); rc = sqlite3_step( stmt ); if (( rc != SQLITE_DONE )&&( rc != SQLITE_ROW )) exit ( -1 ); sqlite3_finalize( stmt ); /* ... close database ... */
In this case we look for either an SQLITE_DONE
or an SQLITE_ROW
return value. Both are possible. Although
the INSERT
itself will be fully
executed on the first call to sqlite3_step()
, if PRAGMA
count_changes
is enabled, then the statement may
return a value. In this case, we want to ignore any potential return
value without triggering an error, so we must check for both possible
return codes. For more details, see count_changes in Appendix F.
It is important to understand that all parameters must have some literal associated with them. As soon as you prepare a statement, all the parameters are set to NULL. If you fail to bind an alternate value, the parameter still has a literal NULL associated with it. This has a few ramifications that are not always obvious.
The general rule of thumb is that bound parameters act as literal string substitutions. Although they offer additional features and protections, if you’re trying to figure out the expected behavior of a parameter substitution, it is safe to assume you’ll get the exact same behavior as if the parameter was a literal string substitution.
In the case of an INSERT
statement, there is no way to force a default value to be used. For
example, if you have the following statement:
INSERT INTO membership ( pid, gid, type ) VALUES ( :pid, :gid, :type );
Even if the type
column has a default value available, there is
no way this statement can use it. If you fail to bind a value to the
:type
parameter, a NULL will
be inserted, rather than the default value. The only way to insert a
default value into the type column is to use a statement that doesn’t
reference it, such as:
INSERT INTO membership ( pid, gid ) VALUES ( :pid, :gid );
This means that if you’re heavily dependent on
database-defined default values, you may need to prepare several
variations of an INSERT
statement
to cover the different cases when different data values are available.
Of course, if your application code is aware of the proper default
values, it can simply bind that value to the proper parameter.
The other area where parameters can cause surprises is in NULL comparisons. For example, consider the statement:
SELECT * FROM employee WHERE manager = :manager;
This works for normal values, but if a NULL is
bound to the :manager
parameter, no
rows will ever be returned. If you need the ability to test for a NULL
in the manager
column, make sure
you use the IS
operator:
SELECT * FROM employee WHERE manager IS :manager;
For more details, see IS in Appendix D.
This behavior also makes it tricky to “stack” conditionals. For example, if you have the statement:
SELECT * FROM employee WHERE manager = :manager AND project = :project;
you must provide a meaningful value for both
:manager
and :project
. If you want the ability to
search on a manager
, on a project
, or on a manager
and a project
, you need to prepare multiple statements, or
you need to add a bit more logic:
...WHERE ( manager = :manager OR :manager IS NULL ) AND ( project = :project OR :project IS NULL );
This query will ignore one (or both) of the value conditions if you assign a NULL to the appropriate parameters. This expression won’t let you explicitly search for NULLs, but that can be done with additional parameters and logic. Preparing more flexible statements reduces the number of unique statements you need to manage, but it also tends to make them more complex and can make them run slower. If they get too complex, it might make more sense to simply define a new set of statements, rather than adding more and more parameter logic to the same statement.
18.223.160.61