How it works...

First, we import the required libraries to handle argument parsing and interacting with SQLite databases:

from __future__ import print_function
import argparse
import os
import sqlite3
import sys

This recipe's command-line handler accepts two positional arguments, SQLITE_DATABASE and TABLE, which represents the path of the input database and the name of the table to review, respectively. An optional argument, column, indicated by the dash, can be used to manually supply the primary key column if it is known:

if __name__ == "__main__":
    # Command-line Argument Parser
    parser = argparse.ArgumentParser(
        description=__description__,
        epilog="Developed by {} on {}".format(
            ", ".join(__authors__), __date__)
    )
    parser.add_argument("SQLITE_DATABASE", help="Input SQLite database")
    parser.add_argument("TABLE", help="Table to query from")
    parser.add_argument("--column", help="Optional column argument")
    args = parser.parse_args()

If the optional column argument is supplied, we pass it to the main() function as a keyword argument along with the database and table name. Otherwise, we just pass the database and table name to the main() function without the col keyword argument:

    if args.column is not None:
        main(args.SQLITE_DATABASE, args.TABLE, col=args.column)
    else:
        main(args.SQLITE_DATABASE, args.TABLE)

The main() function, like the previous recipe, first performs some validation that the input database exists and is a file. Because we are using keyword arguments with this function, we must indicate this with the **kwargs argument in the function definition. This argument serves as a dictionary that stores all provided keyword arguments. In this case, if the optional column argument were supplied, this dictionary would contain a col key/value pair:

def main(database, table, **kwargs):
    print("[+] Attempting connection to {} database".format(database))
    if not os.path.exists(database) or not os.path.isfile(database):
        print("[-] Database does not exist or is not a file")
        sys.exit(1)

After validating the input file, we use sqlite3 to connect to this database and create the Cursor object we use to interact with it:

    # Connect to SQLite Database
    conn = sqlite3.connect(database)
    c = conn.cursor()

In order to identify the primary key for the desired table, we run the pragma table_info command with the table name inserted in parentheses. We use the format() method to dynamically insert the name of the table into the otherwise static string. After we store the command's results in the table_data variable, we perform validation on the table name input. If the user supplied a table name that does not exist, we will have an empty list as the result. We check for this and exit the script if the table does not exist:

    # Query Table for Primary Key
    c.execute("pragma table_info({})".format(table))
    table_data = c.fetchall()
    if table_data == []:
        print("[-] Check spelling of table name - '{}' did not return "
              "any results".format(table))
        sys.exit(2)

At this point, we create an if-else statement for the remainder of the script, depending on whether the optional column argument was supplied by the user. If col is a key in the kwargs dictionary, we immediately call the find_gaps() function and pass it the Cursor object, c, the table name, and the user-specified primary key column name. Otherwise, we try to identify the primary key(s) in the table_data variable.

The command previously executed and stored in the table_data variable returns a tuple for each column in the given table. The last element of each tuple is a binary option between 1 or 0, where 1 indicates that the column is a primary key. We iterate through each of the last elements in the returned tuples and, if they are equal to 1, the column name, stored in index one of the tuple, is appended to the potential_pks list:

    if "col" in kwargs:
        find_gaps(c, table, kwargs["col"])

    else:
        # Add Primary Keys to List
        potential_pks = []
        for row in table_data:
            if row[-1] == 1:
                potential_pks.append(row[1])

Once we have identified all primary keys, we check the list to determine if there are zero or more than one keys present. If either of these cases exists, we alert the user and exit the script. In these scenarios, the user would need to specify which column should be treated as the primary key column. If the list contains a single primary key, we pass the name of that column along with the database cursor and table name to the find_gaps() function:

        if len(potential_pks) != 1:
            print("[-] None or multiple primary keys found -- please "
                  "check if there is a primary key or specify a specific "
                  "key using the --column argument")
            sys.exit(3)

        find_gaps(c, table, potential_pks[0])

The find_gaps() method starts by displaying a message to the console, alerting the user of the current execution status of the script. We attempt the database query in a try and except block. If the user-specified column does not exist or was misspelled, we will receive an OperationalError from the sqlite3 library. This is the last validation step of user-supplied arguments and will exit the script if the except block is triggered. If the query executes successfully, we fetch all of the data and store it in the results variable:

def find_gaps(db_conn, table, pk):
    print("[+] Identifying missing ROWIDs for {} column".format(pk))
    try:
        db_conn.execute("select {} from {}".format(pk, table))
    except sqlite3.OperationalError:
        print("[-] '{}' column does not exist -- "
              "please check spelling".format(pk))
        sys.exit(4)
    results = db_conn.fetchall()

We use list comprehension and the built-in sorted() function to create a list of sorted primary keys. The results list contains tuples with one element at index 0, the primary key, which for the sms.db message table is the column named ROWID. With the sorted list of ROWIDs, we can quickly calculate the number of entries missing from the table. This would be the most recent ROWID minus the number of ROWIDs present in the list. If all entries were active in the database, this value would be zero.

We are working under the assumption that the most recent ROWID is the actual most recent ROWID. It is possible that one could delete the last few entries and the recipe would only detect the most recent active entry as the highest ROWID.

    rowids = sorted([x[0] for x in results])
    total_missing = rowids[-1] - len(rowids)

If we are not missing any values from the list, we print this fortuitous message to the console and exit with 0, indicating a successful termination. On the other hand, if we are missing entries, we print that to the console along with the count of the missing entries:

    if total_missing == 0:
        print("[*] No missing ROWIDs from {} column".format(pk))
        sys.exit(0)
    else:
        print("[+] {} missing ROWID(s) from {} column".format(
            total_missing, pk))

To calculate the missing gaps, we generate a set of all ROWIDs between the first ROWID and the last using the range() method and then compare that against the sorted list that we have. The difference() function can be used with a set to return a new set with elements in the first set that are not present in the object in parentheses. We then print the identified gaps to the console, which completes the execution of the script:

    # Find Missing ROWIDs
    gaps = set(range(rowids[0], rowids[-1] + 1)).difference(rowids)
    print("[*] Missing ROWIDS: {}".format(gaps))

An example of the output of this script may look like the following screenshot. Note how quickly the console can become cluttered based on the number of deleted messages. This, however, is not the intended end of this recipe. We will use the logic from this script in a more advanced recipe, Digging deep to recover messages, later in the chapter to identify and then attempt to locate potentially recoverable messages:

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...