Accessing databases from pandas

We can give pandas a database connection such as the one in the previous example or a SQLAlchemy connection. We will cover the latter in the later sections of this chapter. We will load the statsmodels sunactivity data, just like in the previous chapter, Chapter 7, Signal Processing and Time Series:

  1. Create a list of tuples to form the pandas DataFrame:
    rows = [tuple(x) for x in df.values]

    Contrary to the previous example, create a table without specifying data types:

    con.execute("CREATE TABLE sunspots(year, sunactivity)")
  2. The executemany() method executes multiple statements; in this case, we will be inserting records from a list of tuples. Insert all the rows into the table and show the row count as follows:
    con.executemany("INSERT INTO sunspots(year, sunactivity) VALUES (?, ?)", rows)
    c.execute("SELECT COUNT(*) FROM sunspots")
    print c.fetchone()

    The number of rows in the table is printed as follows:

    (309,)
    
  3. The rowcount attribute of the result of an execute() call gives the number of affected rows. This attribute is somewhat quirky and depends on your SQLite version. A SQL query, as shown in the previous code snippet, on the other hand is unambiguous. Delete the records where the number of events is more than 20:
    print "Deleted", con.execute("DELETE FROM sunspots where sunactivity > 20").rowcount, "rows"

    The following should be printed:

    Deleted 217 rows
    
  4. If we hand the database connection to pandas, we can execute a query and return a pandas DataFrame with the read_sql() function. Select the records until 1732 as follows:
    print read_sql("SELECT * FROM sunspots where year < 1732", con)

    The end result is the following pandas DataFrame:

        year  sunactivity
    0   1700            5
    1   1701           11
    2   1702           16
    3   1707           20
    4   1708           10
    5   1709            8
    6   1710            3
    7   1711            0
    8   1712            0
    9   1713            2
    10  1714           11
    11  1723           11
    
    [12 rows x 2 columns]
    

Refer to the panda_access.py file in this book's code bundle for the following code:

import statsmodels.api as sm
from pandas.io.sql import read_sql
import sqlite3

with sqlite3.connect(":memory:") as con:
    c = con.cursor()

    data_loader = sm.datasets.sunspots.load_pandas()
    df = data_loader.data
    rows = [tuple(x) for x in df.values]

    con.execute("CREATE TABLE sunspots(year, sunactivity)")
    con.executemany("INSERT INTO sunspots(year, sunactivity) VALUES (?, ?)", rows)
    c.execute("SELECT COUNT(*) FROM sunspots")
    print c.fetchone()
    print "Deleted", con.execute("DELETE FROM sunspots where sunactivity > 20").rowcount, "rows"

    print read_sql("SELECT * FROM sunspots where year < 1732", con)
    con.execute("DROP TABLE sunspots")

    c.close()
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.210.102