How to do it...

  1. Before we can begin reading tables from the chinook database, we need to set up our SQLAlchemy engine:
>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///data/chinook.db')
  1. We can now step back into the world of pandas and remain there for the rest of the recipe. Let's complete a simple command and read in the tracks table with the read_sql_table function. The name of the table is the first argument and the SQLAlchemy engine is the second:
>>> tracks = pd.read_sql_table('tracks', engine)
>>> tracks.head()
  1. For the rest of the recipe, we will answer a couple of different specific queries with help from the database diagram. To begin, let's find the average length of song per genre:
>>> genre_track = genres.merge(tracks[['GenreId', 'Milliseconds']], 
on='GenreId', how='left')
.drop('GenreId', axis='columns')

>>> genre_track.head()
  1. Now we can easily find the average length of each song per genre. To help ease interpretation, we convert the Milliseconds column to the timedelta data type:
>>> genre_time = genre_track.groupby('Name')['Milliseconds'].mean()
>>> pd.to_timedelta(genre_time, unit='ms').dt.floor('s')
.sort_values()
Name Rock And Roll 00:02:14 Opera 00:02:54 Hip Hop/Rap 00:02:58 ... Drama 00:42:55 Science Fiction 00:43:45 Sci Fi & Fantasy 00:48:31 Name: Milliseconds, dtype: timedelta64[ns]
  1. Now let's find the total amount spent per customer. We will need the customers, invoices, and invoice_items tables all connected to each other:
>>> cust = pd.read_sql_table('customers', engine, 
columns=['CustomerId','FirstName',
'LastName'])
>>> invoice = pd.read_sql_table('invoices', engine,
columns=['InvoiceId','CustomerId'])
>>> ii = pd.read_sql_table('invoice_items', engine,
columns=['InvoiceId', 'UnitPrice',
'Quantity'])

>>> cust_inv = cust.merge(invoice, on='CustomerId')
.merge(ii, on='InvoiceId')
>>> cust_inv.head()
  1. We can now multiply the quantity by the unit price and then find the total amount spent per customer:
>>> total = cust_inv['Quantity'] * cust_inv['UnitPrice']
>>> cols = ['CustomerId', 'FirstName', 'LastName']
>>> cust_inv.assign(Total = total).groupby(cols)['Total']
.sum()
.sort_values(ascending=False)
.head()
CustomerId FirstName LastName 6 Helena Holý 49.62 26 Richard Cunningham 47.62 57 Luis Rojas 46.62 46 Hugh O'Reilly 45.62 45 Ladislav Kovács 45.62 Name: Total, dtype: float64
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.133.180