© Alejandro Serrano Mena 2019
Alejandro Serrano MenaPractical Haskellhttps://doi.org/10.1007/978-1-4842-4480-7_11

11. Safe Database Access

Alejandro Serrano Mena1 
(1)
Utrecht, The Netherlands
 

In the previous two chapters, you saw how to access the file system to read and save information in a durable way. You’ve been given all the tools for dealing with either a custom file format or a treelike information representation such as JSON.

However, in most production system applications, you aren’t working with files. Instead, a database management system (DBMS) is used to store the application’s information and keep it safe. Furthermore, most DBMSs provide functionalities such as transactions, logging, and recovery and replication, which are quite useful for maintenance, protection against errors, and scalability. This chapter will provide you a bird’s-eye view of the many possibilities of database access in the Haskell ecosystem, in particular those that enable you to access relational DBMSs based on SQL.

In this chapter, I cover two specific libraries: Persistent and Esqueleto. These libraries are not tied to any specific DBMS but provide an abstraction layer over whatever DBMS you might choose to use. The main advantage of using Persistent is that it allows interaction with the database using user-defined Haskell abstract data types, which are converted from and to database format without further intervention (similar to object-relational mappers in the object-oriented world). This ensures a high degree of type safety when working with database code. Esqueleto is the companion library to Persistent, which focuses on specifics for SQL-based DBMSs.

Database Access Landscape

You can see that Haskell has a strong ecosystem of database access libraries just by going to the Hackage web site and seeing the number of libraries listed under the “Database” tag (more than 400 at the moment of writing). Broadly speaking, those libraries can be categorized in three groups.
  1. 1.

    Native implementations of a database in Haskell. The main example of this group is acid-state, which allows you to save Haskell data types directly on disk. The main downside of this approach is that interoperability with other systems and languages is harmed.

     
  2. 2.

    Libraries for accessing a particular DBMS, such as sqlite-simple, mysql-simple, pgsql-simple (for some of the best-known relational ones), or MongoDB. These libraries are easy to understand for users of the corresponding DBMS but are also tied to a specific choice of system, making it difficult to migrate.

     
  3. 3.

    Abstraction layers over several DBMSs, which provide a common interface for several systems (usually along with specific connectors for each of them). The range of DBMSs covered by each library is different; some focus only on the relational, SQL-based one, while others try to include nonrelational ones. The main advantage of these libraries is that the choice of DBMS can be reviewed while keeping intact almost all the database access code in the application. This goes well with Haskell’s philosophy of reusability.

     

Abstracting Over Several DBMSs

In Hackage, you will find a group of packages that focus on the SQL world, which is the biggest one among database users. From these, HDBC (from Haskell Database Connectivity) and hsql hide the specific details of each DBMS but otherwise expose the SQL model (tables, rows, and columns) as is, without any further abstraction. You just need to plug in the package corresponding to the specific DBMS you want to use (such as hsql-mysql or HDBC-postgresql). This gives you as the developer the full power of those systems, allowing the use of prepared statements or controlling the transaction boundaries. However, when using any of these libraries, you need to write the SQL statements by hand, which is usually a source of errors, and you need to marshal the results afterwards (which are given using a specific data type for SQL values) into your own data type. Because of these problems, these packages are not often used directly but rather as a dependent of a higher-level one.

Tip

The libraries that provide a common layer over different database systems are the most interesting to study. By learning just one, you get access to a big range of databases.

On that higher level, you will find HaskellDB. The idea from its developers was to expose the abstraction that the SQL databases are built upon: the relational algebra. Thus, you work with basic operations such as projection or restriction that can describe all the queries you can pose to the database. Furthermore, the schema for the tables is also embedded in the Haskell code, so you can guarantee the safety of all your queries (you won’t be accessing a nonexistent column, and you won’t be comparing columns of different types). The main drawback of HaskellDB is that it exposes an abstraction that is not as well known as SQL queries, so to use it you have to relearn how to express some concepts.

Introducing Persistent and Esqueleto

Persistent, available in Hackage and Stackage under the package name persistent, supports both relational and nonrelational DBMSs, which eases the transition between those two worlds in case it’s needed for enhancing the scalability of the application. When using Persistent, you still use your Haskell data types (i.e., you don’t need to marshal from and to your Client type), and the library generates all the glue code automatically (by using a Template Haskell). This gives you real type checking of your database statements, preventing a great range of application errors coming from this fact.

If you’re from an object-oriented programming background, you may recognize in this description the concept of an object-relational mapping (ORM), which also takes care of gluing a class with a specific table in a database. However, because it’s implemented in a functional language that embodies purity, Persistent doesn’t save automatically any change you do to the value holding the information about one of your rows. Rather, you need to explicitly ask for insertions, updates, and deletions.

One disadvantage of Persistent is that it supports only those operations common to every database it allows you to connect to. Since this includes both relational and nonrelational DBMSs, the common ground is sometimes limited. For example, you cannot easily express a join of two tables (in the SQL sense), since DBMSs such as MongoDB don’t have this concept. For that matter, some libraries have been developed that extend Persistent in one direction while restricting the applicable databases. This is the case of Esqueleto, a domain-specific language for expressing SQL queries inside Haskell using Persistent models.

Persistent itself encompasses a lot of related functionality in three areas: describing the schema of a database in such a way that can be used to work with usual Haskell data types, creating and adapting the schema of an existing database into a new one and adding or deleting columns (this is called the migration functionality), and performing actual work with the information in the database (insertions, updates, deletions, and queries).

Connection

The DBMS is the software performing the actual work of storing and retrieving information from a database. The flow of work is quite simple: you send a statement to the DBMS, and it does some work, returning some values or information about the success of the operation. But before sending statements, you need to know how to establish the initial connection to the DBMS.

If you look around, you will see that along with the basic persistent package, there are many others with a name that comes from joining persistent- with the name of a DBMS (e.g., persistent-sqlite). Each of these packages provides a back end for a specific system, that is, the code that is needed to communicate with it and make sense of the results it returns.

Caution

In Hackage there are other packages starting with persistent- (like persistent-map). Usually only those with the “Database” tag are related to Persistent.

Each back end provides two ways to communicate with the corresponding system: via a single connection or by creating a pool of connections (a set of open connections that are reused by different operations to increase the ability to share resources and thus enhance performance). For the first case, each back end has a withDBMSConn function that generates a connection given the parameters needed for each specific system. This connection can be used then as a parameter to the runSqlPersistM function, which executes a set of database actions. For example, here’s the code needed to insert a Client (which holds a first name, a last name, an address, a country, and an age) inside an SQLite database saved in the example.db file. Notice that at this point the code won’t compile because you haven’t provided the definitions for the entities in the database yet. You’ll learn how to do so in the next pages.
{-# LANGUAGE OverloadedStrings, TypeApplications #-}
import Database.Persist.Sqlite
import Control.Monad.Logger
exampleConn = runNoLoggingT $
  withSqliteConn @(NoLoggingT IO) @SqlBackend "example.db" $ conn ->
    liftIO $ flip runSqlPersistM conn $ do
      spain     <- insert $ Country "Spain"
      _client1  <- insert $ Client "Alejandro" "Serrano"
                                   "Home Town, 1" spain 30
      return ()

The code above uses some fancy GHC extensions. First, the withSqliteConn function expects a Text value with the path to the database. So, I’ve used the OverloadedStrings GHC extension in order to write that value in the code using string literal syntax. Another extension, TypeApplications, is needed to specify the withSqliteConn in which monad we are operating. In older versions of GHC you would use type annotations, but in modern ones types can be specified directly using the @T syntax. In this case, we choose an IO monad wrapper with a dummy logging interface, namely NoLoggingT. That logging interface comes from the monad-logger package, which is heavily used by persistent.

Since opening a connection and running some action in the database is so common, many of the back ends provide a special runDBMS convenience function for those simple cases. For example, the previous code can be written more concisely using runSqlite. Doing so frees you from passing the conn value yourself through several functions, as you can see in the following example. Also, the dummy logging interface is selected, so you don’t need a call to runNoLoggingT. In every other aspect, the two pieces of code are interchangeable.
exampleConn = runSqlite @IO @SqlBackend "example.db" $ do
  spain     <- insert $ Country "Spain"
  _client1  <- insert $ Client "Alejandro" "Serrano"
                               "Home Town, 1" spain 30
  return ()
Creating a pool is usually as easy as creating a single connection. Instead of withDBMSConn, you use the corresponding withDBMSPool function, which takes as input the information to contact the database and the maximum number of connections to keep open. Then, you use the runSqlPersistMPool function with the actions to execute. Here’s the same example for SQLite but using a pool of ten connections (it doesn’t make much sense to use a pool in this little example; rather, you would share the pool between different actions). Notice how similar the code is to the example with a single connection.
examplePool = runNoLoggingT $
  withSqlitePool @(NoLoggingT IO) @SqlBackend "example.db" 10 $ pool ->
    liftIO $ flip runSqlPersistMPool pool $ do
      spain     <- insert $ Country "Spain"
      _client1  <- insert $ Client "Alejandro" "Serrano"
                                   "Home Town, 1" spain 30
      return ()
There is a large disparity of connection methods between DBMSs. Table 11-1 includes a brief explanation of the parameters for each of the more commonly encountered systems.
Table 11-1

Connection Parameters for Persistent Back Ends

DBMS

Single Connection

Connection Pool

Parameters

Sqlite

withSqliteConn

withSqlitePool

Just the path to the file, or ":memory:" to create a temporary in-memory database.

PostgreSQL

withPostgresqlConn

withPostgreSqlPool

A connection string of the form "host=h port=p user=u password=p dbname=db".

MySQL

withMySQLConn

withMySQLPool

A value of the type ConnectionInfo. A defaultConnectionInfo value is provided, which connects to database test in localhost, whose fields connectHost, connectUser, and so on, can be then modified to your needs.

MongoDB

withMongoDBConn

withMongoDBPool

A set of different parameters for the database including host, port, authentication, and idle time.

Note: the pooled version needs two extra parameters for the number of stripes and connections per stripe.

Schemas and Migrations

As mentioned in the introduction, Persistent needs a description of the database schema you will be working with to ensure type safety. In addition, it also needs information about the mapping between this schema and the actual Haskell data types you will use to represent the data. To bring all of this together, Persistent uses quite complex types to encode all the invariants and relations. Right now in the book, many of the techniques used in those types haven’t been introduced yet. Chapter 13 will be providing all that information.

The good news is that you don’t have to care about all of this because Persistent comes with a package, persistent-template, which is able to generate all the necessary code (Haskell data types, instances for some type classes, glue code for the marshaling) from a single description of the database schema. For the rest of the chapter, I assume that your project has both persistent and persistent-template listed as dependencies, in addition to the SQLite back end.

Note

persistent-template uses Template Haskell to generate all its code. You were introduced to Template Haskell in Chapter 6. As a small reminder, Template Haskell is the metaprogramming facility of GHC, which means that at compile time a function can generate new or manipulate existing code.

Describing the Entities

I’ll start with a simple example for the time machine store: describing a database for holding the information of the clients. To make things easier, let’s assume that the clients are always individuals, whose information is composed by first and last names, an address, and an age. The following code defines a database entity for holding this information. This definition will be used later to communicate with the correct tables or documents in the database.
{-# LANGUAGE TemplateHaskell, QuasiQuotes, TypeFamilies, EmptyDataDecls,
             MultiParamTypeClasses, FlexibleContexts, GADTs,
             GeneralizedNewtypeDeriving, OverloadedStrings #-}
import Database.Persist.TH
mkPersist sqlSettings [persistLowerCase|
Client
  firstName String
  lastName  String
  address   String
  age       Int
  deriving Show
|]

After several language extensions that you need to enable, which are listed inside the {-# LANGUAGE #-} pragma, there’s a call to mkPersist. This function needs settings for generating the code and the description of the database. In this case, the code is telling the code generation to follow SQL standard conventions by giving sqlSettings as first parameter.1 The latter parameter is not given using usual Haskell syntax but rather using quasiquotation : the code between [, |, and |] won’t be parsed by the Haskell compiler but rather by a custom reader (in this case, persistLowerCase). Thus, the important thing to know is how to express the database schema for this quasiquoter.

Caution

Be aware that inside a quasiquoter you’re no longer writing Haskell code but some custom Persistent syntax. The need to learn an additional (albeit small) language is seen by some people as a drawback of the persistent-template library.

As you can see, there’s a block per each kind of entity you want to store. In this case, you have only one, namely, Client. Then, you can find all the fields that make that entity, first the name and then the type. Finally, you can include a deriving clause like you would do with the definition of any other Haskell data type. These are the most basic constructs; you will see more as the chapter progresses.

Tip

If you write json next to the entity name in the quasiquotation (for example, Client json), Persistent will derive instances of aeson’s ToJSON and FromJSON automatically. This will be handy in the next chapter, where you’ll learn how to create a web application using Haskell.

Since the code uses persistLowerCase as a quasiquoter, each entity will be mapped to a table and each field to a column, whose name will consist of converting from camel case (each word in an identifier starts with capital letter) to underscore case (different words in an identifier are separated by underscores). In this example, the data will be saved in a client table with the columns first_name, last_name, address, and age. If you want to retain the names using camel case, you need to use persistUpperCase instead of persistLowerCase.

In addition to the glue code for contacting the database, mkPersist will also generate data type constructors representing each of the fields in the entity. These constructors’ names are created by juxtaposing the name of the entity with the name of each of the fields, in camel case. In this case, you would obtain ClientFirstName, ClientLastName, and so on. In particular, this means you can use the same field name in different entities (e.g., using a field name for both individual and company entities) since the prefix will take them apart. This is different from Haskell records, which cannot share field names.

You may have noticed that so far there hasn’t been any description of an identifier , that is, no indication of what the table’s primary key might be. The primary key is a piece of information that uniquely determines a row in a table, and it’s commonly used in SQL environments. Persistent developers found this idea of an identifier interesting too, and that’s why a schema always defines an implicit id field for each entity. In conclusion, in addition to all the explicitly declared fields, you have a ClientId field representing the unique identifier of each row.

This identifier field is used, in addition to performance considerations, to refer to an entity from another or to embed an entity inside another. For example, let’s add information about the country of residence of each client. A simple solution would be to add another String field for the country name. However, as database theory shows, this is not a good idea because you waste a lot of space with repeated information, and even worse, you may have problems when the same country is not written the same everywhere (e.g., "United States" vs. "US" vs. "USA"). The correct solution is to create a new entity that will hold information about countries and refer to it in Client (this is called normalization in database jargon). This is perfect for an identifier field. The new schema declaration looks like this:
mkPersist sqlSettings [persistLowerCase|
Country
  name String
  deriving Show
Client
  firstName String
  lastName  String
  address   String
  country   CountryId
  age       Int
  deriving Show
|]

With this schema definition, the insertions shown in the previous section are correct Haskell code. Notice that Persistent has created a data type for each entity. This data type can be thought of as being defined as a record whose only constructor shares its name with the entity being defined. Thus, in the example, the constructors Country and Client have been generated. Another remark is that insert returns the identifier of the new entity in the database, and the code uses that for referring to the country "Spain" in the client.

Identifiers are Special

In most databases, the identifier column is just another column, with the special qualities of being unique among the rows and usually being autoincremental. However, you have seen that Persistent treats those fields in a special way. Why is that the case?

The main reason is that by having special data types for each identifier of each entity, it’s not possible to mix identifiers of different entities. In this example, ClientId will be a different type from CountryId, so you cannot pass a value of the latter when an element of the former is requested.

A second reason is that by hiding the details of how the identifier is represented, Persistent goes further into supporting different databases; it can choose the representation depending on the final DBMS. For example, MongoDB has a special type called ObjectId that is developed specifically for identifying an object. This is different from the path taken by SQL databases.

I have discussed how Persistent always generates a special identifier field per entity. However, in other cases, there are other uniqueness constraints, which are a set of fields that uniquely identifies a value in the database. Since the combined value of those fields must appear only once in the database, it must be protected by the DBMS for duplicates. In that way, the database can protect data for a whole class of incorrect values. Furthermore, the extra work in inserting and updating is often surpassed by the increase in performance that can be achieved when the uniqueness is guaranteed.

A uniqueness constraint is declared inside the entity it refers to by giving it a name, which must start with a capital letter (in the following examples all the constraints are prefixed with Unique, but this is not a requirement), and then listing the fields that are included in the constraint. For example, you can consider that the combination of first and last names, an address, and a country uniquely identifies a client in a database, so it makes sense to add a uniqueness constraint to it. Furthermore, countries are also uniquely identified by their name (no two countries share a name). The following is the code for doing this, where some other entities and fields have been omitted:
mkPersist sqlSettings [persistLowerCase|
Country
  ...
  UniqueCountryName name
  deriving Show
Client
  ...
  UniqueClient firstName lastName address country
  deriving Show
|]

For data mining purposes, it’s helpful to save the age of each client. However, not all clients would be happy giving that piece of information, so it’s better to make it optional. This corresponds to the notion of a nullable column in SQL and of a value wrapped on Maybe in Haskell. Indeed, that last word is the one chosen by Persistent to indicate that a field is optional for a value. If you want to make age optional, change its declaration from age Int to age Int Maybe.

In addition to the age, it’s also useful to save information about the gender of the clients. Near the beginning of the book, I discussed how using a Boolean value is not the right choice for representing the gender in Haskell. Rather, you should use a data type specific to that task. This increases type safety and removes the errors because of inconsistent mappings (is True equal to male or female?).

The same arguments are relevant to Persistent. When you want to save a value in a field from a set of them, don’t use a Boolean or an integer. You have already seen how to overcome this problem when the set of values is big and can change with time (as countries do). For a small fixed list of values, the best option is to use an enumeration written in Haskell and use its type inside the declaration of the entity (in this example, creating a Gender data type and including a gender Gender field declaration in the Client entity). The only stone in the path is that you need to include some extra code to make a new data type available to Persistent, and because of limitations in Template Haskell, you must do so in a separate file. In this case, start by writing this code in a new Chapter11.Gender module:
{-# LANGUAGE TemplateHaskell #-}
module Chapter11.Gender where
import Database.Persist.TH
data Gender = Male | Female
    deriving (Show, Read, Eq)
derivePersistField "Gender"
The important part is the call to derivePersistField. Now, in the file with the schema declaration, import this new module and add a gender Gender Maybe line to it (making the gender information optional is also a good choice). Now a Client value looks like this:
Client "Alejandro" "Serrano" "Home Town, 1" spain (Just 25) (Just Male)

The previous information makes up the basics of the schema language, but many more features are available. For that reason, I strongly recommend you survey the Persistent documentation2 or the Book of Yesod3 before declaring your entities.

Creating the Database

Let’s now look at the converse scenario. Say you try to run some code similar to the code in the chapter introduction for adding a country and a client to a completely empty database. Here’s an example:
runSqlite @IO @SqlBackend "example.db" $ do
  spain     <- insert $ Country "Spain"
  _client1  <- insert $ Client "Alejandro" "Serrano"
                               "Home Town, 1" spain
                               (Just 30) (Just Male)
  return ()
When you execute this, either via an executable or via GHCi, you will receive an error message like this:
chapter11: user error (SQLite3 returned ErrorError while attempting to perform prepare "INSERT INTO "country"("name") VALUES(?)": no such table: country)

The source of this error is simple. You have declared how to map entities to a database, but none of the tables referenced in that code exists yet. The good news is that, as I mentioned in the introduction of this chapter, Persistent includes functionality to automatically create or adapt a database to the format expected by a schema. This adaptation process is called the migration of the database.

Performing a migration in Persistent consists of two steps. First, you need to tell Persistent to generate the code that will migrate the database. For that task, the library uses Template Haskell once again, and you would need to add the schema declaration all over again. Since using the same schema for creating the mappings and the migration is common, you can use the convenience function share to pass the quasiquoted code to both functions. In concrete words, this means you only need to change the call to mkPersist to read instead.
share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persistLowerCase|
The second step is to execute the migration by calling runMigration with a single argument that is the name given to mkMigrate. For example, you can generate the tables in a Sqlite database using this:
runSqlite @IO @SqlBackend "example.db" $ runMigration migrateAll
If you include that line of code before the two insertions in the database, you will see a log of the migration being output to your screen. That log should be similar to the following. You can see in the output how all the fields and constraints in the entities are translated quite naturally in tables in the SQL world.
Migrating: CREATE TABLE "country"("id" INTEGER PRIMARY KEY,"name" VARCHAR NOT NULL,CONSTRAINT "unique_country_name" UNIQUE ("name"))
Migrating: CREATE TABLE "client"("id" INTEGER PRIMARY KEY,"first_name" VARCHAR NOT NULL,"last_name" VARCHAR NOT NULL,"address" VARCHAR NOT NULL,"country" INTEGER NOT NULL REFERENCES "country","age" INTEGER NULL,"gender" VARCHAR NULL,CONSTRAINT "unique_client" UNIQUE ("first_name","last_name","address","country"))

It’s important to notice that migrations can only add and delete columns; there’s no way for the code to know when a field name has been changed into another one. However, the simple case of adding a new field poses a question: which value should be written for those rows that were already there? The solution comes in the form of a new attribute that can be added to a field, namely, default, which defines this value to write in the case of a migration.

Caution

The default attribute applies only to migrations. It doesn’t have any effect on your actual Haskell code, and it doesn’t allow you to leave the value of that field undefined.

As an example, let’s add a canWeSend field to the Country entity expressing whether the time machines and accessories can be shipped to that specific country. Since you have already created the tables and added some values, the migration procedure needs information of the values to write in that field. If you consider that the best choice is True (shipments can be made to all countries already in the database), the schema definition must be changed to read as follows:
share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persistLowerCase|
Country
  name      String
  canWeSend Bool default=True
  UniqueCountryName name
  deriving Show
Client
  ...
|]

The sidebar “Using An Existing Database” shows how to deal with legacy databases having table names and column names not matching your entities and fields. Then, Exercise 11-1 guides you in defining the entity that will be used throughout this chapter for products.

Using an Existing Database

You’ve seen how to use migration to create or adapt an existing database to the table schema that Persistent expects. However, a quite common scenario involves an already existing database that you want to access.

In those cases, you can specify sql attributes to change the names of the tables that entities and fields are mapped to. Furthermore, you can specify the id attribute on an entity to declare the column in the table containing the identifier. The following is an example for Country:
Country sql=my_country_table id=the_country_identifier
  name      String
  canWeSend Bool default=True sql=send_there_is_possible
  UniqueCountryName name
  deriving Show

The table name will be my_country_table. The column named the_country_identifier will be treated as the ID column.

Exercise 11-1. Defining Products

Add the definition of a Product entity to the database. That entity must include the name and description of each product, its price as a Double value, and the number of elements in stock.

Be aware that Persistent does not allow using Integer values in fields. Rather, you should use Int64, which represents those integer values that fit in 64 bits.

Finally, let’s add a simple definition for purchases in the store. A purchase will be a collection of a client, a single product for which the client may have ordered more than one unit, and the total amount that was paid (there’s an implicit assumption that each purchase contains only one type of product). The entity Purchase in this case will relate a Client and a Product, so you need to use their identifiers, as the following code shows:
share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persistLowerCase|
...
Client
  ...
Product
  ...
Purchase
  client  ClientId
  product ProductId
  number  Int
  amount  Double
  deriving Show
|]

Queries

Querying is one of the most basic operations you can perform with a database. A good knowledge of the different options for selecting data from your sources is instrumental for obtaining the performance that applications require. As you will see in this section, Persistent provides several ways to query for data, and Esqueleto extends that feature even more with SQL-only operators such as joins.

Note

In the rest of examples in the chapter, I will show only the statements. If you want to execute any of those statements, wrap them inside a runSqlite call (or the corresponding version for other database), as done in the migration examples. The code won’t show all signature, because some of them contain type system features that will be introduced later (in Chapter 13). GHC correctly infers the signatures in the missing cases, so you can still copy and paste the examples as they appear in the book.

Queries by Identifier or Uniqueness

Since every entity in the database is embellished by Persistent with a unique identifier, the simplest way to query a database is indeed to ask for a value with a given identifier. This is done using the function get , which returns its value wrapped in a Maybe. It may be the case that for a given identifier there’s no associated value. For example, you have defined the client field in the Purchase entity to hold a ClientId; that identifier can be used afterward to retrieve all the information about a client. The following code performs that task:
import Database.Persist.Sql
getPurchaseClient p = get (purchaseClient p)  -- returns Maybe Client
If you have the identifier but have not obtained it from a previous query and thus it is wrapped in the appropriate constructor for identifier, you can still look for the object using the Key corresponding to that type. Although better to be avoided, this option may be needed, for example, if you are obtaining the identifier from the route in a web application (the URL looks like /clients/5, where 5 is the client identifier). For example, the code for asking for a Client by identifier is as follows:
getClientById n = get $ ClientKey (SqlBackendKey $ fromIntegral n)
Another way of identifying a value was discussed before: uniqueness constraints. For each of these constraints, Persistent generates a constructor that takes as parameters a value for all the fields in its definition. Then, you can use the getBy function to query the database using that combination of data. For example, you defined previously a UniqueClient constraint on the entity Client, so you can query with one of them as follows:
getClientByInfo :: MonadIO m
                => String -> String -> String -> String
                -> SqlPersistT m (Maybe Client)
getClientByInfo fName lName addr cnName = do
    cn <- getBy $ UniqueCountryName cnName
    case cn of
      Just (Entity cId _) ->
        do cl <- getBy $ UniqueClient fName lName addr cId
           case cl of
             Just (Entity _ client) -> return $ Just client
             Nothing                -> return Nothing
      Nothing -> return Nothing

It’s important to notice that the result of getBy is not just the value you asked for but rather a combination of the identifier associated with the value and the value itself, provided inside the Entity data type. In the previous example you can find two different usages of getBy: one for finding the identifier of the country given its name (check how in the Entity pattern matching only the identifier is actually bound to a variable) and another one for finding a client with given personal attributes and belonging to that country.

Selecting Several Entities

The full power of a database is unleashed when you perform queries not only via identifiers but also based on other fields within entities. Furthermore, the languages in which queries are posed are usually quite expressive. For example, you could be asking for all those clients coming from the United States who have bought at least two products in the last few years to create a better campaign for clients in that part of the world.

Those queries are usually not guaranteed to return only one or no results but can produce a set of values that fulfill the requirements of the query. You can ask Persistent to represent that set using two different approaches: either as a Source from the conduit library or as a plain list. You can also choose to just return the first value of the set wrapped on a Maybe. You may then get Nothing back if the query selects an empty set.

On an orthogonal axis, you can ask Persistent to show the entire set of fields of each value matched by the query (i.e., all the fields belonging to the entities) or just return their identifiers. In the first case, the information from the fields is wrapped inside an Entity constructor (as done with getBy), which contains both the entity and its identifier. Five out of the six possible combinations have a corresponding function in Persistent, as Table 11-2 shows.
Table 11-2

Query Functions in Persistent

 

As a Source

As a List

Only First Result in a Maybe

Identifier and value

selectSource

selectList

selectFirst

Only identifier

selectKeys

selectKeysList

Not available

All the functions listed in Table 11-2 have the same parameters and are used in the same way, so in the examples I’ll use selectList because it’s the one that shows output simply while still letting you form an idea about what’s going on in the database.

The first of the two parameters is a list of filters . Each filter constrains the value of one or more fields to satisfy a specified condition. For example, you might require the age of a Client to be greater than 18. The way in which you write those filters is by using one of the constructors that Persistent has created for each of the fields in an entity and a set of built-in operators. The age filter, for example, is represented as ClientAge >=. Just 18 (you need to include Just because age was an optional field, so its values are wrapped in Maybe).

In addition to the simple equality and ordering operators (==.) and (!=.) – notice that “not equals” is written using C-like syntax, not as (/=.)4 – (>.), (>=.), (<.), and (<=.) – Persistent supports an operator for a field having a value from a set of wanted ones, namely, (<-.) and read as “in” and its negation (/<-.) “not in.” In the following example, I ask for all those Clients coming from Spain or Germany older than 18, using that operator. Notice how I need to get the identifiers of those countries prior to creating the actual query.
getAdultsOfSpainAndGermany :: MonadIO m => SqlPersistT m [Entity Client]
getAdultsOfSpainAndGermany = do
  es <- getBy $ UniqueCountryName "Spain"
  de <- getBy $ UniqueCountryName "Germany"
  let countries = map entityKey (catMaybes [es, de])
  selectList [ ClientCountry <-.countries, ClientAge >=. Just 18 ] []
In case you want to know only the number of clients that fulfill those conditions, you may be tempted to just apply length to the resulting list. However, this is not very efficient since you’ll be bringing a lot of data from the database only to discard it afterward. Instead, you should use the count function , which just returns the numbers of values satisfying the query. Here’s an example:
countAdultsOfSpainAndGermany :: MonadIO m => SqlPersistT m Int
countAdultsOfSpainAndGermany = do
  sp <- getBy $ UniqueCountryName "Spain"
  de <- getBy $ UniqueCountryName "Germany"
  let countries = map entityKey (catMaybes [sp, de])
  count [ ClientCountry <-. countries, ClientAge >=. Just 18 ]
Sometimes the operators introduced up to this point are not enough or are not convenient for expressing disjunctive queries. In those cases, you can use the “or” operator, (||.), for merging two lists of filters into a single query. For example, say you want to obtain the clients from Spain and the United States who are at or above the legal age at which alcohol may be purchased. However, that age is not the same in both countries. In Spain it’s 18, whereas in the United States it is 21. Thus, the query must reflect that fact: get clients from Spain older than 18 or from the United States older than 21, as the code shows:
getAdultsOfSpainAndUS :: MonadIO m => SqlPersistT m [Entity Client]
getAdultsOfSpainAndUS = do
  Just (Entity spId _) <- getBy $ UniqueCountryName "Spain"
  Just (Entity usId _) <- getBy $ UniqueCountryName
                                    "United States of America"
  selectList (   [ ClientCountry ==. spId, ClientAge >=. Just 18 ]
             ||. [ ClientCountry ==. usId, ClientAge >=. Just 21 ] )
             []

I’ve been silently adding an empty list as a second parameter to . selectList . That list represents the options to the query, which do not affect the results themselves but rather the way the results are presented. One typical option involves sorting by one or more fields. In the previous example, if you want to return the clients from the oldest to the youngest one, you must change the list of options to [ Desc ClientAge ]. As you can see, the way to indicate the ordering is by using one of the constructors Asc or Desc and the constructor corresponding to the field.

Other options are used to extract just one subset of the results. If you think of the entire set of results as an ordered list, OffsetBy allows you to discard some elements at the beginning and return only those from one point to the end, and LimitTo limits the number of returned values to those smaller than a certain amount, ignoring the elements past that amount. Those options are useful to paginate information in a web application. For example, in the store you may decide to show the products in pages of 10. Therefore, page n (starting at 1) would show ten products starting from the element (n-1)*10 (the results from Persistent are zero-indexed). The corresponding query to obtain those products follows:
getProductsPage n  -- returns [Entity Product]
  = selectList [ ] [ Asc ProductPrice, LimitTo 10, OffsetBy ((n-1)*10) ]

SQL Queries with Esqueleto

Using only get , getBy , selectList , and count (a version not previously introduced that returns only the number of results), let’s try to obtain the countries of all those clients that bought more than three products. The following implementation does it in several steps. It first gets all the possible buyers, then counts the number of purchases of each of them, filters those with more than three, and finally gets all the countries that are left. Notice how the code is quite intricate and performs several round-trips to the database.
getCountriesWithBigBuyers :: MonadIO m => SqlPersistT m [Country]
getCountriesWithBigBuyers = do
  buyers <- selectKeysList [ ] [ ]
  buyersPurchases <- mapM ( -> count [ PurchaseClient ==. b ]
                          >>= c -> return (b,c)) buyers
  let buyersPurchases' = filter ((_,c) -> c > 3) buyersPurchases
  mapM ((b,_) -> do Just cl <- get b
                     Just cn <- get $ clientCountry cl
                     return cn)
       buyersPurchases'

If you are used to relational databases, there’s a feature from its queries that would come to your mind to solve this problem: joins . However, Persistent aims to support also nonrelational databases, making the ability to join unavailable to you.

Note

To be completely honest, there’s support in Persistent for sending a raw query to the database. But that means that there’s no type checking and automatic marshaling, so you’ve lost all the benefits of using the library.

As I have already introduced, the solution to this problem is using another library, namely, Esqueleto. The esqueleto package provides support for writing SQL queries that are type checked. Furthermore, it reuses the schema definitions from Persistent, making the former a great companion to the latter.

As in the previous case, you can return the results in two different data structures. The select function shows the queried values as a list, whereas selectSource wraps them into a Conduit source. Both take as an argument a SqlQuery, so you need to focus on how to construct values of that type.

The first thing to do is to select which entities you’ll query. To do that, you use from. This is quite an interesting piece of the library because it takes a function as an argument, and depending on the type of the function, a certain subset of entities is queried. For example, if you want to query for clients, you must pass a function whose unique argument is of type Client. In most of the cases, that type is inferred by the compiler because of further filters (e.g., if you filter ClientAge, the compiler would infer that you’re querying Client), but in some cases you may need extra annotations. If you want to perform a query that involves more than one entity, you must pass them on a tuple.

The body of the function passed to from must be of a certain monadic type, so it’s fairly common to use a do block at that point. Inside that monad, several special functions can be called to specify a query. The simplest one is where_ (notice the final _, which prevents collision with the where keyword), which includes a set of restrictions over the fields of the entities, inside an expression that may contain the following:
  • Access to a field of an entity using (^.).

  • Constant values wrapped using val, or lists of them wrapped using valList.

  • Comparisons with (==.), (!=.), and the rest of operators used in Persistent, remarking that (<-.) takes the name in_ and (/<-.) is called notIn.

  • Grouping of several constraints with (&&.) (for “and”) and (||.) (for “or”). Notice that this is different from plain Persistent because in that case there is no conjunctive operator; the set of filters in the list is implicitly conjoined.

Another possible function is orderBy, which takes a list of expressions of the form asc (entity ^. field) or desc (entity ^. field), representing ascending and descending order over those fields. As a simple example, let’s just return all the clients older than 25 years, ordered by last and first names . Note that we have imported Database.Esqueleto qualified, since several names conflict with Persistent. However, in most cases you replace the latter entirely by the former, so there is no conflict.
import Database.Esqueleto ((^.))
import qualified Database.Esqueleto as E
getPeopleOver25 :: MonadIO m => SqlPersistT m [Entity Client]
getPeopleOver25 =
  E.select $
  E.from $ client -> do
  E.where_ (client ^. ClientAge E.>. E.just (E.val 25))
  E.orderBy [ E.asc (client ^. ClientLastName)
            , E.asc (client ^. ClientFirstName) ]
  return client

The code makes explicit that ClientAge has a Maybe value. The constant value the field is compared to must be wrapped first with val and then with just.

The return value of the monad may not be all the entities that are queried; only a subset of the information may be relevant to the next steps, and the rest could be discarded. For example, let’s try to get the information about all the clients from Spain or Germany older than 25. As you can see in the following code, I select both Client and Country as entities to query, make sure that both entities are linked correctly (the Country in Client is the same as the Country identifier), and perform restrictions over the country name and the client age. Finally, I just return the client information, not the country name.
getPeopleOver25FromSpainOrGermany
  :: MonadIO m => SqlPersistT m [Entity Client]
getPeopleOver25FromSpainOrGermany =
  E.select $
  E.from $ (client, country) -> do
  E.where_ (     client ^. ClientAge E.>. E.just (E.val 25)
           E.&&. country ^. CountryName `E.in_`
                   E.valList [ "Spain", "Germany" ]
           E.&&. client ^. ClientCountry E.==. country ^. CountryId )
  E.orderBy [ E.asc (client ^. ClientLastName)
            , E.asc (client ^. ClientFirstName) ]
  return client
Here you’ve actually seen an example of a join that was implicit in the link between the country from a client and the entity representing the country itself. Esqueleto allows you to be a bit more explicit, using the InnerJoin constructor in the argument instead of a bare tuple. Then, with the “on” function in the monad you specify the conditions for the join of the entities. The previous example could be rephrased as follows:
getPeopleOver25FromSpainOrGermanyJoin
  :: MonadIO m => SqlPersistT m [Entity Client]
getPeopleOver25FromSpainOrGermanyJoin =
  E.select $
  E.from $ (client `E.InnerJoin` country) -> do
  E.on (client ^. ClientCountry E.==. country ^. CountryId)
  E.where_ (     client ^. ClientAge E.>. E.just (E.val 25)
           E.&&. country ^. CountryName `E.in_`
                   E.valList [ "Spain", "Germany" ])
  E.orderBy [ E.asc (client ^. ClientLastName)
            , E.asc (client ^. ClientFirstName) ]
  return client

In addition to inner joins, Esqueleto supports speaking about outer joins, which are useful in many situations.

To wrap up this fast introduction to Esqueleto, now you need to learn a bit about expressing grouping like you do in SQL. The grouping itself is done via the groupBy function, which takes as an argument a field to make the grouping. Then, you can use aggregation operators such as sum_, min_, max_, avg_, or countRows over other fields to perform that operation over all values of each group. Explaining in detail how these operations work would take a long time, and that is a task better suited to a book on SQL, from which Esqueleto takes its concepts .

As an example of grouping, let’s try to compute the amount of money spent by each client in their purchases. To do so, the idea is to group the information of all Purchase entities by the Client identifier and then use the sum_ aggregator. Notice how a left outer join is needed to aggregate also over those clients that haven’t yet purchased any product.
getMoneyByClient
  :: MonadIO m => SqlPersistT m [(Entity Client, E.Value (Maybe Double))]
getMoneyByClient =
  E.select $
  E.from $ (client `E.LeftOuterJoin` purchase) -> do
  E.on (client ^. ClientId E.==. purchase ^. PurchaseClient)
  E.groupBy (client ^. ClientId)
  let s = E.sum_ (purchase ^. PurchaseAmount)
  return (client, s)

The language supported by Esqueleto is indeed expressive and allows you to express queries far beyond what has been explained here. My recommendation for those moments when you need to perform powerful queries against a relational database is to read its documentation in more detail to discover the full generality of its constructions.

Insertions, Updates, and Deletions

At the beginning of the chapter, you were introduced to the way of inserting new values in the database, via insert. That function returns the identifier that was assigned to the new value that was saved on the database, which comes in handy when referring to it on other database values. However, in some cases the value won’t be inserted because of collisions on the uniqueness constraints. If this happens, it’s recommended that you use instead insertUnique, which returns a Maybe value, which will be Nothing if the value could not be inserted.

I mentioned that Persistent embraces the pure functional paradigm, so all changes made to a database value must be explicit. The easiest way to modify a value is by using replace. That function takes an identifier and a value of one of your entities and replaces whatever value was saved before in the database with the mentioned identifier with the new value. For example, a quite slow but simple way to make sure that all the names in the database start with a capital letter would be as follows:
import Data.Char
capitalizeNamesSlow :: MonadIO m => SqlPersistT m ()
capitalizeNamesSlow = do
  clients <- selectList [] []
  mapM_ ((Entity ident client) ->
             let c:rest  = clientFirstName client
              in replace ident
                   $ client { clientFirstName = (toUpper c):rest })
        clients

This solution is slow because it needs to bring all the data from the database to the program and then send back all the information for each client. In simple cases, the same task can be performed right on the database side, without any information being sent back and forth. Persistent includes support for doing such an in-place update when the operation either is assignment to a constant or involves only numerical data present in the value. The function that performs such an operation is updateWhere; all it takes as a parameter are a list of filters to specify which elements in the database will be affected by the changes and a list of in-place updates. Each update is represented using a field name, one of the update operators ((=.) for direct assignment and -, (+=.), (-=.), (*=.), or (/=.) for applying a numerical operation), and a constant value that is the argument in the update.

For example, let’s decide to provide a 10 percent discount. However, you don’t want to provide such a big discount to all the products because in the case of expensive ones, that discount is too high. The solution is to provide only a 3 percent discount on all products whose price is higher than 10,000. The following code uses 10,000 as the dividing line and applies either a 3 percent or a 10 percent discount as appropriate:
discount :: MonadIO m => SqlPersistT m ()
discount = do
  updateWhere [ ProductPrice <=. 10000 ] [ ProductPrice *=. 0.9 ]
  updateWhere [ ProductPrice >. 10000 ] [ ProductPrice *=. 0.97 ]

As with the queries, Persistent is not powerful enough to represent some of the conditions for updating in just one expression. Thus, you have to resort either to multiple queries and updates or to using a more powerful language such as Esqueleto. In the second instance, you need to change the call to select and from into a call to update. Once again, that function takes as a parameter another function, whose type will inform Esqueleto about which entities are affected by the change. However, that update does not allow tuples or joins because the changes must affect only one kind of entity. To perform queries inside an update block, you can use sub_select, which works in a similar way to select but it’s allowed to return only one result.

When using update instead of select, you are allowed to use one extra function inside the monad: set . This function is the one describing the changes to perform in the database. It takes as a first argument the value to modify and then a list of in-place updates as in Persistent. As an example, let’s say you decide to use a better policy for discounts. You apply them only to those products that don’t sell well. The way you define a low amount of sales is via the total amount of purchases being less than 10. You can perform the query and the update at once with the following code:
betterDiscount :: MonadIO m => SqlPersistT m ()
betterDiscount = E.update $ product -> do
  let totalAmount = E.sub_select $
                    E.from $ purchase -> do
                    E.where_ $ product ^. ProductId
                                 E.==. purchase ^. PurchaseProduct
                    E.groupBy (purchase ^. PurchaseProduct)
                    return $ E.sum_ (purchase ^. PurchaseAmount)
  E.where_ $ E.isNothing totalAmount E.||. totalAmount E.<. E.just (E.val 10)
  E.set product [ ProductPrice E.*=. E.val 0.9 ]

And now you come to the most difficult point for any database. After some time, it may be necessary to delete some of the information that you stored for either space or efficiency reasons. Doing so is easy if you know a way to uniquely identify the value to delete in the database, either via its identifier or via a set of fields with a uniqueness constraint. In the first case, you can use the delete function, and for the second case, the deleteBy function is provided.

However, there are cases where you want to delete a set of values that satisfy certain conditions. Using the functions you already know, that would mean obtaining the value identifiers, maybe via selectKeysList, and then mapping over all of them with delete. But considering everything you’ve learned in this chapter, you may be expecting a function that performs this task without any round-trip of data between the database and your application. And it exists: it’s called deleteWhere (similarly to updateWhere).

For example, as a way to clean the database, you may want to delete all the products in the database that are no longer in stock. The corresponding Persistent call is as follows, assuming that the stock never gets to negative values:
cleanProductStock :: MonadIO m => SqlPersistT m ()
cleanProductStock = deleteWhere [ ProductInStock ==. 0 ]
However, after careful thought, you can find that this code, even completely correct, would be dangerous to your database. If a client purchased one of the products that was no longer in stock, that line would make the reference to the product identifier incorrect because the product would no longer be in the database. A possible way to correct this problem is to create a better query that checks that there’s no Purchase entity with that product. Once again, Persistent itself doesn’t allow you to express that code; you need to use Esqueleto and its delete function, which works like select and update but deletes the resulting values. Notice in the following code how the notExists function performs the check for a corresponding purchase:
cleanProductStock' :: MonadIO m => SqlPersistT m ()
cleanProductStock' = E.delete $
  E.from $ product -> do
  E.where_ $       product ^. ProductInStock E.==. E.val 0
             E.&&. (E.notExists $
                    E.from $ purchase ->
                    E.where_ (purchase ^. PurchaseProduct
                                E.==. product ^. ProductId))

As you can see, performing updates and deletions in database values is easy using both Persistent and Esqueleto. The bulk of the learning process is to understand how to create good constraints for the values that will be affected by those operations. This shows one advantage of using high-level programming languages such as Haskell; you can express the concepts of your domain in a simple way and concentrate on the rules of your application.

Summary

In this chapter you learned how Haskell allows you to interface with a DBMS in a type-safe manner.
  • You saw the landscape of database access libraries in Hackage, and the focus moved to those that provide abstractions over several databases.

  • In particular, the choice of libraries was Persistent and Esqueleto, which allow you to access both relational and nonrelational databases and to work with them in a type-safe way, making it harder to write illegal queries, something that is easy when working with raw database statements.

  • You learned how to connect to databases both with and without a pool of connections.

  • You were exposed to the language for specifying the database schema, with the idea that using that information combined with Template Haskell will create a bunch of data types and glue code for talking with the DBMS and will migrate the database.

  • Much of the chapter was devoted to queries, either via identifiers, by a set of fields uniquely constrained, or in general by any field. For those cases where Persistent was not enough (like joins), Esqueleto was introduced.

  • Finally, you read about performing insertions, updates, and deletions with those libraries.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.52.203