In the previous two chapters, you saw how to access the file system to read and save information in a durable way. You’ve been given all the tools for dealing with either a custom file format or a treelike information representation such as JSON.
However, in most production system applications, you aren’t working with files. Instead, a database management system (DBMS) is used to store the application’s information and keep it safe. Furthermore, most DBMSs provide functionalities such as transactions, logging, and recovery and replication, which are quite useful for maintenance, protection against errors, and scalability. This chapter will provide you a bird’s-eye view of the many possibilities of database access in the Haskell ecosystem, in particular those that enable you to access relational DBMSs based on SQL.
In this chapter, I cover two specific libraries: Persistent and Esqueleto. These libraries are not tied to any specific DBMS but provide an abstraction layer over whatever DBMS you might choose to use. The main advantage of using Persistent is that it allows interaction with the database using user-defined Haskell abstract data types, which are converted from and to database format without further intervention (similar to object-relational mappers in the object-oriented world). This ensures a high degree of type safety when working with database code. Esqueleto is the companion library to Persistent, which focuses on specifics for SQL-based DBMSs.
Database Access Landscape
- 1.
Native implementations of a database in Haskell. The main example of this group is acid-state, which allows you to save Haskell data types directly on disk. The main downside of this approach is that interoperability with other systems and languages is harmed.
- 2.
Libraries for accessing a particular DBMS, such as sqlite-simple, mysql-simple, pgsql-simple (for some of the best-known relational ones), or MongoDB. These libraries are easy to understand for users of the corresponding DBMS but are also tied to a specific choice of system, making it difficult to migrate.
- 3.
Abstraction layers over several DBMSs, which provide a common interface for several systems (usually along with specific connectors for each of them). The range of DBMSs covered by each library is different; some focus only on the relational, SQL-based one, while others try to include nonrelational ones. The main advantage of these libraries is that the choice of DBMS can be reviewed while keeping intact almost all the database access code in the application. This goes well with Haskell’s philosophy of reusability.
Abstracting Over Several DBMSs
In Hackage, you will find a group of packages that focus on the SQL world, which is the biggest one among database users. From these, HDBC (from Haskell Database Connectivity) and hsql hide the specific details of each DBMS but otherwise expose the SQL model (tables, rows, and columns) as is, without any further abstraction. You just need to plug in the package corresponding to the specific DBMS you want to use (such as hsql-mysql or HDBC-postgresql). This gives you as the developer the full power of those systems, allowing the use of prepared statements or controlling the transaction boundaries. However, when using any of these libraries, you need to write the SQL statements by hand, which is usually a source of errors, and you need to marshal the results afterwards (which are given using a specific data type for SQL values) into your own data type. Because of these problems, these packages are not often used directly but rather as a dependent of a higher-level one.
Tip
The libraries that provide a common layer over different database systems are the most interesting to study. By learning just one, you get access to a big range of databases.
On that higher level, you will find HaskellDB. The idea from its developers was to expose the abstraction that the SQL databases are built upon: the relational algebra. Thus, you work with basic operations such as projection or restriction that can describe all the queries you can pose to the database. Furthermore, the schema for the tables is also embedded in the Haskell code, so you can guarantee the safety of all your queries (you won’t be accessing a nonexistent column, and you won’t be comparing columns of different types). The main drawback of HaskellDB is that it exposes an abstraction that is not as well known as SQL queries, so to use it you have to relearn how to express some concepts.
Introducing Persistent and Esqueleto
Persistent, available in Hackage and Stackage under the package name persistent, supports both relational and nonrelational DBMSs, which eases the transition between those two worlds in case it’s needed for enhancing the scalability of the application. When using Persistent, you still use your Haskell data types (i.e., you don’t need to marshal from and to your Client type), and the library generates all the glue code automatically (by using a Template Haskell). This gives you real type checking of your database statements, preventing a great range of application errors coming from this fact.
If you’re from an object-oriented programming background, you may recognize in this description the concept of an object-relational mapping (ORM), which also takes care of gluing a class with a specific table in a database. However, because it’s implemented in a functional language that embodies purity, Persistent doesn’t save automatically any change you do to the value holding the information about one of your rows. Rather, you need to explicitly ask for insertions, updates, and deletions.
One disadvantage of Persistent is that it supports only those operations common to every database it allows you to connect to. Since this includes both relational and nonrelational DBMSs, the common ground is sometimes limited. For example, you cannot easily express a join of two tables (in the SQL sense), since DBMSs such as MongoDB don’t have this concept. For that matter, some libraries have been developed that extend Persistent in one direction while restricting the applicable databases. This is the case of Esqueleto, a domain-specific language for expressing SQL queries inside Haskell using Persistent models.
Persistent itself encompasses a lot of related functionality in three areas: describing the schema of a database in such a way that can be used to work with usual Haskell data types, creating and adapting the schema of an existing database into a new one and adding or deleting columns (this is called the migration functionality), and performing actual work with the information in the database (insertions, updates, deletions, and queries).
Connection
The DBMS is the software performing the actual work of storing and retrieving information from a database. The flow of work is quite simple: you send a statement to the DBMS, and it does some work, returning some values or information about the success of the operation. But before sending statements, you need to know how to establish the initial connection to the DBMS.
If you look around, you will see that along with the basic persistent package, there are many others with a name that comes from joining persistent- with the name of a DBMS (e.g., persistent-sqlite). Each of these packages provides a back end for a specific system, that is, the code that is needed to communicate with it and make sense of the results it returns.
Caution
In Hackage there are other packages starting with persistent- (like persistent-map). Usually only those with the “Database” tag are related to Persistent.
The code above uses some fancy GHC extensions. First, the withSqliteConn function expects a Text value with the path to the database. So, I’ve used the OverloadedStrings GHC extension in order to write that value in the code using string literal syntax. Another extension, TypeApplications, is needed to specify the withSqliteConn in which monad we are operating. In older versions of GHC you would use type annotations, but in modern ones types can be specified directly using the @T syntax. In this case, we choose an IO monad wrapper with a dummy logging interface, namely NoLoggingT. That logging interface comes from the monad-logger package, which is heavily used by persistent.
Connection Parameters for Persistent Back Ends
DBMS | Single Connection | Connection Pool | Parameters |
---|---|---|---|
Sqlite | withSqliteConn | withSqlitePool | Just the path to the file, or ":memory:" to create a temporary in-memory database. |
PostgreSQL | withPostgresqlConn | withPostgreSqlPool | A connection string of the form "host=h port=p user=u password=p dbname=db". |
MySQL | withMySQLConn | withMySQLPool | A value of the type ConnectionInfo. A defaultConnectionInfo value is provided, which connects to database test in localhost, whose fields connectHost, connectUser, and so on, can be then modified to your needs. |
MongoDB | withMongoDBConn | withMongoDBPool | A set of different parameters for the database including host, port, authentication, and idle time. Note: the pooled version needs two extra parameters for the number of stripes and connections per stripe. |
Schemas and Migrations
As mentioned in the introduction, Persistent needs a description of the database schema you will be working with to ensure type safety. In addition, it also needs information about the mapping between this schema and the actual Haskell data types you will use to represent the data. To bring all of this together, Persistent uses quite complex types to encode all the invariants and relations. Right now in the book, many of the techniques used in those types haven’t been introduced yet. Chapter 13 will be providing all that information.
The good news is that you don’t have to care about all of this because Persistent comes with a package, persistent-template, which is able to generate all the necessary code (Haskell data types, instances for some type classes, glue code for the marshaling) from a single description of the database schema. For the rest of the chapter, I assume that your project has both persistent and persistent-template listed as dependencies, in addition to the SQLite back end.
Note
persistent-template uses Template Haskell to generate all its code. You were introduced to Template Haskell in Chapter 6. As a small reminder, Template Haskell is the metaprogramming facility of GHC, which means that at compile time a function can generate new or manipulate existing code.
Describing the Entities
After several language extensions that you need to enable, which are listed inside the {-# LANGUAGE #-} pragma, there’s a call to mkPersist. This function needs settings for generating the code and the description of the database. In this case, the code is telling the code generation to follow SQL standard conventions by giving sqlSettings as first parameter.1 The latter parameter is not given using usual Haskell syntax but rather using quasiquotation : the code between [, |, and |] won’t be parsed by the Haskell compiler but rather by a custom reader (in this case, persistLowerCase). Thus, the important thing to know is how to express the database schema for this quasiquoter.
Caution
Be aware that inside a quasiquoter you’re no longer writing Haskell code but some custom Persistent syntax. The need to learn an additional (albeit small) language is seen by some people as a drawback of the persistent-template library.
As you can see, there’s a block per each kind of entity you want to store. In this case, you have only one, namely, Client. Then, you can find all the fields that make that entity, first the name and then the type. Finally, you can include a deriving clause like you would do with the definition of any other Haskell data type. These are the most basic constructs; you will see more as the chapter progresses.
Tip
If you write json next to the entity name in the quasiquotation (for example, Client json), Persistent will derive instances of aeson’s ToJSON and FromJSON automatically. This will be handy in the next chapter, where you’ll learn how to create a web application using Haskell.
Since the code uses persistLowerCase as a quasiquoter, each entity will be mapped to a table and each field to a column, whose name will consist of converting from camel case (each word in an identifier starts with capital letter) to underscore case (different words in an identifier are separated by underscores). In this example, the data will be saved in a client table with the columns first_name, last_name, address, and age. If you want to retain the names using camel case, you need to use persistUpperCase instead of persistLowerCase.
In addition to the glue code for contacting the database, mkPersist will also generate data type constructors representing each of the fields in the entity. These constructors’ names are created by juxtaposing the name of the entity with the name of each of the fields, in camel case. In this case, you would obtain ClientFirstName, ClientLastName, and so on. In particular, this means you can use the same field name in different entities (e.g., using a field name for both individual and company entities) since the prefix will take them apart. This is different from Haskell records, which cannot share field names.
You may have noticed that so far there hasn’t been any description of an identifier , that is, no indication of what the table’s primary key might be. The primary key is a piece of information that uniquely determines a row in a table, and it’s commonly used in SQL environments. Persistent developers found this idea of an identifier interesting too, and that’s why a schema always defines an implicit id field for each entity. In conclusion, in addition to all the explicitly declared fields, you have a ClientId field representing the unique identifier of each row.
With this schema definition, the insertions shown in the previous section are correct Haskell code. Notice that Persistent has created a data type for each entity. This data type can be thought of as being defined as a record whose only constructor shares its name with the entity being defined. Thus, in the example, the constructors Country and Client have been generated. Another remark is that insert returns the identifier of the new entity in the database, and the code uses that for referring to the country "Spain" in the client.
Identifiers are Special
In most databases, the identifier column is just another column, with the special qualities of being unique among the rows and usually being autoincremental. However, you have seen that Persistent treats those fields in a special way. Why is that the case?
The main reason is that by having special data types for each identifier of each entity, it’s not possible to mix identifiers of different entities. In this example, ClientId will be a different type from CountryId, so you cannot pass a value of the latter when an element of the former is requested.
A second reason is that by hiding the details of how the identifier is represented, Persistent goes further into supporting different databases; it can choose the representation depending on the final DBMS. For example, MongoDB has a special type called ObjectId that is developed specifically for identifying an object. This is different from the path taken by SQL databases.
I have discussed how Persistent always generates a special identifier field per entity. However, in other cases, there are other uniqueness constraints, which are a set of fields that uniquely identifies a value in the database. Since the combined value of those fields must appear only once in the database, it must be protected by the DBMS for duplicates. In that way, the database can protect data for a whole class of incorrect values. Furthermore, the extra work in inserting and updating is often surpassed by the increase in performance that can be achieved when the uniqueness is guaranteed.
For data mining purposes, it’s helpful to save the age of each client. However, not all clients would be happy giving that piece of information, so it’s better to make it optional. This corresponds to the notion of a nullable column in SQL and of a value wrapped on Maybe in Haskell. Indeed, that last word is the one chosen by Persistent to indicate that a field is optional for a value. If you want to make age optional, change its declaration from age Int to age Int Maybe.
In addition to the age, it’s also useful to save information about the gender of the clients. Near the beginning of the book, I discussed how using a Boolean value is not the right choice for representing the gender in Haskell. Rather, you should use a data type specific to that task. This increases type safety and removes the errors because of inconsistent mappings (is True equal to male or female?).
The previous information makes up the basics of the schema language, but many more features are available. For that reason, I strongly recommend you survey the Persistent documentation2 or the Book of Yesod3 before declaring your entities.
Creating the Database
The source of this error is simple. You have declared how to map entities to a database, but none of the tables referenced in that code exists yet. The good news is that, as I mentioned in the introduction of this chapter, Persistent includes functionality to automatically create or adapt a database to the format expected by a schema. This adaptation process is called the migration of the database.
It’s important to notice that migrations can only add and delete columns; there’s no way for the code to know when a field name has been changed into another one. However, the simple case of adding a new field poses a question: which value should be written for those rows that were already there? The solution comes in the form of a new attribute that can be added to a field, namely, default, which defines this value to write in the case of a migration.
Caution
The default attribute applies only to migrations. It doesn’t have any effect on your actual Haskell code, and it doesn’t allow you to leave the value of that field undefined.
The sidebar “Using An Existing Database” shows how to deal with legacy databases having table names and column names not matching your entities and fields. Then, Exercise 11-1 guides you in defining the entity that will be used throughout this chapter for products.
Using an Existing Database
You’ve seen how to use migration to create or adapt an existing database to the table schema that Persistent expects. However, a quite common scenario involves an already existing database that you want to access.
The table name will be my_country_table. The column named the_country_identifier will be treated as the ID column.
Exercise 11-1. Defining Products
Add the definition of a Product entity to the database. That entity must include the name and description of each product, its price as a Double value, and the number of elements in stock.
Be aware that Persistent does not allow using Integer values in fields. Rather, you should use Int64, which represents those integer values that fit in 64 bits.
Queries
Querying is one of the most basic operations you can perform with a database. A good knowledge of the different options for selecting data from your sources is instrumental for obtaining the performance that applications require. As you will see in this section, Persistent provides several ways to query for data, and Esqueleto extends that feature even more with SQL-only operators such as joins.
Note
In the rest of examples in the chapter, I will show only the statements. If you want to execute any of those statements, wrap them inside a runSqlite call (or the corresponding version for other database), as done in the migration examples. The code won’t show all signature, because some of them contain type system features that will be introduced later (in Chapter 13). GHC correctly infers the signatures in the missing cases, so you can still copy and paste the examples as they appear in the book.
Queries by Identifier or Uniqueness
It’s important to notice that the result of getBy is not just the value you asked for but rather a combination of the identifier associated with the value and the value itself, provided inside the Entity data type. In the previous example you can find two different usages of getBy: one for finding the identifier of the country given its name (check how in the Entity pattern matching only the identifier is actually bound to a variable) and another one for finding a client with given personal attributes and belonging to that country.
Selecting Several Entities
The full power of a database is unleashed when you perform queries not only via identifiers but also based on other fields within entities. Furthermore, the languages in which queries are posed are usually quite expressive. For example, you could be asking for all those clients coming from the United States who have bought at least two products in the last few years to create a better campaign for clients in that part of the world.
Those queries are usually not guaranteed to return only one or no results but can produce a set of values that fulfill the requirements of the query. You can ask Persistent to represent that set using two different approaches: either as a Source from the conduit library or as a plain list. You can also choose to just return the first value of the set wrapped on a Maybe. You may then get Nothing back if the query selects an empty set.
Query Functions in Persistent
As a Source | As a List | Only First Result in a Maybe | |
---|---|---|---|
Identifier and value | selectSource | selectList | selectFirst |
Only identifier | selectKeys | selectKeysList | Not available |
All the functions listed in Table 11-2 have the same parameters and are used in the same way, so in the examples I’ll use selectList because it’s the one that shows output simply while still letting you form an idea about what’s going on in the database.
The first of the two parameters is a list of filters . Each filter constrains the value of one or more fields to satisfy a specified condition. For example, you might require the age of a Client to be greater than 18. The way in which you write those filters is by using one of the constructors that Persistent has created for each of the fields in an entity and a set of built-in operators. The age filter, for example, is represented as ClientAge >=. Just 18 (you need to include Just because age was an optional field, so its values are wrapped in Maybe).
I’ve been silently adding an empty list as a second parameter to . selectList . That list represents the options to the query, which do not affect the results themselves but rather the way the results are presented. One typical option involves sorting by one or more fields. In the previous example, if you want to return the clients from the oldest to the youngest one, you must change the list of options to [ Desc ClientAge ]. As you can see, the way to indicate the ordering is by using one of the constructors Asc or Desc and the constructor corresponding to the field.
SQL Queries with Esqueleto
If you are used to relational databases, there’s a feature from its queries that would come to your mind to solve this problem: joins . However, Persistent aims to support also nonrelational databases, making the ability to join unavailable to you.
Note
To be completely honest, there’s support in Persistent for sending a raw query to the database. But that means that there’s no type checking and automatic marshaling, so you’ve lost all the benefits of using the library.
As I have already introduced, the solution to this problem is using another library, namely, Esqueleto. The esqueleto package provides support for writing SQL queries that are type checked. Furthermore, it reuses the schema definitions from Persistent, making the former a great companion to the latter.
As in the previous case, you can return the results in two different data structures. The select function shows the queried values as a list, whereas selectSource wraps them into a Conduit source. Both take as an argument a SqlQuery, so you need to focus on how to construct values of that type.
The first thing to do is to select which entities you’ll query. To do that, you use from. This is quite an interesting piece of the library because it takes a function as an argument, and depending on the type of the function, a certain subset of entities is queried. For example, if you want to query for clients, you must pass a function whose unique argument is of type Client. In most of the cases, that type is inferred by the compiler because of further filters (e.g., if you filter ClientAge, the compiler would infer that you’re querying Client), but in some cases you may need extra annotations. If you want to perform a query that involves more than one entity, you must pass them on a tuple.
Access to a field of an entity using (^.).
Constant values wrapped using val, or lists of them wrapped using valList.
Comparisons with (==.), (!=.), and the rest of operators used in Persistent, remarking that (<-.) takes the name in_ and (/<-.) is called notIn.
Grouping of several constraints with (&&.) (for “and”) and (||.) (for “or”). Notice that this is different from plain Persistent because in that case there is no conjunctive operator; the set of filters in the list is implicitly conjoined.
The code makes explicit that ClientAge has a Maybe value. The constant value the field is compared to must be wrapped first with val and then with just.
In addition to inner joins, Esqueleto supports speaking about outer joins, which are useful in many situations.
To wrap up this fast introduction to Esqueleto, now you need to learn a bit about expressing grouping like you do in SQL. The grouping itself is done via the groupBy function, which takes as an argument a field to make the grouping. Then, you can use aggregation operators such as sum_, min_, max_, avg_, or countRows over other fields to perform that operation over all values of each group. Explaining in detail how these operations work would take a long time, and that is a task better suited to a book on SQL, from which Esqueleto takes its concepts .
The language supported by Esqueleto is indeed expressive and allows you to express queries far beyond what has been explained here. My recommendation for those moments when you need to perform powerful queries against a relational database is to read its documentation in more detail to discover the full generality of its constructions.
Insertions, Updates, and Deletions
At the beginning of the chapter, you were introduced to the way of inserting new values in the database, via insert. That function returns the identifier that was assigned to the new value that was saved on the database, which comes in handy when referring to it on other database values. However, in some cases the value won’t be inserted because of collisions on the uniqueness constraints. If this happens, it’s recommended that you use instead insertUnique, which returns a Maybe value, which will be Nothing if the value could not be inserted.
This solution is slow because it needs to bring all the data from the database to the program and then send back all the information for each client. In simple cases, the same task can be performed right on the database side, without any information being sent back and forth. Persistent includes support for doing such an in-place update when the operation either is assignment to a constant or involves only numerical data present in the value. The function that performs such an operation is updateWhere; all it takes as a parameter are a list of filters to specify which elements in the database will be affected by the changes and a list of in-place updates. Each update is represented using a field name, one of the update operators ((=.) for direct assignment and -, (+=.), (-=.), (*=.), or (/=.) for applying a numerical operation), and a constant value that is the argument in the update.
As with the queries, Persistent is not powerful enough to represent some of the conditions for updating in just one expression. Thus, you have to resort either to multiple queries and updates or to using a more powerful language such as Esqueleto. In the second instance, you need to change the call to select and from into a call to update. Once again, that function takes as a parameter another function, whose type will inform Esqueleto about which entities are affected by the change. However, that update does not allow tuples or joins because the changes must affect only one kind of entity. To perform queries inside an update block, you can use sub_select, which works in a similar way to select but it’s allowed to return only one result.
And now you come to the most difficult point for any database. After some time, it may be necessary to delete some of the information that you stored for either space or efficiency reasons. Doing so is easy if you know a way to uniquely identify the value to delete in the database, either via its identifier or via a set of fields with a uniqueness constraint. In the first case, you can use the delete function, and for the second case, the deleteBy function is provided.
However, there are cases where you want to delete a set of values that satisfy certain conditions. Using the functions you already know, that would mean obtaining the value identifiers, maybe via selectKeysList, and then mapping over all of them with delete. But considering everything you’ve learned in this chapter, you may be expecting a function that performs this task without any round-trip of data between the database and your application. And it exists: it’s called deleteWhere (similarly to updateWhere).
As you can see, performing updates and deletions in database values is easy using both Persistent and Esqueleto. The bulk of the learning process is to understand how to create good constraints for the values that will be affected by those operations. This shows one advantage of using high-level programming languages such as Haskell; you can express the concepts of your domain in a simple way and concentrate on the rules of your application.
Summary
You saw the landscape of database access libraries in Hackage, and the focus moved to those that provide abstractions over several databases.
In particular, the choice of libraries was Persistent and Esqueleto, which allow you to access both relational and nonrelational databases and to work with them in a type-safe way, making it harder to write illegal queries, something that is easy when working with raw database statements.
You learned how to connect to databases both with and without a pool of connections.
You were exposed to the language for specifying the database schema, with the idea that using that information combined with Template Haskell will create a bunch of data types and glue code for talking with the DBMS and will migrate the database.
Much of the chapter was devoted to queries, either via identifiers, by a set of fields uniquely constrained, or in general by any field. For those cases where Persistent was not enough (like joins), Esqueleto was introduced.
Finally, you read about performing insertions, updates, and deletions with those libraries.