Chapter 6. SQL Database Programming

This chapter deals with access to databases built according to the principles of the relational model, which—generally speaking—we refer to as SQL databases due to the language used to manipulate their data. In the next chapter, we will take a look at the emerging noSQL database model.

Note

Note that we will not go in depth about data access here. This will just be a quick review of the most common possibilities you have in order to access database systems build according to the relational model.

Here, we will quickly review the rules and foundations of the relational model (schemas, Normal Form Rules, and so on) before getting into SQL Server 2014, the database management system (DBMS) that I'll use in this chapter. Additionally, we'll go through the process of installing SQL Server 2014 Express Edition (totally free) along with some sample databases that will help us with the demos.

In this section, we will also cover a not-so-common type of project that Visual Studio offers to deal with databases, the SQL Server Project Template, and explore how we can configure many aspects of the target database that our application uses straight from Visual Studio. We will even be able to save all this configuration in a .dacpac file, which can be later replicated in any other machine.

Then, we'll cover the basic .NET Framework database engines recommended for data management. We'll start with a reminder of the initial engine that appeared with version 1.0 (ADO.NET), passing from there to the Entity Framework model (the most common and the one recommended by Microsoft), which is already in its version 6.1 (although the new version aligns with the .NET Core wave, and it has been published under the name Entity Framework 1.1 recently).

We'll discover how to build an ORM data schema from one of the demo databases and how to query and manipulate data with EF and some of the new possibilities that these versions offer to the programmer.

Overall, we'll cover the following topics in this chapter:

  • A refresher view of the Database Relational Model
  • The SQL Server 2014 Database system, installation, and features
  • The SQL Server Project template in Visual Studio
  • Basic data access from ADO.NET
  • Basic data access with Entity Framework 6.0 using the Database-first flavor as used with an ASP.NET MVC application

The relational model

Up until 1970, data access was diverse in nature and management. No standard or common approaches were available, and the term used to refer to what we now understand as databases was data banks, but their structures were quite different.

Of course, there were other models, such as the hierarchical model and the network model, but their specifications were somewhat informal.

In 1969 and the following years, an engineer at IBM (E.F. Codd) started publishing a series of papers in which he established the foundations of what we now understand as the relational model; especially, his paper, The relational model for database management, is now considered the RM manifesto. In this model, all data is represented in terms of tuples, and these tuples are grouped into relations. As a result, a database organized in terms of the relational model is called a relational database.

Properties of relational tables

The following are the properties of relational tables:

  • All data is offered in terms of collections of relations.
  • Each relation is described as a table. A table is composed of columns and rows.
  • Columns are attributes of the entity, modeled by the table's definition (in a customer table, you could have customerID, e-mail, account, and so on).
  • Each row (or tuple) represents a single entity (that is, in a customer table, 1234, [email protected], 1234, 5678, and so on, would denote a row with a single customer).
  • Every table has a set of attributes (one or more) that can be taken as a key, which uniquely identifies each entity (in the customer table, customer ID would specify only one customer, and this value should be unique in the table).

Many types of keys offer several possibilities, but the two most important are the primary key and the foreign key. While the former identifies tuples uniquely, the latter sits at another table, allowing the establishment of relations between tables. It is in this way that we can query data that relates to two or more tables based on a common field (this field does not have to be named as the other with which we will match; it only has to be of the same data type).

The relational model is based on Relational Algebra (also described by E.F. Codd, who proposed this algebra as a basis for database query languages, and thus the relation with set theory). This algebra uses a set union, set difference, and Cartesian product from a set theory and adds additional constraints to these operators, such as select, project, and join.

These operations have a direct correspondence in the SQL Language, which is used to manipulate data, and basically, we find the following:

  • Select: It recovers values for rows in a table (optionally, with a given criteria)
  • Project: It reads values for select attributes
  • Join: It combines information from two or more tables (or only one, taken as a second table)
  • Intersect: It shows rows present in two tables
  • Union: It shows rows from multiple tables and removes the duplicate rows
  • Difference: It recovers rows in one table that are not found in another table
  • Product: The cartesian product combines all rows from two or more tables and is normally used with filter clauses and foreign key relations

Another important aspect of the relational model is that ensures data integrity by means of a set of rules. Mainly, there are five rules that have to be considered:

  • The order of tuples and/or attributes doesn't matter: If you have the ID before e-mail, it is the same as the e-mail before the ID).
  • Every tuple is unique. For every tuple in a table, there is a combination of values that uniquely identifies it.
  • Every field contains only single values. Or, if you want, each of the table's cells should only hold one value. This is derived from the First Normal Form; we'll come to that later.
  • All values within an attribute (think of it as a column) are from the same domain. That is, only values allowed by the attribute's definition are allowed, either number, characters, dates, and so on. Their practical implementation will depend on the type definitions permitted by the database engine.
  • Identifiers of tables must be unique in a single database, and the same goes for columns in tables.

The principles and rules mentioned hereby are formalized by the Normal Form Rules (generally called Normalization). They establish a number of rules to reinforce data integrity. However, in practice, only the three first rules (1NF, 2NF, and 3NF) are applied in everyday business and even the third one admits a de-normalization process in order to avoid unnecessary complexities in design on some occasions.

In a very succinct way, these are the requirements of the three normative Normal Forms:

  • 1NF: It eliminates duplicative columns from the same table. That is, create separate tables for each group of related data and identify each row with a primary key.
  • 2NF: It eliminates subsets of data that apply to more than one row of a table, and creates a new table with them. Later, you can create foreign keys to maintain relationships between the tables.
  • 3NF: It deletes all columns that are not dependent upon the primary key.

With these normal forms, our data integrity is guaranteed in 99% of the cases, and we can exploit our bank of data using SQL in a very efficient way. Keep in mind that those keys that uniquely identify each table's rows allow the creation of indexes, which are additional files intended to speed up and ease data recovery.

In the case of SQL Server (and other DBMS), two types of indexes are allowed:

  • Clustered indexes: They can be only one per table. They are extremely fast data structures that should be based on a short-length field(s) and preferably over a field(s) that don't change, such as customer ID in the example we mentioned earlier.
  • Non-clustered indexes: We can define several per table and allow improvements in speed, especially in reading, joining, and filtering operations. It is recommended that the candidate fields be those that appear in WHERE and JOIN clauses, preferably.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.138.104