Data, tables, and normalization

The primary purpose of a database is to store data. Data is information, usually text-based, but not always, and this data could be anything from a company phone directory, to patient medical information, to an auto parts list, or even reviews of gourmet hot sauces complete with pictures of the bottles.

Database servers such as MariaDB store information, no matter what it is, in a structure called a table. Tables are two-dimensional data structures containing rows and columns. A row corresponds to a single record in a database and records are divided into columns. Think of database tables like a specialized spreadsheet.

The columns in a database can have relationships defined in one way or another. For example, the id column in an employee table may relate to the employee_id column in an address table. These relationships (also called foreign keys) are why we call MariaDB a relational database server.

A database without tables of data is nothing more than an entry in the MariaDB system database (this database is called mysql) and a directory in the file system under the datadir directory. Until we create some tables and start adding data to those tables, our new database is useless.

There are few things in MariaDB that we will spend more time on, at least in the beginning, than when we create the tables in our database.

When we create a table, we are defining its structure. This structure includes such things as the number of columns and the type of data that we want to store in each column. Data types include things such as numbers, text, and dates. For example, if we are creating an employee table, we might decide that each row will contain an employee identification number (number), a surname (text), any given names (text), a preferred name (text), a birthdate (date), and so on.

We might also want to store the e-mail addresses, phone numbers, and home addresses of the employees, but we don't necessarily want to store that kind of data in the same table. Why? Because they are things people often have more than one of. For example, many people have both personal and work e-mail addresses. The same holds true for phone numbers and, for some people, even houses. If we try to design a table that has enough fields for the multiples of phone numbers and e-mail addresses that people have, it will quickly become unwieldy with too many columns, and with possibly no single row that uses all of them. Instead, we break apart the data into multiple tables, and define the relationships between the tables.

A good rule of thumb is to break the information apart into a separate table when it is clear there could be multiples of it. For example, it wouldn't make sense to have a single orders table in a company database that contains everything. Instead, we would have a customers table for the core customer information, an addresses table to hold the multiple addresses that the customers might want us to ship items to, an items table for the various things we might ship to a customer, and lastly, an orders table to actually track the orders made by customers. Of course, this is only one way to split the information apart and we might also need to store payment, supplier, and other information.

The process by which we refine our table definitions and split our data off into multiple tables is called normalization. There isn't space here for a complete discussion of this process, but the MariaDB Knowledge Base has a page which discusses it in depth and you can refer to the following location:

https://mariadb.com/kb/en/recap-the-relational-model

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.67.5