The first normal form (1NF) is the first level of database normalization. You will need to complete this step before proceeding to other database normalization forms. The primary reason to implement 1NF is to eliminate repeating groups. This ensures that you can use simple SQL statements to query the data. It also ensures that you aren't duplicating data, which uses additional storage and computing time. This step will ensure that you are doing the following:
- Defining data, columns, and data types and putting related data into columns
- Eliminating repeating groups of data:
- This means that you will not have repeating columns, such as Year1, Year2, Year3, but instead will have a column that is named Year, and each row in the table will be a different year.
- Another example of this is not having multiple values in the same field, such as 1985, 1987, 1989, but instead placing each year in a row.
- This means that there are no exact duplicate rows. The example following this bullet list will explain this concept in more depth.
- Creating a primary key for each table
In the following example, you could make the first column the primary key in the people table and the foreign key in the salaries table. In the salaries table, you could create a new primary key or create a composite key that is an amalgamation of multiple fields.
Here is a denormalized sample table:
Name |
birthYear |
Salary |
SalaryYear |
Jim Jones |
1981 |
2750000, 4500000 |
2010, 2011 |
Joe Smith |
1974 |
10600000 |
2014 |
There is a right way and wrong way to normalize this table. Let's go over the wrong way first:
Name |
birthYear |
Salary1 |
Salary2 |
SalaryYear1 |
SalaryYear2 |
Jim Jones |
1981 |
2750000 |
4500000 |
2010 |
2011 |
Joe Smith |
1974 |
10600000 |
2014 |
The preceding design has introduced new problems. Even though it doesn't have groups of repeating data in one column, the salary is limited to two values. What if a player has more than two salaries? You don't have anywhere to put another salary without adding a third column. This also wastes space for those players that only have one salary, and searching through this table for a player with a specific salary becomes difficult. The same goes for the SalaryYear columns.
The right way to normalize the denormalized table to the first normal form is to ensure that there aren't repeating groups, as shown in the following table. The people table with player information would look like the following:
playerID |
nameFirst |
nameLast |
birthYear |
jjones01 |
Jim |
Jones |
1981 |
jsmith01 |
Joe |
Smith |
1974 |
The Salary value has been removed and placed in another table with the playerID field linking them to each other; therefore, the salaries table will look like the following:
salaryID |
playerID |
salary |
year |
1 |
jjones01 |
2750000 |
2010 |
2 |
jjones01 |
4500000 |
2011 |
3 |
jsmith01 |
10600000 |
2014 |
Let's go through a denormalization example by looking at the following table:
playerID | namefirst | namelast | birthYear | franchID | franchname | teamID | RBI | rank | yearID |
abbotpa01 | Paul | Abbott | 1967 | PHI | Philadelphia Phillies | PHI | 2 | 2 | 2004 |
abreubo01 | Bobby | Abreu | 1974 | PHI | Philadelphia Phillies | PHI | 79 | 1 | 2000 |
abreubo01 | Bobby | Abreu | 1974 | PHI | Philadelphia Phillies | PHI | 110 | 3 | 2001 |
alcanar01 | Arismendy | Alcantara | 1991 | CHI | Chicago Cubs | CHI | 1 | 8 | 2015 |
almoral01 | Albert | Almora | 1994 | CHI | Chicago Cubs | CHI | 14 | 8 | 2016 |
almoral01 | Albert | Almora | 1994 | CHI | Chicago Cubs | CHI | 46 | 6 | 2017 |
alvarpe01 | Pedro | Alvarez | 1987 | PIT | Pittsburg Pirates | PIT | 77 | 17 | 2015 |
alvarto01 | Tony | Alvarez | 1979 | PIT | Pittsburg Pirates | PIT | 2 | 9 | 2002 |
alvarto01 | Tony | Alvarez | 1979 | PIT | Pittsburg Pirates | PIT | 8 | 1 | 2004 |
To meet the requirements of 1NF, you would need to split this table into multiple tables. Depending on the table you are trying to normalize, you might not need to split it if it's already following the rules of 1NF.
This table only contains the information about the player and has a primary key of playerID:
playerID |
namefirst |
namelast |
birthYear |
abbotpa01 |
Paul |
Abbott |
1967 |
abreubo01 |
Bobby |
Abreu |
1974 |
alcanar01 |
Arismendy |
Alcantara |
1991 |
almoral01 |
Albert |
Almora |
1994 |
alvarpe01 |
Pedro |
Alvarez |
1987 |
alvarto01 |
Tony |
Alvarez |
1979 |
The other table would contain the rest of the fields from the denormalized table. The following table has a foreign key relationship to the preceding table regarding playerID:
playerID |
franchID |
franchname |
teamID |
RBI |
rank |
yearID |
abbotpa01 |
PHI |
Philadelphia Phillies |
PHI |
2 |
2 |
2004 |
abreubo01 |
PHI |
Philadelphia Phillies |
PHI |
79 |
1 |
2000 |
abreubo01 |
PHI |
Philadelphia Phillies |
PHI |
110 |
3 |
2001 |
alcanar01 |
CHI |
Chicago Cubs |
CHI |
1 |
8 |
2015 |
almoral01 |
CHI |
Chicago Cubs |
CHI |
14 |
8 |
2016 |
almoral01 |
CHI |
Chicago Cubs |
CHI |
46 |
6 |
2017 |
alvarpe01 |
PIT |
Pittsburg Pirates |
PIT |
77 |
17 |
2015 |
alvarto01 |
PIT |
Pittsburg Pirates |
PIT |
2 |
9 |
2002 |
alvarto01 |
PIT |
Pittsburg Pirates |
PIT |
8 |
1 |
2004 |