So just what is Normalization and why is it so critical for an efficiently designed database?
Normalization is the process of organizing your data and breaking it into smaller tables that are easier to manage. The primary reason we normalize a database is to prevent redundant data.
That alone should be reason enough; who wants to type information more than what is absolutely necessary?
Normalization is often over-looked, because we may think understanding and applying the principles of normalization is far too complicated.
In reality, normalization really isn't that complicated and if your database is not normalized it can be inaccurate, slow, inefficient, and it might not produce the data you expect. Not to mention if you have a normalized database, queries, forms, and reports are much easier to design!
Understanding primary keys is crucial to understanding normalization. A primary key is a field or a combination of fields that uniquely identifies a record.
In this day and age unique identifiers are all around us. We have a social security number, telephone number, credit card numbers, checking account numbers, or my favorite; the "savings" card many grocery stores encourage you to obtain.
Each of these values uniquely describes you and would be considered a primary key in a database—even that "savings" card from your favorite grocery store has a unique identifier. (Ever notice how the grocery coupons you receive in the mail are geared towards items you purchase?)
The above were all examples of a single-field primary key. A table could have a multiple-field primary key, also known as a composite key.
A good example of a unique identifier that is comprised of multiple fields is a check from a check book. Take the following image for example:
A check has a Bank Routing Number, Account Number, and a Check Number. So what is the unique identifier? It takes all three numbers, in combination, to create a unique identifier.
The check number identifies a check in a checking account, the Account Number identifies the checking account at the Bank, and the Bank Routing number uniquely identifies the Bank.
For example, if you write an electronic check over the Internet all of these numbers are required so your payment can be uniquely identified.
The first phase of the normalization is to determine, or if necessary create, your primary key(s). Once that is complete you can begin the normalization process.
Database design theory includes standards and guidelines to help us efficiently design our tables and create a normalized database.
These are referred to as normal forms. Let's begin with a summary of the normal forms.
First Normal Form (1NF):
• Break each field down to the smallest meaningful value
• Remove repeating groups of data
• Create a separate table for each set of related data
Second Normal Form(2NF)
• Create new tables for data that applies to more than one record in a table
• Add a related field (foreign key) to the table
Third Normal Form (3NF)
• Remove fields that do not relate to, or provide a fact about, the primary key
The normal forms are cumulative and there are actually a total of five normal forms starting with the first normal form (1NF) through the fifth normal form (5NF).
However, the fourth and fifth normal forms are rarely applied. Most databases achieve normalization by the third normal form (3NF) so the first three normal forms will be the focal point of this article.
First Normal Form
A table is in its 1NF when each field contains the smallest meaningful value and does not contain repeating groups of data.
What does this mean exactly? Let's break it down step-by-step.
Take the following table for example:
If we apply the first portion of 1NF, the Name, and Spouse\Children fields do not contain the smallest meaningful value. Instead, the first name, last name, and the children's names should be placed in separate fields:
Of course there are still flaws in the table, but keep in mind there is more than one step in the normalization process.
Once the first step is accomplished then we proceed to the second portion of 1NF: the table should not contain repeating groups. Spouse, Child 1, Child 2, and Child 3 would be considered repeating groups. They are dependants of the employee.
The dependants need to be moved to another table and each dependant should be added as records in the table:
Also consider that Lenny and Carl do not have dependants so there are no related records for them in the Dependants table. However should either add a dependant in the future we simply add a related record. There is no need to modify the database design.
Second Normal Form
Create new tables for data that applies to more than one record in a table and add a related field (foreign key) to the table:
The Manager fields applies to multiple records in the table. Therefore, it should be moved to another table, duplicate values removed and a field to establish a relationship between the tables should be added.
Third Normal Form
Remove fields that do not relate to, or provide a fact about, the primary key.
Take the Manager, Dept, and Sector fields for example. The Manager and Sector are facts about the Dept—not the primary key. These fields should be moved to another table and a field to establish a relationship between the tables should be added.
As an additional note, if your table contains a multiple-field primary key then all non-key fields should be a fact about the entire key. If a field describes only a one field in the key then it should be moved to another table.
The following image shows the completed database design.
Additional Notes and Design Considerations
In this example, technically the Dept Code field does not provide a fact about the primary key and according to the experts it, too, should be moved to another table.
I've also read many articles that suggest an Address field should be broken down into smaller fields and that City and State field are not necessary in a table. The reason behind this is both a city and state could apply to more than one record in a table. Instead a Zip Code field could be used to look up the City and State.
Going to extremes can create too many tables which in turn can make it difficult to manage your data. The key to developing an efficient database is to determine your needs.
A postal carrier may need an Address field broken down into smaller fields for sorting and grouping purposes, but do you?
Another good example is leaving the Dept Code field in our completed table design. If you also wanted to track information such as pay rate, health insurance, etc., then a new table that contains company related data for the employee would be necessary. If all you need is to track the department an employee belongs to then leaving it in the Employees table is fine.
In summary, let's take another look at the main objectives for normalizing a database:
• Save typing of repetitive data
• Increase flexibility to query, sort, summarize, and group data
• Avoid frequent restructuring of tables and other objects to accommodate new data
If you type data value more than once then consider placing the field in another table.
Consider your sorting and grouping needs. If you need to sort or group on a portion of a field, then the field is not broken down into its smallest meaningful value.
If you have multiple groups of fields, such as several telephone numbers, then consider eliminating those fields and turning them into records in another table. Think vertically—not horizontally!
For those following this series the topic of next month's article will feature tables, data types, and field properties.