Normal Forms
Overview
Normal forms are a set of rules and guidelines used in the design of relational databases to minimize data redundancy, improve data integrity, and ensure efficient organization of data. They are part of the database normalization process, which is a technique used to organize tables, attributes, and relationships in a relational database. Normalization aims to eliminate anomalies, such as update, insertion, and deletion anomalies, that can arise when data is stored in an unorganized or poorly structured manner.
There are several normal forms, each building upon the previous one, with the most common ones being:
-
First Normal Form (1NF): A table is in 1NF if it contains only atomic values (i.e., each attribute value is indivisible) and has no repeating groups of attributes. In other words, each column should contain a single value, and each row should have a unique combination of values.
-
Second Normal Form (2NF): A table is in 2NF if it is in 1NF and all its non-prime attributes (attributes that are not part of any candidate key) are fully functionally dependent on the whole primary key. This means that there should be no partial dependencies, where an attribute depends only on part of a composite primary key.
-
Third Normal Form (3NF): A table is in 3NF if it is in 2NF and all its non-prime attributes are non-transitively dependent on the primary key. This means that there should be no transitive dependencies, where an attribute depends on another attribute that is not part of the primary key, which in turn depends on the primary key.
-
Boyce-Codd Normal Form (BCNF): A table is in BCNF if it is in 3NF and, for every functional dependency X → Y, X is a superkey (a set of one or more columns that can uniquely identify a row) or Y is a prime attribute (an attribute that is part of a candidate key). BCNF is a stricter version of 3NF that further reduces redundancy and anomalies.
-
Fourth Normal Form (4NF): A table is in 4NF if it is in BCNF and has no multi-valued dependencies, where two or more independent sets of values are stored in the same table.
-
Fifth Normal Form (5NF): A table is in 5NF if it is in 4NF and cannot be decomposed into any smaller tables without losing information (i.e., it is fully decomposed).
Each normal form successively reduces redundancy and improves the structure of the database, making it easier to maintain and query. However, achieving higher normal forms might come at the cost of increased complexity and reduced performance. Therefore, it is essential to strike a balance between normalization and the specific requirements of the database and its applications.
Normal forms are a set of rules and guidelines used in designing relational databases to minimize redundancy and dependency, ensuring data integrity and consistency. They are based on the principles of functional dependencies and decomposition. Edgar F. Codd introduced the concept of normal forms in 1970.
Motivating Examples
In relational databases, data is organized into tables (relations) with rows (tuples) and columns (attributes). By adhering to normal forms, database designers can reduce the likelihood of anomalies (insert, update, and delete anomalies) that may occur due to poor organization of data.
There are several normal forms, each building on the previous one. The most commonly used are the first three normal forms (1NF, 2NF, and 3NF), and Boyce-Codd normal form (BCNF). We'll discuss these four normal forms in detail.
- First Normal Form (1NF): A table is said to be in 1NF if it satisfies the following conditions:
- Each attribute (column) contains only atomic (indivisible) values.
- Each attribute (column) contains a single value from its domain (no repeating groups or arrays).
Motivating example: Consider a table "Orders" with columns: OrderID, CustomerID, CustomerName, and Items (where Items contains multiple values for each order).
| OrderID | CustomerID | CustomerName | Items |
|---------|------------|--------------|--------------------|
| 1 | 101 | John Smith | TV, WashingMachine |
| 2 | 102 | Jane Doe | Fridge, Microwave |
The "Items" column violates 1NF as it contains multiple values. To bring it to 1NF, we split the Items column into multiple rows, one for each item:
| OrderID | CustomerID | CustomerName | Item |
|---------|------------|--------------|----------------|
| 1 | 101 | John Smith | TV |
| 1 | 101 | John Smith | WashingMachine |
| 2 | 102 | Jane Doe | Fridge |
| 2 | 102 | Jane Doe | Microwave |
- Second Normal Form (2NF): A table is in 2NF if it is in 1NF and every non-key attribute is fully functionally dependent on the primary key (not just part of the primary key).
Motivating example: Using the 1NF "Orders" table:
| OrderID | CustomerID | CustomerName | Item |
|---------|------------|--------------|----------------|
| 1 | 101 | John Smith | TV |
| 1 | 101 | John Smith | WashingMachine |
| 2 | 102 | Jane Doe | Fridge |
| 2 | 102 | Jane Doe | Microwave |
The primary key is (OrderID, Item), but the CustomerName is dependent only on CustomerID. To bring the table to 2NF, we split it into two tables, "Orders" and "Customers":
Customers:
| CustomerID | CustomerName |
|------------|--------------|
| 101 | John Smith |
| 102 | Jane Doe |
Orders:
| OrderID | CustomerID | Item |
|---------|------------|---------------|
| 1 | 101 | TV |
| 1 | 101 | WashingMachine|
| 2 | 102 | Fridge |
| 2 | 102 | Microwave |
- Third Normal Form (3NF): A table is in 3NF if it is in 2NF and there are no transitive dependencies among non-key attributes (i.e., no non-key attribute depends on another non-key attribute).
Motivating example: Consider a table "Employees" with columns: EmployeeID, DepartmentID, DepartmentName, and ManagerID.
| EmployeeID | DepartmentID | DepartmentName | ManagerID |
|------------|--------------|----------------|-----------|
| 1 | 100 | HR | 10 |
| 2 | 100 | HR | 10 |
| 3 | 200 | IT | 20 |
| 4 | 200 | IT | 20 |
The primary key is EmployeeID. However, the DepartmentName is transitively dependent on EmployeeID through DepartmentID (non-key attribute). To bring the table to 3NF, we split it into two tables, "Employees" and "Departments":
Employees:
| EmployeeID | DepartmentID | ManagerID |
|------------|--------------|-----------|
| 1 | 100 | 10 |
| 2 | 100 | 10 |
| 3 | 200 | 20 |
| 4 | 200 | 20 |
Departments:
| DepartmentID | DepartmentName | ManagerID |
|--------------|----------------|-----------|
| 100 | HR | 10 |
| 200 | IT | 20 |
- Boyce-Codd Normal Form (BCNF): A table is in BCNF if it is in 3NF and for every non-trivial functional dependency X → Y, X is a superkey (i.e., X is either a primary key or a candidate key).
Motivating example: Using the 3NF "Departments" table:
| DepartmentID | DepartmentName | ManagerID |
|--------------|----------------|-----------|
| 100 | HR | 10 |
| 200 | IT | 20 |
Here, DepartmentID is the primary key, but there is a non-trivial functional dependency: ManagerID → DepartmentID, where ManagerID is not a superkey. To bring the table to BCNF, we split it into two tables, "Departments" and "Managers":
Departments:
| DepartmentID | DepartmentName |
|--------------|----------------|
| 100 | HR |
| 200 | IT |
Managers:
| ManagerID | DepartmentID |
|-----------|--------------|
| 10 | 100 |
| 20 | 200 |
In summary, normal forms help in designing efficient relational databases by minimizing redundancy and dependency. They reduce the likelihood of anomalies and ensure data integrity and consistency. By following the guidelines and rules of each normal form, a database designer can create a well-structured database that meets the requirements of various applications.