A database is a group of related data
Technically, anything that stores related data counts as a database, but we commonly think about large servers storing terabytes of data within itself
The database stores, of course, data (known facts that have implicit meaning and can be recorded), modelling what’s known as a mini-world (some part of the real world represented by the data in the database, ex. a university)
We can navigate a database through many different systems, but what’s most common nowadays is a database management system or DBMS (a software package to create/maintain computerized databases)
Back in the day, computer scientists had to use flat files (txt or ascii files), where each row is a record (instance) to form a database, but this is inefficient for a number of reasons
The first reason is inconsistent data, since there’s no type checking for pieces of data in each record
There’s also a number of other benefits of having a DBMS, including
The only real disadvantages of DBMS are their cost, their training requirements for management and their complexity, but these pale in comparison to the practical benefits
Due to their complexity, special protocols need to be made in their management, including the roles in a team meant to manage it
First, we have database administrators (DBA) who’s job includes giving new users access, upgrading the DBMS, monitoring its usage and performance, checking for security breaches and managing backups
Next is the database designer, who build the database through identifying data, relationships between data and the proper model to use
Of course, we also need to determine what our end users need, which is where the system analysts come in
Finally, we have the end users who, while not part of the technical team, are the main consumers of data from the database, accessing data by either getting reports or using software
Let’s put ourselves in the shoes of the database designer for a second to explain why they’re built the way they are
First, let’s define some terms
When we say schema, we mean a description of the database (such as defining what types of data are in each table and how many tables we have) but NOT the data itself
When we define a new database, what we are really doing is specifying its schema, meaning the initial state will be empty
For the schema, there’s three levels, the internal, conceptual and external
Internal schema describe the storage itself and their locations, deep down to the ones and zeroes
With conceptual schema, we store all the data into tables (aka the relational model, more on this later) to make things easier to read
What if we don’t need all the data as an end user? We make an external schema to present the data we need in the way we want
Because of these external schemas, we need to have data independence to make sure we don’t break the end user experience, with two types existing: logical data independence and physical data independence
Logical data independence is the capacity to change the conceptual schema without changing the external schema (ex. adding a new table shouldn’t be a problem, but making old tables reference that table or deleting old tables could)
The other type is physical data independence, where we can make changes to the physical schema without changing the conceptual schema (ex. upgrading the storage system to make data retrieval faster)
Now that we understand a schema is, we can build one, right? First, we should model our data properly
With databases, like all projects, it’s always best to create a plan before starting out, so we should start out with thinking about what data we need
Let’s say we’re building a database for a recipe website; we need to think of storing users with their username, login, authored recipes, saved recipes and anything else needed
To apply this thought process to some other application, we should think about
After we think about this, we draw a model, specifically an ER Diagram (Entity Relationship Model)
This is a model created by Peter Chen in 1976 and acts as a visual representation of data which we can map to a currently used model for database implementation (ex. Relational Model)
To better explain the process, let’s build a model of the following mini-world together
To start off with, what should we use to draw the diagram? We have several options, including
Next, we need to figure out how to represent our mini world using the right terminology, since if we hand over an ER diagram with the wrong symbols, the database application might be flawed
To start, we need to differentiate entities from attributes
An entity is something that can exist separate from anything else, while an attribute is some characteristic that is attached to an entity
For example, in “Each employee has a name, number, address, salary, sex and birthdate”, ‘employee’ would be an entity and ‘name’ would be an attribute
For each attribute, we also have a domain (meaning a set of values that the attribute can be equal to, i.e. integers, strings, characters, etc.)
Each entity is stored in a table along with everything that has the same entity type with a unique key as to differentiate them (ex. your student ID number)
In terms of ER diagrams, we use the following symbols
Now let’s look at our example again
Now we can begin to see where the entities are
Now that we’ve covered the entity part of Entity Relationship, let’s talk about the relationship part
Notice how, when we defined our entities earlier, we didn’t mark the department manager or anything else where two entities interact
This is where we define our relationships (named grouping of entities)
Things get more complicated as you expand your instances out
Representation in ER form
In the above example, sometimes we want to associate an attribute onto a relationship, especially in situations where you want the attribute to describe the relationship somehow.
We can also have ternary relationships (the above diagram says a part is supplied by a supplier for a project).
As weird as it may sound, unary relationships also have their uses, especially when an instance has a relationship with a different instance of the same type (e.g., a student is friends with another student).
We can also talk about a relationship’s cardinality ratio, with the common ones being one-to-one, many-to-one, and many-to-many:
ER representation of a one-to-many relationship
ER symbols
For our overarching example, let’s look at the relationships:
Our relationships would look something like this:
We’ve been careful about our use of the word “can” up until now, but what if some relationship is mandatory?
Participation refers to whether or not an entity must participate in a relationship.
We put a double line on the side with total participation.
Note that the maximum of one side should match the other side.
Let’s go back to our example again:
Our participation may look something like this:
In a relationship, we often run into situations where one side can’t exist without the other, with the side that won’t remain being referred to as a weak entity.
Weak entities also have some other properties, including:
In our example from before, a dependent can be considered a weak entity since, without an attached employee, we don’t need to store it.
After everything’s done, our ER diagram looks something like this:
While this is technically correct, it’s clunky and hard to read, so let’s look at how we can improve it.
Here are some guidelines that should be followed:
This won’t be tested, but it’s good to recognize.
This notation is similar to UML diagrams, where attributes are under the entity name in a rectangle.
Instead of diamonds for relationships, we simply have lines that tell us the cardinality and participation depending on the notches.
With this, we can represent all of our relationships with a different notation.