ER Diagrams

Databases

A database is a group of related data

Technically, anything that stores related data counts as a database, but we commonly think about large servers storing terabytes of data within itself

The database stores, of course, data (known facts that have implicit meaning and can be recorded), modelling what’s known as a mini-world (some part of the real world represented by the data in the database, ex. a university)

We can navigate a database through many different systems, but what’s most common nowadays is a database management system or DBMS (a software package to create/maintain computerized databases)

Back in the day, computer scientists had to use flat files (txt or ascii files), where each row is a record (instance) to form a database, but this is inefficient for a number of reasons

The first reason is inconsistent data, since there’s no type checking for pieces of data in each record

There’s also a number of other benefits of having a DBMS, including

The only real disadvantages of DBMS are their cost, their training requirements for management and their complexity, but these pale in comparison to the practical benefits

Who Handles a DBMS?

Due to their complexity, special protocols need to be made in their management, including the roles in a team meant to manage it

First, we have database administrators (DBA) who’s job includes giving new users access, upgrading the DBMS, monitoring its usage and performance, checking for security breaches and managing backups

Next is the database designer, who build the database through identifying data, relationships between data and the proper model to use

Of course, we also need to determine what our end users need, which is where the system analysts come in

Finally, we have the end users who, while not part of the technical team, are the main consumers of data from the database, accessing data by either getting reports or using software

Let’s put ourselves in the shoes of the database designer for a second to explain why they’re built the way they are

Schema

First, let’s define some terms

When we say schema, we mean a description of the database (such as defining what types of data are in each table and how many tables we have) but NOT the data itself

When we define a new database, what we are really doing is specifying its schema, meaning the initial state will be empty

For the schema, there’s three levels, the internal, conceptual and external

Internal schema describe the storage itself and their locations, deep down to the ones and zeroes

With conceptual schema, we store all the data into tables (aka the relational model, more on this later) to make things easier to read

What if we don’t need all the data as an end user? We make an external schema to present the data we need in the way we want

Because of these external schemas, we need to have data independence to make sure we don’t break the end user experience, with two types existing: logical data independence and physical data independence

Logical data independence is the capacity to change the conceptual schema without changing the external schema (ex. adding a new table shouldn’t be a problem, but making old tables reference that table or deleting old tables could)

The other type is physical data independence, where we can make changes to the physical schema without changing the conceptual schema (ex. upgrading the storage system to make data retrieval faster)

Now that we understand a schema is, we can build one, right? First, we should model our data properly

Modelling

With databases, like all projects, it’s always best to create a plan before starting out, so we should start out with thinking about what data we need

Let’s say we’re building a database for a recipe website; we need to think of storing users with their username, login, authored recipes, saved recipes and anything else needed

To apply this thought process to some other application, we should think about

After we think about this, we draw a model, specifically an ER Diagram (Entity Relationship Model)

This is a model created by Peter Chen in 1976 and acts as a visual representation of data which we can map to a currently used model for database implementation (ex. Relational Model)

Untitled

To better explain the process, let’s build a model of the following mini-world together

More on ER Diagrams

To start off with, what should we use to draw the diagram? We have several options, including

Next, we need to figure out how to represent our mini world using the right terminology, since if we hand over an ER diagram with the wrong symbols, the database application might be flawed

To start, we need to differentiate entities from attributes

An entity is something that can exist separate from anything else, while an attribute is some characteristic that is attached to an entity

For example, in “Each employee has a name, number, address, salary, sex and birthdate”, ‘employee’ would be an entity and ‘name’ would be an attribute

For each attribute, we also have a domain (meaning a set of values that the attribute can be equal to, i.e. integers, strings, characters, etc.)

Each entity is stored in a table along with everything that has the same entity type with a unique key as to differentiate them (ex. your student ID number)

In terms of ER diagrams, we use the following symbols

Untitled

Now let’s look at our example again

Now we can begin to see where the entities are

Untitled

Now that we’ve covered the entity part of Entity Relationship, let’s talk about the relationship part

Relationships

Notice how, when we defined our entities earlier, we didn’t mark the department manager or anything else where two entities interact

This is where we define our relationships (named grouping of entities)

Untitled

Things get more complicated as you expand your instances out

Untitled

Representation in ER form

In the above example, sometimes we want to associate an attribute onto a relationship, especially in situations where you want the attribute to describe the relationship somehow.

Untitled

We can also have ternary relationships (the above diagram says a part is supplied by a supplier for a project).

As weird as it may sound, unary relationships also have their uses, especially when an instance has a relationship with a different instance of the same type (e.g., a student is friends with another student).

We can also talk about a relationship’s cardinality ratio, with the common ones being one-to-one, many-to-one, and many-to-many:

Untitled

ER representation of a one-to-many relationship

Untitled

ER symbols

For our overarching example, let’s look at the relationships:

Our relationships would look something like this:

We’ve been careful about our use of the word “can” up until now, but what if some relationship is mandatory?

Participation

Participation refers to whether or not an entity must participate in a relationship.

Untitled

We put a double line on the side with total participation.

Untitled

Note that the maximum of one side should match the other side.

Let’s go back to our example again:

Our participation may look something like this:

Weak Entities

In a relationship, we often run into situations where one side can’t exist without the other, with the side that won’t remain being referred to as a weak entity.

Weak entities also have some other properties, including:

Untitled

In our example from before, a dependent can be considered a weak entity since, without an attached employee, we don’t need to store it.

General Guidelines

After everything’s done, our ER diagram looks something like this:

Untitled

While this is technically correct, it’s clunky and hard to read, so let’s look at how we can improve it.

Here are some guidelines that should be followed:

Crow’s Feet Notation

This won’t be tested, but it’s good to recognize.

This notation is similar to UML diagrams, where attributes are under the entity name in a rectangle.

Untitled

Instead of diamonds for relationships, we simply have lines that tell us the cardinality and participation depending on the notches.

Untitled

With this, we can represent all of our relationships with a different notation.

Untitled