A lot of the challenges of graphics come from figuring out how to manipulate coordinates, vectors and matrices to our benefit
With this in mind, it’s worth our time to review a bit of linear algebra
To understand this, let’s take an example
A 1-dimensional coordinate system would be a number line, but when we get up to 2 dimensions we form a Cartesian plane and so on and so forth
In general, a coordinate system is a geometrical system with tuples of numbers being used to describe unique points, lines, surfaces and other structures
In all cases, we need to define an origin (0, (0,0), etc.) and an orthonormal basis (x, (x,y), (x,y,z), etc.)
The points are the tuples themselves, which we can then use to form the rest of our structures
The orthonormal basis is defined by vectors, which, for our purposes, has a direction and a length
These can start from the origin, but they can start from anywhere since they don’t particularly care about location
Typically, we construct vectors using displacement, describing how the coordinate space is transformed
For example, if we have a point $p(x,y)$, then $p + (2,3)=(x+2,y+3)$
We can also do scalar multiplication, so if \(\vec{v}=\begin{bmatrix} v_1 \\ v_2 \end{bmatrix}\), then \(2\vec{v}=\begin{bmatrix} 2v_1 \\ 2v_2 \end{bmatrix}\)
This changes the vector length but not the direction, which is defined as $|\vec{v}|=\sqrt{(v_1^2+v_2^2+…+v_n^2)}$
In general, $|c\vec{v}|=c|\vec{v}| \forall c \in R$
Another operation on vectors is addition, so if we have two vectors $\vec{v}, \vec{w}$, then $\vec{v} + \vec{w}=\begin{bmatrix}
v_1+w_1
v_2+w_2 \end{bmatrix}$
We can also take the product between two vectors in one of two ways: dot product or cross product
The dot product returns a scalar as such
\[\vec{v} \cdot \vec{w}=\sum_{i=1}^nv_iw_i\]We can use this to quickly find the length of a vector, since $\vec{v} \cdot \vec{v}=|\vec{v}|^2$
Another thing we can do is find the angle between two vectors as such
\[\frac{\vec{v}\cdot \vec{w}}{|\vec{v}||\vec{w}|}=\cos(\theta)\]This, combined with normalizing vectors with $\frac{\vec{v}}{|\vec{v}|}$, we can create orthonormal vectors, or vectors that are normalized (length 1) and equal to each other
Simply put, if $\vec{v} \cdot \vec{w}=0$ when both vectors are normalized, the two vectors are perpendicular, since $\cos(90^\circ)=0$
It might be intuitive to say that vectors have a particular direction like north or south, but this doesn’t really mean anything since direction is relative
In general, when we describe direction, we mean a normalized vector
Orthonormal vectors are normal vectors that are orthogonal, which is what defines a vector
\[\vec{i}=\begin{bmatrix} 1 \\ 0 \\ 0\end{bmatrix}, \vec{j}=\begin{bmatrix} 0 \\ 1 \\ 0\end{bmatrix}, \vec{k}=\begin{bmatrix} 0 \\ 0 \\ 1\end{bmatrix}\]For these vectors, this is pretty easy to figure out, but in general, we need the cross product, which will give the vector that is orthogonal to two vectors
\[\vec{u}\times \vec{v}=\begin{bmatrix} u_2v_3-u_3v_2 \\ u_3v_1-u_1v_3 \\ u_1v_2-u_2v_1\end{bmatrix}\]The important thing about cross products is that they can be used to define the span of two vectors, i.e. the plane they take up
How do we decide which direction to use is dependent on the “handedness” of the system
If you’re using a right-hand system, open your right palm towards $\vec{u}$ and bend your fingers towards $\vec{v}$ to find the direction of $\vec{u}\times \vec{v}$
This is an important distinction to make since different graphics libraries will use different systems
Something else that’s interesting is that we can use cross product to get the sin of the angle between two vectors
\[\frac{|\vec{u}\times \vec{v}|}{|\vec{u}||\vec{v}|}=\sin(\theta)\]This, however, isn’t very useful in practice, since computing cross products is expensive
An $m \times n$ matrix is a rectangular array organized into $m$ rows and $n$ columns, denoted by $A_{m \times n}$
Usually, for this course, we’ll mainly be focused on square matrices since, given square matrices $A,B,C$, then $(AB)C=A(BC)$
Much like with vectors, we can do dot products
\[A_{m \times n} \cdot B_{n \times p}=C_{m \times p} \\ c_{i,j}=\begin{bmatrix} a_{i,1} &a_{i,2} &... &a_{i,n} \end{bmatrix} \cdot \begin{bmatrix} b_{1,j} \\ b_{2,j} \\ ... \\ b_{n,j}\end{bmatrix}\]Displays are a 2D grid of pixels where we change the values of pixels
Back in the day, we used to have raster displays, which draws lines of pixels one at a time
Nowadays, raster displays are pretty irrelevant, but they’re still used in graphics programs
More modern displays are bitmap-oriented, where each pixel is modified independently and controlled by a particular spot in video memory
Typically, these are laid out as an RGB stripe, but other ones do exist
The modern standard of displays (LCD and LED) combine the light from many subpixels to get deeper colors
When displaying colors to someone on a screen, we have to understand a little more about how humans see color
A good way to do this is with a color model, which is a way to describe colors using a set of descriptors (ex. RGB, HSV, CMYK)
Models can be divided into additive (mixing to move away from black) and subtractive (mixing to move away from white)
To make this more mathematical in a way a computer can understand, we can add a coordinate system to get a color space (we can use a 3D system for RGB, ex. $[255,0,255]$ describes purple)
Optimally we’d want all real numbers to be possible, but computers have limited memory, so we’ve come up with a few approaches
The set of colors that we can create with a color space is called a palette, which is represented with a hex code in the case of RGB
More examples
These are rarely used nowadays since we need so many colors, but back in the day they made for a great lookup table
We can’t do graphics without displays, so let’s look at how these are handled
Different graphical systems have different coordinate systems with left-handed systems (origin at the top left) being used by Tkinter, Java Swing and Direct2D
On the other hand, right-hand coordinate systems put the origin more naturally in the bottom left, which is used by Nintendo and Direct2D
We’ll use right-handed systems for theory, but left-handed systems make more sense from an application perspective since that’s the way we read matrices
Most libraries give you the option between device space (handled by your system) and user space (whichever one you want) so it doesn’t REALLY matter which one you pick, as long as you account for it in your transforms
Sometimes we want to keep all our graphics in a window and not have a program draw directly to the screen, so we can use a windowing system to maintain a separate render target
This window would then, in turn, have its own coordinate system, with the windowing system taking care of layering and combinations of windows
Due to our access to multiple windows, we tend to only work on one window at a time, each with its own context maintaining separate render settings and render target
Now that we have a display, what do we actually draw? We can start with some primitives, which are basic 2D graphical objects
Each primitive has some attributes, parameters that affect how the primitive is drawn
Some primitives include
This is analogous to how we draw shapes in real life; we draw a bunch of points, connect them together and our brain fills in the rest
We connect these together to create a 2D image, and connect THOSE together to create a 3D image
Polygons are closed shapes made of straight lines joined together
Since you can represent any polygon as a series of triangles (we’ve known this since Egypt), polygons are most often represented as triangles
There’s lost of ways to draw these in OpenGL
Quads is deprecated
We can be more specific with these polygons by defining winding order, which refers to which normal to use when rendering
This is CCW by default
OpenGL has two main components:
There’s also two ways to draw things
Before we talk about transforms, let’s look at how we render things to different windows
Since every screen is different, graphics libraries will use normalized device coordinates (NDCs) unless explicitly told to do otherwise
Obviously most screens aren’t square, so GLFW handles this with a viewport, which is specified by a window’s width and height
This frame buffer is the same size as the window, serving as an in-memory raster image within the window
How do we actually figure out where to draw stuff from the NDC? We can use the viewport matrix
If we have vertex at $(x,y)$ in the NDC and a viewport at position $(x_0, y_0)$ with width $w$ and height $h$, we can use the following formula
$x_s=(x+1)\frac{w}{2}+x_0$
$y_s=(y+1)\frac{h}{2}+y_0$
This is defined for you, but you can use glViewport(x0,y0,w,h) to set a new one
Now that we’ve figured out how to draw things, let’s figure out how to move stuff around
An affine transform is any geometric manipulation that preserves lines and parallelism
Each of these can be combined and handled with a single matrix multiplication, which is great because this is most of what we do (except for perspective projection, which will come in later)
Scaling is the simplest to understand, since all it involves is taking the vertices and multiplying their positions
We can apply this differently in each direction by scaling the x coordinate differently from the y coordinate
For this, we can take a simple scalar $(s_x, s_y)$ and multiply it by a vertex $(x,y)$ as such
\[\begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} s_x \cdot x \\ s_y \cdot y \end{bmatrix}\]This is just another matrix multiplication, where the new shape has slope
\[\begin{bmatrix} 1 & m \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} x+my \\ y \end{bmatrix}\]With vertical shear, it’s much of the same
\[\begin{bmatrix} 1 & 0 \\ m & 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} x \\ y+mx \end{bmatrix}\]For rotation, this can be handled simply given an angle $\theta$ that’s counterclockwise
\[\begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} x\cos(\theta)-y\sin(\theta) \\ x\sin(\theta)+y\cos(\theta) \end{bmatrix}\]We can invert this with a negative angle as such
\[\begin{bmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} x\cos(\theta)+y\sin(\theta) \\ -x\sin(\theta)+y\cos(\theta) \end{bmatrix}\]Reflecting in the direction $\vec{v}=(v_x,v_y)$, we get the following
\[\begin{bmatrix} v_x^2-v_y^2 & 2v_xv_y \\ 2v_xv_y & v_x^2-v_y^2 \end{bmatrix}\]Just multiply the corresponding matrices (lol)
This is surprisingly the most complicated, since we can’t multiply a vector by a scalar and get an addition every time
What we can do instead is use homogenous coordinates, which are actually quite simple: turn our coordinates and vectors into 3D vectors, where the 3rd dimension is 1 for a 2D point and 0 for a vector
\[\begin{bmatrix} x\\ y\\ 0 \end{bmatrix} \iff \begin{bmatrix} x\\ y\\ 1 \end{bmatrix}\]Now we can take our displacement vector $\vec{d}=(d_x,d_y)$ and form a vector like so
\[\begin{bmatrix} 1 & 0 & d_x \\ 0 & 1 & d_y \\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} x+d_x \\ y+d_y \\ 1 \end{bmatrix}\]If we try this with a direction, nothing happens (try this out for yourself)
We can scale up our other transforms in order to combine with transformations
\(\begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix} \quad \rightarrow \quad \begin{bmatrix} s_x & 0 & 0 \\ 0 & s_y & 0 \\ 0 & 0 & 1 \end{bmatrix}\)
\(\begin{bmatrix} 1 & m \\ 0 & 1 \end{bmatrix} \quad \rightarrow \quad \begin{bmatrix} 1 & m & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\)
\[\begin{bmatrix} 1 & 0 \\ m & 1 \end{bmatrix} \quad \rightarrow \quad \begin{bmatrix} 1 & 0 & 0 \\ m & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\]\(\begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix} \quad \rightarrow \quad \begin{bmatrix} \cos(\theta) & -\sin(\theta) & 0 \\ \sin(\theta) & \cos(\theta) & 0 \\ 0 & 0 & 1 \end{bmatrix}\)
When we have a top-left origin with positive y going down, this is basically the same but with slightly modified matrices
We can also think of these transformations as simply altering the coordinate system itself, much like turning your head around to look at something from an angle instead of turning the object itself
When we think of it this way, transferring between coordinate systems becomes a matter of reflecting along the x axis
To represent 3D objects on a 2D screen, we need to project it
So far, we’ve just been setting $z=0$ in our 2D drawings, but we can set it farther back and get different effects
When we bring $z$ into our NDC in the same way, we realize that NDCs are kinda useless
What if we want to draw further out? We can define an orthographic view volume with glOrtho
glOrtho(left, right, bottom, top, near, far)
Of course, it isn’t that easy since we have the pesky global state machine, but we can set this in global state with glMatrixMode(GL_PROJECTION)
In total, the following three functions should do the trick
glMatrixMode(GL_PROJECTION)
glLoadIdentity() //replaces the current matrix with the identity matrix
glOrtho(left, right, bottom, top, near, far)