Scheduling

On most computers, we need to run multiple things at once, but how do we achieve this efficiently?

We’ve spoken about multicore systems, but this is only part of the puzzle; we need to incorporate scheduling in order to make this work

We do this with a CPU-I/O burst cycle, where we cycle short bits of CPU usage with short bits of I/O usage, each called a burst

There are (supposed to be) many short bursts and only a few long bursts, so we can interleave pretty easily

2023-04-17_05-34.png

The scheduler itself selects processes from a queue of processes that haven’t been completed

These processes switch from running to waiting, running to ready, waiting to ready, or any state to termination

For situations 1 and 4, we can’t do much with our scheduler, but for 2 and 3 we can shake things up a bit with choice

We handle process swapping with the dispatcher, which swaps context, between user mode and kernel mode and going to where it left off in a previous program

This dispatcher has some latency, but this should be minimized as much as possible (we can use vmstat to track this)

2023-04-17_05-40.png

Scheduling Algorithms

There’s many different ways to calculate how to interleave processes, but which one do we choose?

To decide, we need to take into account CPU utilization, throughput, turnaround time, waiting time and response time

Simple Queue

The first and most simple way to go about this is to run whatever comes first in a FIFO queue, running until it needs I/O or is terminated

The problem is that the wait time can be very long, so if you run a bunch of processes at once, it can be a while before we get to the next process

To look at this deeper, consider the following processes

2023-04-17_05-41.png

This obviously isn’t good for 2 and 3, but if we had a different order, things get much better

2023-04-17_05-41_1.png

This begs the question: what if we slap $P_1$ into the middle? Crunching the numbers, we get the following

  Burst Wait Turnaround
P2 3 0 3
P1 24 3 27
P3 3 27 30
Avg - 10 20

We notice that when we have short processes stuck behind long processes, we have longer wait times are turnaround times, much like a slow truck on a one-lane stretch of highway

This, of course, assumes that we have everything arrive at the same time, which usually isn’t the case

If we take into account arrival time, our calculations look slightly different

2023-04-17_05-41_2.png

Stack

Another option is to use a stack, where processes are handled in LIFO

This option improves response time for newly created processes, but has an extremely large risk of starvation, so it’s probably best to avoid this

Shortest-Job-First

In this, we take the shortest CPU burst and run that first, followed by the next shortest burst

This gives the benefit of an optimal SJF with a minumum guaranteed average waiting time for a given set of processes, but determining the length of the burst is a bit complicated

We could either make a guess based on history (which is automatic but inaccurate) or make the user/developer set it (accurate but annoying to have to set)

To see how optimal this really is, let’s crunch the numbers

2023-04-17_05-42.png

This gives us our best times yet from all the other methods, but how do we actually calculate the burst length ahead of time?

One way we can do this is with an averaging formula known as exponential smoothing, with the idea that the next CPU burst will probably be a similar length

Here is the text from the image, typed out word for word:


  1. $t_n$ = actual length of $n^{th}$ CPU burst
  2. $\tau_{n+1}$ = predicted value for the next CPU burst
  3. $\alpha$, $0 \leq \alpha \leq 1$
  4. Define: \(\tau_{n+1} = \alpha t_n + (1 - \alpha) \tau_n\)

We can modify $\alpha$ to get different effects

This gives us slightly better times if we use it properly

Here is the text from the image, typed out word for word:

Here is the corrected table, typed out word for word:


\[\begin{array}{|c|c|c|c|c|c|c|c|c|} \hline t_i & & 6 & 4 & 6 & 4 & 13 & 13 & 13 \\ \hline \tau_0 & 10 & 4.5+2.5 & 3+1.8 & 4.5+1.2 & 3+1.4 & 9.8+1.1 & 9.8+2.8 & 9.8+3.2 \\ & & 7 & 4.8 & 5.7 & 4.4 & 10.9 & 12.6 & 13 \\ \hline \end{array}\]

2023-04-17_05-45.png

Round Robin

This is probably the simplest of the “effective methods”: just split time evenly

More specifically, we give each burst a certain amount of time (given by a time quantum) before switching to the next process, and so on in a loop

In this case, no one process waits more than (n-1)q time units, but we have to be careful with q: too low and we spend more time context switching, but too high and we’re just doing FCFS again

Despite this, round robin is the most common scheduling algorithm, with pretty good results

2023-04-17_05-45.png

Priority

We can also assign priority for each process with an integer, with the upside of more control and the downside of possible starvation

2023-04-17_05-49.png

We can partially solve the starvation issue with process aging, so as the time progresses, increase priority

Lottery

Give each process a lottery ticket, and if it’s chosen, run that process, which eliminates starvation but can be inefficient

Multilevel Queue

We can also combine all of these together, with one method for each queue, which gives the best of all worlds but still can starve processes

2023-04-17_05-52.png

A variation of this that solves starvation is the feedback queue, where processes can be promoted to higher priority queues as they age and demoted to lower priority queues if they use too much CPU time

Here is the text from the image, typed out word for word:

Consider an example with three queues

A new process enters queue $Q_0$ and receives 8 milliseconds

If it does not finish in 8 milliseconds, the process is moved to queue $Q_1$

At $Q_1$ job is again served in RR and receives 16 additional milliseconds

If it still does not complete, it is preempted and moved to queue $Q_2$

On Multiprocesses

Things get more complicated when we introduce parallelism, because now we can balance the load more efficiently

One approach we can take is asymmetric multiprocessing, where we use one process for kernel-level scheduling and other processors just execute code

The other approach is symmetric multiprocessing, where each processor schedules itself, where each processor can choose from a common queue (can lead to race conditions) or a private queue (can unbalance the load)

This is applied either on

This also comes with the issue of load balancing, since now we have threads to work with

We have two approaches to sort this out

We usually don’t want threads to migrate to new processors since we have to then repopulate the cache again

We can enforce this further by either using soft affinity (OS attempts to keep the thread running on the same processor) or hard affinity (threads specify a set of processors to run on)

This is counterbalanced by load balancing, so the more we load balance, the less benefit we get from affinity, so we need to achieve a golden mean between the two in our algorithms