On most computers, we need to run multiple things at once, but how do we achieve this efficiently?
We’ve spoken about multicore systems, but this is only part of the puzzle; we need to incorporate scheduling in order to make this work
We do this with a CPU-I/O burst cycle, where we cycle short bits of CPU usage with short bits of I/O usage, each called a burst
There are (supposed to be) many short bursts and only a few long bursts, so we can interleave pretty easily
The scheduler itself selects processes from a queue of processes that haven’t been completed
These processes switch from running to waiting, running to ready, waiting to ready, or any state to termination
For situations 1 and 4, we can’t do much with our scheduler, but for 2 and 3 we can shake things up a bit with choice
We handle process swapping with the dispatcher, which swaps context, between user mode and kernel mode and going to where it left off in a previous program
This dispatcher has some latency, but this should be minimized as much as possible (we can use vmstat to track this)
There’s many different ways to calculate how to interleave processes, but which one do we choose?
To decide, we need to take into account CPU utilization, throughput, turnaround time, waiting time and response time
The first and most simple way to go about this is to run whatever comes first in a FIFO queue, running until it needs I/O or is terminated
The problem is that the wait time can be very long, so if you run a bunch of processes at once, it can be a while before we get to the next process
To look at this deeper, consider the following processes
This obviously isn’t good for 2 and 3, but if we had a different order, things get much better
This begs the question: what if we slap $P_1$ into the middle? Crunching the numbers, we get the following
Burst | Wait | Turnaround | |
---|---|---|---|
P2 | 3 | 0 | 3 |
P1 | 24 | 3 | 27 |
P3 | 3 | 27 | 30 |
Avg | - | 10 | 20 |
We notice that when we have short processes stuck behind long processes, we have longer wait times are turnaround times, much like a slow truck on a one-lane stretch of highway
This, of course, assumes that we have everything arrive at the same time, which usually isn’t the case
If we take into account arrival time, our calculations look slightly different
Another option is to use a stack, where processes are handled in LIFO
This option improves response time for newly created processes, but has an extremely large risk of starvation, so it’s probably best to avoid this
In this, we take the shortest CPU burst and run that first, followed by the next shortest burst
This gives the benefit of an optimal SJF with a minumum guaranteed average waiting time for a given set of processes, but determining the length of the burst is a bit complicated
We could either make a guess based on history (which is automatic but inaccurate) or make the user/developer set it (accurate but annoying to have to set)
To see how optimal this really is, let’s crunch the numbers
This gives us our best times yet from all the other methods, but how do we actually calculate the burst length ahead of time?
One way we can do this is with an averaging formula known as exponential smoothing, with the idea that the next CPU burst will probably be a similar length
Here is the text from the image, typed out word for word:
We can modify $\alpha$ to get different effects
Set $\alpha$ to 0 – The most recent history has no bearing. Pick an estimate and never update it \(\tau_{n+1} = \tau_n\)
Set $\alpha$ to 1 – Only the most recent history has any bearing \(\tau_{n+1} = t_n\)
More commonly, set $\alpha$ to ½ so we get the most recent history and all the past history
This gives us slightly better times if we use it properly
Here is the text from the image, typed out word for word:
Here is the corrected table, typed out word for word:
This is probably the simplest of the “effective methods”: just split time evenly
More specifically, we give each burst a certain amount of time (given by a time quantum) before switching to the next process, and so on in a loop
In this case, no one process waits more than (n-1)q time units, but we have to be careful with q: too low and we spend more time context switching, but too high and we’re just doing FCFS again
Despite this, round robin is the most common scheduling algorithm, with pretty good results
We can also assign priority for each process with an integer, with the upside of more control and the downside of possible starvation
We can partially solve the starvation issue with process aging, so as the time progresses, increase priority
Give each process a lottery ticket, and if it’s chosen, run that process, which eliminates starvation but can be inefficient
We can also combine all of these together, with one method for each queue, which gives the best of all worlds but still can starve processes
A variation of this that solves starvation is the feedback queue, where processes can be promoted to higher priority queues as they age and demoted to lower priority queues if they use too much CPU time
Here is the text from the image, typed out word for word:
Consider an example with three queues
A new process enters queue $Q_0$ and receives 8 milliseconds
If it does not finish in 8 milliseconds, the process is moved to queue $Q_1$
At $Q_1$ job is again served in RR and receives 16 additional milliseconds
If it still does not complete, it is preempted and moved to queue $Q_2$
Things get more complicated when we introduce parallelism, because now we can balance the load more efficiently
One approach we can take is asymmetric multiprocessing, where we use one process for kernel-level scheduling and other processors just execute code
The other approach is symmetric multiprocessing, where each processor schedules itself, where each processor can choose from a common queue (can lead to race conditions) or a private queue (can unbalance the load)
This is applied either on
Multicore processors - each core is a logical CPU for the OS (faster and saves power, but can create memory stalls)
In this case, the chip has built-in thread scheduling
This also comes with the issue of load balancing, since now we have threads to work with
We have two approaches to sort this out
We usually don’t want threads to migrate to new processors since we have to then repopulate the cache again
We can enforce this further by either using soft affinity (OS attempts to keep the thread running on the same processor) or hard affinity (threads specify a set of processors to run on)
This is counterbalanced by load balancing, so the more we load balance, the less benefit we get from affinity, so we need to achieve a golden mean between the two in our algorithms