To explain this, let’s assume we have a very simple stored program computer that we will perform instructions on
The format of instructions defines their anatomy, which consists of
For example, here’s some hypothetical instructions (with result destinations in bold)
LDR r1,1234 (load memory address 1234 in register 1) STR r3,2000 (store register 3 into memory address 2000) ADD r1,r2,r3 (add r2 and r3 and store in r1) SUB r3,r3,r1 (subtract r1 from r3 and store in r3)
To execute instructions, the computer will go through a two-stage process
The program counter then points to the next instruction, where the process repeats ad nauseum
The program counter (PC) is a type of register, along with the…
To make this a little more clear, here’s an example of a hypothetical computer
Sometimes (like ADD), the operands will be read from the register before being transferred to the ALU and then sent back to the destination register (this is called a register-to-register operation)
Other times, we need to access the MAR to read/write to memory, but how do we deal with the conflict with the PC?
We do this through a multiplex, which acts like a traffic light for the signal
As an example, let’s cover the execution process in RTL:
Fetch
[MAR] ← [PC]
[PC] ← [PC] + 4
[MBR] ← [[MAR]]
[IR] ← [MAR]
Execute LDR
[MAR] ← [IR(address)]
[MBR] ← [[MAR]]
[r1] ← [MBR]
How do we do flow control? We just have to modify the incrementor’s path
Before the incrementor can increment the PC, we have the CU check it to make sure it doesn’t have to be changed
We also have some status bits stored in the condition code register (CCR) after an operation is performed, going as follows
On RISC processors like ARM, we have to manually request status flags to be updated, while they are automatically updated on newer CISC processors
We’ve seen Assembly instructions before, but let’s look at them again
For whitespace (space or tab), we need at least 1 after the operation name, but all other whitespace is up to the programmer
To branch (do a conditional) we can use BPL
Let’s look at a more complex example
SUBS r5,r5,#1 ;Subtract 1 from r5 and store in r5
BEQ onZero ;IF zero THEN go to onZero
notZero ADD r1,r2,r3 ;ELSE continue
...
onZero SUB r1,r2,r3 ;Here's where the branch ends up if SUBS gave 0
We can also define variables to do something like this
\[P \geq Q \rightarrow X=P+5 \\ ELSE \rightarrow X=P+20\]First, we need to define these variables, which we can do with DCD
P DCD 12 ;This reserves a space in memory for the decimal number 12 and labels it P
Q DCD 9 ;This does the same thing but for Q and 9
X DCD 0 ;The memory locations are 36, 40 and 44 (we'll learn how we got this later)
Then we can do our ARM magic
LDR r0,P ;Load P into r0
LDR r1,Q ;We do this so that we can do arithmetic on the number
SUBS r2,r0,r1 ;It's easier to check if P-Q >= 0 since we can just do BPL
BPL THEN ;If P-Q>=0 then execute THEN
ADD r0,r0,#20 ;ELSE add 20 to r0 and store it in r0
B EXIT ;Skip to EXIT
THEN ADD r0,r0,#5 ;Add 5 to r0 and store in r0
EXIT STR r0,X ;Whatever's in r0 is stored in X (which branch we took doesn't matter)
STOP ;Program is done
P DCD 12 ;It sounds weird but we declare variables after our instructions
Q DCD 9
X DCD 0
Since we set P to 12 and Q to 9, we can see that P ≥ Q, so P-Q≥0
In this case, we go to the branch and skip to THEN, where we execute X=P+5 and halt (B EXIT and ADD 20 are ignored)
This is how we got the address for our vars btw
To do a loop, we can just branch to a previous instruction
Below is an example where we calculate 1+2+3+…+20
Registers usually have the same width as the word of the computer (if the word is 32 bits, the registers will be 32 bits as well)
Let’s say you have 32 bits and you don’t use all of them (ex. 16 bits of data); where do the rest of the bits go?
This depends entirely on the processor, which might
Something to note is that if you want to do data processing on ARM processors, you have to put the data in registers first
In CISC, processors usually have two-address instructions (one memory and one register)
RISC on the other hand has three-address instructions where all three operands are register (this doesn’t include LDR and STR, which are special cases)
There are three fundamental addressing modes
For ARM processors specifically, the registers are 32-bits and there’s 16 of them r0-r15
Because there’s 16 registers, we require a 4-bit address for each in the instruction before its decoded
The ARM also has a current program status register (CPSR) which has
They also have a large instruction set including the following
Generally, instructions in ARM go by the following
{label} op-code operand1,operand2,operand3 {;comment}
Below is an example of an ARM assembly program fragment
And here’s another example which generates the sum of the cubes of numbers
Note that these are fragments and not full programs since we need to specify the environment first
We mentioned before that we need ARM directives and instructions in order to make a full program, but what do these directives look like?
To start, let’s look at an example from last week
These include
The DCD, DCW and DCB directives tell the assembler to
Below is an example of these directives in action
loop b loop
There’s also synonyms here, so let’s look at them
In the disassembly, these directives look like gibberish at first, so let’s look at it
Everything in the orange block is the memory locations used in the directives
We mentioned pseudos before, but what exactly are they?
Pseudos are operations for which there are no direct machine language equivalent
To execute these, the assembler has tricks it can use instead
For example, you can’t execute MOV r0,#0x12345678 to load r0 with 0x12345678 since the instructions are only 32-bits long
As an alternative, we can use LDR r0, = 0x12345678 as our pseudo instead
With this, the assembler
Another example is ADR r0,label (address to register) which loads the 32-bit address of the line ‘label’ into r0
Below is an example of ADR in action
All of the pseudo translation is done automatically by the assembler, utilizing program counter relative addressing
Relatives addressing allows us to specify the location of an operand with respect to a register value
The cool part comes in when we use r15 (PC) in the relative addressing)
In most cases the ARM’s PC is 8 bytes from the current instruction to be executed due to pipelining (where the next instruction is fetched before the current one has been executed)
To go more in-depth into how this works, let’s look at this in action in a real program
The top right shows how we calculate the offsets (goal address - address of the current instruction + 0x08 for pipelining = offset)
Each instruction listed here will have a version ending in S, which will update the status flags (remember what we have to do this to update status flags because it’s not automatic)
We’ve covered a few of these already so let’s go over the ones we haven’t
ADC performs a 32-bit ADD while also added the carry-over, which is useful for adding 64-bit numbers
We can do so using a double-precision strategy, like so
RSB is the same as SUB, except that RSB reverses the two operands
This is useful when you want to subtract a register value from a constant, since you can’t list constants first
Negation subtracts a number from zero, which has the effect of making it negative by two’s complement
ARM doesn’t give this as a instruction, rather as a pseudo called NEG
We’ve covered MOV already, but we also have MVN, which flips the bits of the source operand before storing it in the register
The multiply instruction MUL Rd,Rm,Rs
Note that Rd can’t be the same as Rm because of how ARM implements MUL
MLA exists as well, but we’ve already covered that
With MLA, we can also find dot products of vectors
We have multiplication for other types of numbers as well
Note that ARM does not have division, but that can be implemented by the programmer if they wish
These will just be listed off since they’re self-explanatory
We can use these bitwise operations for more complicated bit manipulations as well
We’ve covered CMP already, but we have other compare instructions that we can use as well
Shifts move bites one or more places to the right or left, with its replacement depending on the instruction used
These shift can either be static (determined by a number representing the number of bits to shift by) or dynamic
Dynamic shifts can be any one of the following
The following shifts are in ARM itself