We’ve covered conditionals already, so let’s go into more depth
In total ARM instructions has 4 bits dedicated to 16 conditions
Below is the meaning and encoding of each
Branch instructions are encoded as such
For example, let’s say we want to decode the following instruction (signifying the end of a program)
loop B loop
First, since the branch has no condition, bits 28-31 will be 1110
Secondly, since there is no link, we need to set bit 24 to 0
Lastly, let’s look at the word offset; this offset signifies how many instructions we want to move forward by (or backwards with negative numbers)
Since we want to go back to the same instruction, we want to have an offset of 2
Translating this into two’s compliment, we get 1111 1111 1111 1111 1111 1110
Overall, our instruction is 1110 1010 1111 1111 1111 1111 1111 1110, or 0xEAFFFFFE in hex
For the reverse, let’s look at the instruction 0x1AFFFFFD
Translating into binary gives us 0001 1010 1111 1111 1111 1111 1111 1101, which we can then use for the rest of our encoding
Since bits 27-24 are 1010, we know this is a branching instruction with no link
Bits 31-28 are 0001, so we know the op-code is BNE
The offset is -3, which means we go 1 instruction backwards (taking into account pipelining)
So overall this is a BNE one step backwards, which would look something like this
marker ADDS r1,r1,r2
BNE marker
For data processing instructions, they go as such
For shift operations, we have some extra divisions
The condition is still here because all instructions can be conditionally executed
For example, let’s encode the following instruction
ADD r0,r1,r2,LSR r3
Firstly, the condition is always, so bits 31-28 are 1110
Bit 25 indicates whether source 2 is a shift or a literal (it’s a shift, so we put 0)
Bits 24-21 are the op-code, for which ADD is 0100
Bit 20 indicates if there’s an S added to the op-code (there isn’t, so we put 0)
Bits 19-16 and 15-12 are the source 1 and destination registers, respectively, so we get 0001 0000
Bits 11-8 indicate the register that we will take for the shift length (it’s r3, so 0011), while bit 7 will be 0 since it’s a register-specified shift)
Bits 6-5 indicate the shift type (it’s a logical right, for which the code is 01)
Bit 4 will dictate if the shift is register-based (it is, so we put 1)
Finally, bits 3-0 indicate the second source (r2 = 0010)
Putting this all together, we get 1110 0000 1000 0001 0000 0011 0010 0011 (in hex, this is 0xE0A10332)
For comparison operations (ex. CMP, CMN, TST and TEQ), the destination is always r0 and S is always 1)
Let’s see this in action with an example
CMPGT r3,r5
In total, the instruction in binary is 1100 0001 0101 0011 0000 0000 0000 0101 (or 0xC1530005)
For MOV and MVN, source 1 is always 0
For example, let’s look at the following instruction
MOV PC,LR
Overall, the instruction is 1110 0001 1010 0000 1111 0000 0000 1110 (or 0xE1A0F00E in hex)
Bits 11-0 can also be a literal, with bits 11-8 being an alignment representing half the desired rotations right and 7-0 being an 8-bit literal
For example, let’s look at the encoded operand 1110 1111 1111
So in total, the operand would be 1111 1111 0000, or 0xFF0 in hex
There are three addressing modes that we already know for getting info into registers:
Since direct is impossible in ARM, register indirect with LDR and STR is the only way to actually access memory in ARM programs
The register with the address is indicated by square brackets, as seen in LDR r0,[r1]
We can use this for data structures such as arrays, which makes it vital for ARM programs
But how do we go backwards/forwards in an array like this? What we can do is subtract/add the register by 4 (i.e. one word), respectively
We can also do these types of offsets inside the brackets itself, like in the case of LDR r0,[r1,#4], which grabs the element one location ahead of what’s pointed to by r1
We can see this in action with the following program
Just like other registers, we can use the program counter (r15) as a pointer register as well
We can also set dynamic offsets as well, such as in LDR r0,[r1,r2]
Something else useful we can do is automatically update the pointer itself, either by updating and then using (++r0) or using and then updating (r0++)
To summerize:
So far, we’ve covered how to encode every instruction except for LDR and STR
The basic format goes at so
As an example, let’s try to encode 0x57224106
Compiling all this together, we get the following
STRPL r4,[r2,-r6,LSL#2]!
For an encoding example, let’s look at the following instruction
STRGT r1,[r2,#-0xFFF]
Overall, we get 1100 0101 0000 0010 0001 1111 1111 1111 (or 0xC5021FFF in hex)
Knowing these load and store instructions, we can not only implement arrays but stacks as well
As a refresher, stacks are a last in first out (LIFO) data structure in which items enter at one end and leave from the same end in reverse order
This type of data structure requires something called a stack pointer, which keeps track of the top of the stack and updates according to changes (moving forward on a push and backward on a pop)
There are four ways of forming a stack
Growing up and pointing to the top of the stack (TOS)
Growing up and pointing one word above the TOS
Growing down and pointing to the TOS
Growing down and pointing one word below TOS
The last instructions we will cover are the block move instructions, LDM and STM
Assume we want to load a set of consecutive words from memory
Normally, we would have to add each word separately like so
But with block move, we can combine all 4 instructions into 1
To make this easier to understand, you can think of STM as pushing a group of register content to memory and LDM as popping values from memory and loading them into register
For example, assume you have instructions like this
ADR r0,DataToGo
STMIA r0!,{r1-r3,r5}
Starting at the address pointed to by r0, the STM instruction will load words in order into the register r1, r2, r3 and r5
Note that the ! is important since it tells the computer to update r0
The two letters beside STM will tell you how the pointer will update
Usually for a stack we will have to leave some amount of space, which we can do pretty easily using SPACE
LDR r1,=0x11111111
LDR r1,=0x22222222
LDR r1,=0x33333333
LDR r1,=0x55555555
ADR r0,Stack
STMIA r0!,{r1-r3,r5}
Loop B Loop
SPACE 20 ;on exams we will be given both spaces, which one to use is up to you
Stack SPACE 20 ;we make space for each word needed + one buffer word = 16 + 4 = 20
Since these block move instructions implement stacks, we have special addons to simulate each type of stack
Note that block move ≠ stack application, they just provide a means of moving memory content more efficiently
For example, if we want to move 256 words from one table to another in memory, we would have to load the memory into registers and load it back into memory at the specified location
Block move instructions can’t do all 256 words at the same time, but they can make the process much faster
ADR r0,Table1
ADR r1,Table2
MOV r2,#32
Loop LDMFD r0!,{r3-r10}
STMFD r1!,{r3-r10}
SUBS r2,r2,#1
BNE Loop
Encoding/decoding in block move works much like memory move, except the destination and second operand are replaced with a register list
The register list of each register, with its corresponding bit being 1 is the corresponding register is included in the range
For an example, let’s look at the following instruction
STMFD r13!,{r0-r4,r10}
Overall, the instruction is encoded as 1110 1001 0010 1101 0000 0100 0001 1111 (or 0xE92D041F in hex)
For a decoding instruction, lets look at the instruction 0x08855555
Translating to binary, we get 0000 1000 1000 0101 0101 0101 0101 0101
Overall, this is the instruction that we get
STMEQIA r5,{r14,r12,r10,r8,r6,r4,r2}
;OR
STMEQEA r5,{r14,r12,r10,r8,r6,r4,r2}
In a program, oftentimes we need to execute some code multiple times throughout
Obviously we can just put in the code multiple times, but this gets messy and complicated fast
Instead of this, we can put the code into a subroutine
To understand subroutines and how they work, we need to understand its characteristics
How this is done in ARM is that the processor will save the address of the next instruction in a safe place and then load the PC with the address of the first instruction in the subroutine
After the subroutine is complete, we have a return to subroutine instruction (RTS) which will cause the processor to return to the address immediately after the subroutine call
The flow control looks something like this
CISC processors have fully automatic subroutine mechanisms, but in ARM it’s a bit more complicated
ARM’s branch with link instruction (BL) acts as a subroutine call, saving the return address in register r14
For example, let’s say we want to execute a subroutine Sub_A like so
BL Sub_A
At the end of Sub_A, we already have the return address in r14, so we can simply move r14 into r15
MOV r15,r14
;OR
MOV pc,lr
As a reminder of how these are encoded, we just take the old method and make bit 24 1
We can also emulate a CISC processor and push the return address onto the stack before branching to the target address
Once the subroutine is finished, we can then pop the return address from the stack and copy it to the PC
For example, we can do something like this (assuming we have a Full Ascending stack)
...
...
...
STR r15,[r13,#-4]! ;pre-decrement the stack pointer AND
;push the return address on the stack (DB)
B Target ;NOT BL
...
...
Something we have to note is that usually the pipelining effect will have the PC be 8 bytes ahead of the current instruction, with the exceptions being STR and STM
STR and STM have a pipelining effect of +12, meaning the PC will be 3 instructions ahead instead of 2
How do we deal with this? Just load the top of the stack and add 4
...
...
LDR r12,[r13],#+4 ;get the PC and post-increment the stack pointer
;r12 is just a general use register
SUB r15,r12,#4 ;fill the PC one instruction back from the return address
The reason why knowing the stack method is important is that it’s the only way to implement nested subroutines
The way you have to handle nested routines is to save the link register in the stack before you call another subroutine
For example, take a look at the following program
Since we call Fun_1 within Fun_2, we have to push the link register on the stack before we branch with link to Fun_1
From there, since Fun_1 is a leaf routine (a routine with no nested call inside of it), we just have to move the link register into the PC
At the end of Fun_2, we must then pop the original link register and load it into the PC (we don’t have to modify the link register value since pipelining is handled for us in branch with link)
Assume that a program uses R1 to store a value, after which the program calls a function and R1 is used again to store a different value
This presents a problem because R1 will be overwritten, which will cause bugs
To avoid this, we should push all registers that will be used onto a stack at the beginning of a function and pop all of them into the same registers before returning from the function
For example:
Note that if we’re using this approach, we can include the program counter and link register as well to save ourselves an instruction