Assembly

What is an instruction?

To explain this, let’s assume we have a very simple stored program computer that we will perform instructions on

Untitled

The format of instructions defines their anatomy, which consists of

For example, here’s some hypothetical instructions (with result destinations in bold)

LDR r1,1234 (load memory address 1234 in register 1) STR r3,2000 (store register 3 into memory address 2000) ADD r1,r2,r3 (add r2 and r3 and store in r1) SUB r3,r3,r1 (subtract r1 from r3 and store in r3)

To execute instructions, the computer will go through a two-stage process

  1. Fetch (read from memory and decode)
  2. Execute (interpret and execute by the CPU’s logic)

The program counter then points to the next instruction, where the process repeats ad nauseum

The program counter (PC) is a type of register, along with the…

To make this a little more clear, here’s an example of a hypothetical computer

Untitled

Sometimes (like ADD), the operands will be read from the register before being transferred to the ALU and then sent back to the destination register (this is called a register-to-register operation)

Other times, we need to access the MAR to read/write to memory, but how do we deal with the conflict with the PC?

We do this through a multiplex, which acts like a traffic light for the signal

As an example, let’s cover the execution process in RTL:

Fetch

[MAR] ← [PC]

[PC] ← [PC] + 4

[MBR] ← [[MAR]]

[IR] ← [MAR]

Execute LDR

[MAR] ← [IR(address)]

[MBR] ← [[MAR]]

[r1] ← [MBR]

How do we do flow control? We just have to modify the incrementor’s path

Before the incrementor can increment the PC, we have the CU check it to make sure it doesn’t have to be changed

We also have some status bits stored in the condition code register (CCR) after an operation is performed, going as follows

Untitled

On RISC processors like ARM, we have to manually request status flags to be updated, while they are automatically updated on newer CISC processors

Assembly

We’ve seen Assembly instructions before, but let’s look at them again

For whitespace (space or tab), we need at least 1 after the operation name, but all other whitespace is up to the programmer

To branch (do a conditional) we can use BPL

Let’s look at a more complex example

SUBS r5,r5,#1 ;Subtract 1 from r5 and store in r5
BEQ onZero ;IF zero THEN go to onZero
notZero ADD r1,r2,r3 ;ELSE continue
...
onZero SUB r1,r2,r3 ;Here's where the branch ends up if SUBS gave 0

We can also define variables to do something like this

\[P \geq Q \rightarrow X=P+5 \\ ELSE \rightarrow X=P+20\]

First, we need to define these variables, which we can do with DCD

P DCD 12 ;This reserves a space in memory for the decimal number 12 and labels it P
Q DCD 9 ;This does the same thing but for Q and 9
X DCD 0 ;The memory locations are 36, 40 and 44 (we'll learn how we got this later)

Then we can do our ARM magic

LDR r0,P ;Load P into r0
LDR r1,Q ;We do this so that we can do arithmetic on the number
SUBS r2,r0,r1 ;It's easier to check if P-Q >= 0 since we can just do BPL
BPL THEN ;If P-Q>=0 then execute THEN
ADD r0,r0,#20 ;ELSE add 20 to r0 and store it in r0
B EXIT ;Skip to EXIT
THEN ADD r0,r0,#5 ;Add 5 to r0 and store in r0
EXIT STR r0,X ;Whatever's in r0 is stored in X (which branch we took doesn't matter)
STOP ;Program is done
P DCD 12 ;It sounds weird but we declare variables after our instructions
Q DCD 9
X DCD 0

Since we set P to 12 and Q to 9, we can see that P ≥ Q, so P-Q≥0

In this case, we go to the branch and skip to THEN, where we execute X=P+5 and halt (B EXIT and ADD 20 are ignored)

This is how we got the address for our vars btw

This is how we got the address for our vars btw

To do a loop, we can just branch to a previous instruction

Below is an example where we calculate 1+2+3+…+20

Untitled

More on Registers

Registers usually have the same width as the word of the computer (if the word is 32 bits, the registers will be 32 bits as well)

Let’s say you have 32 bits and you don’t use all of them (ex. 16 bits of data); where do the rest of the bits go?

This depends entirely on the processor, which might

Something to note is that if you want to do data processing on ARM processors, you have to put the data in registers first

In CISC, processors usually have two-address instructions (one memory and one register)

RISC on the other hand has three-address instructions where all three operands are register (this doesn’t include LDR and STR, which are special cases)

Addressing Modes

There are three fundamental addressing modes

ARM Assembly

For ARM processors specifically, the registers are 32-bits and there’s 16 of them r0-r15

Because there’s 16 registers, we require a 4-bit address for each in the instruction before its decoded

The ARM also has a current program status register (CPSR) which has

Untitled

They also have a large instruction set including the following

Untitled

Generally, instructions in ARM go by the following

{label} op-code operand1,operand2,operand3 {;comment}

Below is an example of an ARM assembly program fragment

Untitled

And here’s another example which generates the sum of the cubes of numbers

Untitled

Note that these are fragments and not full programs since we need to specify the environment first

Program Structure

We mentioned before that we need ARM directives and instructions in order to make a full program, but what do these directives look like?

To start, let’s look at an example from last week

Untitled

These include

The DCD, DCW and DCB directives tell the assembler to

Below is an example of these directives in action

Untitled

loop b loop

There’s also synonyms here, so let’s look at them

In the disassembly, these directives look like gibberish at first, so let’s look at it

Untitled

Everything in the orange block is the memory locations used in the directives

Untitled

Pseudo Instructions

We mentioned pseudos before, but what exactly are they?

Pseudos are operations for which there are no direct machine language equivalent

To execute these, the assembler has tricks it can use instead

For example, you can’t execute MOV r0,#0x12345678 to load r0 with 0x12345678 since the instructions are only 32-bits long

As an alternative, we can use LDR r0, = 0x12345678 as our pseudo instead

With this, the assembler

Another example is ADR r0,label (address to register) which loads the 32-bit address of the line ‘label’ into r0

Below is an example of ADR in action

Untitled

All of the pseudo translation is done automatically by the assembler, utilizing program counter relative addressing

Relative Addressing

Relatives addressing allows us to specify the location of an operand with respect to a register value

The cool part comes in when we use r15 (PC) in the relative addressing)

In most cases the ARM’s PC is 8 bytes from the current instruction to be executed due to pipelining (where the next instruction is fetched before the current one has been executed)

To go more in-depth into how this works, let’s look at this in action in a real program

Untitled

The top right shows how we calculate the offsets (goal address - address of the current instruction + 0x08 for pipelining = offset)

Data-Processing Instructions

Each instruction listed here will have a version ending in S, which will update the status flags (remember what we have to do this to update status flags because it’s not automatic)

We’ve covered a few of these already so let’s go over the ones we haven’t

Add

ADC performs a 32-bit ADD while also added the carry-over, which is useful for adding 64-bit numbers

We can do so using a double-precision strategy, like so

Untitled

Subtract

RSB is the same as SUB, except that RSB reverses the two operands

This is useful when you want to subtract a register value from a constant, since you can’t list constants first

Negation

Negation subtracts a number from zero, which has the effect of making it negative by two’s complement

ARM doesn’t give this as a instruction, rather as a pseudo called NEG

Move

We’ve covered MOV already, but we also have MVN, which flips the bits of the source operand before storing it in the register

Multiply

The multiply instruction MUL Rd,Rm,Rs

Note that Rd can’t be the same as Rm because of how ARM implements MUL

MLA exists as well, but we’ve already covered that

With MLA, we can also find dot products of vectors

Untitled

We have multiplication for other types of numbers as well

Note that ARM does not have division, but that can be implemented by the programmer if they wish

Bitwise

These will just be listed off since they’re self-explanatory

Untitled

We can use these bitwise operations for more complicated bit manipulations as well

Untitled

Untitled

Compare

We’ve covered CMP already, but we have other compare instructions that we can use as well

Untitled

Shifts

Shifts move bites one or more places to the right or left, with its replacement depending on the instruction used

Untitled

These shift can either be static (determined by a number representing the number of bits to shift by) or dynamic

Untitled

Dynamic shifts can be any one of the following

Untitled

The following shifts are in ARM itself

Untitled