How to control the flow of a program in x86 assembly
x86 assembly language just like most other programming languages provides us with the ability to control the flow of the program using various instructions.
This article provides an overview of those instructions that can be used to control the flow of a program.
See the last article in this series, How to diagnose and locate segmentation faults in x86 assembly.
Intro to x86 Disassembly
Using comparison instructions to control applications at the x86 level
x86 instruction set comes with two popular instructions for comparison. They are CMP and TEST. Let us explore the following program to understand how these two instructions work.
global _start
_start:
mov eax, 101
mov ebx, 100
mov ecx, 100
cmp eax, ebx
cmp ebx, ecx
xor eax, eax
test eax, eaxFirst, let us assemble and link this program using the following commands.
Now, let us load the program in GDB as shown below.
Set up a breakpoint at the entry point of the program and run the program as shown in the following excerpt.
Breakpoint 1 at 0x8049000
gef➤ runThe following instructions move the values into the respective registers specified in the instructions.
0x8049005 <_start+5> mov ebx, 0x64
0x804900a <_start+10> mov ecx, 0x64Following is the state of registers after executing the first 3 instructions shown above.
$ebx : 0x64
$ecx : 0x64
$edx : 0x0
$esp : 0xffffd210 → 0x00000001
$ebp : 0x0
$esi : 0x0
$edi : 0x0
$eip : 0x0804900f → <_start+15> cmp eax, ebxThe eflags are as shown below.
Now, let us run the first cmp instruction by typing si and observe the changes to the eflags register.
As we can notice, there is no difference in the flags after executing the first CMP instruction. CMP instruction compares the values and sets the ZERO flag if the difference is 0. This instruction also sets Sign Flag (SF), Parity Flag (PF), Carry Flag(CF), Overflow Flag(OF) and Adjust Flag(AF) depending on various results. In this case, the values in EAX and EBX are compared and the result did not set any of these flags.
However, after executing the next cmp instruction, the ZERO flag and PARITY flags are set as shown below.
When a specific flag is set, GEF shows it in upper case letters as shown in the preceding output.
The next instruction xor eax, eax sets eax to 0. Following is the status of registers after executing this instruction.
$ebx : 0x64
$ecx : 0x64
$edx : 0x0
$esp : 0xffffd210 → 0x00000001
$ebp : 0x0
$esi : 0x0
$edi : 0x0
$eip : 0x08049015 → <_start+21> test eax, eaxThe next instruction test eax, eax checks if the register eax contains the value 0. If yes, the zero flag will be set. Following is the status of eflags after executing this instruction.
Parity flag is set if the register eax has an even number of set bits.
These instructions can be used to control the flow of the program. As an example, execute a block of code if a specific register has value 0. Similarly, execute a specific block if the comparison (using the CMP instruction) results in the value zero.
Following is a sample use case of cmp instruction.
Following is a sample use case of test instruction.
Using jump instructions to control applications at the x86 level
The next set of instructions are jump instructions. Jump instructions are of two types. Unconditional jumps and conditional jumps. The instruction JMP is an unconditional jump as it does not rely on any conditions to be met. All other jump instructions are conditional jump instructions as their execution depends on certain conditions that are possibly set by other parts of the program. Following is an example with both unconditional and conditional jump instructions.
equal db "eax and ebx are equal"
notequal db "eax and ebx are not equal"
section .text
global _start
_start:
mov eax, 100
mov ebx, 101
cmp eax, ebx
jz _printequal
jmp _printnotequal
_exitprogram:
mov eax, 1
mov ebx, 0
int 0x80
_printequal:
mov eax, 4
mov ebx, 1
mov ecx, equal
mov edx, 21
int 0x80
jmp _exitprogram
_printnotequal:
mov eax, 4
mov ebx, 1
mov ecx, notequal
mov edx, 25
int 0x80
jmp _exitprogramAs we can notice in the preceding program, the entry point of the program is labeled as _start. When the program starts its execution, the registers eax and ebx are set with some values. Next, a comparison is done using CMP instruction. Since the values in eax and ebx are not equal, the ZERO flag will not be set. Once it is done, the jz _printequal instruction is executed. This instruction checks if the ZERO flag is set and takes a jump to the label _printequal if zero flag is set. Clearly, this instruction relies on the output of other instructions such as CMP. In this case, the jump will not be taken. Following is an excerpt taken from GDB at this instruction.
0x804900a <_start+10> cmp eax, ebx
→ 0x804900c <_start+12> je 0x804901c <_printequal> NOT taken [Reason: !(Z)]
0x804900e <_start+14> jmp 0x8049034 <_printnotequal>GEF clearly shows that the JUMP is not taken because the ZERO flag is not set. Since the JUMP is not taken, the control will be passed to the next instruction, which is an unconditional jump to _printnotequal. Once the code within _printnotequal is executed, there is another unconditional jump instruction to invoke the code within the label _exitprogram, which will gracefully exit the program.
Following is a list of conditional jump instructions.
JE (Jump if Equal): This instruction usually follows a CMP instruction and loads the EIP register with the specified address, if operands of the previous cmp instruction are equal.
Example:
mov eax, 10
mov ebx, 10
cmp eax, ebx
je _loc
_loc:
JNE (Jump if Not Equal): This instruction usually follows a CMP instruction and loads the EIP register with the specified address, if operands of the previous cmp instruction are not equal.
Example:
mov eax, 10
mov ebx, 11
cmp eax, ebx
jne _loc
_loc:
JG (Jump if Greater): This instruction usually follows a CMP instruction and loads the EIP register with the specified address, if the first operand is greater than the second operand in the previous cmp instruction. A signed comparison is performed.
Example:
mov eax, 11
mov ebx, 10
cmp eax, ebx
jg _loc
_loc:
JGE (Jump if Greater or Equal): This instruction usually follows a CMP instruction and loads the EIP register with the specified address, if the first operand is greater than or equal to the second operand in the previous cmp instruction. A signed comparison is performed.
Example:
mov eax, 11
mov ebx, 10
cmp eax, ebx
jge _loc
_loc:
JA (Jump if Above): This instruction is the same as JG except that it performs an unsigned comparison.
JAE (Jump if Above or Equal): This instruction is the same as JGE except that it performs an unsigned comparison.
JO (Jump if Overflow): This instruction loads the EIP register with the specified address if overflow bit is set.
JNO (Jump if Not Overflow): This instruction loads the EIP register with the specified address if overflow bit is not set.
JZ (Jump if Zero): This instruction loads the EIP register with the specified address if a previous arithmetic expression resulted in a zero flag being set.
JNZ (Jump if Not Zero): This instruction loads the EIP register with the specified address if a zero flag is not set.
JS (Jump if Signed): This instruction loads the EIP register with the specified address if a previous arithmetic expression resulted in the sign flag being set.
JNS (Jump if Not Signed): This instruction loads the EIP register with the specified address if the sign flag is not set.
Using function calls to control applications at the x86 level
In x86, the call instruction is used to call another function. The function can then return using the ret instruction. When a function is called using the call instruction, a new stack frame is created at the current esp location and the return address(typically address of the instruction next to the call instruction) is stored on the stack. After the function is executed, ret instruction will be executed to return to this address saved on the stack. Let us consider the following example.
global _start
_start:
call print
mov eax, 1
mov ebx, 0
int 0x80
_print:
mov edx,len
mov ecx,msg
mov ebx,1
mov eax,4
int 0x80
ret
section .rodata
msg db 'Hello, world!',0xa
len equ $ - msgThe first instruction within _start directive is a call to _print. After the _print function is executed, the ret instruction will be executed, which will return the control to the exit code written immediately after the call print instruction. Let us see how this looks like using GDB. First, let us assemble and link the program using the following commands,
Load the binary in GDB using the following command.
Set up a breakpoint at the entry point and run the program.
Breakpoint 1 at 0x8049000
gef➤ runFollowing are the instructions to be executed.
↳ 0x8049011 <_print+0> mov edx, 0xe
0x8049016 <_print+5> mov ecx, 0x804a000
0x804901b <_print+10> mov ebx, 0x1
0x8049020 <_print+15> mov eax, 0x4
0x8049025 <_print+20> int 0x80
0x8049027 <_print+22> retFollowing is the stack before running the first instruction.
0xffffd214│+0x0004: 0xffffd3c7 → "/home/dev/x86/functions"
0xffffd218│+0x0008: 0x00000000
0xffffd21c│+0x000c: 0xffffd3df → "SHELL=/bin/bash"
0xffffd220│+0x0010: 0xffffd3ef → "SESSION_MANAGER=local/x86-64:@/tmp/.ICE-unix/1760,[...]"
0xffffd224│+0x0014: 0xffffd441 → "QT_ACCESSIBILITY=1"
0xffffd228│+0x0018: 0xffffd454 → "COLORTERM=truecolor"
0xffffd22c│+0x001c: 0xffffd468 → "XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg"Now, run the call instruction by typing si and observe the top of the stack.
0xffffd210│+0x0004: 0x00000001
0xffffd214│+0x0008: 0xffffd3c7 → "/home/dev/x86/functions"
0xffffd218│+0x000c: 0x00000000
0xffffd21c│+0x0010: 0xffffd3df → "SHELL=/bin/bash"
0xffffd220│+0x0014: 0xffffd3ef → "SESSION_MANAGER=local/x86-64:@/tmp/.ICE-unix/1760,[...]"
0xffffd224│+0x0018: 0xffffd441 → "QT_ACCESSIBILITY=1"
0xffffd228│+0x001c: 0xffffd454 → "COLORTERM=truecolor"Notice the address placed on the top of the stack after executing the call instruction. What address is this? Let us view the disassembly of _start, which looks as shown below.
Dump of assembler code for function _start:
0x08049000 <+0>: call 0x8049011 <_print>
0x08049005 <+5>: mov eax,0x1
0x0804900a <+10>: mov ebx,0x0
0x0804900f <+15>: int 0x80
End of assembler dump.
gef➤As we can see in the preceding excerpt, the address placed on the stack is the address of the immediate next instruction to the call instruction. Let us continue execution until the ret instruction and observe what happens when we are about to execute the ret instruction.
↳ 0x8049005 <_start+5> mov eax, 0x1
0x804900a <_start+10> mov ebx, 0x0
0x804900f <_start+15> int 0x80As we can notice in the preceding excerpt, the address of the next instruction to be executed after the ret instruction is the same address that was placed on the stack earlier. So, when the ret instruction is executed, the address will be popped from the stack and placed in the EIP register.
Using loop instructions to control applications at the x86 level
x86 instruction set provides loop instruction, which decrements ECX and jumps to the address specified by arg unless decrementing ECX causes its value to become zero. So, the loop will continue to run until the value of ECX becomes zero. Let us examine the following program.
global _start
_start:
mov eax, 0
mov ecx, 5
_addtoeax:
inc eax
loop _addtoeaxThe preceding program has two registers eax, ecx with the values 0 and 5 respectively. When the control first goes to _addtoeax, the value of eax will be incremented and the loop _addtoeax instruction will be executed. When this instruction is executed, the value of ECX will be decremented by 1 and eax will be incremented once again. The loop will continue until ECX becomes 0. When ECX value becomes 1, EAX value becomes 5. So, when the loop instruction executes, ECX becomes 0 and the loop terminates there.
Intro to x86 Disassembly
Conclusion:
As discussed in this article, there are several different instructions exist in the x86 assembly instruction set to control the flow of a program. Depending on the requirement, we can choose to use these instructions appropriately.