How to build a program and execute an application entirely built in x86 assembly
In the previous article, we discussed an overview of common x86 instructions which can be used in writing assembly programs. With the knowledge we have gained so far, we are in a good position to begin writing our first assembly program.
This article provides an overview of how to build a program and execute an application entirely built in x86 assembly.
Intro to x86 Disassembly
Input and output: x86 system calls
Operating systems contain routines to perform various low-level operations. If we want to invoke these operating system routines from our program, we need to invoke system calls. A system call is a bridge between the user program and the operating system routine. If we want to write a string to the output console, instead of writing the routine from scratch every time, we can make use of a routine that already exists in the operating system. This can be achieved using a system call.
According to Wikipedia, "A system call is how a program requests a service from an operating system's kernel. This may include hardware-related services (e.g., accessing the hard disk), creating and executing new processes, and communicating with integral kernel services (like scheduling). System calls provide an essential interface between a process and the operating system."
On a Ubuntu 20.04 Desktop x64 build, we can view the following file to view the full list of x86 system calls and their associated system call numbers available.
Now, let us extract some system calls and their system call numbers associated with them.
The following excerpt shows the READ system call, which can be used to get user input.
#define __NR_read 3
#define __NR_readlink 85
#define __NR_readdir 89
#define __NR_readv 145
#define __NR_pread64 180
#define __NR_readahead 225
#define __NR_set_thread_area 243
#define __NR_get_thread_area 244
#define __NR_readlinkat 305
#define __NR_preadv 333
#define __NR_process_vm_readv 347
#define __NR_preadv2 378As highlighted, the READ system call has the system call number 3. Similarly, the following excerpt shows the WRITE system call, which can be used to write output to the console.
#define __NR_write 4
#define __NR_writev 146
#define __NR_pwrite64 181
#define __NR_pwritev 334
#define __NR_process_vm_writev 348
#define __NR_pwritev2 379As highlighted, the write system call has the system call number 4. When we want to use these system calls in our x86 assembly programs, we should use their respective numbers.
Similarly, the following excerpt shows the exit system call.
#define __NR_exit 1
#define __NR_exit_group 252As we can see, the exit system call has the syscall number 1.
Hello World! Creating the usual Hello World in x86
Now that we understand what system calls are and how the system call numbers can be found let us write a simple program in x86 assembly to print the string Hello World!
It should be noted that we will use the write syscall to print the string Hello World! To better understand the arguments and other data this syscall requires, we can read the man page as shown in the following command.
Following is an excerpt taken out from the output of the preceding command.
The write function requires 3 arguments. The first argument is the file descriptor, which is stdout in this case and takes the value 1. The second argument is the constant buffer, which is a pointer to the message we want to print. The third argument is the length of the string. When invoking system calls, we will need to provide these values in appropriate registers. Following is the standard pattern we need to follow when writing assembly programs in x86.
The first argument goes into EBX; the second argument goes into ECX; the third argument goes into the EDX register. The syscall number goes into the EAX register.
With all these details considered, the following is the program that prints the string Hello, world! to the output console.
global _start
_start:
mov ebx,1
mov ecx,msg
mov edx,len
mov eax,4
int 0x80
section .rodata
msg db 'Hello, world!',0xa
len equ $ - msgThe program has two sections: .text and .rodata. The .rodata section has a string defined using the label msg. The write routine also requires the length of the string, and thus we are computing the length of the string without hardcoding it and saving it in the label len.
In the .text section, we used a global directive _start to specify the entry point of the program. Within the entry point, we are placing the value 1 for stdout into EBX, a pointer to the string is being placed into ECX, and the length of the string is placed in EDX. Lastly, we placed the syscall number 4 into the EAX register. To invoke the syscall, we executed the instruction int 0x80.
We can then assemble and link the program using the following commands:
Once done, we can run the program as shown below.
Hello, world!
Segmentation fault (core dumped)
$As we can notice in the preceding output, the string Hello, world! is printed. However, we should also notice the segmentation fault. We will discuss more about segmentation faults and how to identify the reasons for them in a later article.
Strings/ASCII: How to work with strings and ASCII in x86
In this section, let us extend our previous Hello, world! program to ask for input from the user and then print the entered text back to the screen.
Following is the program written in x86 assembly to achieve this.
question db "What is your name? "
greeting db "Hello, "
section .bss
input resb 24
section .text
global _start
_start:
call _printQuestion
call _getInput
call _printGreeting
call _printInput
_getInput:
mov eax, 3
mov ebx, 0
mov ecx, input
mov edx, 24
int 0x80
ret
_printQuestion:
mov eax, 4
mov ebx, 1
mov ecx, question
mov edx, 19
int 0x80
ret
_printGreeting:
mov eax, 4
mov ebx, 1
mov ecx, greeting
mov edx, 7
int 0x80
ret
_printInput:
mov eax, 4
mov ebx, 1
mov ecx, input
mov edx, 24
int 0x80
retFirst, we defined the labels question and greeting with the strings that we want to print on the screen within the .data section, as shown below.
section .data
question db "What is your name? "
greeting db "Hello, "
Next, the .bss section is used to reserve 24 bytes as shown below.
section .bss
input resb 24
Note that the .bss (block starting symbol) is the portion of an object file, executable, or assembly language code that contains statically allocated variables that are declared but have not been assigned a value yet.
Next is the .text section with code to read input and write the output. As we can notice, this section has 4 subroutines as shown below.
section .text
global _start
_start:
call _printQuestion
call _getInput
call _printGreeting
call _printInput
All the subroutines except for _getInput are similar to the Hello, world! program we wrote earlier as they are just used to write output to the screen. So, let us focus on _getInput subroutine in this section. Following is the assembly code written.
_getInput:
mov eax, 3
mov ebx, 0
mov ecx, input
mov edx, 24
int 0x80
ret
As we can notice, EAX register holds the value 3, which is the syscall number for read. The registers EBX, ECX and EDX hold the arguments required for the read system call. EBX holds 0, which is for stdin. ECX is a pointer to the label that contains the user-supplied input. EDX contains the length, which is 24 in this case. Finally, we used int 0x80 to invoke the read syscall. After the syscall is executed, ret instruction is executed so the control is passed to the next call instruction in the .text section.
The following commands can be used to assemble and link this program.
Once the executable file is produced, we can run it and the output looks as follows.
What is your name? infosec
Hello, infosec
Segmentation fault (core dumped)
$Once again, there is a segmentation fault after the program completed its execution. We will discuss the reasons and the solutions in a later article.
Intro to x86 Disassembly
Conclusion
This article has provided foundational knowledge of how to build a program and execute an application entirely built in x86 assembly. This process has covered some other concepts such as system calls, handling strings, declaring data, reading data from the user input and writing data to the terminal. These concepts are the fundamental building blocks of writing x86 assembly programs, and they will come in handy when writing complex x86 assembly programs.
See the next article in the series, Debugging your first x86 program.
Sources:
- Assembly Language for x86 Processors, Kip Irvine
- Modern X86 Assembly Language Programming, Daniel Kusswurm
- Linux Assembly Language Programming, Bob Neveln