ELF File Format
Executable and Linking Format (ELF) is the object format used in UNIX-like operating systems. This post introduces the ELF file format in the big picture of Linux.
We will use the readelf tool from Binutils to view the content of the ELF file of a C program as below.
What should you learn next?
[bash]
#include <stdio.h>
int main(int argc, char **argv)
{
printf("HELLO WORLD!n");
}
Save the code in a file named test.c, and then collect/make the source code into a *.o file and an executable with the commands below.
[plain]
$_$ gcc test.c -c
$_$ gcc test.o -o test
The first command should produce a file named test.o, and the second command will output a file test.
File Types
Below are the four main types of ELF files
Relocatable files: created by the collector/maker and put together/group together, usually ends with .o extension, and to be processed by the linker to produce the executable or library files.
Executable files: created by linker with all relocation done and all symbols resolved except for shared library symbols to be settled at run time. It specifies how exec creates a program's process image.
Shared object files: created by the linker, contains the symbol information and runnable code needed by the linker. It can be used by the linker to create another object file or used along with other shared objects and executable files by the energetic/changing linker to create a process image.
Core file: a core dump file.
We can check if a file is ELF file by the file command as shown below.
$_$ file test
The file command only gives some brief information about the two files. We can use the readelf tool to get more info.
$_$ readelf -h test
The -h option means to display the ELF file header. The Magic number is used to point to/show the file is ELF file. From the Type field, we can easily tell which ELF file type it belongs to.
ElF header has the following C like structure:
ELF sections
Elf section header consists of 0 or many section table that tells the linked how and where the section should be loaded. It can be best understood using the -S section flag. Let's understand all the sections in our binary using command readelf
So, looking through its output, one can actually structure through the ELF file, with addresses and offsets.
As one can observe from the output, all sections have a name and a type. Each type has a meaning; the important ones are as follows
- PROGBITS : This section holds data related to the program. Examples would be sections like .text, .data, etc.
- SYMTAB : It holds the symbol table. Just as an exercise,
- REL : It is in this section it holds the relocation entries.
- NOBITS : This section is empty and holds no data.
- STRTAB : This section would hold the string table.
- DYNAMIC : This Section holds details regarding linking with shared libraries
- NULL : Its an inactive one and connected to no section.
Dynamic Linking of ELF files: You can link any other so or shared objects with your ELF file . For example, the printf function would be linked using libc.so.x or depending upon your system libc version. This all information is stored in dynamic section. The conception of a shared library is that you would somehow take the contents of the static library (not literally the contents), and pre-link it into some kind of special ELF. When you link your program against the shared library, the linker only makes note of the fact that you are calling a function in a shared library, so it does not extract any executable code from the shared library. Instead, the linker integrates function dictations to the executable, which tells the startup code in your executable that some shared libraries are additionally needed, so when you run your program, the kernel does by inserting the executable into your address space, but once your program starts up, all of these shared libraries are additionally integrated to your address space.
Using the program ldd we can print out how many additional libraries it uses
ldd a.out
linux-gate.so.1 => (0xb77be000)
libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xb75f3000)
Using nm we can iterate over all the symbols used by the programs (static or imported)
it is stored in the following format inside an elf file
[plain]
typedef struct {
Elf32_Sword d_tag;
union {
Elf32_Word d_val;
Elf32_Addr d_ptr;
} d_un;
} Elf32_Dyn;
typedef struct {
Elf64_Sxword d_tag;
union {
Elf64_Xword d_val;
Elf64_Addr d_ptr;
} d_un;
} Elf64_Dyn;
extern Elf64_Dyn _DYNAMIC[];
For scripting purposes, we can use pylibelf, which can be used to parse elf files.
[plain]
import pylibelf
>>> ELF = pylibelf.ELF("/bin/bash")
--> Building ELF from file.
--> File is ELF64.
--> Data: LSB.
>>> dir(ELF)
>>> for i in ELF.ShdrTable:
... print i.sectionName
FREE role-guided training plans
.interp
.note.ABI-tag
.note.gnu.build-id
.gnu.hash
.dynsym
.dynstr
.gnu.version
.gnu.version_r
.rela.dyn
.rela.plt
.init
.plt
.text
.fini
.rodata
.eh_frame_hdr
.eh_frame
.init_array
.fini_array
.jcr
.dynamic
.got
.got.plt
.data
.bss
.gnu_debuglink
.shstrtab
>>> dir(ELF.getSectionByName(".text"))
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__len__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_fields', 'getFields', 'getType', 'parse', 'sectionName', 'sectionRawData', 'sh_addr', 'sh_addralign', 'sh_entsize', 'sh_flags', 'sh_info', 'sh_link', 'sh_name', 'sh_offset', 'sh_size', 'sh_type', 'shouldPack', 'sizeof']