Sunday, November 22, 2009

Notes on ELF executables and dynamic linking.

Getting a good grasp of the ELF file format IMO is crucial to getting a good understanding of dynamic linking mechanisms.
This article explains How dynamic linking works, and features in ELF that support this.

The wikipedia image of an ELF file:
An ELF file mainly contains of 3 types of headers.
1. the ELF header - points to other headers, their sizes and locations.
2. the program header.
3. the section headers.

The ELF header:
This can be seen by executing "readelf -h obj.o". The ELF header contains the architecture type, the offset from which section/program headers begin, the size of a section/program header and related.
There is a special table called the section header string table which is described later, its index (shstrndx) is also stored in the ELF header.

The section header:
- Each section header has a fixed length (shsize) (as specified in the ELF header).
- All section headers start at a particular offset (shoff) in the file (as mentioned in the ELF header).
- Each section header has its own index. Starting at the above offset in the file, count section header length bytes at a time, you can jump from one section header to the next, The number of times jumped so far will be the index of the section header.
- The names of each section is not stored in the section header but in a special section called the ".shstrtab" section. The index of the section's header is stored in the ELF header itself (shstrndx).
You can dump all the section headers using readelf:
readelf -S myobj.o
(Skip this para if you're not interested in how the section headers are retrieved). The way this works is, it goes to offset of the start of section headers within the file. Goes straight to the shstrtab section (offset + shstrndx * shsize), fetches the names of all the sections. Once it gets the names, it goes back to the offset of the start of section headers within the file and starts parsing all the section headers one by one.

Each section is represented by its index as you can see:
readelf -S /bin/cat
There are 27 section headers, starting at offset 0x71bc:

Section Headers:
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
[ 0] NULL 00000000 000000 000000 00 0 0 0
[ 1] .interp PROGBITS 08048134 000134 000013 00 A 0 0 1
[ 2] .note.ABI-tag NOTE 08048148 000148 000020 00 A 0 0 4
[ 3] .hash HASH 08048168 000168 000158 04 A 5 0 4
[ 4] .gnu.hash GNU_HASH 080482c0 0002c0 000030 04 A 5 0 4
[ 5] .dynsym DYNSYM 080482f0 0002f0 0002f0 10 A 6 1 4
[ 6] .dynstr STRTAB 080485e0 0005e0 000202 00 A 0 0 1
[ 7] .gnu.version VERSYM 080487e2 0007e2 00005e 02 A 5 0 2
[ 8] .gnu.version_r VERNEED 08048840 000840 000080 00 A 6 1 4
[ 9] .rel.dyn REL 080488c0 0008c0 000020 08 A 5 0 4
[10] .rel.plt REL 080488e0 0008e0 000150 08 A 5 12 4
[11] .init PROGBITS 08048a30 000a30 000030 00 AX 0 0 4
[12] .plt PROGBITS 08048a60 000a60 0002b0 04 AX 0 0 4
[13] .text PROGBITS 08048d10 000d10 004cbc 00 AX 0 0 16
[14] .fini PROGBITS 0804d9cc 0059cc 00001c 00 AX 0 0 4
[15] .rodata PROGBITS 0804da00 005a00 000dfc 00 A 0 0 32
[16] .eh_frame PROGBITS 0804e7fc 0067fc 000004 00 A 0 0 4
[17] .ctors PROGBITS 0804ff08 006f08 000008 00 WA 0 0 4
[18] .dtors PROGBITS 0804ff10 006f10 000008 00 WA 0 0 4
[19] .jcr PROGBITS 0804ff18 006f18 000004 00 WA 0 0 4
[20] .dynamic DYNAMIC 0804ff1c 006f1c 0000d0 08 WA 6 0 4
[21] .got PROGBITS 0804ffec 006fec 000008 04 WA 0 0 4
[22] .got.plt PROGBITS 0804fff4 006ff4 0000b4 04 WA 0 0 4
[23] .data PROGBITS 080500a8 0070a8 000038 00 WA 0 0 4
[24] .bss NOBITS 080500e0 0070e0 000184 00 WA 0 0 32
[25] .gnu_debuglink PROGBITS 00000000 0070e0 000008 00 0 0 1
[26] .shstrtab STRTAB 00000000 0070e8 0000d1 00 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings)
I (info), L (link order), G (group), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
The third column is where in virtual memory to load the section.
The fourth column is the offset of the start of the section within the ELF file.
The fifth column is the size of the section.
Lets describe the sections necessary for dynamic linking briefly, and then lets join everything together.

The .plt section
A quick disassembly of the .plt section of the 'func' program in my last post reveals:
joel@joel-laptop:~/repository/asm$ objdump -d -x -M intel func
Disassembly of section .plt:

080482c8 <__gmon_start__@plt-0x10>:
80482c8: ff 35 f8 9f 04 08 push DWORD PTR ds:0x8049ff8
80482ce: ff 25 fc 9f 04 08 jmp DWORD PTR ds:0x8049ffc
80482d4: 00 00 add BYTE PTR [eax],al
...

080482d8 <__gmon_start__@plt>:
80482d8: ff 25 00 a0 04 08 jmp DWORD PTR ds:0x804a000
80482de: 68 00 00 00 00 push 0x0
80482e3: e9 e0 ff ff ff jmp 80482c8 <_init+0x30>

080482e8 <__libc_start_main@plt>:
80482e8: ff 25 04 a0 04 08 jmp DWORD PTR ds:0x804a004
80482ee: 68 08 00 00 00 push 0x8
80482f3: e9 d0 ff ff ff jmp 80482c8 <_init+0x30>

080482f8 <printf@plt>:
80482f8: ff 25 08 a0 04 08 jmp DWORD PTR ds:0x804a008
80482fe: 68 10 00 00 00 push 0x10
8048303: e9 c0 ff ff ff jmp 80482c8 <_init+0x30>
This is called the procedure linkage table (also called the jump table) which consists of stub functions inplace of shared library functions (which are dynamically linked at run time and are not present in our ELF at compile time, obviously). Notice the function printf which is simply a stub function that jumps to another address. As the printf function is present in the standard C library, our ELF file only contains a stub. The memory addresses that the plt functions inturn jumps to is retrieved from the GOT (global offset table) which is described shortly. For eg. In the above assembly code, 0x804a008 contains the actually address of the function printf.

The Got section:
joel@joel-laptop:~/repository/asm$ readelf -S func|grep .got
[21] .got PROGBITS 08049ff0 000ff0 000004 04 WA 0 0 4
[22] .got.plt PROGBITS 08049ff4 000ff4 000018 04 WA 0 0 4
The global offset tables are a set of memory locations that contain addresses of functions and variables of loaded shared libraries. Their values are patched by the dynamic loader/linker when the application is first loaded. If you observe, the plt functions actually jump to these addresses.

The disassembly of the plt code for printf:
080482f8 <printf@plt>:
80482f8: ff 25 08 a0 04 08 jmp DWORD PTR ds:0x804a008
80482fe: 68 10 00 00 00 push 0x10
8048303: e9 c0 ff ff ff jmp 80482c8 <_init+0x30>
Now if you go back to the got table section header above, notice that 804a008 is actually a GOT table entry's virtual memory address.

The relocation section:
joel@joel-laptop:~/repository/asm$ readelf -r func

Relocation section '.rel.dyn' at offset 0x278 contains 1 entries:
Offset Info Type Sym.Value Sym. Name
08049ff0 00000106 R_386_GLOB_DAT 00000000 __gmon_start__

Relocation section '.rel.plt' at offset 0x280 contains 3 entries:
Offset Info Type Sym.Value Sym. Name
0804a000 00000107 R_386_JUMP_SLOT 00000000 __gmon_start__
0804a004 00000207 R_386_JUMP_SLOT 00000000 __libc_start_main
0804a008 00000307 R_386_JUMP_SLOT 00000000 printf
The dynamic loader/linker goes through this table to see which addresses have to patched for which symbols. For example the address of the symbol printf can be found in 08004a008 in the GOT table. So the linker goes to the GOT tables, and places the virtual address of the symbol printf in 0804a008 . It gets this address from the loaded shared library's symbol table.

The full picture of dynamic linking and function calling:



Closing note
By the way, shared libraries have to be compiled with position independent code (gcc -fPIC option) to make it possible for the loader to load them anywhere in virtual memory. During the days of the a.out file format, the addresses were fixed and conflicts could arise! The ELF file format is much more flexible in this regard and has better support for shared libraries and dynamic linking, linux was quick to adopt it early on.

0 comments: