Executable and Linkable Format (ELF) is a file format for executables, object files, shared libraries, and more that's used on various Unix-like systems. If you've ever downloaded and run a program on Linux, you're using an ELF executable. It's akin to an .exe
file on Windows.
DWARF is a debugging information format that is used with ELF files. Debug information allows you to do neat things with a running program such as:
In this series, we'll go in to a lot of detail on topics such as these. Let's take it from the top: parsing ELF files, which contain DWARF debug information.
Let's parse some files! In order to do so, we'll need a program to play around with. Throughout the rest of this series, I'm going to use this dead-simple C program called cloop
that gets its own process ID, then loops forever and prints it once per second. C is a good choice because it is simple, has no runtime, and it's well-supported by all the tools we'll be using. Here's the full program:
#include <unistd.h>
#include <stdio.h>
int main() {
pid_t pid = getpid();
unsigned long long ndx = 0;
while (1) {
printf("c looping (pid %d): %llu\n", pid, ndx);
fflush(stdout);
ndx++;
sleep(1);
}
return 0;
}
I'll be compiling this with gcc 14.2.1 on Manjaro Linux with kernel version 6.9.12 using this build.sh
script, but feel free to play around with CC
and DWARF
as we go (it defaults to DWARF version 5):
#!/usr/bin/env bash
${CC:-gcc} -Wall -Wextra -Werror -no-pie -O0 -g -gdwarf-${DWARF:-5} -o cloop main.c
Additionally, for this series, I'll give some short code examples of Go code to help illustrate various concepts. I chose Go because it's popular, terse, simple to read, and has a large standard library to help us out. I'll intentionally omit error handling and not worry about writing effecient code to keep the examples short. I'm using Go version 1.22.7.
There's a lot of data contained within ELF files, but for our needs it's pretty straightforward, and we can ignore most of it. We just want to grab the raw binary of each debug info section as well as a couple facts about the executable.
Each binary file starts with the ELF header, followed by various "sections", each of which is just a region of the file that has a distinct job. For instance, the program text (machine code) of your executable or object is in the .text
section.
We first want to open and read the contents of the binary file. It starts with 16 bytes of "ELF Identifier" header data. The first four of those bytes are the magic number 0x7f followed by 0x45, 0x4c, 0x46, or ELF in ASCII. So using our BinaryReader, we'd do:
fileBuf, _ := os.ReadFile(filePath)
reader := NewBinaryReader(bytes.NewBuffer(fileBuf), binary.NativeEndian)
magic := []byte{0x7f, 'E', 'L', 'F'}
magicBuf := make([]byte, len(magic))
reader.Read(magicBuf)
if !slices.Equal(magic, magicBuf) {
panic("incorrect ELF magic number")
}
Next comes a the e_ident
header section, which contains several one-byte flags, each prefixed with EI_
, then some padding, which you should skip over. They are, in order:
EI_CLASS
: address size (1 for a 32-bit binary, 2 for a 64-bit binary)EI_DATA
: byte order (1 for 2's compliment little-endian, 2 for 2's compliment big-endian)EI_VERSION
: file format version (should always be 1 as of time of writing)EI_OSABI
: operating system and ABI (see Go's implementation for a list of values)EI_ABIVERSION
: often ignored on LinuxNext up is the rest of the ELF file headers, again in order. Refer to the documentation or a robust implementation such as Go or Zig for more information on each field and their values.
e_type, uint16
: file typee_machine, uint16
: machine typee_version, uint32
: file format versione_entry, uintptr
: virtual address at which the start of the program residese_phoff, uintptr
: byte offset from the start of the file at which the program header table is locatede_shoff, uintptr
: byte offset from the start of the file at which the section header table is locatede_flags, uint32
: processor-specific flagse_ehsize, uint16
: the number of bytes in this ELF headere_phentsize, uint16
: the number of bytes in one entry in the program header table (all entries are the same size)e_phnum, uint16
: the number of entries in the program header tablee_shentsize, uint16
: the number of bytes in one entry in the section header table (all entries are the same size)e_shnum, uint16
: the number of entries in the section header tablee_shstrndx, uint16
: the section header table index of the entry associated with the section name string table
It's giving us a few facts about the binary, then a list of offsets from the start of the file that indicate where each secion is located (everything that starts with sh
). We'll use these fields to look up the section header table, read each entry in the table, and use those entries to find the debug sections we care about.
Note that in Go, uintptr
is the built-in data type for an int of your machine's address size, meaning 4 bytes on 32-bit systems, and 8 bytes on 64-bit systems.
Also, In digging through the docs, you may have noticed some values such as LOPROC = 0xff00; HIPROC = 0xffff;
. Both ELF and DWARF commonly reserve large ranges of high values for each processor, programming language, OS, etc. to define their own custom values for various enums. We won't be using these too much, but be aware that GNU, Go, Zig, and others commonly make use of these. You'll be able to get more information on each by reading through various compilers.
Next up, we need to parse each section header contained within the file. The "table" is just a fancy word for "an array of section header entries". So once we're done, we'll have a list of where all sections start and end within the binary, the name of each section, and some other data.
The section header table starts at the e_shoff
'th byte in the file, and is e_shentsize * e_shnum
bytes long.
The fields of each section header are:
sh_name, uint32
: name of the section as an index in to the string tablesh_type, uint32
: section type enumsh_flags, uintptr
: flags for this sectionsh_addr, uintptr
: the address at which this section should reside within the address space of the process, if relevantsh_offset, uintptr
: offset from the first byte of the ELF file to where the start of this section residessh_size, uintptr
: the number of bytes in the sectionsh_link, uint32
: enum indicating the linkage of this sectionsh_info, uint32
: enum indicating extra information about this sectionsh_addralign, uintptr
: contraints on the alignment of addresses on the target platform (0 and 1 mean no constraints)sh_entsize, uintptr
: if the section contains a table of fixed-size elements (i.e. a symbol table), this is the size of each element
Read e_shnum
entries, which should be exactly enough bytes. To give an example of how this might look in code, consider:
type ELFSectionHeader struct {
sh_name uint32
sh_type uint32
sh_flags uintptr
sh_addr uintptr
sh_offset uintptr
sh_size uintptr
sh_link uint32
sh_info uint32
sh_addralign uintptr
sh_entsize uintptr
// this is not part of the standard, but we'll
// look up and store the name on this struct later
name string
}
sectionHeaderTable := fileBuf[shOff : shOff+uintptr(shentSize*shNum)]
sectionHeaderTableReader := NewBinaryReader(
bytes.NewBuffer(sectionHeaderTable),
binary.NativeEndian,
)
sectionHeaders := []*ELFSectionHeader{}
for ndx := 0; ndx < int(shNum); ndx++ {
header := &ELFSectionHeader{}
header.sh_name, _ = Read[uint32](sectionHeaderTableReader)
header.sh_type, _ = Read[uint32](sectionHeaderTableReader)
header.sh_flags, _ = Read[uintptr](sectionHeaderTableReader)
header.sh_addr, _ = Read[uintptr](sectionHeaderTableReader)
header.sh_offset, _ = Read[uintptr](sectionHeaderTableReader)
header.sh_size, _ = Read[uintptr](sectionHeaderTableReader)
header.sh_link, _ = Read[uint32](sectionHeaderTableReader)
header.sh_info, _ = Read[uint32](sectionHeaderTableReader)
header.sh_addralign, _ = Read[uintptr](sectionHeaderTableReader)
header.sh_entsize, _ = Read[uintptr](sectionHeaderTableReader)
sectionHeaders = append(sectionHeaders, header)
}
Once we have all this information, we're going to want to use the sh_name
field to look up our section name in the section header string table. This is the ELF section with index e_shstrndx
, named .shstrtab
. In my case with the test C program, it's the 35th section, though yours may be different.
This table is a series of null-terminated strings all next to each other in one long array. You can read the entire table in to an array, then use the sh_name
field to find the entry at that index.
I'll use the sh_size
and sh_offset
fields of the e_shstrndx
'th entry to find our location within the binary:
sectionNames := sectionHeaders[shStrTabNdx]
start := sectionNames.sh_offset
end := start + sectionNames.sh_size
sectionNamesBuf := fileBuf[start:end]
for _, header := range sectionHeaders {
for ndx := header.sh_name; ; ndx++ {
ch := sectionNamesBuf[ndx]
if ch == 0 {
break
}
header.name += string(ch)
}
}
Now we're able to look up each debug information section by name!
There's a fair number of sections in there! You may recognize some of them, but for the most part, we care about the ones that start with .debug_
, though we also care about .eh_frame
. If you want to check your work, you can with readelf --headers cloop
. We'll get in to what each of these sections mean over time.
There are actually a few sections that are missing from the binary on my machine that we also would want to save if they were present (there are some sections that were present in older versions of DWARF for instance, but were dropped when v5 was released). We'll want to take the content of each one of those sections and save them for parsing later:
type DWARFSections struct {
abbrev []byte
line []byte
info []byte
addr []byte
aranges []byte
frame []byte
eh_frame []byte
line_str []byte
loc []byte
loclists []byte
names []byte
macinfo []byte
macro []byte
pubnames []byte
pubtypes []byte
ranges []byte
rnglists []byte
str []byte
str_offsets []byte
types []byte
}
getSection := func(header *ELFSectionHeader) []byte {
start := header.sh_offset
end := header.sh_offset + header.sh_size
return fileBuf[start:end]
}
sections := &DWARFSections{}
for _, header := range sectionHeaders {
switch header.name {
case ".debug_abbrev":
sections.abbrev = getSection(header)
case ".debug_line":
sections.line = getSection(header)
case ".debug_info":
sections.info = getSection(header)
case ".debug_addr":
sections.addr = getSection(header)
case ".debug_aranges":
sections.aranges = getSection(header)
case ".debug_frame":
sections.frame = getSection(header)
case ".eh_frame":
sections.eh_frame = getSection(header)
case ".debug_line_str":
sections.line_str = getSection(header)
case ".debug_loc":
sections.loc = getSection(header)
case ".debug_loclists":
sections.loclists = getSection(header)
case ".debug_names":
sections.names = getSection(header)
case ".debug_macinfo":
sections.macinfo = getSection(header)
case ".debug_macro":
sections.macro = getSection(header)
case ".debug_pubnames":
sections.pubnames = getSection(header)
case ".debug_pubtypes":
sections.pubtypes = getSection(header)
case ".debug_ranges":
sections.ranges = getSection(header)
case ".debug_rnglists":
sections.rnglists = getSection(header)
case ".debug_str":
sections.str = getSection(header)
case ".debug_str_offsets":
sections.str_offsets = getSection(header)
case ".debug_types":
sections.types = getSection(header)
}
}
Now we're almost ready to start parsing those debug info sections in to something that allows us to inspect a running program!
The last thing we'll want to do with our ELF file for now is examine it to determine if it is a position independent executable (PIE, also known as position independent code or PIC). PIE means that the code can be loaded and executed at any address in the process' memory space, and is the opposite of aboslute code, which must be loaded at a fixed address in memory. You can enable PIE with the -fPIC
compiler flag in gcc and clang. It ultimately doesn't restrict our capabilities as a debugger at all, it just means that we need to look up where in the process' address space our code is loaded when we start the program (we'll do that much later).
For now, we can determine if we're PIC based on the value of the FLAGS_1
field in the .dynamic
section like so:
var dynamicHeader *ELFSectionHeader
for _, header := range sectionHeaders {
if header.name == ".dynamic" {
dynamicHeader = header
break
}
}
pie := false
dynamicBuf := getSection(dynamicHeader)
dynamicReader := NewBinaryReader(bytes.NewBuffer(dynamicBuf), binary.NativeEndian)
for {
tag, _ := Read[uintptr](dynamicReader)
val, err := Read[uintptr](dynamicReader)
if tag == 0x6fff_fffb { // DT_FLAGS_1
if (val & 0x0800_0000) > 0 { // DF_1_PIE
pie = true
break
}
}
if err == io.EOF {
break
}
}
You can check your work on this using readelf --dynamic cloop
. You may want to try compiling cloop with -fPIE
and without -no-pie
and re-running your parser to make sure things are looking good.
That's it for today! We learned what ELF and DWARF are as well as how to parse just enough ELF to get the debug information sections we care about. ELF is probably the easiest section in this series, so strap in.