Jim Calabro

About
RSS | Atom | JSON
---
LinkedIn
Chess
Mastodon
Last.fm

How DWARF Works: Go Code Reference

Sep 25, 2024
This is part of the series on DWARF.

This page contains Go helper snippets that illustrate how parsing works, but are used across many sections. These snippets often intentionally ignore error handling in favor of brevity.

LEB128

A key integer type that is used all over the place in DWARF is signed/unsigned LEB128. It's a variable-width signed/unsigned integer type, though in practice, I've never seen anything use more than 64 bits, but it's perfectly reasonable to assume that's not always the case.

I wrote a simple library to parse these a couple years ago, and I'm going to be using it throughout the series. You can import it just like you would any other Go library, or read its implementation and write your own. Zig also has a good implementation for reference.

Initial Length

An initial length field in DWARF is a special integer encoding that takes either 4 or 12 bytes and allows you to encode either a 4 or 8 byte integer. To parse one, you read a 4-byte integer, and if it's any value other than 0xffffffff, you use it as-is. If it's that special value, you read another eight bytes, discard the initial four bytes you read, and just use the final eight bytes.

func readInitialLength(reader *BinaryReader) uintptr {
    val32, _ := Read[uint32](reader)
    if val32 != 0xffffffff {
        return uintptr(val32)
    }

    val64, _ := Read[uint64](reader)
    return uintptr(val64)
}

Binary Reader

Go has an encoding/binary package, but it doesn't allow for reading arbitrary integers, nor does it allow us to know how far along the buffer we've read (our offset). We can wrap it a bit more nicely and ensure we're always reading in the correcty byte order using our own BinaryReader:

type Signed interface {
	~int | ~int8 | ~int16 | ~int32 | ~int64
}

type Unsigned interface {
	~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 | ~uintptr
}

type Integer interface {
	Signed | Unsigned
}

type BinaryReader struct {
    *bytes.Buffer

    startLen int
}

// Returns how many bytes we've read since the start of the buffer
func (br *BinaryReader) Offset() int {
    return br.startLen - br.Len()
}

func NewBinaryReader(r *bytes.Buffer) *BinaryReader {
    return &BinaryReader{Buffer: r, startLen: r.Len()}
}

func Read[T Integer](br *BinaryReader) (T, error) {
    empty := *new(T)
    size := unsafe.Sizeof(empty)
    enc := binary.NativeEndian

    // read N bytes br the reader
    buf := make([]byte, size)
    err := binary.Read(br, enc, buf)
    if err != nil {
        return empty, err
    }

    // convert to the appropriate type
    val := *new(T)
    switch any(val).(type) {
    case int8:
        val = T(int8(buf[0]))
    case uint8:
        val = T(uint8(buf[0]))

    case int16:
        val = T(int16(enc.Uint16(buf)))
    case uint16:
        val = T(enc.Uint16(buf))

    case int32:
        val = T(int32(enc.Uint32(buf)))
    case uint32:
        val = T(enc.Uint32(buf))

    case int, int64:
        val = T(int64(enc.Uint64(buf)))
    case uint, uint64:
        val = T(enc.Uint64(buf))
    case uintptr:
        if size == 4 {
            val = T(uintptr(enc.Uint32(buf)))
        } else if size == 8 {
            val = T(uintptr(enc.Uint64(buf)))
        } else {
            return empty, fmt.Errorf("word size %d not supported", size)
        }

    default:
        return empty, fmt.Errorf("unknown data type during load: %T", val)
    }

    return val, nil
}