Making a low level (Linux) debugger part 2: C

Last time, we started making a debugger and live editor for (re)creating assembly and C programs.

We got all the assembly parts: read/write registers and memory, single step, single instruction execution, function calls (although not perfect), set/restore breakpoints, memory allocation and examining upcoming instructions.

This post will try to do similar things in the C portion. Next time, we'll try using it to make something.

Modifying C variables

Since we know the location of C variables from binary header (extracted through our variables function), we can just read and write those location in memory directly. It helps to know the variable's type so we know how many bytes to read and write.

For example, if we had ints, we could define

def read_int(var_name):
    addr = start + c_variables[var_name]
    type_, size = 'h', 2
    return struct.unpack(type_, process.readBytes(addr, size))[0]

and

def write_int(var_name, value):
    addr = start + c_variables[var_name]
    type_, size = 'h', 2
    process.writeBytes(addr, struct.pack(type_, value))

Arrays just take up contiguous regions of memory and so can be read in one go as bytes and then parsed.

Handling segfaults

We'll now setup some (unexpected) error handling. We could do this at any other time. But setting it up now will make troubleshooting any issues from later additions easier.

By default, a program that get a segmentation fault (segfault, signal SIGSEGV) just crashes and exits [1]. Instead, we setup a function to be called that print out some useful information on segfaults.

We essentially use this article here.

We'll create a function bt_sighandler which prints the backtrace and use the sigaction system call [2] to have it called on a segfault.

void register_signals(){
  struct sigaction sa;

  sa.sa_handler = (void *)bt_sighandler;
  sigemptyset(&sa.sa_mask);
  sa.sa_flags = SA_RESTART;

  sigaction(SIGSEGV, &sa, NULL);
}

Instead, we could have also used our debugger to print the backtrace.

Note that we need to let the process continue to run the body of bt_sighandler if its paused by our debugger.

Memory maps

Our next goal is to get a C REPL. For that, we'll first take a look at memory maps. While not strictly needed, it will give us a better idea of where things are now that they'll start being more scattered.

When reading and writing to memory from our debugger, we refer to absolute addresses. When loading things like libraries, variables and the a.out executable, each file is usually copied to some non-overlapping region.

The region is available in /proc/<pid>/maps in text form (where <pid> is the pid of the traced process). That why getting the start of the region for a.out [3] and adding it to the address read from the header for some function gave us the actual address in memory of the (beginning) of that function.

The file /proc/<pid>/maps is text formatted as follows [4].

address           perms offset  dev   inode   pathname
08048000-08056000 r-xp 00000000 03:0c 64593   /usr/sbin/gpm

We can parse this and put it in a python dict.

from collections import defaultdict

def memory_maps():
    mmap = defaultdict(dict)
    mmaped_counter = 0
    for line in open("/proc/%s/maps" % pid).readlines():
        if len(line.split()) == 5:
            line += "[mmaped-%s]" % mmaped_counter
            mmaped_counter += 1
        region, permissions, offset, dev, inode, filename = line.split()
        start, end = [int(x, 16) for x in region.split("-")]
        mmap[filename][permissions] = {"start": start, "end": end,
                                       "offset": int(offset, 16),
                                       "dev": dev, "inode": int(inode)}
    return mmap

The content of /proc/<pid>/maps changes as we allocate (or deallocate) memory.

For convenience, we'll also add a function for finding the region to which an address belongs to.

mmap = memory_maps()

def find_section(mmap, address):
    for filename in mmap:
        for perm, region in mmap[filename].items():
            if region['start'] <= address <= region['end']:
            return (filename, perm), address - region['start']

We can use it like this.

>>> find_section(mmap, process.getreg('rip'))
(('/usr/lib/libc-2.26.so', 'r-xp'), 946896L)

C REPL

Getting a C REPL is a bit trickier. We will use dynamic library loading (of libdl)to accomplish this. To run a line of C, we will

Create a temporary C source run_once.c,
compile it to create a shared object file (gcc -shared -x c -o run_once.so -fPIC run_once.c),
load it from the current running C program (dlopen("./run_once.so", RTLD_LAZY);).

Since we also created the ptraced C program, it is easiest to add a function there for use to call with the debugger later.

#include <dlfcn.h>

void* dlrun_once; // Will contain the dynamically loaded library

void reload_run_once(){
  if (dlrun_once != NULL) dlclose(dlrun_once);
  dlrun_once = dlopen("./run_once.so", RTLD_LAZY);
}

To be able to access variables, we will declare them as extern in the temporary C file. We will also include any other libraries needed (like <stdio.h> to use printf).

extern int foobar;

In our Python program, we'll put the entire temporary program in a string, create a subprocess to call gcc and pipe our program through stdin. The list c_globals will hold the list of variables to be externed that we want to access.

import shlex

c_globals = []

def load_lib_vars(filename):
    global lib_vars, lib_start
    line1 = [l for l in open("/proc/%s/maps" % pid).readlines()
             if l.endswith(filename + '\n')][0]

    lib_start = int(line1.split("-")[0], 16)
    lib_vars = elfreader.variables(filename)
    c_variables.update(lib_vars)

def run_c(c_lines):
    program = """
    #include <stdio.h>
    %s;
    void run_once(){
    %s
    }""" % (";\n".join(c_globals), c_lines.encode('string_escape'))
    command = "gcc -shared -x c -o run_once.so -fPIC -"
    gcc_proc = subprocess.Popen(shlex.split(command), stdin=subprocess.PIPE)
    out = gcc_proc.communicate(input=program)
    gcc_proc.stdin.close()
    if not gcc_proc.returncode: # No error
        c_func_call("reload_run_once")
        load_lib_vars('run_once.so')
        c_func_call("run_once", lib_start)

Try it out

>>> run_c('printf("Hello world\n");')
Hello world

(Don't forget that stdout is buffered in C so without the end of line \n, we would see no text appear.)

>>> c_globals.append("int foobar")
>>> run_c('printf("The value of foobar is %i\n", foobar);')
The value of foobar is 12

Note, if instead we want to call the dynamically loaded function from C instead of the debugger, we could store the function in a variable and call it.

typedef int (*func_ptr_t)(void); // Type should match the function. Here it has void input and int output
void* lib_func;
lib_func = dlsym(dltest, "some_func_name");
printf("lib_func output: %d\n", ((func_ptr_t)lib_func)());

Defining functions

To dynamically define functions, we can just add then to the source compiled to run_once.so. We could also create other shared libraries live and dynamically load them.

Eventually, these would be included in our main program and recompiled in.

Line numbers

By default the header of a.out does not include information about line numbers in the C source. But we can ask gcc to include them with the -gdwarf-2 flag. This adds the DWARF headers [3] to a.out (previously we only read the ELF headers).

I won't say more about the DWARF format because I don't know them. Basically, it looks like a list of things ("compilation units"), some of those things are XML-like documents ("debugging information entries") that may be nested and refer to each other. We're only interested in these things from the list for now.

For each file loaded into memory (obtained from memory_maps()), we'll see if it has DWARF headers and extract this list of XML-like documents. Store it all in all_dwarf_info.

from elftools.dwarf.descriptions import describe_form_class

all_dwarf_info = {}

def die_bounds(die):
    lowpc = die.attributes['DW_AT_low_pc'].value

    highpc_attr = die.attributes['DW_AT_high_pc']
    highpc_attr_class = describe_form_class(highpc_attr.form)
    highpc = highpc_attr.value if highpc_attr_class == 'address' else\
             highpc_attr.value + lowpc if highpc_attr_class == 'constant' else\
             Exception('Error: invalid DW_AT_high_pc class: %s' % highpc_attr_class)
    return lowpc, highpc

def load_dwarf_info(mmap):
    """ Load or reload all dwarf info from mmap. """
    for filename in mmap:
        if filename.startswith("["):
            continue
        elffile = ELFFile(open(filename, "rb"))
        if not elffile.has_dwarf_info():
            continue

        dwarfinfo = elffile.get_dwarf_info()
        # Information from Compilation Units (CUs)
        cus = []
        for cu in dwarfinfo.iter_CUs():
            lineprog = dwarfinfo.line_program_for_CU(cu)
            states = [entry.state for entry in lineprog.get_entries()
                      if entry.state and not entry.state.end_sequence]
            addresses = [state.address for state in states]
            dies = [{"entry": die,
                     "bounds": die_bounds(die),
                     "name": die.attributes['DW_AT_name'].value}
                    for die in cu.iter_DIEs()
                    if die.tag == 'DW_TAG_subprogram']
            cus.append({"lineprog": lineprog, "states": states,
                        "addresses": addresses, "entries": dies})
        all_dwarf_info[filename] = {"dwarfinfo": dwarfinfo, "units": cus}

And with this information, we can find the file name, function name and line number for any address. We just need to locate the address we are looking for between two addresses with known line numbers.

from bisect import bisect

def address_info(address):
    for filename, dwarfinfo in all_dwarf_info.items():
        for cu in dwarfinfo["units"]:
            index = bisect(cu["addresses"], address) - 1
            if -1 < index < len(cu["addresses"]) - 1:
                state = cu["states"][index]
                # Could probably bisect
                func_name = None
                for entry in cu["entries"]:
                    if entry["bounds"][0] <= address < entry["bounds"][1]:
                        func_name = entry["name"]
                        break
                return {"function": func_name,
                        "file": cu["lineprog"]['file_entry'][state.file - 1].name,
                        "line": state.line}

Lets put all these headers related helpers, including variables(), into a file, say elfreader.py.

Since all this information comes from headers, we still have to shift it by the start of the appropriate region.

>>> elfreader.load_dwarf_info(mmap)
>>> elfreader.address_info(process.getreg('rip') - start)
{'function': 'main', 'line': 79, 'file': 'sample2.c'}

Getting the stack

By convention, the stack is between the value of registers rbp (lower address) and rsp (higher address) and rsp increases when there are more stack frames added. We're on 64-bit Linux so each frame takes up 8 bytes.

Edit: Thanks to saagarjha for pointing out the previous paragraph is false. There are conventions but its not those and they aren't that simple [6].

This is the same stack for both assembly and C.

def get_stack():
    bottom = process.getreg('rsp')
    top = process.getreg('rbp')
    stack_bytes = process.readBytes(bottom, top - bottom)
    return [struct.unpack('l', stack_bytes[i*8: (i+1)*8])[0]
            for i in xrange(len(stack_bytes) / 8)]

Now we can combine this with elfreader.address_info to get the information of all stack frames.

Sometimes, things other than stack frames are put on the stack. We can try to exclude those with a heuristic and hope their value don't lie in a memory mapped region.

def line_numbers():
    elfreader.load_dwarf_info(mmap)
    lines = [elfreader.address_info(find_section(mmap, frame)[1])
             for frame in get_stack() + [process.getreg('rip')]
             if find_section(mmap, frame)]
    return [line for line in lines if line is not None]

Try it out.

>>> line_numbers()
[{'file': 'sample2.c', 'function': 'main', 'line': 71},
 {'file': 'sample2.c', 'function': 'main', 'line': 69}]

Undo

We'll use the fork system call to allow us to experiment with the live process being edited and undo changes. This won't be a "full" undo because external resources like file descriptors aren't restored.

fork() creates two process: a "parent" and a "child".

To save the state of the process, we create a fork and store the parent in a list of processes. (The stored copy is frozen by ptrace until needed.)

To load the state of the process, we create a fork of the state we want to revert to and replace the current process.

In our C file, add a helper function (this could have been dynamically loaded in or translated to assembly and executed line-by-line from the debugger).

int pid;

void make_fork(){
  pid = fork();
  if (pid == 0) {
    raise(SIGSTOP);}}

The forked parent receives the pid of the child as output of fork() and the child receives 0 as output. Here, we raise(SIGSTOP) in the child which halt execution and we'll send it SIGCONT with ptrace. There might be a better way to do this.

Now call make_fork from the debugger.

states = []

def save_state(skip_save=False):
    global child_pid, parent_process, child_process, process
    old_regs = process.getregs()
    c_func_call('make_fork')
    child_pid = read_int('pid')
    parent_process = process
    child_process = process = debugger.addProcess(child_pid, False)
    process.cont()
    process.waitSignals(signal.SIGSTOP)
    process.setregs(old_regs)
    if not skip_save:
        states.append(parent_process)

def load_state(state=None):
    global process
    state = states[-1] if state is None else state
    process.kill(9)
    process = state
    save_state(True)

Edit: Fixed a segfault bug in save_state: when called from load_state, the instruction pointer manually set by c_func_call wasn't restored.

Its a bit brutal to send SIGKILL (signal 9) but I haven't found a nicer way.

save_state and load_state don't always produce the expected result for the moment.

Waiting for a cleaner state

This is speculative. I think the reason for some of the failures with c_func_call (including when called to save and load state discussed just now) is because we are inside some function in libc in some intermediate state and calling another function from this state, which itself call libc functions causes the intermediate state to be used, resulting in unexpected behaviours and crashes.

I could try to learn more about this by stepping though the execution when I make a c_func_call but that might need knowning more libc internals and the final product might not use libc much or at all.

Child termination

Currently the child process of a fork when killed is a zombie. We could call wait() to fix this. However, we aren't using the parent-child relation of processes in its intended way and don't know whether children are alive or not. For example, restoring to state 3 from a fork of state 5 mean state 4, the child of state 3 is alive. Restoring to state 5 from a fork of state 5 on the other hand means all children of state 5 are terminated.

So instead, we will again use sigaction to ignore SIGCHLD, the signal sent to the parent when the child exits. We add this to register_signals().

struct sigaction sa2;
sa2.sa_handler = SIG_IGN;
sigemptyset(&sa2.sa_mask);
sa2.sa_flags = 0;
sigaction(SIGCHLD, &sa2, 0);

We should also kill all the processes with paused state we have left.

import atexit

def cleanup():
    print("Cleaning up child processes.")
    for proc in [process] + states:
        try:
            proc.kill(9)
        except OSError:
            pass

atexit.register(cleanup)

If the C process we're editing needs SIGCHLD for some other reason, we could try to ignore it from our debugger instead of the C program.

Testing undo

Try it out. We'll put a loop in our main function that increases about every 0.1 seconds

int count;

int main(){
  register_signals();
  printf("Starting main loop\n");
  count = 0;
  while (1){
    count += 1;
    printf("%i: Sleep loop %i\n", pid, count);
    usleep(100000);
  }
}

Add a function in Python to wait until that count reaches some value.

def wait_for_count(min_count=1):
    while read_int('count') < min_count:
        step()

And run our test

>>> wait_for_count(2)
Starting main loop
0: Sleep loop 1
>>> save_state()
>>> wait_for_count(4)
0: Sleep loop 2
0: Sleep loop 3
>>> save_state()
>>> wait_for_count(6)
0: Sleep loop 4
0: Sleep loop 5
>>> load_state()
>>> wait_for_count(6)
0: Sleep loop 4
0: Sleep loop 5
>>> load_state(states[0])
>>> wait_for_count(6)
0: Sleep loop 2
0: Sleep loop 3
0: Sleep loop 4
0: Sleep loop 5

That worked as expected. wait_for_count(2) pauses when count reaches 2, which is before its value is printed by its sleep loop.

Using this debugger/editor

Now that we are ready, next time we'll get to the best part: actually using this thing to make a program. I was hoping to squeeze in a short sample session but this is already getting really long.

There's probably a number of convenience functions that are missing and some useful bundling of features that's still absent. We'll figure all that out later.

Footnotes and odd and ends

Side note about globals and interactivity

We are using a few more globals than would be good normally. This is in part because we'll use the Python interpreter as a REPL for the debugger. Still, we could make this library cleaner and then add helper functions that are easier to type.

Undo and checkpointing

It seems there used to be a lot of programs for checkpointing, saving the state of a process. I was trying to find a small one to run, read and extract the steps of. Unfortunately, I couldn't find much recent activity on the topic. Most of the things I looked at were from before 2005.

Footnotes

[1] Well, it also creates a core dump but the default maximum size for the file is 0. ulimit can be used to increase that.
[2] If doing this in assembly instead of C, we'd still have to create a C structure (or at least a piece of memory that's shaped like one) and pass a pointer to it as argument.
[3] This happens to be the first line of /proc/<pid>/maps every time I tried it. I don't know if that's always the case.
[4] There was probably a better place than Stackoverflow to get this information from.
[5] Wikipedia says DWARF is not an acronym and has no meaning.
[6] This page explains what states the stack and registers can be in. Even the registers rbp and rsp need not preserve their meaning of locations in stack (so the next paragraph also needs correction/clarification).

Some assembly registers

The assembly registers refered to in this post are:

rip instruction pointer: address of the next asm instruction to run
rsp stack pointer: end of the stack (which happens to be the "bottom" because the stack grows down).
rbp base pointer: beginning of the stack (which happens to be the "top" because the stack grows down).

Source

The source for this post is here. The files are tutorial2.py, elfreader.py and sample2.c.

Posted on Jun 22, 2018

Blog index RSS feed Contact