Last time, we started making a debugger and live editor for (re)creating assembly and C programs.
We got all the assembly parts: read/write registers and memory, single step, single instruction execution, function calls (although not perfect), set/restore breakpoints, memory allocation and examining upcoming instructions.
This post will try to do similar things in the C portion. Next time, we'll try using it to make something.
Since we know the location of C variables from binary header (extracted through our variables
function), we can just read and write those location in memory directly. It helps to know the variable's type so we know how many bytes to read and write.
For example, if we had int
s, we could define
def read_int(var_name):
addr = start + c_variables[var_name]
type_, size = 'h', 2
return struct.unpack(type_, process.readBytes(addr, size))[0]
and
def write_int(var_name, value):
addr = start + c_variables[var_name]
type_, size = 'h', 2
process.writeBytes(addr, struct.pack(type_, value))
Arrays just take up contiguous regions of memory and so can be read in one go as bytes and then parsed.
We'll now setup some (unexpected) error handling. We could do this at any other time. But setting it up now will make troubleshooting any issues from later additions easier.
By default, a program that get a segmentation fault (segfault, signal SIGSEGV
) just crashes and exits [1]. Instead, we setup a function to be called that print out some useful information on segfaults.
We essentially use this article here.
We'll create a function bt_sighandler
which prints the backtrace and use the sigaction
system call [2] to have it called on a segfault.
void register_signals(){
struct sigaction sa;
sa.sa_handler = (void *)bt_sighandler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;
sigaction(SIGSEGV, &sa, NULL);
}
Instead, we could have also used our debugger to print the backtrace.
Note that we need to let the process continue to run the body of bt_sighandler
if its paused by our debugger.
Our next goal is to get a C REPL. For that, we'll first take a look at memory maps. While not strictly needed, it will give us a better idea of where things are now that they'll start being more scattered.
When reading and writing to memory from our debugger, we refer to absolute addresses. When loading things like libraries, variables and the a.out
executable, each file is usually copied to some non-overlapping region.
The region is available in /proc/<pid>/maps
in text form (where <pid>
is the pid of the traced process). That why getting the start of the region for a.out
[3] and adding it to the address read from the header for some function gave us the actual address in memory of the (beginning) of that function.
The file /proc/<pid>/maps
is text formatted as follows [4].
address perms offset dev inode pathname
08048000-08056000 r-xp 00000000 03:0c 64593 /usr/sbin/gpm
We can parse this and put it in a python dict.
from collections import defaultdict
def memory_maps():
mmap = defaultdict(dict)
mmaped_counter = 0
for line in open("/proc/%s/maps" % pid).readlines():
if len(line.split()) == 5:
line += "[mmaped-%s]" % mmaped_counter
mmaped_counter += 1
region, permissions, offset, dev, inode, filename = line.split()
start, end = [int(x, 16) for x in region.split("-")]
mmap[filename][permissions] = {"start": start, "end": end,
"offset": int(offset, 16),
"dev": dev, "inode": int(inode)}
return mmap
The content of /proc/<pid>/maps
changes as we allocate (or deallocate) memory.
For convenience, we'll also add a function for finding the region to which an address belongs to.
mmap = memory_maps()
def find_section(mmap, address):
for filename in mmap:
for perm, region in mmap[filename].items():
if region['start'] <= address <= region['end']:
return (filename, perm), address - region['start']
We can use it like this.
>>> find_section(mmap, process.getreg('rip'))
(('/usr/lib/libc-2.26.so', 'r-xp'), 946896L)
Getting a C REPL is a bit trickier. We will use dynamic library loading (of libdl
)to accomplish this. To run a line of C, we will
run_once.c
,gcc -shared -x c -o run_once.so -fPIC run_once.c
),dlopen("./run_once.so", RTLD_LAZY);
).Since we also created the ptraced C program, it is easiest to add a function there for use to call with the debugger later.
#include <dlfcn.h>
void* dlrun_once; // Will contain the dynamically loaded library
void reload_run_once(){
if (dlrun_once != NULL) dlclose(dlrun_once);
dlrun_once = dlopen("./run_once.so", RTLD_LAZY);
}
To be able to access variables, we will declare them as extern
in the temporary C file. We will also include any other libraries needed (like <stdio.h>
to use printf
).
extern int foobar;
In our Python program, we'll put the entire temporary program in a string, create a subprocess to call gcc and pipe our program through stdin. The list c_globals
will hold the list of variables to be extern
ed that we want to access.
import shlex
c_globals = []
def load_lib_vars(filename):
global lib_vars, lib_start
line1 = [l for l in open("/proc/%s/maps" % pid).readlines()
if l.endswith(filename + '\n')][0]
lib_start = int(line1.split("-")[0], 16)
lib_vars = elfreader.variables(filename)
c_variables.update(lib_vars)
def run_c(c_lines):
program = """
#include <stdio.h>
%s;
void run_once(){
%s
}""" % (";\n".join(c_globals), c_lines.encode('string_escape'))
command = "gcc -shared -x c -o run_once.so -fPIC -"
gcc_proc = subprocess.Popen(shlex.split(command), stdin=subprocess.PIPE)
out = gcc_proc.communicate(input=program)
gcc_proc.stdin.close()
if not gcc_proc.returncode: # No error
c_func_call("reload_run_once")
load_lib_vars('run_once.so')
c_func_call("run_once", lib_start)
Try it out
>>> run_c('printf("Hello world\n");')
Hello world
(Don't forget that stdout is buffered in C so without the end of line \n
, we would see no text appear.)
>>> c_globals.append("int foobar")
>>> run_c('printf("The value of foobar is %i\n", foobar);')
The value of foobar is 12
Note, if instead we want to call the dynamically loaded function from C instead of the debugger, we could store the function in a variable and call it.
typedef int (*func_ptr_t)(void); // Type should match the function. Here it has void input and int output
void* lib_func;
lib_func = dlsym(dltest, "some_func_name");
printf("lib_func output: %d\n", ((func_ptr_t)lib_func)());
To dynamically define functions, we can just add then to the source compiled to run_once.so
. We could also create other shared libraries live and dynamically load them.
Eventually, these would be included in our main program and recompiled in.
By default the header of a.out
does not include information about line numbers in the C source. But we can ask gcc to include them with the -gdwarf-2
flag. This adds the DWARF headers [3] to a.out
(previously we only read the ELF headers).
I won't say more about the DWARF format because I don't know them. Basically, it looks like a list of things ("compilation units"), some of those things are XML-like documents ("debugging information entries") that may be nested and refer to each other. We're only interested in these things from the list for now.
For each file loaded into memory (obtained from memory_maps()
), we'll see if it has DWARF headers and extract this list of XML-like documents. Store it all in all_dwarf_info
.
from elftools.dwarf.descriptions import describe_form_class
all_dwarf_info = {}
def die_bounds(die):
lowpc = die.attributes['DW_AT_low_pc'].value
highpc_attr = die.attributes['DW_AT_high_pc']
highpc_attr_class = describe_form_class(highpc_attr.form)
highpc = highpc_attr.value if highpc_attr_class == 'address' else\
highpc_attr.value + lowpc if highpc_attr_class == 'constant' else\
Exception('Error: invalid DW_AT_high_pc class: %s' % highpc_attr_class)
return lowpc, highpc
def load_dwarf_info(mmap):
""" Load or reload all dwarf info from mmap. """
for filename in mmap:
if filename.startswith("["):
continue
elffile = ELFFile(open(filename, "rb"))
if not elffile.has_dwarf_info():
continue
dwarfinfo = elffile.get_dwarf_info()
# Information from Compilation Units (CUs)
cus = []
for cu in dwarfinfo.iter_CUs():
lineprog = dwarfinfo.line_program_for_CU(cu)
states = [entry.state for entry in lineprog.get_entries()
if entry.state and not entry.state.end_sequence]
addresses = [state.address for state in states]
dies = [{"entry": die,
"bounds": die_bounds(die),
"name": die.attributes['DW_AT_name'].value}
for die in cu.iter_DIEs()
if die.tag == 'DW_TAG_subprogram']
cus.append({"lineprog": lineprog, "states": states,
"addresses": addresses, "entries": dies})
all_dwarf_info[filename] = {"dwarfinfo": dwarfinfo, "units": cus}
And with this information, we can find the file name, function name and line number for any address. We just need to locate the address we are looking for between two addresses with known line numbers.
from bisect import bisect
def address_info(address):
for filename, dwarfinfo in all_dwarf_info.items():
for cu in dwarfinfo["units"]:
index = bisect(cu["addresses"], address) - 1
if -1 < index < len(cu["addresses"]) - 1:
state = cu["states"][index]
# Could probably bisect
func_name = None
for entry in cu["entries"]:
if entry["bounds"][0] <= address < entry["bounds"][1]:
func_name = entry["name"]
break
return {"function": func_name,
"file": cu["lineprog"]['file_entry'][state.file - 1].name,
"line": state.line}
Lets put all these headers related helpers, including variables()
, into a file, say elfreader.py
.
Since all this information comes from headers, we still have to shift it by the start of the appropriate region.
>>> elfreader.load_dwarf_info(mmap)
>>> elfreader.address_info(process.getreg('rip') - start)
{'function': 'main', 'line': 79, 'file': 'sample2.c'}
By convention, the stack is between the value of registers rbp
(lower address) and rsp
(higher address) and rsp
increases when there are more stack frames added. We're on 64-bit Linux so each frame takes up 8 bytes.
Edit: Thanks to saagarjha for pointing out the previous paragraph is false. There are conventions but its not those and they aren't that simple [6].
This is the same stack for both assembly and C.
def get_stack():
bottom = process.getreg('rsp')
top = process.getreg('rbp')
stack_bytes = process.readBytes(bottom, top - bottom)
return [struct.unpack('l', stack_bytes[i*8: (i+1)*8])[0]
for i in xrange(len(stack_bytes) / 8)]
Now we can combine this with elfreader.address_info
to get the information of all stack frames.
Sometimes, things other than stack frames are put on the stack. We can try to exclude those with a heuristic and hope their value don't lie in a memory mapped region.
def line_numbers():
elfreader.load_dwarf_info(mmap)
lines = [elfreader.address_info(find_section(mmap, frame)[1])
for frame in get_stack() + [process.getreg('rip')]
if find_section(mmap, frame)]
return [line for line in lines if line is not None]
Try it out.
>>> line_numbers()
[{'file': 'sample2.c', 'function': 'main', 'line': 71},
{'file': 'sample2.c', 'function': 'main', 'line': 69}]
We'll use the fork
system call to allow us to experiment with the live process being edited and undo changes. This won't be a "full" undo because external resources like file descriptors aren't restored.
fork()
creates two process: a "parent" and a "child".
To save the state of the process, we create a fork and store the parent in a list of processes. (The stored copy is frozen by ptrace until needed.)
To load the state of the process, we create a fork of the state we want to revert to and replace the current process.
In our C file, add a helper function (this could have been dynamically loaded in or translated to assembly and executed line-by-line from the debugger).
int pid;
void make_fork(){
pid = fork();
if (pid == 0) {
raise(SIGSTOP);}}
The forked parent receives the pid of the child as output of fork()
and the child receives 0 as output. Here, we raise(SIGSTOP)
in the child which halt execution and we'll send it SIGCONT
with ptrace. There might be a better way to do this.
Now call make_fork
from the debugger.
states = []
def save_state(skip_save=False):
global child_pid, parent_process, child_process, process
old_regs = process.getregs()
c_func_call('make_fork')
child_pid = read_int('pid')
parent_process = process
child_process = process = debugger.addProcess(child_pid, False)
process.cont()
process.waitSignals(signal.SIGSTOP)
process.setregs(old_regs)
if not skip_save:
states.append(parent_process)
def load_state(state=None):
global process
state = states[-1] if state is None else state
process.kill(9)
process = state
save_state(True)
Edit: Fixed a segfault bug in save_state
: when called from load_state
, the instruction pointer manually set by c_func_call
wasn't restored.
Its a bit brutal to send SIGKILL
(signal 9) but I haven't found a nicer way.
save_state
and load_state
don't always produce the expected result for the moment.
This is speculative. I think the reason for some of the failures with c_func_call
(including when called to save and load state discussed just now) is because we are inside some function in libc
in some intermediate state and calling another function from this state, which itself call libc
functions causes the intermediate state to be used, resulting in unexpected behaviours and crashes.
I could try to learn more about this by stepping though the execution when I make a c_func_call
but that might need knowning more libc
internals and the final product might not use libc
much or at all.
Currently the child process of a fork
when killed is a zombie. We could call wait()
to fix this. However, we aren't using the parent-child relation of processes in its intended way and don't know whether children are alive or not. For example, restoring to state 3 from a fork of state 5 mean state 4, the child of state 3 is alive. Restoring to state 5 from a fork of state 5 on the other hand means all children of state 5 are terminated.
So instead, we will again use sigaction
to ignore SIGCHLD
, the signal sent to the parent when the child exits. We add this to register_signals()
.
struct sigaction sa2;
sa2.sa_handler = SIG_IGN;
sigemptyset(&sa2.sa_mask);
sa2.sa_flags = 0;
sigaction(SIGCHLD, &sa2, 0);
We should also kill all the processes with paused state we have left.
import atexit
def cleanup():
print("Cleaning up child processes.")
for proc in [process] + states:
try:
proc.kill(9)
except OSError:
pass
atexit.register(cleanup)
If the C process we're editing needs SIGCHLD for some other reason, we could try to ignore it from our debugger instead of the C program.
Try it out. We'll put a loop in our main function that increases about every 0.1 seconds
int count;
int main(){
register_signals();
printf("Starting main loop\n");
count = 0;
while (1){
count += 1;
printf("%i: Sleep loop %i\n", pid, count);
usleep(100000);
}
}
Add a function in Python to wait until that count reaches some value.
def wait_for_count(min_count=1):
while read_int('count') < min_count:
step()
And run our test
>>> wait_for_count(2)
Starting main loop
0: Sleep loop 1
>>> save_state()
>>> wait_for_count(4)
0: Sleep loop 2
0: Sleep loop 3
>>> save_state()
>>> wait_for_count(6)
0: Sleep loop 4
0: Sleep loop 5
>>> load_state()
>>> wait_for_count(6)
0: Sleep loop 4
0: Sleep loop 5
>>> load_state(states[0])
>>> wait_for_count(6)
0: Sleep loop 2
0: Sleep loop 3
0: Sleep loop 4
0: Sleep loop 5
That worked as expected. wait_for_count(2)
pauses when count
reaches 2, which is before its value is printed by its sleep loop.
Now that we are ready, next time we'll get to the best part: actually using this thing to make a program. I was hoping to squeeze in a short sample session but this is already getting really long.
There's probably a number of convenience functions that are missing and some useful bundling of features that's still absent. We'll figure all that out later.
We are using a few more globals than would be good normally. This is in part because we'll use the Python interpreter as a REPL for the debugger. Still, we could make this library cleaner and then add helper functions that are easier to type.
It seems there used to be a lot of programs for checkpointing, saving the state of a process. I was trying to find a small one to run, read and extract the steps of. Unfortunately, I couldn't find much recent activity on the topic. Most of the things I looked at were from before 2005.
ulimit
can be used to increase that./proc/<pid>/maps
every time I tried it. I don't know if that's always the case.rbp
and rsp
need not preserve their meaning of locations in stack (so the next paragraph also needs correction/clarification).The assembly registers refered to in this post are:
rip
instruction pointer: address of the next asm instruction to runrsp
stack pointer: end of the stack (which happens to be the "bottom" because the stack grows down).rbp
base pointer: beginning of the stack (which happens to be the "top" because the stack grows down).The source for this post is here. The files are tutorial2.py
, elfreader.py
and sample2.c
.
Posted on Jun 22, 2018