gdb and lldb are the best known debuggers to me. While they are both customizable with scripts, there are many times where I'd like have much more control over how my debugger works (both the interactive portion and its internal representation).
Being able to recreate flpc in a more interactive ways is one of these times. In this post, I try to make a debugger from more primitive pieces: the ptrace system call wrapped by the python-ptrace library, pyelftools and later on, the disassembly library distorm3.
Because those debuggers are very large projects, trying to remake them seem daunting. But since I mostly want to debug and live-edit a binary I've created, I don't need maximum compatibility. Simplicity will be favoured over completeness when it seems like a good trade. Hopefully, this post itself exposes enough of the underlying ideas to bridge the gap in case of a slightly different environment and standard.
The source for this post is here. Everything is made and tested for Linux x86_64. The lines are in the order of this tutorial with functions and imports moved closer to the front. So not only is the final debugger interactive, the steps for making the debugger is also interactive.
To avoid permissions issue, we will launch the debugged process as a child.
import ptrace.debugger
shell_command = ["./a.out"]
child_proc = subprocess.Popen(shell_command)
pid = child_proc.pid
debugger = ptrace.debugger.PtraceDebugger()
process = debugger.addProcess(pid, False)
This uses the ptrace
system call to attach to the child process and pause it. process
now contains many convenient methods.
(This follows the hello world example from python-ptrace.)
Lets start simple but not linger here for too long.
Get registers
>>> regs = process.getregs()
>>> registers = {k:getattr(regs, k) for k in dir(regs) if not k.startswith('_')}
>>> registers
{'cs': 51L,
'ds': 0L,
'eflags': 519L,
[...]
'rax': 0L,
'rbp': 140733962602848L,
'rbx': 3L,
'rcx': 139901135742274L,
'rdi': 3L,
'rdx': 140733962602656L,
'rip': 139901135742280L,
'rsi': 140733962602656L,
'rsp': 140733962602520L,
'ss': 43L}
Read bytes from memory
>>> import binascii
>>> binascii.hexlify(process.readBytes(registers['rsp'], 8))
'70987e453d7f0000'
Next we'll want to run assembly instructions one at a time. Lets gather the ingredients
Single step
>>> process.getreg('rip')
140187902313503L
>>> process.singleStep()
>>> process.getreg('rip')
140187902313507L
(rip
is the instruction pointer. The r
prefix indicates its length of 64 bits. We see that it has indeed advance when we took a step.)
Continue until the child raises a signal (SIGTRAP
in this case). This may result in an error if the process terminates or raises a different signal.
>>> import signal
>>> process.waitSignals(signal.SIGTRAP)
ProcessSignal('Signal SIGTRAP',)
process.singleStep
is non-blocking so we'll add a blocking version for convenience.
def step():
process.singleStep()
process.waitSignals(signal.SIGTRAP)
(Its not very clean, but lets use process
as a global for the moment.)
Write to memory. In assembly, the instruction int 3
raises SIGTRAP
. This instruction can be written as a single byte 0xCC
.
>>> process.writeBytes(process.getreg('rip'), chr(0xCC))
>>> process.cont()
>>> process.waitSignals(signal.SIGTRAP)
ProcessSignal('Signal SIGTRAP',)
(We can also check the rip
register before and after to see that it increase by exactly 1.)
Set a register
>>> process.setreg('rax', 0)
Now we have everything we need to run a single instruction given as bytes.
def run_asm(instr):
old_rip = process.getreg('rip')
old_values = process.readBytes(old_rip, len(instr))
process.writeBytes(old_rip, instr)
step()
# Rewind rip unless the instruction altered it.
if process.getreg('rip') == old_rip + len(instr):
process.setreg('rip', old_rip)
process.writeBytes(old_rip, old_values)
This overwrites the bytes below the instruction pointer with out instruction instr
takes a step and revert the overwritten bytes and position of the instruction pointer. Only do that last bit if the instruction pointer wasn't changed (like in the case of a jump or call).
With a lookup table of assembly instructions to bytes, this could be put into a loop and made into a REPL.
What if we want to call an assembly function and pause after returning from it?
Lets write a Python function func_call(func_addr)
for that. (Run this function line by line to examine its intermediate state.) First, save some of our current state.
def func_call(func_addr):
old_rip = process.getreg('rip')
old_regs = process.getregs()
old_values = process.readBytes(old_rip, 6)
We could just use run_asm
with the call
instruction. That's byte 0xE8
followed by 5 bytes in little endian describing the difference between the current rip and our destination.
To pause the child after the call, we can write int 3
(byte 0xCC
) after our call instructions.
diff = func_addr - (old_rip + 5)
new_values = chr(0xE8) + struct.pack('i', diff) + chr(0xCC)
process.writeBytes(old_rip, new_values)
step()
We can double check that the call was made
new_rip = process.getreg('rip')
assert(new_rip == func_addr)
Now lets run until it hits a SIGTRAP
(hopefull the one we set).
process.cont()
process.waitSignals(signal.SIGTRAP)
And now restore the bytes we've overwritten and the register values. In some cases, we might want to keep some of those.
process.writeBytes(old_rip, old_values)
process.setregs(old_regs)
In fact, lets try to call C functions (compiled into the binary), for the moment with no argument and void
return.
We just need to find the address of the function. We can get it from its header using pyelftools.
from elftools.elf.elffile import ELFFile
from elftools.elf.sections import SymbolTableSection
def variables(filename="a.out"):
f = ELFFile(open(filename))
symb_sections = [section for section in f.iter_sections()
if isinstance(section, SymbolTableSection)]
variables = {symb.name:symb['st_value'] for section in symb_sections
for symb in section.iter_symbols()}
return variables
and now call
>>> c_variables = variables("a.out")
>>> func_call(c_variables['some_func_name'])
In fact, this method gets all static variables (I think) not just functions. For shared libraries, we can call variables
with the full path to the .so
file of that library.
However, this won't always work because the actual region of memory used doesn't always start at 0 and we need to add the start of that region as offset.
For the momemnt, we can get it like so. We will explain and explore memory region and /proc/pid/maps
a bit later.
>>> line1 = open("/proc/%s/maps" % pid).readline()
>>> _start = int(line1.split("-")[0], 16)
>>> start = _start if _start != 0x400000 else 0
>>> func_call(start + c_variables['some_func_name'])
Now that we have functions' addresses, we can set breakpoint by just writing int 3
(byte 0xCC
) at the start of the function.
def set_breakpoint(addr):
old = process.readBytes(addr, 1)
process.writeBytes(addr, chr(0xCC))
return old
and restore the overwritten value once we hit the breakpoint
def restore_breakpoint(old):
rip = process.getreg('rip')
process.setreg('rip', rip - 1)
addr = rip - 1
process.writeBytes(addr, old)
They can then be used like so
>>> old = set_breakpoint(start + variables['my_func'])
>>> process.waitSignals(signal.SIGTRAP)
>>> restore_breakpoint(old)
There are some issues with the first approach to calling functions, although in general, it works surprisingly well.
Call distance too high call
(0xE8
) only takes 5 bytes as argument but an address (diff) may need 8 bytes to describe. We could either wait (step) until we are in range of the function we want to call (this only works if we don't need to call the function right away) or we could put the destination in a register, say rax
, and call rax
(bytes FF D0
).
Overwritten bytes Since we overwrite 7 bytes (6 for call
, one for int
) and only restore these bytes after the function returns, anything else that reads them will get unexpected values. For example, if we made the call while inside the function body (and the program reaches old_rip
again).
We could potentially restore 6 of the 7 bytes after one step, leaving only 0xCC
. This only reduces the size of the problem.
We could manually craft a stack frame. I think this is what gdb does.
Instead, we will reserve a new piece of memory and write our instructions there.
We can use the mmap system call (call number 9) to reserve some memory. This syscall needs some magic constants, some of which are in ptrace.syscall
.
import ptrace.syscall
MMAP_PROT_BITMASK = {k:v for v,k in ptrace.syscall.posix_arg.MMAP_PROT_BITMASK}
MMAP_PROT_BITMASK['PROT_ALL'] = MMAP_PROT_BITMASK['PROT_READ']\
| MMAP_PROT_BITMASK['PROT_WRITE']\
| MMAP_PROT_BITMASK['PROT_EXEC']
MAP_PRIVATE = 0x02
MAP_ANONYMOUS = 0x20
syscalls = {k: v for v, k in ptrace.syscall.linux_syscall64.SYSCALL_NAMES.items()}
With this function, we can call mmap
. syscall
is bytes 0F 05
.
def reserve_memory(size):
old_regs = process.getregs()
regs = {'rax': syscalls['mmap'], 'rdi': 0, 'rsi': size,
'rdx': MMAP_PROT_BITMASK['PROT_ALL'],
'r10': MAP_PRIVATE | MAP_ANONYMOUS,
'r8': -1, 'r9': 0}
for reg, value in regs.items():
process.setreg(reg, value)
run_asm(chr(0x0f) + chr(0x05))
result = process.getreg('rax')
process.setregs(old_regs)
return result
This strategy is adapted from this example. For referene, the constants are
syscalls['mmap'] = 9
MMAP_PROT_BITMASK['PROT_ALL'] = 7
MAP_PRIVATE | MAP_ANONYMOUS = 34
The address at which memory is reserved is in rax
after the call so we extract and return it.
This lets us make our modified slightly safer function call
def safe_func_call(func_addr):
old_rip = process.getreg('rip')
old_regs = process.getregs()
tmp_addr = reserve_memory(6)
process.setreg('rip', tmp_addr)
# call rax
process.setreg('rax', func_addr)
new_values = chr(0xff) + chr(0xd0) + chr(0xcc)
process.writeBytes(tmp_addr, new_values)
step()
new_rip = process.getreg('rip')
assert(new_rip == func_addr)
process.cont()
process.waitSignals(signal.SIGTRAP)
process.setregs(old_regs)
This version may still segfault sometimes. I'm not entirely sure why.
Lets add a look function to our debugger that tells us what are the next instructions. We need the distorm3 disassembler for this, which can be installed using pip
.
PtraceProcess.disassemble
then gives us an iterator of the next ten instructions
def look(addr=None):
print("ip:", hex(process.getreg('rip')))
for i, instr in enumerate(process.disassemble(start=addr)):
hexa = instr.hexa
hexa = ' '.join(hexa[i:i+2] for i in range(0, len(hexa), 2))
print(str(i).ljust(4), hexa.ljust(24), instr.text.lower())
Running this gives something like
>>> look()
ip: 0x555c9860810dL
0 48 89 c2 mov rdx, rax
1 48 8d 05 79 0f 20 00 lea rax, [rip+0x200f79]
2 48 89 10 mov [rax], rdx
3 48 8d 05 6f 0f 20 00 lea rax, [rip+0x200f6f]
4 48 8b 00 mov rax, [rax]
5 48 89 c6 mov rsi, rax
6 48 8d 3d af 02 00 00 lea rdi, [rip+0x2af]
7 b8 00 00 00 00 mov eax, 0x0
8 e8 d8 fa ff ff call 0x555c98607c10
9 48 8d 05 51 0f 20 00 lea rax, [rip+0x200f51]
PtraceProcess.dumpCode
works similarly with different formatting.
This post is already getting long. I will write about reading/writing C variables, running single C statement, shared library, dynamic loading and memory maps (/proc/pid/maps
) next time.
Originally, I wasn't sure if I'd go this low level for my project (or if I really need to). Instead, I could just start debugging once my interpreter is up. It'd still be useful to have a separate interpreter and debugger so the former's state can be modified while frozen. (Imagine trying to alter the call stack while each the control flow of the stack altering function is determined by the top of that stack! It may be possible but is already very hard to reason about.)
I might still try to make the interpreter support having an external debugger plugin at some point.
The source for this post is here.
Posted on Jun 14, 2018