Using LLDB for reverse engineering Dec 20 2019
I've been exploring reverse engineering, and it's a fascinating topic. There are many ways to analyse a binary. Usually, the analysis is divided into two types, static and dynamic. Static analysis is when you decompile the binary and read the assembly code and try to figure out what it does. On the other hand, in dynamic analysis, you execute the binary and analyse it while running. In general, for dynamic analysis, we use a debugger. As you can imagine, there are many debuggers out there. In this post, we are going to use LLDB to analyse a binary. I'll explain the basic commands we would use and a general setup that I find useful when doing dynamic analysis.
LLDB is the debugger that comes with Xcode when you install the developer tools on macOS, so it'll be there if you are already developing some macOS/*OS applications. So let's begin with writing and analysing a simple C program.
Table of Contents
Hello, world!
Alright, we are going to write a basic C program, and compile. Create a new file, name it hello.c
and add the following content:
1
2
3
4
5
6
#include <stdio.h>
int main(int argc, char* argv[]) {
printf("Hello, world!");
return 0;
}
Now compile it using Clang (you can use GCC, or any other compiler, I'm just trying to stay to the tools provided by LLVM used in the Apple ecosystem):
1
2
$ clang hello.c
# this should create a.out
Now we are going to use lldb
to analyse the a.out.
1
$ lldb a.out
The lldb
command, provides us with a REPL where we can run the program, set breakpoints and analyse the code.
Let's run the command:
1
2
3
(lldb) r
Process 46295 launched: '/Users/perensejo/a.out' (x86_64)
Hello, world!Process 46295 exited with status = 0 (0x00000000)
Now, we know what it does when we execute it, but how it does it is what we are interested in.
We are going to assume we don't know anything about the binary, so let's first show the symbol tables. We could use the command nm(1)
in the shell.
1
2
3
4
5
6
$ nm a.out
0000000100002008 d __dyld_private
0000000100000000 T __mh_execute_header
0000000100000f50 T _main
U _printf
U dyld_stub_binder
Or from the debugger, we can show the symbol table using the image
command.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
(lldb) image dump symtab a.out
Symtab, file = /Users/pascualin/a.out, num_symbols = 5:
Debug symbol
|Synthetic symbol
||Externally Visible
|||
Index UserID DSX Type File Address/Value Load Address Size Flags Name
------- ------ --- --------------- ------------------ ------------------ ------------------ ---------- ----------------------------------
[ 0] 0 Data 0x0000000100002008 0x0000000000000008 0x000e0000 _dyld_private
[ 1] 1 X Data 0x0000000100000000 0x0000000000000f50 0x000f0010 _mh_execute_header
[ 2] 2 X Code 0x0000000100000f50 0x0000000000000031 0x000f0000 main
[ 3] 3 Trampoline 0x0000000100000f82 0x0000000000000006 0x00010100 printf
[ 4] 4 X Undefined 0x0000000000000000 0x0000000000000000 0x00010100 dyld_stub_binder
To learn more about all of lldb
's commands, I would recommend reading the help included in lldb
. For example, if we wanted to check what the image
command does. We can use help image
inside lldb
, and we'll get a nice description with all the options supported by the command (you can also help help
or help apropos
to learn more).
Ok, we can see that the binary has a main
function. Let's set a breakpoint into main
and see what is going on. Yea, I know, the binaries in macOS require you to have a main
entry point, but it was an excuse to show you the symbol table for the binary.
Anyways, let's set the breakpoint, and rerun the command. I'm using the short form of the commands, but you can always use the long-form and use tab for auto-complete.:
1
2
3
4
5
6
7
8
9
10
11
12
(lldb) b main
(lldb) r
Process 46305 launched: '/Users/fulano/a.out' (x86_64)
Process 46305 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1
frame #0: 0x0000000100000f50 a.out`main
a.out`main:
-> 0x100000f50 <+0>: pushq %rbp
0x100000f51 <+1>: movq %rsp, %rbp
0x100000f54 <+4>: subq $0x20, %rsp
0x100000f58 <+8>: movl $0x0, -0x4(%rbp)
Target 0: (a.out) stopped.
Alright, we got stopped at the beginning of our main
function. This is not an introduction to Assembly language, so I won't go into the details. I will assume you have some familiarity with assembly languages. Let's have a look at our registers:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
(lldb) register read
General Purpose Registers:
rax = 0x0000000100000f50 a.out`main
rbx = 0x0000000000000000
rcx = 0x00007ffeefbfe000
rdx = 0x00007ffeefbfdc18
rdi = 0x0000000000000001
rsi = 0x00007ffeefbfdc08
rbp = 0x00007ffeefbfdbf8
rsp = 0x00007ffeefbfdbe8
r8 = 0x0000000000000000
r9 = 0x0000000000000000
r10 = 0x0000000000000000
r11 = 0x0000000000000000
r12 = 0x0000000000000000
r13 = 0x0000000000000000
r14 = 0x0000000000000000
r15 = 0x0000000000000000
rip = 0x0000000100000f50 a.out`main
rflags = 0x0000000000000246
cs = 0x000000000000002b
fs = 0x0000000000000000
gs = 0x0000000000000000
As you can see, the instruction pointer is at 0x100000f50
which is exactly where we are at, good. The instruction to be executed is:
1
-> 0x100000f50 <+0>: pushq %rbp
So we are going to be pushing what we have in register rbp
into the stack. So let's first look at where the stack pointer "points" to:
1
2
(lldb) register read rsp
rsp = 0x00007ffeefbfdbe8
That is the address in memory, but what is on that address? We can use the memory
command (I'll use the short form):
1
2
3
4
(lldb)x/10w $rsp
0x7ffeefbfdbe8: 0x6e44f7fd 0x00007fff 0x6e44f7fd 0x00007fff
0x7ffeefbfdbf8: 0x00000000 0x00000000 0x00000001 0x00000000
0x7ffeefbfdc08: 0xefbfe088 0x00007ffe
Depending on how you prefer to look at your stack, you might want to show it on a single column. I prefer that, so let's add more format to the command and use:
1
2
3
4
5
6
7
8
9
10
11
(lldb) x/10w -l 1 $rsp
0x7ffeefbfdbe8: 0x6e44f7fd
0x7ffeefbfdbec: 0x00007fff
0x7ffeefbfdbf0: 0x6e44f7fd
0x7ffeefbfdbf4: 0x00007fff
0x7ffeefbfdbf8: 0x00000000
0x7ffeefbfdbfc: 0x00000000
0x7ffeefbfdc00: 0x00000001
0x7ffeefbfdc04: 0x00000000
0x7ffeefbfdc08: 0xefbfe088
0x7ffeefbfdc0c: 0x00007ffe
That's more like it. Ok, so our stack pointer points to the top of the stack 0x7ffeefbfdbe8
, and we were about to execute the following instruction:
1
-> 0x100000f50 <+0>: pushq %rbp
Let's see what is inside rbp
:
1
2
(lldb) register read rbp
rbp = 0x00007ffeefbfdbf8
So if we push it to the stack, in the top of our stack, we should see 0x 7ffeefbfdbf8
. Let's see if it's true, run the next instruction (ni
):
1
2
3
4
5
6
7
8
9
(lldb) ni
Process 46305 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over
frame #0: 0x0000000100000f51 a.out`main + 1
a.out`main:
-> 0x100000f51 <+1>: movq %rsp, %rbp
0x100000f54 <+4>: subq $0x20, %rsp
0x100000f58 <+8>: movl $0x0, -0x4(%rbp)
0x100000f5f <+15>: movl %edi, -0x8(%rbp)
Again let's see our stack:
1
2
3
4
5
6
7
8
9
10
11
(lldb) x/10w -l 1 $rsp
0x7ffeefbfdbe0: 0xefbfdbf8
0x7ffeefbfdbe4: 0x00007ffe
0x7ffeefbfdbe8: 0x6e44f7fd
0x7ffeefbfdbec: 0x00007fff
0x7ffeefbfdbf0: 0x6e44f7fd
0x7ffeefbfdbf4: 0x00007fff
0x7ffeefbfdbf8: 0x00000000
0x7ffeefbfdbfc: 0x00000000
0x7ffeefbfdc00: 0x00000001
0x7ffeefbfdc04: 0x00000000
As you can see our stack now shows 0x7ffeefbfdbf8
on top of the stack. But that doesn't look right, it seems like one part of the hex number is on the top and another at the bottom. Well, this is because we are using x10w
This shows the format in words (32bits) and we are in a 64bits architecture, so we should use:
1
2
3
4
5
6
7
8
9
10
11
(lldb) x/10xw -s 8 -l 1 $rsp
0x7ffeefbfdbe0: 0x00007ffeefbfdbf8
0x7ffeefbfdbe8: 0x00007fff6e44f7fd
0x7ffeefbfdbf0: 0x00007fff6e44f7fd
0x7ffeefbfdbf8: 0x0000000000000000
0x7ffeefbfdc00: 0x0000000000000001
0x7ffeefbfdc08: 0x00007ffeefbfe088
0x7ffeefbfdc10: 0x0000000000000000
0x7ffeefbfdc18: 0x00007ffeefbfe0b4
0x7ffeefbfdc20: 0x00007ffeefbfe0c2
0x7ffeefbfdc28: 0x00007ffeefbfe105
And now the display looks right. Let's keep moving, let's show the disassembly code we are currently in. We can do it by typing di
:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(lldb) di
a.out`main:
0x100000f50 <+0>: pushq %rbp
-> 0x100000f51 <+1>: movq %rsp, %rbp
0x100000f54 <+4>: subq $0x20, %rsp
0x100000f58 <+8>: movl $0x0, -0x4(%rbp)
0x100000f5f <+15>: movl %edi, -0x8(%rbp)
0x100000f62 <+18>: movq %rsi, -0x10(%rbp)
0x100000f66 <+22>: leaq 0x35(%rip), %rdi ; "Hello, world!"
0x100000f6d <+29>: movb $0x0, %al
0x100000f6f <+31>: callq 0x100000f82 ; symbol stub for: printf
0x100000f74 <+36>: xorl %ecx, %ecx
0x100000f76 <+38>: movl %eax, -0x14(%rbp)
0x100000f79 <+41>: movl %ecx, %eax
0x100000f7b <+43>: addq $0x20, %rsp
0x100000f7f <+47>: popq %rbp
0x100000f80 <+48>: retq
Or we can read the memory using x
(with the i
format) on our instruction register (rip
).
1
2
3
4
5
6
7
8
9
10
11
(lldb) x/10i $rip
-> 0x100000f51: 48 89 e5 movq %rsp, %rbp
0x100000f54: 48 83 ec 20 subq $0x20, %rsp
0x100000f58: c7 45 fc 00 00 00 00 movl $0x0, -0x4(%rbp)
0x100000f5f: 89 7d f8 movl %edi, -0x8(%rbp)
0x100000f62: 48 89 75 f0 movq %rsi, -0x10(%rbp)
0x100000f66: 48 8d 3d 35 00 00 00 leaq 0x35(%rip), %rdi ; "Hello, world!"
0x100000f6d: b0 00 movb $0x0, %al
0x100000f6f: e8 0e 00 00 00 callq 0x100000f82 ; symbol stub for: printf
0x100000f74: 31 c9 xorl %ecx, %ecx
0x100000f76: 89 45 ec movl %eax, -0x14(%rbp)
I hope you are getting a better feel for using the memory read (x
short version) and the registers. Ok, we are skipping a few instructions and stop where we see the "Hello, world!" String to be passed to printf
.
1
2
3
4
5
(lldb) ni -c 5
-> 0x100000f66 <+22>: leaq 0x35(%rip), %rdi ; "Hello, world!"
0x100000f6d <+29>: movb $0x0, %al
0x100000f6f <+31>: callq 0x100000f82 ; symbol stub for: printf
0x100000f74 <+36>: xorl %ecx, %ecx
Alright, let's imagine the debugger didn't add that comment showing that it's getting the string. We see that the rdi
register will point to the memory address that contains the "Hello, world!" String. It'll be in the rdi
register after we execute the instruction.
1
2
3
4
5
(lldb) ni
-> 0x100000f6d <+29>: movb $0x0, %al
0x100000f6f <+31>: callq 0x100000f82 ; symbol stub for: printf
0x100000f74 <+36>: xorl %ecx, %ecx
0x100000f76 <+38>: movl %eax, -0x14(%rbp)
Let's read the memory that rdi
points to (let's read 4 words):
1
2
3
4
5
(lldb) x/4w $rdi
0x100000fa2: "Hello, world!"
0x100000fb0: "\x01"
0x100000fb2: ""
0x100000fb3: ""
We can also take advantage of the s
format that will obtain a string until it reaches a "null" character \x01
.
1
2
(lldb) x/s $rdi
0x100000fa2: "Hello, world!"
Perfect, you can then see that we have a call to printf
and the rest of the teardown of the program. You can continue debugging it on your own, or just use the command continue
that will continue until the next breakpoint (which we don't have) or the end of the program in our case.
Ok, that should be enough to get you started. There are a few more details I want to show you. First, if we are debugging a program that we wrote. We have access to the code so we can compile it with additional information for the debugger. Second, we'll see how to set up a command file to make your debugging life easier.
Debugger information
Ok, let's now compile our code using the flag glldb
. Using that flag will give additional information to our debugger:
1
2
$ clang -glldb hello.c
# This generates a.out
Again, let's jump into lldb
.
1
2
3
4
5
6
$ lldb a.out
(lldb) target create "a.out"
Current executable set to 'a.out' (x86_64).
(lldb) b main
Breakpoint 1: where = a.out`main + 22 at hello.c:4:3, address = 0x0000000100000f66
(lldb)
And run the program:
1
2
3
4
5
6
7
8
9
10
11
12
(lldb) r
Process 46448 launched: '/Users/derik/Documents/Development/re/a.out' (x86_64)
Process 46448 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100000f66 a.out`main(argc=1, argv=0x00007ffeefbfdc08) at hello.c:4:3
1 #include <stdio.h>
2
3 int main(int argc, char* argv[]) {
-> 4 printf("Hello, world!");
5 return 0;
6 }
Target 0: (a.out) stopped.
Alright, now that shows us the source code in the debugger, that is useful. If we want to go to the next instruction in the code, just use the next
(n
short form) command.
1
2
3
4
5
6
7
8
9
10
(lldb) n
Process 46448 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x0000000100000f79 a.out`main(argc=1, argv=0x00007ffeefbfdc08) at hello.c:5:3
2
3 int main(int argc, char* argv[]) {
4 printf("Hello, world!");
-> 5 return 0;
6 }
Target 0: (a.out) stopped.
As you can see, it went straight to the return 0
instruction. When we get the additional debugging information, we can use n
to go to the next source code instruction. And we can use ni
if we want to step into the assembly instructions. Which is quite handy.
Let's rerun our program and try to show the assembly instructions:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
(lldb) r
There is a running process, kill it and restart?: [Y/n] y
Process 46457 exited with status = 9 (0x00000009)
Process 46463 launched: '/Users/derik/Documents/Development/re/a.out' (x86_64)
Process 46463 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100000f66 a.out`main(argc=1, argv=0x00007ffeefbfdc08) at hello.c:4:3
1 #include <stdio.h>
2
3 int main(int argc, char* argv[]) {
-> 4 printf("Hello, world!");
5 return 0;
6 }
Target 0: (a.out) stopped.
(lldb) ni
Process 46463 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over
frame #0: 0x0000000100000f6d a.out`main(argc=1, argv=0x00007ffeefbfdc08) at hello.c:4:3
1 #include <stdio.h>
2
3 int main(int argc, char* argv[]) {
-> 4 printf("Hello, world!");
5 return 0;
6 }
Target 0: (a.out) stopped.
Alright, nothing happened. What happened? Well, we are not displaying the assembly code, use the di
command to show the disassembly:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
(lldb) di
a.out`main:
0x100000f50 <+0>: pushq %rbp
0x100000f51 <+1>: movq %rsp, %rbp
0x100000f54 <+4>: subq $0x20, %rsp
0x100000f58 <+8>: movl $0x0, -0x4(%rbp)
0x100000f5f <+15>: movl %edi, -0x8(%rbp)
0x100000f62 <+18>: movq %rsi, -0x10(%rbp)
0x100000f66 <+22>: leaq 0x35(%rip), %rdi ; "Hello, world!"
-> 0x100000f6d <+29>: movb $0x0, %al
0x100000f6f <+31>: callq 0x100000f82 ; symbol stub for: printf
0x100000f74 <+36>: xorl %ecx, %ecx
0x100000f76 <+38>: movl %eax, -0x14(%rbp)
0x100000f79 <+41>: movl %ecx, %eax
0x100000f7b <+43>: addq $0x20, %rsp
0x100000f7f <+47>: popq %rbp
0x100000f80 <+48>: retq
Now we can use ni
+di
to view the steps in the assembly code.
You can continue playing with that on your own. Let's now create a custom configuration that will be helpful when we are reverse engineering a binary.
LLDB custom hooks
We can pass as an argument to lldb
of a file that contains lldb
instructions to be executed when the debugger is executed.
That could be useful, but it becomes much better when we add to that file some lldb
hooks. We can define some hooks that will run when the debugger stops (in each step or breakpoint). Create a file revengsetup
with the following content:
1
2
3
4
5
6
7
8
9
10
11
ta st a -o "x/x $rax "
ta st a -o "x/x $rbx "
ta st a -o "x/x $rcx "
ta st a -o "x/x $rdx "
ta st a -o "x/x $rdi "
ta st a -o "x/x $rsi "
ta st a -o "x/x $rbp "
ta st a -o "x/x $rsp "
ta st a -o "x/8w -s 8 -l1 $rsp"
ta st a -o "x/10i $rip"
b main
What we are doing is adding hooks that display useful information on the state of the registers, the stack, and disassembly code of the current instructions.
Let's try it out with our a.out
.
1
2
$ lldb -s revengsetup a.out
(lldb) r
Run the command, and you'll be able to see all the information on your screen. Very handy.
Final thoughts
There is a lot to reverse engineering than just using a debugger, but it is useful to become proficient with one. This was just a short introduction to get you started, there are more resources out there on the Internet. I wrote this post because the information I found was mostly directed to GDB, and the GDB information was also hidden between assembly language tutorials or books. I wanted to present you with a concise way to jump into lldb
without having to thread through lots of pages of how to write assembly. I hope you find it useful.
Let me know what you think, as always, feedback is welcomed.
And also let me know what are you reverse engineering, it is always fun to talk about this stuff.
Related topics/notes of interest
- The GDB to LLDB command map, useful because there is a lot of information on how to use GDB but less on LLDB so if you learn how to do it on GDB then you might find the equivalent on LLDB in that link.
- A stack overflow answer that explains the difference between GDB and LLDB, a simple explanation.
- If you want to learn about assembly, I would recommend http://opensecuritytraining.info/Training.html.
- Also, Reverse Engineering for Beginners.
- Reverse Engineering subreddit, a lot of useful information there.