Using LLDB for reverse engineering
I’ve been exploring reverse engineering, and it’s a fascinating topic. There are many ways to analyse a binary. Usually, the analysis is divided into two types, static and dynamic. Static analysis is when you decompile the binary and read the assembly code and try to figure out what it does. On the other hand, in dynamic analysis, you execute the binary and analyse it while running. In general, for dynamic analysis, we use a debugger. As you can imagine, there are many debuggers out there. In this post, we are going to use LLDB to analyse a binary. I’ll explain the basic commands we would use and a general setup that I find useful when doing dynamic analysis.
LLDB is the debugger that comes with Xcode when you install the developer tools on macOS, so it’ll be there if you are already developing some macOS/*OS applications. So let’s begin with writing and analysing a simple C program.
Hello, world!
Alright, we are going to write a basic C program, and compile. Create a new file, name it hello.c and add the following content:
| |
Now compile it using Clang (you can use GCC, or any other compiler, I’m just trying to stay to the tools provided by LLVM used in the Apple ecosystem):
| |
Now we are going to use lldb to analyse the a.out.
| |
The lldb command, provides us with a REPL where we can run the program, set breakpoints and analyse the code.
Let’s run the command:
| |
Now, we know what it does when we execute it, but how it does it is what we are interested in.
We are going to assume we don’t know anything about the binary, so let’s first show the symbol tables. We could use the command nm(1) in the shell.
| |
Or from the debugger, we can show the symbol table using the image command.
| |
To learn more about all of lldb’s commands, I would recommend reading the help included in lldb. For example, if we wanted to check what the image command does. We can use help image inside lldb, and we’ll get a nice description with all the options supported by the command (you can also help help or help apropos to learn more).
Ok, we can see that the binary has a main function. Let’s set a breakpoint into main and see what is going on. Yea, I know, the binaries in macOS require you to have a main entry point, but it was an excuse to show you the symbol table for the binary.
Anyways, let’s set the breakpoint, and rerun the command. I’m using the short form of the commands, but you can always use the long-form and use tab for auto-complete.:
| |
Alright, we got stopped at the beginning of our main function. This is not an introduction to Assembly language, so I won’t go into the details. I will assume you have some familiarity with assembly languages. Let’s have a look at our registers:
| |
As you can see, the instruction pointer is at 0x100000f50 which is exactly where we are at, good. The instruction to be executed is:
| |
So we are going to be pushing what we have in register rbp into the stack. So let’s first look at where the stack pointer “points” to:
| |
That is the address in memory, but what is on that address? We can use the memory command (I’ll use the short form):
| |
Depending on how you prefer to look at your stack, you might want to show it on a single column. I prefer that, so let’s add more format to the command and use:
| |
That’s more like it. Ok, so our stack pointer points to the top of the stack 0x7ffeefbfdbe8, and we were about to execute the following instruction:
| |
Let’s see what is inside rbp:
| |
So if we push it to the stack, in the top of our stack, we should see 0x 7ffeefbfdbf8. Let’s see if it’s true, run the next instruction (ni):
| |
Again let’s see our stack:
| |
As you can see our stack now shows 0x7ffeefbfdbf8 on top of the stack. But that doesn’t look right, it seems like one part of the hex number is on the top and another at the bottom. Well, this is because we are using x10w This shows the format in words (32bits) and we are in a 64bits architecture, so we should use:
| |
And now the display looks right. Let’s keep moving, let’s show the disassembly code we are currently in. We can do it by typing di:
| |
Or we can read the memory using x (with the i format) on our instruction register (rip).
| |
I hope you are getting a better feel for using the memory read (x short version) and the registers. Ok, we are skipping a few instructions and stop where we see the “Hello, world!” String to be passed to printf.
| |
Alright, let’s imagine the debugger didn’t add that comment showing that it’s getting the string. We see that the rdi register will point to the memory address that contains the “Hello, world!” String. It’ll be in the rdi register after we execute the instruction.
| |
Let’s read the memory that rdi points to (let’s read 4 words):
| |
We can also take advantage of the s format that will obtain a string until it reaches a “null” character \x01.
| |
Perfect, you can then see that we have a call to printf and the rest of the teardown of the program. You can continue debugging it on your own, or just use the command continue that will continue until the next breakpoint (which we don’t have) or the end of the program in our case.
Ok, that should be enough to get you started. There are a few more details I want to show you. First, if we are debugging a program that we wrote. We have access to the code so we can compile it with additional information for the debugger. Second, we’ll see how to set up a command file to make your debugging life easier.
Debugger information
Ok, let’s now compile our code using the flag glldb. Using that flag will give additional information to our debugger:
| |
Again, let’s jump into lldb.
| |
And run the program:
| |
Alright, now that shows us the source code in the debugger, that is useful. If we want to go to the next instruction in the code, just use the next (n short form) command.
| |
As you can see, it went straight to the return 0 instruction. When we get the additional debugging information, we can use n to go to the next source code instruction. And we can use ni if we want to step into the assembly instructions. Which is quite handy.
Let’s rerun our program and try to show the assembly instructions:
| |
Alright, nothing happened. What happened? Well, we are not displaying the assembly code, use the di command to show the disassembly:
| |
Now we can use ni +di to view the steps in the assembly code.
You can continue playing with that on your own. Let’s now create a custom configuration that will be helpful when we are reverse engineering a binary.
LLDB custom hooks
We can pass as an argument to lldb of a file that contains lldb instructions to be executed when the debugger is executed.
That could be useful, but it becomes much better when we add to that file some lldb hooks. We can define some hooks that will run when the debugger stops (in each step or breakpoint). Create a file revengsetup with the following content:
| |
What we are doing is adding hooks that display useful information on the state of the registers, the stack, and disassembly code of the current instructions.
Let’s try it out with our a.out.
| |
Run the command, and you’ll be able to see all the information on your screen. Very handy.
Final thoughts
There is a lot to reverse engineering than just using a debugger, but it is useful to become proficient with one. This was just a short introduction to get you started, there are more resources out there on the Internet. I wrote this post because the information I found was mostly directed to GDB, and the GDB information was also hidden between assembly language tutorials or books. I wanted to present you with a concise way to jump into lldb without having to thread through lots of pages of how to write assembly. I hope you find it useful.
Let me know what you think, as always, feedback is welcomed.
And also let me know what are you reverse engineering, it is always fun to talk about this stuff.
Related topics/notes of interest
- The GDB to LLDB command map, useful because there is a lot of information on how to use GDB but less on LLDB so if you learn how to do it on GDB then you might find the equivalent on LLDB in that link.
- A stack overflow answer that explains the difference between GDB and LLDB, a simple explanation.
- If you want to learn about assembly, I would recommend http://opensecuritytraining.info/Training.html.
- Also, Reverse Engineering for Beginners.
- Reverse Engineering subreddit, a lot of useful information there.