Scanning a process' memory using LLDB
When performing dynamic analysis, a useful source of information is examining the process’ memory for specific patterns. For example, imagine we would like to obtain information about the current process’ code signature. To get this information, we could search for the specific magic ( CSMAGIC_EMBEDDED_SIGNATURE - 0xfade0cc0 you can verify it in codesign.h), and obtain where that structure is kept in memory. In this post, I’ll show you how to use the Python API provided by the lldb debugger to scan a process’ memory for patterns.
Let’s get started.
Using LLDB’s Python API
First, let’s take LLDB Script Binding capabilities for a spin. Open lldb in your terminal and let’s start exploring.
| |
As you can see, we have access to using Python in our lldb sessions. We can do pretty much any task you would expect, like check our Python path:
| |
We could also check the python version:
| |
Alright, that’s the Python you already know. Where it gets interesting is when you have access to the lldb API through the Scripting Bindings. The bindings provide us with access to a few handy objects that we can use to obtain information and interact with our debugging session.
If you want to follow the examples with me, we’ll create and use an Xcode project. The approach we’ll use here is handy when using the Xcode LLDB debugger. If you are reversing a binary or doing black-box testing where you don’t have the source code, you can set up a reversing lab to circumvent SIP.
Comment on SIP and having reversing lab
I feel comfortable using LLDB, so to use lldb for dynamic analysis in my reversing sessions, I need lldb to be able to attach to any running process. macOS out of the box has restrictions, which I’m happy they implement, that prevent you from messing with other running processes. These restrictions are enforced by what is commonly referred to as SIP (System Integrity Protection). If you try to attach lldb to a running process you’ll get a message saying that you can’t:
| |
If you want lldb to be able to attach to any running processes, you’ll have to disable SIP (This makes your system less secure). I like having SIP enabled on my day to day machine, it makes me feel safe. What I do is I have a virtual machine running a version of macOS where I disable SIP, and I use that machine as my lab.
Also, if you want to use LLDB to debug iOS apps, you’ll need a jailbroken device (to install the debug server). I won’t explain how to disable SIP or how to jailbreak your phone, but it’s not a complicated process, and you’ll find many posts and information on the internet.
In this post, we’ll focus on using the Python binding to scan a process memory via Xcode that doesn’t require us to disable SIP. Later, you can apply the same knowledge with any other processes in your lab. So let’s begin by creating an Xcode project.
Creating our demo project
Open Xcode and create a new project. You can select Swift or Objective-C; it doesn’t matter. Just create a new macOS application project. I’ll name my project “MemScanDemo”.
Build it, run it and pause the program to enter the debugger (You can press Ctr+Command+Y, or press the pause button on your debugging bar). This will open lldb, where you’ll have access to the scripting API.
Let’s see it in action.
Script Bridging API
We already saw how to execute Python commands on lldb, now let me introduce you to a few objects provided by lldb. The first object we’ll explore will be the lldb.debugger object. This object allows us to interact with the debugging session from Python. Let’s see what information we can get of our target:
| |
We can use the API’s documentation(https://lldb.llvm.org/python_reference/) to see what information SBTarget provides. We can see from the SBTarget documentation, that we can obtain how many modules the current target has:
| |
That’s a lot of modules!
We can also iterate and display them if we want to. If we only use the command script it’ll allow us to enter multiline commands, it’ll stop the interpreter when we type quit in a single line.
| |
It’s interesting how many modules are included in just this empty project. From “General Knowledge” we get that on 1000 lines of code the average code base will have between 1 to 25 bugs. That makes you wonder how the software industry thrives. Might be a “good” statistic if you are a Bug Bounty hunter, tho.
Anyways, I think you can see how useful accessing the LLDB API through Python is. We can manipulate and analyse data with our scripts. It’ll be tedious to write every line of Python directly in LLDB REPL. To avoid that, we can write our script in a file and then map it to a custom command in LLDB.
Creating a Python script and using it on LLDB
Create a file where ever makes sense in your file system structure. I’ll just add it to my ~/.config/lldb/scripts/ directory. I’ll call our file: memscan.py. Add the following content:
| |
We can import it from LLDB:
| |
Excellent, now we can make use of the Python API.
But first, let’s do one additional step to make it even better.
Creating an LLDB command out of a Python script
When we run the command: command script import, LLDB will look for a function called __lldb_init_module that will be executed when the Python package is loaded. The function has the following signature:
| |
Remember, we have access to the lldb debugger from the API, that means we have access to LLDB commands. With that in mind, what we can do is execute an LLDB command that creates an LLDB command using our script.
Change our memscan.py file to have the following content:
| |
We are adding an LLDB command with the name memscan. When LLDB executes memscan it will automatically pass four arguments:
- The current debugger instance.
- The arguments passed to the command (in the documentation this parameter is called
command, but I findargumentsmore descriptive). - A result object that will be passed back to LLDB after execution.
- A Python Dict object with the execution context (variables and functions) of the current embedded script session.
If we try to run the memscan command now we’ll get an error:
| |
That is because we need to reload our script:
| |
How cool is that?
One more thing, thanks Dave Lee for sharing the following tip:
Instead of declearing the function:
| |
We can use a decorator provided by LLDB! (If you need a refresher on decorators, read this post)
Using the decorator our code will look like this:
| |
Back to our post, Thanks Dave!
This post is not a deep dive into LLDB Script Binding. If you would like to learn more, I encourage you to read Derek Selander’s book Advanced Apple Debugging and Reverse Engineering.You should also take a gander on his GitHub repository on LLDB aliases, commands, and scripts. If buying the books seems like too much of an investment at the moment, you can have a look at LLDB’s official documentation for Python Scripting.
With the basic examples out of the way, we can move on to scanning our current process’s memory. I’ll assume you know basic Python programming and understand how to bind the scripts to make an LLDB command. If you need a refresher on Python or would like to go more in-depth into LLDB Script Binding, now is the time, from this point on I’ll assume we have the basics covered.
Let’s work on our script.
Our memory scanning script
As an example, we’ll be searching for CSMAGIC_EMBEDDED_SIGNATURE (0xfade0cc0) in our process’ memory. Our function is not that complex. We’ll make use of SBProcess - ReadMemory method to get the bytes in memory. The signature for ReadMemory is the following:
| |
initial_addressis the memory address to start reading from.bytes_to_readthe number of bytes to read.erroran instance ofSBErrorfrom where we can check if the read was successful.
The function returns a bytearray object with the content of the memory. With access to the bytearray, we are only left with the task of searching for the pattern.
Our script will search for a hex pattern in one of the process image segments or sections. Our basic command usage will look like this:
| |
If you want a refresher on the Mach-O format, you should check this post by Objc.io on Mach-O executables.
The core of our function will receive the blob of memory we read from the process, and we’ll search for our pattern there. This is how we’ll scan for our pattern:
| |
You might be asking, how do we get that blob of memory?
As I mentioned before, we have access to objects that represent different parts of our LLDB session via the Python Script Binding. We are going to use these objects to access to the current module’s sections and go from there. To get a reference to our module, we are going to do this:
| |
From our module, we are going to obtain the SBSection. From our SBSection object, we’ll get the address where our module was loaded in the current target.
| |
Now that we have the memory blob, we have everything we need to call our find_in_blob function. With the blob and the pattern we would call our function like so:
| |
That’s the main idea. We get the blob and then search for our pattern.
I’m going to show you the whole script putting everything together, including the arguments supported by the script, validations and all.
| |
I tried to break the code in logical blocks; that way, we don’t get tangled in validation code and option generation. Take your time and go through the code. Once you understand the idea, continue reading.
Running our script
Alright, following our example we are going to search for the the CSMAGIC_EMBEDDED_SIGNATURE (0xfade0cc0). The embedded signature should be located in the __LINKEDIT section (unless you are using the Xcode simulator, it’ll be in __TEXT.__entitlements). Let’s reload our script and test it:
| |
We can check the memory at that offset just to validate:
| |
Cool, it’s there!
Because you are very observant, you might have noticed that the memory display is in little-endian. The default endianness of our script (if you read through the code) is big-endian, but we can also do the same search using little-endian.
| |
And we get the same result, that could also be handy.
Final Thoughts
We now can scan sections of our process to find specific patterns, that will come in handy when doing dynamic analysis of binaries. The script I shared with you is for you to use as a base; you should be able to modify it to suit your purposes. I hope you find it useful. I used a similar script to obtain the entitlements of the binary I was analysing.
Remember, there are many ways to solve problems, and rare are the cases when only one solution exists. We could have done something similar using Frida, in the end, the tool you use doesn’t matter that much what matters is how you use it.
I encourage you to play around with LLDB Script Binding capabilities, the documentation is not the best but after a combination of experimentation, googling, and source code reading you’ll have a powerful tool at your disposal.
Related topics/notes of interest
- A good resource to learn how to wield LLD like a pro is Derek Selander’s book Advanced Apple Debugging and Reverse Engineering. As with any book, some parts have gotten outdated. Still, I find that useful to verify that I understand the topic and I can make the changes to adapt to the updates Apple has made to its code.
- If you want to learn more what’s in the
__LINKEDITsection this DYLD article by J. Levin is great. - The LLDB Python reference describes the LLDB Python object’s API.
- If for some reason you want to explore other dynamic instrumentation tools have a look at Frida.