Scanning a process' memory using LLDB Mar 23 2020

When performing dynamic analysis, a useful source of information is examining the process’ memory for specific patterns. For example, imagine we would like to obtain information about the current process’ code signature. To get this information, we could search for the specific magic ( CSMAGIC_EMBEDDED_SIGNATURE - 0xfade0cc0 you can verify it in codesign.h), and obtain where that structure is kept in memory. In this post, I’ll show you how to use the Python API provided by the lldb debugger to scan a process’ memory for patterns.

Let’s get started.

Using LLDB’s Python API

First, let’s take LLDB Script Binding capabilities for a spin. Open lldb in your terminal and let’s start exploring.

1
2
3
4
$ lldb
(lldb) script name = "Derik"
(lldb) script print("Hello, {}!".format(name))
Hello, Derik!

As you can see, we have access to using Python in our lldb sessions. We can do pretty much any task you would expect, like check our Python path:

1
2
(lldb) script import sys; print(sys.path)
['/Applications/Xcode.app/Contents/SharedFrameworks/LLDB.framework/Versions/A', '/Applications/Xcode.app/Contents/SharedFrameworks/LLDB.framework/Resources/Python3', '/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python37.zip', '/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7', '/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/lib-dynload', '/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/site-packages', '.']

We could also check the python version:

1
2
3
(lldb) script import sys; print(sys.version)
3.7.3 (default, Dec 13 2019, 19:58:14)
[Clang 11.0.0 (clang-1100.0.33.17)]

Alright, that’s the Python you already know. Where it gets interesting is when you have access to the lldb API through the Scripting Bindings. The bindings provide us with access to a few handy objects that we can use to obtain information and interact with our debugging session.

If you want to follow the examples with me, we’ll create and use an Xcode project. The approach we’ll use here is handy when using the Xcode LLDB debugger. If you are reversing a binary or doing black-box testing where you don’t have the source code, you can set up a reversing lab to circumvent SIP.

Comment on SIP and having reversing lab

I feel comfortable using LLDB, so to use lldb for dynamic analysis in my reversing sessions, I need lldb to be able to attach to any running process. macOS out of the box has restrictions, which I’m happy they implement, that prevent you from messing with other running processes. These restrictions are enforced by what is commonly referred to as SIP (System Integrity Protection). If you try to attach lldb to a running process you’ll get a message saying that you can’t:

1
2
3
$ lldb -p 47615
(lldb) process attach --pid 47615
error: attach failed: cannot attach to process due to System Integrity Protection

If you want lldb to be able to attach to any running processes, you’ll have to disable SIP (This makes your system less secure). I like having SIP enabled on my day to day machine, it makes me feel safe. What I do is I have a virtual machine running a version of macOS where I disable SIP, and I use that machine as my lab.

Also, if you want to use LLDB to debug iOS apps, you’ll need a jailbroken device (to install the debug server). I won’t explain how to disable SIP or how to jailbreak your phone, but it’s not a complicated process, and you’ll find many posts and information on the internet.

In this post, we’ll focus on using the Python binding to scan a process memory via Xcode that doesn’t require us to disable SIP. Later, you can apply the same knowledge with any other processes in your lab. So let’s begin by creating an Xcode project.

Creating our demo project

Open Xcode and create a new project. You can select Swift or Objective-C; it doesn’t matter. Just create a new macOS application project. I’ll name my project “MemScanDemo”.

Build it, run it and pause the program to enter the debugger (You can press Ctr+Command+Y, or press the pause button on your debugging bar). This will open lldb, where you’ll have access to the scripting API.

Let’s see it in action.

Script Bridging API

We already saw how to execute Python commands on lldb, now let me introduce you to a few objects provided by lldb. The first object we’ll explore will be the lldb.debugger object. This object allows us to interact with the debugging session from Python. Let’s see what information we can get of our target:

1
2
3
4
5
(lldb) script target = lldb.debugger.GetSelectedTarget()
(lldb) script print(target)
MemScanDemo
(llldb) script print(target.__class__)
<class 'lldb.SBTarget'>

We can use the API’s documentation(https://lldb.llvm.org/python_reference/) to see what information SBTarget provides. We can see from the SBTarget documentation, that we can obtain how many modules the current target has:

1
2
(lldb) script print(target.GetNumModules())
456

That’s a lot of modules!

We can also iterate and display them if we want to. If we only use the command script it’ll allow us to enter multiline commands, it’ll stop the interpreter when we type quit in a single line.

1
2
3
4
5
6
7
8
(lldb) script
>>>for i in range(target.GetNumModules()):
>>>  print(target.GetModuleAtIndex(i))
#press enter twice to break from the indentation scope and execute the for loop
>>>
# You should see a list of all modules
>>>quit
(lldb)

It’s interesting how many modules are included in just this empty project. From “General Knowledge” we get that on 1000 lines of code the average code base will have between 1 to 25 bugs. That makes you wonder how the software industry thrives. Might be a “good” statistic if you are a Bug Bounty hunter, tho.

Anyways, I think you can see how useful accessing the LLDB API through Python is. We can manipulate and analyse data with our scripts. It’ll be tedious to write every line of Python directly in LLDB REPL. To avoid that, we can write our script in a file and then map it to a custom command in LLDB.

Creating a Python script and using it on LLDB

Create a file where ever makes sense in your file system structure. I’ll just add it to my ~/.config/lldb/scripts/ directory. I’ll call our file: memscan.py. Add the following content:

1
2
def greeting():
	print("Hola Mundo!")

We can import it from LLDB:

1
2
3
4
(lldb) command script import ~/.config/lldb/scripts/memscan.py
(lldb) script memscan.greeting()
Hola Mundo!
(lldb)

Excellent, now we can make use of the Python API.

But first, let’s do one additional step to make it even better.

Creating an LLDB command out of a Python script

When we run the command: command script import, LLDB will look for a function called __lldb_init_module that will be executed when the Python package is loaded. The function has the following signature:

1
def __lldb_init_module(debugger, internal_dict):

Remember, we have access to the lldb debugger from the API, that means we have access to LLDB commands. With that in mind, what we can do is execute an LLDB command that creates an LLDB command using our script.

Change our memscan.py file to have the following content:

1
2
3
4
5
6
def __lldb_init_module(debugger, internal_dict):
  debugger.HandleCommand(
  'command script add -f memscan.scan_memory memscan')

def scan_memory(debugger, arguments, result, internal_dict):
  print("Hola Mundo!")

We are adding an LLDB command with the name memscan. When LLDB executes memscan it will automatically pass four arguments:

The current debugger instance.
The arguments passed to the command (in the documentation this parameter is called command, but I find arguments more descriptive).
A result object that will be passed back to LLDB after execution.
A Python Dict object with the execution context (variables and functions) of the current embedded script session.

If we try to run the memscan command now we’ll get an error:

1
2
(lldb) memscan
error: 'memscan' is not a valid command.

That is because we need to reload our script:

1
2
3
(lldb) command script import ~/.config/lldb/scripts/memscan.py
(lldb) memscan
Hola Mundo!

How cool is that?

One more thing, thanks Dave Lee for sharing the following tip:

Instead of declearing the function:

1
2
3
def __lldb_init_module(debugger, internal_dict):
  debugger.HandleCommand(
  'command script add -f memscan.scan_memory memscan')

We can use a decorator provided by LLDB! (If you need a refresher on decorators, read this post)

Using the decorator our code will look like this:

1
2
3
@lldb.command ("memscan")
def scan_memory(debugger, arguments, result, internal_dict):
  print("Hola Mundo!")

Back to our post, Thanks Dave!

This post is not a deep dive into LLDB Script Binding. If you would like to learn more, I encourage you to read Derek Selander’s book Advanced Apple Debugging and Reverse Engineering.You should also take a gander on his GitHub repository on LLDB aliases, commands, and scripts. If buying the books seems like too much of an investment at the moment, you can have a look at LLDB’s official documentation for Python Scripting.

With the basic examples out of the way, we can move on to scanning our current process’s memory. I’ll assume you know basic Python programming and understand how to bind the scripts to make an LLDB command. If you need a refresher on Python or would like to go more in-depth into LLDB Script Binding, now is the time, from this point on I’ll assume we have the basics covered.

Let’s work on our script.

Our memory scanning script

As an example, we’ll be searching for CSMAGIC_EMBEDDED_SIGNATURE (0xfade0cc0) in our process’ memory. Our function is not that complex. We’ll make use of SBProcess - ReadMemory method to get the bytes in memory. The signature for ReadMemory is the following:

1
def ReadMemory(initial_address, bytes_to_read, error):

initial_address is the memory address to start reading from.
bytes_to_read the number of bytes to read.
error an instance of SBError from where we can check if the read was successful.

The function returns a bytearray object with the content of the memory. With access to the bytearray, we are only left with the task of searching for the pattern.

Our script will search for a hex pattern in one of the process image segments or sections. Our basic command usage will look like this:

1
(lldb) memscan pattern -s [SEGMENT|SECTION]

If you want a refresher on the Mach-O format, you should check this post by Objc.io on Mach-O executables.

The core of our function will receive the blob of memory we read from the process, and we’ll search for our pattern there. This is how we’ll scan for our pattern:

1
2
3
4
5
6
7
def find_in_blob(blob,sequence):
  if(len(blob) < len(sequence)):
    return
  for i, x in enumerate(blob):
    if(bytes(blob[i:i+len(sequence)] == sequence)):
      return i
  return

You might be asking, how do we get that blob of memory?

As I mentioned before, we have access to objects that represent different parts of our LLDB session via the Python Script Binding. We are going to use these objects to access to the current module’s sections and go from there. To get a reference to our module, we are going to do this:

1
2
3
  process = debugger.GetSelectedTarget().GetProcess()
  target = debugger.GetSelectedTarget()
  module = target.GetModuleAtIndex(0)

From our module, we are going to obtain the SBSection. From our SBSection object, we’ll get the address where our module was loaded in the current target.

1
2
3
4
5
  section = module.FindSection(section_name)
  start_addr = section.GetLoadAddress(target)
  bytes_to_read = section.GetByteSize()
  err = lldb.SBError()
  section_blob = process.ReadMemory(start_addr,bytes_to_read, err)

Now that we have the memory blob, we have everything we need to call our find_in_blob function. With the blob and the pattern we would call our function like so:

1
  match = findInBlob(bytes(linkedit_blob),bytes([0xfa,0xde,0x0c,0xc0]))

That’s the main idea. We get the blob and then search for our pattern.

I’m going to show you the whole script putting everything together, including the arguments supported by the script, validations and all.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
import lldb
import shlex
import optparse

def __lldb_init_module(debugger, internal_dict):
  debugger.HandleCommand(
  'command script add -f memscan.scan_memory memscan')

def scan_memory(debugger, arguments, result, internal_dict):
'
  Displays the offset of the match in memory of the pattern.
'

  process = debugger.GetSelectedTarget().GetProcess()
  target = debugger.GetSelectedTarget()
  module = target.GetModuleAtIndex(0)

   
  raw_args = shlex.split(arguments)
  if len(raw_args) == 0 or raw_args == ['-h']:
    parser.print_help()
    return


  # `get_values` returns a tuple (pattern_bytes, section) from the command arguments
  # but also makes sure everything is valid
  values = get_values(generate_option_parser(), raw_args, module, result)
  (pattern_bytes, section) = (None, None)
  if result.Succeeded():
    (pattern_bytes, section) = values
  else:
    return

  start_addr = section.GetLoadAddress(target)
  bytes_to_read = section.GetByteSize()
  err = lldb.SBError()
  section_blob = process.ReadMemory(start_addr,bytes_to_read, err)


  offset_found = find_in_blob(section_blob, pattern_bytes)
  if not offset_found:
    print("Pattern not found")
    return
  print("Offset from section {}({}): {}".format(section.GetName(), hex(start_addr), hex(offset_found)))
  print("Full offset: {}".format(hex(start_addr + offset_found)))
   

def find_in_blob(blob,sequence):
  if(len(blob) < len(sequence)):
    return
  for i, x in enumerate(blob):
    if(bytes(blob[i:i+len(sequence)] == sequence)):
      return i
  return

def get_values(parser, raw_args, module, result):
  try:
    (options, args) = parser.parse_args(raw_args)
  except:
    result.SetError(parser.usage)
    return

  if not args:
    result.SetError("missing pattern to search.\n{}".format(parser.usage))
    return

  pattern = 0 
  try:
    pattern = int(args[0], base=16)
  except:
    result.SetError("parsing argument. Argument should be in hex")
    return

  if not options.endianness in ["big", "little"]:
    result.SetError("endianness can only be 'little' or 'big'")
    return

  if not options.section:
    result.SetError("missing section, please specify section.\n{}".format(parser.usage))
    return

  section = module.FindSection(options.section)
  if not section:
    result.SetError("Couldn't find section: {}".format(options.section))
    return

  pattern_bytes = pattern.to_bytes(pattern.bit_length() // 8,byteorder=options.endianness)
  return (pattern_bytes, section)

def generate_option_parser():
  usage = "usage: memscan <PATTERN> <-s|--section> <SECTION_NAME> [options]"
  parser = optparse.OptionParser(usage=usage)
  parser.add_option("-s", "--section",
           action="store",
           default=None,
           dest="section",
           help="Define the section to search for pattern")
  parser.add_option("-e", "--endianness",
           action="store",
           default='big',
           dest="endianness",
           help="Define pattern endianness")
  return parser

I tried to break the code in logical blocks; that way, we don’t get tangled in validation code and option generation. Take your time and go through the code. Once you understand the idea, continue reading.

Running our script

Alright, following our example we are going to search for the the CSMAGIC_EMBEDDED_SIGNATURE (0xfade0cc0). The embedded signature should be located in the __LINKEDIT section (unless you are using the Xcode simulator, it’ll be in __TEXT.__entitlements). Let’s reload our script and test it:

1
2
3
4
5
(lldb) command script import ~/.config/lldb/scripts/memscan.py
(lldb) memscan 0xfade0cc0 -s __LINKEDIT
Offset from section __LINKEDIT(0x100005000): 0x3200
Full offset: 0x100008200
(lldb)

We can check the memory at that offset just to validate:

1
2
3
4
5
6
7
8
9
(lldb) x/32wx 0x100008200
0x100008200: 0xc00cdefa 0x29170000 0x04000000 0x00000000
0x100008210: 0x2c000000 0x02000000 0x6e020000 0x05000000
0x100008220: 0x2a030000 0x00000100 0x9f040000 0x020cdefa
0x100008230: 0x42020000 0x00050200 0x00000100 0x22010000
0x100008240: 0x60000000 0x05000000 0x09000000 0x00820000
0x100008250: 0x0c000220 0x00000000 0x00000000 0x77000000
0x100008260: 0x00000000 0x00000000 0x00000000 0x00000000
0x100008270: 0x00000000 0x00000000 0x00000000 0x00000000

Cool, it’s there!

Because you are very observant, you might have noticed that the memory display is in little-endian. The default endianness of our script (if you read through the code) is big-endian, but we can also do the same search using little-endian.

1
2
3
4
(lldb) memscan 0xc00cdefa -s __LINKEDIT -e little
Offset from section __LINKEDIT(0x100005000): 0x3200
Full offset: 0x100008200
(lldb)

And we get the same result, that could also be handy.

Final Thoughts

We now can scan sections of our process to find specific patterns, that will come in handy when doing dynamic analysis of binaries. The script I shared with you is for you to use as a base; you should be able to modify it to suit your purposes. I hope you find it useful. I used a similar script to obtain the entitlements of the binary I was analysing.

Remember, there are many ways to solve problems, and rare are the cases when only one solution exists. We could have done something similar using Frida, in the end, the tool you use doesn’t matter that much what matters is how you use it.

I encourage you to play around with LLDB Script Binding capabilities, the documentation is not the best but after a combination of experimentation, googling, and source code reading you’ll have a powerful tool at your disposal.

A good resource to learn how to wield LLD like a pro is Derek Selander’s book Advanced Apple Debugging and Reverse Engineering. As with any book, some parts have gotten outdated. Still, I find that useful to verify that I understand the topic and I can make the changes to adapt to the updates Apple has made to its code.
If you want to learn more what’s in the __LINKEDIT section this DYLD article by J. Levin is great.
The LLDB Python reference describes the LLDB Python object’s API.
If for some reason you want to explore other dynamic instrumentation tools have a look at Frida.

** If you want to check what else I'm currently doing, be sure to follow me on twitter @rderik or subscribe to the newsletter. If you want to send me a direct message, you can send it to derik@rderik.com.