Let's write some assembly code in macOS for Intel x86_64 Dec 12 2019

This is going to be a small article on the basics of working with Assembly Language. We won’t go deep into building extensive programs in assembly. The main idea of this post is to clarify the workflow for creating an assembly program and some key concepts so you can comfortably begin your assembly explorations.

Let’s first learn about the different assembly syntaxes and types.

Different types of assembly languages

Assembly languages were created to be an abstraction above machine code (The actual 1’s and 0’s). But the abstraction is not separate from the hardware that runs it. What this means is that depending on the hardware, we will have different assembly languages. The ISA (Instruction Set Architecture) is the definition of the registers, data types, and instructions supported by a specific computer architecture. The ISA, as you can imagine, changes depending on the hardware. Different architectures are one reason that influences the existence of multiple assembly languages.

Another reason to have different types of assembly languages is the assembler. The assembler is the program that translates from the higher-level assembly language to machine code.

In this post, we are going to focus on Intel’s x86 processor. Just because macOS laptops, at the time of writing, run on x86 processors. For x86 we have many Assembler programs (NASM, GAS, YASM, and many more), and each support its own “style” of assembly. We have two main syntax branches for x86, Intel and AT&T (You can read some of the differences in this IBM article).

In summary, we have different assembly languages depending on the architecture, and also depending on the assembler program.

If you want to write assembly, compile it and run it on your computer, you need to make sure you are using the assembly language and assembler that matches your architecture.

Enough background, let’s write some code.

Writing our first hello world program!

Create a file hello_intel.asm with the following content (We’ll use Intel syntax):

1
2
3
4
5
6
7
8
9
10
11
12
13
  section  .data
message: db    "Hello, World!", 0Ah, 00h
  global  _main
  section  .text
_main:
  mov    rax, 0x02000004    ; system call for write
  mov    rdi, 1             ; file descriptor 1 is stdout
  mov    rsi, qword message ; get string address
  mov    rdx, 13            ; number of bytes
  syscall                   ; execute syscall (write)
  mov    rax, 0x02000001    ; system call for exit
  mov    rdi, 0             ; exit code 0
  syscall                   ; execute syscall (exit)

Now we can generate the object file using yasm. If you don’t have it installed on your computer, you can install it using Homebrew package manager.

1
2
$ yasm -f macho64 hello_intel.asm
# this generates hello_inte.o object file

Now we have to use the linker to link it to the system’s dylibs (dynamic libraries).

1
2
$ ld -lSystem -o hello_intel hello_intel.o
# this will generate hello_intel executable

If we run it we’ll get our desired output:

1
2
$ ./hello_intel
Hello, World!

To show you the difference between Intel and AT&T syntax, we are going to write the same program but this time on AT&T syntax. We’ll be using the command as. as is the assembler that comes by default in macOS, and as it’s common on *nix systems, it uses the AT&T syntax. Let’s create a new file, hello_atnt.asm with the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
.section __DATA, __data
message:
  .asciz "Hello world!\n"
.section __TEXT, __text
  .globl  _main
_main:
  mov   $0x02000004, %rax            # system call for write
  mov   $1, %rdi                     # file descriptor 1 is stdout
  movq  message@GOTPCREL(%rip), %rsi # get string address
  mov   $13, %rdx                    # number of bytes
  syscall                            # execute syscall (write)
  mov   $0x02000001, %rax            # system call for exit
  xor   $0, %rdi                     # exit code 0
  syscall                            # execute syscall (exit)

As you can see, in AT&T syntax, there are more macros, and the order of the operands is different. Intel syntax feels like we are doing rax = 0x02000004 and in AT&T it feels more like $0x02000004 -> %rax. Let’s generate the object file:

1
2
$ as hello_atnt.asm -o hello_atnt.o
#we specify the object file to be hello_atnt.o

Now we can link it in the same way we did with the Intel assembly.

1
2
$ ld -lSystem -o hello_atnt hello_atnt.o
# we get the executable hello_atnt

And if we run the executable we get what we were expecting:

1
2
$ ./hello_atnt
Hello, world!

Great! We created a simple executable from assembly code. From here you can start exploring the exciting world of Assembly language on macOS.

Final thoughts

When searching assembly language examples, most of them are from the reverse engineering perspective. Which makes sense, fewer people write a whole program in assembly. I think the understanding is complete if we can also write even a simple assembly program.

Anyways, I hope this small post was helpful or at least entertaining :). Let me know what you think, and if you know of any useful assembly language resources, send them my way.

Related topics/notes of interest


** There is no comment system yet, but you can send me a message on twitter @rderik or send me an email: derik[at]rderik[dot]com.