Let's write some assembly code in macOS for Intel x86_64 Dec 12 2019
This is going to be a small article on the basics of working with Assembly Language. We won’t go deep into building extensive programs in assembly. The main idea of this post is to clarify the workflow for creating an assembly program and some key concepts so you can comfortably begin your assembly explorations.
Let’s first learn about the different assembly syntaxes and types.
Different types of assembly languages
Assembly languages were created to be an abstraction above machine code (The actual 1’s and 0’s). But the abstraction is not separate from the hardware that runs it. What this means is that depending on the hardware, we will have different assembly languages. The ISA (Instruction Set Architecture) is the definition of the registers, data types, and instructions supported by a specific computer architecture. The ISA, as you can imagine, changes depending on the hardware. Different architectures are one reason that influences the existence of multiple assembly languages.
Another reason to have different types of assembly languages is the assembler. The assembler is the program that translates from the higher-level assembly language to machine code.
In this post, we are going to focus on Intel’s x86 processor. Just because macOS laptops, at the time of writing, run on x86 processors. For x86 we have many Assembler programs (NASM, GAS, YASM, and many more), and each support its own “style” of assembly. We have two main syntax branches for x86, Intel and AT&T (You can read some of the differences in this IBM article).
In summary, we have different assembly languages depending on the architecture, and also depending on the assembler program.
If you want to write assembly, compile it and run it on your computer, you need to make sure you are using the assembly language and assembler that matches your architecture.
Enough background, let’s write some code.
Writing our first hello world program!
Create a file
hello_intel.asm with the following content (We’ll use Intel syntax):
1 2 3 4 5 6 7 8 9 10 11 12 13 section .data message: db "Hello, World!", 0Ah, 00h global _main section .text _main: mov rax, 0x02000004 ; system call for write mov rdi, 1 ; file descriptor 1 is stdout mov rsi, qword message ; get string address mov rdx, 13 ; number of bytes syscall ; execute syscall (write) mov rax, 0x02000001 ; system call for exit mov rdi, 0 ; exit code 0 syscall ; execute syscall (exit)
Now we can generate the object file using
yasm. If you don’t have it installed on your computer, you can install it using Homebrew package manager.
1 2 $ yasm -f macho64 hello_intel.asm # this generates hello_inte.o object file
Now we have to use the linker to link it to the system’s dylibs (dynamic libraries).
1 2 $ ld -lSystem -o hello_intel hello_intel.o # this will generate hello_intel executable
If we run it we’ll get our desired output:
1 2 $ ./hello_intel Hello, World!
To show you the difference between Intel and AT&T syntax, we are going to write the same program but this time on AT&T syntax. We’ll be using the command
as is the assembler that comes by default in macOS, and as it’s common on *nix systems, it uses the AT&T syntax. Let’s create a new file,
hello_atnt.asm with the following content:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 .section __DATA, __data message: .asciz "Hello world!\n" .section __TEXT, __text .globl _main _main: mov $0x02000004, %rax # system call for write mov $1, %rdi # file descriptor 1 is stdout movq message@GOTPCREL(%rip), %rsi # get string address mov $13, %rdx # number of bytes syscall # execute syscall (write) mov $0x02000001, %rax # system call for exit xor $0, %rdi # exit code 0 syscall # execute syscall (exit)
As you can see, in AT&T syntax, there are more macros, and the order of the operands is different. Intel syntax feels like we are doing
rax = 0x02000004 and in AT&T it feels more like
$0x02000004 -> %rax. Let’s generate the object file:
1 2 $ as hello_atnt.asm -o hello_atnt.o #we specify the object file to be hello_atnt.o
Now we can link it in the same way we did with the Intel assembly.
1 2 $ ld -lSystem -o hello_atnt hello_atnt.o # we get the executable hello_atnt
And if we run the executable we get what we were expecting:
1 2 $ ./hello_atnt Hello, world!
Great! We created a simple executable from assembly code. From here you can start exploring the exciting world of Assembly language on macOS.
When searching assembly language examples, most of them are from the reverse engineering perspective. Which makes sense, fewer people write a whole program in assembly. I think the understanding is complete if we can also write even a simple assembly program.
Anyways, I hope this small post was helpful or at least entertaining :). Let me know what you think, and if you know of any useful assembly language resources, send them my way.
Related topics/notes of interest
- NASM tutorial
- GAS Examples
- Stack overflow - minimal Mach-O 64 binary
- IBM - Linux Assemblers: a comparison of GAS and NASM, shows the difference between Intel syntax and At&t.
- Reverse engineering for Beginners
- Open Security Training - Intro to x86