64bit x86_64 APIs are Magic But Not The Way You Think

This is probably going to confuse the hell out of a lot of people, but what if I told you that for most applications you ever write, can now write them in portable 64 bit assembler?

I discovered this factoid on porting an old application I started writing in NASM for building on YASM on Mac OS X 10.9. The original system I wrote the code on was a Mac OS X 10.5 machine, and I had implemented a set of scripts which generated assembler macros based on the /usr/include/sys/.h header files. I could in theory call every BSD syscall underlying the Mac operating system, and this means I had access to files, network sockets, memory allocation, and raw hardware. It turns out that you don't need a lot of code to do the typical things you do to write things like a web server. For example I could make a HTTP request in assembler using little more than socket, bind, connect, read, and write. I also wrote a bare bones web server with socket, bind, listen, accept, read and write. These are incredibly small programs, and since they don't include anything like libc or the like, they start and stop nearly instantly.

The trickiest thing to do is actually understand what the C libraries are expecting as their data structures. The horrific struct sockaddr_in
used by bind, connect, sendto, and recvfrom looks like:

.size: db 0
.family: db 2
.port: db 0x1F,0x90
.ipaddr: db 0,0,0,0
.padding: dq 0

In this example, I have a PF_INET (AF_INET) socket listening on IPv4's (IN_ADDR_ANY) address on port 0x1f90 (8080 http-alt). This code is no more difficult than the C code, and actually is much much easier to understand what is going on. Since .port and .ipaddr are in network byte order, it turns out that it is really easy to change things like .ipaddr, just write the number separated by , instead of . and you're done: 127,0,0,1 instead of htonl(127 << 24 + 1);

Similarly all of your syscalls are going to look like:

mov rdi, 2 ; PF_INET
mov rax, socket
mov r12, rax ; store fd in r12 as it is preserved across syscalls

Where the registers are going to follow this scheme:

; rax = syscall
; rdi = arg1
; rsi = arg2
; rdx = arg3
; r10 = arg4 ; rcx on 10.5 or sooner
; r8 = arg5
; r9 = arg6

What shocked me was that my code was only broken because of a ABI breakage on Mac OS X, where they went from using rcx to pass arg4 to the OS to using r10, which is the same as Linux. This standardization meant that with a small script:

cat /usr/include/sys/syscall.h | grep -v "old " | grep "^#define" | sed 's%#define%\\%define%' | sed 's%SYS_%%' | sed 's%$$% + 0x2000000%' | tail -n +3 > syscall.asm

I could generate a macros for all of the posix calls on Mac OS X, and a second script:

cat /usr/include/asm/unistd_64.h | grep -v "old " | grep "^#define" | sed 's%#define%\\%define%' | sed 's%__NR_%%' | tail -n +2

Does the same for Linux. With only some minor tweaks on specific data structures like those used by fstat, the code for both operating systems is nearly identical. I have very few places in the code where there are %ifdef %else macro blocks to customize the behavior to a specific operating system. My largest assembler project has only 2 points where I had to resort to that. And when you think about that, the sum different between the two platforms is:
And you'll realize that assembler has become much more portable. That said, as soon as you want to venture into the realms that make each OS unique, you're SOL. Part of the reason I have projects like my Framebuffer Server, is to ignore the difference between the OSes, by virtualizing the OS device interface. Since the OS is standing between me and the hardware, I've decided to build virtual devices that work like the hardware interfaces I wish I had access to. This further removes the differences. Hopefully I'll soon release my Audio Server, and be able to play PCM data by dd'ing to a file. But since I have
assembler support for mmapping a shared file, I can always code my version in assembler. And since projects like SDL make it easy to program cross platform multi-media applications, I can run my virtual devices on both Mac OS X and Linux.