Ghidra is an incredibley powerful tool for reverse engineering, but it’s far from perfect. Recently, while reversing some ARMv7-based firmware, I found Ghidra’s decompiler lacking when it came to system call invocations, or rather “supervisor calls” (SVCs) as they are called in the ARM architecture. The decompiler was not recognizing the arguments passed to the syscall. This made it annoyingly difficult to reverse engineer functions containing lots of syscall invocations. The image below shows how the decompiler fails to recognize the arguments passed to the four syscalls invoked on highlighted lines 8, 11, 12, and 21.

Decompiler Fails to Identify Syscall Arguments

As shown above, the decompiler shows that a certain system call is made, which is identified by its number, but we cannot tell from the decompiler output what arguments are being passed to the system call. Let’s look at the disassembly of this same function to see what’s going on at the assembly level. The figure below shows the full disassembly for the same function.

Disassembly of Function that makes System Calls

The first system call, syscall 7, is made at address 0x1223C in the figure above. In the ARMv7 Application Binary Interface, arguments for system calls are passed via the registers r0, r1, r2, and so on. We can therefore see that the arguments passed to syscall 7 are as follows:

ArgumentRegisterValue
Firstr0Value at address 0x12260
Secondr10x00039000
Thirdr20x00000004
Fourthr30xD8000000
Arguments to Syscall 7

Identifying the arguments in this case wasn’t too difficult, but you can imagine things get a lot more complex in other circumstances, for example, with a very large function where the arguments are not simply immediate values but may have been setup much earlier in the function. It would be much better if the decompiler could just take care of this business for us. Thankfully, there’s a way to fix this in Ghidra! First, we’ll walk through how to manually fix this situation. Then, I’ll provide a Ghidra script that will automate this task and fixup all the syscalls for us.

Option 1: Manually Fixing Up Syscall Invocations (the hard way)

Ghidra isn’t always the most intuitive tool to use, but thankfully, there’s lots of helpful documentation around if you know where to look. The solution for fixing up system call invocations is actually documented in a Ghidra guide called Moderately Advanced Ghidra Usage. We will walk through the process step-by-step for the example shown above. If you don’t care about the details of how to do this manually, you can skip straight to the next section, which provides a helpful script for fixing up all ARMv7 system calls in a program.

The solution for fixing up syscalls makes use of the OTHER space in Ghidra. The OTHER space is a separate address space that, in the past, was used for storing metadata from an executable that doesn’t get loaded into memory, such as the .comment section of an ELF file. Ghidra 9.1 added the ability to make references into the OTHER space. Well, technically, you can’t make explicit references to the OTHER space, but you can create an overlay block to the OTHER space, which can be referenced directly. Overlays are a means to allow different regions of memory to be accessed from the same address range. In this case, we’ll be defining the function signatures for the different system calls in an overlay to the OTHER space, and then we’ll fixup all the system call invocations to reference these function signatures.

First, let’s create the overlay to the OTHER space. Open the Memory Map by clicking Window -> Memory Map. You should see the dialog below. Then click on the green plus sign to add a new block to memory.

Memory Map Dialog

Setup the new Memory Block as follows:

  • Block Name: syscall_block (this can be whatever you want)
  • Start Addr: Select “OTHER” in the drop down and leave the starting address as 0x00000000
  • Length: 0x1000 (we’ll use one byte for every syscall we want to define)
  • Overlay: Click the Overlay checkbox to make this an overlay

Click OK to add the block, and you should see it show up in the Memory Map. It should also show up in the Program Tree dialog in the main UI as shown below.

Program Tree

Next, navigate to address 0x00000001 in the syscall_block. You can get there by double clicking on the syscall_block in the Program Tree.

Go to the offset corresponding to the syscall you’d like to define. In our case, we’re going to define syscall 7. Select both the address and the “??” and then press the “F” key to define a function at this address. If you get a popup showing multiple actions, select “Create Function”. You should now have a function defined at address 0x00000007 of the syscall_block region as shown below.

Defining a Syscall Signature

We can now edit the function signature just like any function in Ghidra. Right-click on the top function name and select “Edit Function” to bring up the dialog shown below. We’ve added four function arguments of type uint and set the Calling Convention to “default” as shown below.

Edit Function Dialog

Now that we have our function signature defined, we can setup the syscall invocation to reference this function defined in the OTHER space overlay. To do that, we right-click on the “svc” instruction at address 0x1223C and choose References -> Add/Edit, or you can use the “R” shortcut. In the References Editor, click the green plus sign to add a reference and use the following options:

  • To Address: syscall_block::7
  • Ref-Type: CALLOTHER_OVERRIDE_CALL
Add Reference Dialog

After you click the Add button, you should see the decompiler update to reflect the changes as shown below. Yay! The decompiler now recognizes that the syscall is a function call with four arguments!

Decompiler Showing Syscall According to Function Signature

You can now treat that function as any other function and rename it or modify the arguments as needed.

Option 2: Running the Script (the easy way)

The manual procedure is fine if you just have one or two syscalls you want to inspect, but it’s way too cumbersome for any more than that. Thankfully, Ghidra ships with a script for automatically fixing up all the syscall invocations. The script is named ResolveX86orX64LinuxSyscallsScript.java. The only problem is the script only supports x86 programs. This obviously won’t work for other architectures. Thankfully, it wasn’t too hard to port the script to ARMv7. Here are the main changes I made to port to ARMv7.

The first change was to modify the function checkARM32Instruction() to determine if a given instruction is a syscall. For ARMv7, this is easy to do. We can simply look at the instruction mnemonic and check for “swi” (non-thumb mode) or “svc” (thumb mode).

     /**
     * Checks whether an ARM native instruction is a system call
     * @param inst instruction to check
     * @return true precisely when the instruction is a system call
     */
    private static boolean checkARM32Instruction(Instruction inst) {
        String mnemonic = inst.getMnemonicString();
        return mnemonic.equals("svc") || mnemonic.equals("swi");
    }

The second change was to the function resolveConstants(), which maps a particular syscall invocation to the syscall number used. For example, the system call invoked at address 0x1223C in our disassembled example from above is syscall 7. For ARMv7, the system call number is embedded within the instruction itself rather than being passed via a register. So it’s very straightforward to map a given system call invocation to a system call number. We can either look at the first byte of the raw instruction, or we can look at the operand used in Ghidra’s PCODE representation of the instruction. The latter is very easy to do as shown below on line 16.

    /**
     * Uses the symbolic propogator to attempt to determine the constant value in
     * the syscall register at each system call instruction
     * 
     * @param funcsToCalls map from functions containing syscalls to address in each function of 
     * the system call
     * @param program containing the functions
     * @return map from addresses of system calls to system call numbers
     * @throws CancelledException if the user cancels
     */
    private Map<Address, Long> resolveConstants(Map<Function, Set<Address>> funcsToCalls,
            Program program, TaskMonitor tMonitor) throws CancelledException {
        Map<Address, Long> addressesToSyscalls = new HashMap<>();
        for (Function func : funcsToCalls.keySet()) {
            for (Address callSite : funcsToCalls.get(func)) {
                long val = Long.decode(getInstructionAt(callSite).getDefaultOperandRepresentation(0));
                addressesToSyscalls.put(callSite, val);
            }
        }
        return addressesToSyscalls;
    }

One last thing to note is that the script supports reading a mapping of syscall numbers to names. For example, if you are reversing a Linux userspace program that makes a direct syscalls, you can give the script a listing of all the standard ARM Linux syscalls, and it will name the syscall functions accordingly. Ghidra ships with two such mapping files named x86_linux_syscall_numbers and x64_linux_syscall_numbers (both in the directory Ghidra/Features/Base/data/x86_linux_syscall_numbers), which support 32-bit and 64-bit x86 syscall numbering for Linux. The numbering is specific to each architecture, so you’ll have to port the listing to ARMv7 if you are working with standard Linux syscalls in your program. By default, the script will just use “syscall_<number>” to name the functions, so don’t worry about this listing if you’re not working with a Linux program. The format of the mapping file is very simple as shown below.

#format = number(decimal) syscall_name
00 setup
01 exit
02 fork
03 read
04 write
05 open
06 close
07 waitpid
08 creat
09 link
10 unlink
11 execve
12 chdir
13 time
14 mknod

You can download the ARMv7 version of the syscall fixup script from our github repo here: https://github.com/syscall7/ghidra-scripts. At this time, it only supports ARMv7 (32-bit) but it should be fairly straightforward to port it to ARMv8 (64-bit) or another architecture.

About the Author: Anthony DeRosa

Anthony DeRosa is a software security researcher with 20 years of experience in static and dynamic reverse engineering. He holds a Masters degree in Electrical and Computer Engineering from Johns Hopkins University. He is the founder of Syscall 7, a software development and analysis firm based in Baltimore, MD. He serves as an expert witness in technology-related litigation and currently leads a team of engineers supporting patent infringement litigation through source code analysis, software reverse engineering, and runtime testing.