Machine Emulation with Ghidra

We recently finished a job that required reverse engineering a 30-year-old firmware image for an embedded system. Our goal was to recover an equation involving floating point operations from the firmware. Unfortunately, the processor was so old that it didn’t have hardware floating point support, so the developers used a software floating point library instead. If you’ve never looked at the internals of a soft float implementation, I can assure you that it’s not something you want to manually reverse, even if you had good decompilation support from Ghidra (which we didn’t)! Instead of reversing all the floating point functions through manual inspection, we decided to emulate each floating point function to quickly identify the mathematical operation it implemented (i.e., add, subtract, exp, log2, etc.).

The particular processor we were working with was the Intel 80196, and there are not a lot of tools that exist for working with this old chip. We did not find many tools for working with this instruction set. Fortunately for us, Ghidra had recently added support for this machine, so we decided to take Ghidra’s emulation capabilities for a spin. Ghidra’s emulation capabilities are not very well documented, but there are some helpful files in the Ghidra repository that were enough to get us get started.

Emulation Examples Bundled with Ghidra

Buried in the Ghidra docs directory, you will find a directory named Emulation with an example of how to use Ghidra to emulate machine code. We’re going to first walk through the steps necessary to get this example working for x86, and then later on we’ll show you how to port the emulation script to another architecture, like ARM.

First things first. If you don’t have Ghidra installed, grab the latest release from https://ghidra-sre.org.

Once you extract the Ghidra release, you should find the Emulation directory buried in the docs directory as shown below.

~/Software/ghidra_9.1_PUBLIC/docs/GhidraClass/ExerciseFiles/Emulation/Source$ ls -la
drwxr-xr-x@ 5 anthony  staff   160B Dec 31 15:11 ./
drwxr-xr-x@ 4 anthony  staff   128B Nov 15 14:40 ../
-rw-r--r--@ 1 anthony  staff   588B Oct 23 17:58 README.txt
-rw-r--r--@ 1 anthony  staff   1.6K Oct 23 17:58 deobExample.c
-rw-r--r--@ 1 anthony  staff   1.8K Oct 23 17:58 deobHookExample.c

The two C source files represent two very similar programs that simply de-obfuscate a string that has been XOR’ed with 0xAA. Here’s the deobExample.c file.

int length(char *s) {
	int len = 0;
	while (*s++ != 0) {
		++len;
        }
	return len;
}

const char data[] = {
        0xec, 0xc3, 0xd8, 0xd9, 0xde, 0x8a, 0xcf, 0xc4,
        0xde, 0xd8, 0xd3, 0x00, 0xf9, 0xcf, 0xc9, 0xc5,
        0xc4, 0xce, 0x8a, 0xcf, 0xc4, 0xde, 0xd8, 0xd3,
        0x00, 0xfe, 0xc2, 0xc3, 0xd8, 0xce, 0x8a, 0xcf,
        0xc4, 0xde, 0xd8, 0xd3, 0x00, 0x00
};

char buffer[64];

char * deobfuscate(char *src, char *dst, int len) {
        char *ptr = dst;
        for (int i = 0; i < len; i++) {
            *ptr++ = *src++ ^ 0xAA;
        }
        *ptr = 0;
        return dst;
}

void use_string(char * str, int index) {
//	fprintf(stderr, "String[%d]: %s\n", index, str);
}

int main (int argc, char **argv) {
    char *ptr = (char *)data;
    int index = 0;
    while (*ptr != 0) {
        int len = length(ptr);
        char *str = deobfuscate(ptr, buffer, len);
	use_string(str, index++);
	ptr += len + 1;
    }
    return 0;
}

Compile these two program using gcc as follows:

$ gcc -o deobExample deobExample.c
$ gcc -o deobHookExample deobHookExample.c

Once you’ve compiled both programs, import them into a Ghidra project as shown below. You can accept all the defaults during the Import process.

Open the first program, deobExample, and say “Yes” to analyze the program. Navigate to the “Main” function, and you should see something similar to below:

There are two Ghidra scripts bundled with Ghidra that demonstrate how to emulate using Ghidra’s EmulatorHelper class. These two scripts can be found in the following directory:

~/Software/ghidra_9.1_PUBLIC/Ghidra/Features/Base/ghidra_scripts$ ls -la Emu*
-rw-r--r--@ 1 anthony  staff   7.6K Oct 23 17:58 EmuX86DeobfuscateExampleScript.java
-rw-r--r--@ 1 anthony  staff    11K Oct 23 17:58 EmuX86GccDeobfuscateHookExampleScript.java

You can also bring up these two scripts by navigating to Window -> Scripts Manager

Search for “EmuX86*” to locate the two scripts. If you right-click on the script, you can select “Edit with basic editor” to bring up the source for the script.

The idea of the EmuX86DeobfuscateExampleScript.java example scripts is to show how you could use Ghidra’s emualtion capabilities to perform dynamic analysis. In this case, the dynamic analysis is to figure out what the de-obfuscated strings are. Yes, this is a contrived example, and you might just as easily write a little script to de-obfuscate the strings yourself, but in some real world problems, like emulating floating point instructions, emulation is really the most efficient approach.

Before we dive into the details of how the script operates, here’s a high level view of what EmuX86DeobfuscateExampleScript.java is doing. The script sets a breakpoint just before the call to the deobfuscate function and another breakpoint just after it returns. When the breakpoint before the call to deobfuscate is hit, the script records the address of the obfuscated string. When the breakpoint after the call to deobfuscate is hit, the script gets the de-bofuscated string from the return value (the RAX register for x86-64), and it creates a helpful comment in the disassembly and decompilation listings showing the address of the obfuscated string as well as the value of the de-obfuscated string. See screenshot below of the comments inserted by the script showing the de-obfuscated strings. The de-obufscated strings are “First entry”, “Second entry”, and “Third entry”.

Okay, let’s dive into EmuX86DeobfuscateExampleScript.java and step through what the script is doing. The entry point for the script is the run function, and the first thing this function does is verify that it is operating against the deobExample program. This is just a helpful sanity check.

    @Override
    protected void run() throws Exception {

        String format =
            currentProgram.getOptions(Program.PROGRAM_INFO).getString("Executable Format", null);

        if (currentProgram == null || !currentProgram.getName().startsWith(PROGRAM_NAME) ||
            !"x86:LE:64:default".equals(currentProgram.getLanguageID().toString()) ||
            !ElfLoader.ELF_NAME.equals(format)) {

            printerr(
                "This emulation example script is specifically intended to be executed against the\n" +
                    PROGRAM_NAME +
                    " program whose source is contained within the GhidraClass exercise files\n" +
                    "(see docs/GhidraClass/ExerciseFiles/Emulation/" + PROGRAM_NAME + ".c).\n" +
                    "This program should be compiled using gcc for x86 64-bit, imported into your project, \n" +
                    "analyzed and open as the active program before running ths script.");
            return;
        }
        /* ... */
    }

Next, we get the address of the main function and store this in mainFunctionEntry, and we get the first instruction in the main function and store that in entryInstr. We also get the instruction that calls the deobfuscate function, which we’ll use to set a breakpoint later.

        // Identify function to be emulated
        mainFunctionEntry = getSymbolAddress("main");

        // Obtain entry instruction in order to establish initial processor context
        Instruction entryInstr = getInstructionAt(mainFunctionEntry);
        if (entryInstr == null) {
            printerr("Instruction not found at main entry point: " + mainFunctionEntry);
            return;
        }

        // Identify important symbol addresses
        // NOTE: If the sample is recompiled the following addresses may need to be adjusted
        Instruction callSite = getCalledFromInstruction("deobfuscate");
        if (callSite == null) {
            printerr("Instruction not found at call site for: deobfuscate");
            return;
        }

        deobfuscateCall = callSite.getAddress();
        deobfuscateReturn = callSite.getFallThrough(); // instruction address immediately after deobfuscate call

Finally, we delete any comments that might have been inserted from previous runs of this script with the setPreComment helper function.

        // Remove prior pre-comment
        setPreComment(deobfuscateReturn, null);

We then create an instance of the EmulatorHelper class, which is a helper class that Ghidra provides to make the Emulation capabilities easier to work with. The class exports helper functions for creating breakpoints, reading and writing registers, reading and writing memory, and controlling the emulator.

        // Establish emulation helper
        emuHelper = new EmulatorHelper(currentProgram);

The next thing we do is setup our execution context. Because we are in complete control of the emulator, we can choose the address at which to start executing code. As we’ll see later, the script starts execution at the beginning of the main function, but on a real operating system, the program would actually start with C runtime initialization code that would setup the stack pointer and pass along the command line arguments. Since we are skipping all that, we need to perform some basic initialization ourselves. The next lines in the script do just that and setup the stack pointer to a safe location near the end of the memory address space.

            // Initialize stack pointer
            long stackOffset = (entryInstr.getAddress().getAddressSpace().getMaxAddress().getOffset() >>> 1) - 0x7fff;

Next we setup a breakpoint just before and immediately after the call to deobfuscate.

            // Setup breakpoints
            emuHelper.setBreakpoint(deobfuscateCall);
            emuHelper.setBreakpoint(deobfuscateReturn);

Since we are starting execution from the main function, we need a way to end the emulation once the main function returns. We could simply place a breakpoint at the last instruction of main, but then we’d have to do more work to determine the last instruction in the main function. Instead, the script artificially places a bogus return value on the stack before starting at the main function, and then it sets a breakpoint on this bogus return address so that we can detect when main exits. The bogus return address is called CONTROLLED_RETURN_OFFSET in the script and it’s value is set to 0.

            // Set controlled return location so we can identify return from emulated function
            controlledReturnAddr = getAddress(CONTROLLED_RETURN_OFFSET);
            emuHelper.writeStackValue(0, 8, CONTROLLED_RETURN_OFFSET);
            emuHelper.setBreakpoint(controlledReturnAddr);

Now we get to the meat of the emulation. The script executes the emulation loop using the global monitor instance. The emuHelper.run function emulates until an event occurs, like a breakpoint or an error. The script continues in this loop until one of three exit conditions occur:

The emulation is canceled by the user
The bogus return address is hit, indicating that main is returning
The emulation triggers an error condition

Emulation is kicked off by the call to “emuHelper.run(mainFunctionEntry, entryInstr, monitor);” below.

            // Execution loop until return from function or error occurs
            while (!monitor.isCancelled()) {
                boolean success =
                    (emuHelper.getEmulateExecutionState() == EmulateExecutionState.BREAKPOINT)
                            ? emuHelper.run(monitor)
                            : emuHelper.run(mainFunctionEntry, entryInstr, monitor);
                Address executionAddress = emuHelper.getExecutionAddress();
                if (monitor.isCancelled()) {
                    println("Emulation cancelled");
                    return;
                }
                if (executionAddress.equals(controlledReturnAddr)) {
                    println("Returned from function");
                    return;
                }
                if (!success) {
                    String lastError = emuHelper.getLastError();
                    printerr("Emulation Error: " + lastError);
                    return;
                }
                processBreakpoint(executionAddress);
            }

If none of the exit conditions has occurred, then the breakpoint is process with the call to processBreakpoint. This function is called for the two breakpoints that are placed just before and immediately after the call to deobfuscate in the target program. The processing is very straightforward. For the case that hits just before deobfuscate is called, we remember the address of the obfuscated string that is passed to deobfuscate in the RDI register. For the case that hits immediately after the call to deobfuscate, we get a pointer to the de-obfuscated string in the return register RAX. Finally, we add a comment to the disassembly and decompilation listings with a call to setPreComment.

    /**             
     * Perform processing for the various breakpoints.
     * @param addr current execute address where emulation has been suspended
     * @throws Exception if an error occurs
     */             
    private void processBreakpoint(Address addr) throws Exception {
                
        if (addr.equals(deobfuscateCall)) {
            lastDeobfuscateArg0 = emuHelper.readRegister("RDI").longValue();
        }           
                    
        else if (addr.equals(deobfuscateReturn)) {
            long deobfuscateReturnValue = emuHelper.readRegister("RAX").longValue();
            String str = "deobfuscate(src=0x" + Long.toHexString(lastDeobfuscateArg0) + ") -> \"" +
                emuHelper.readNullTerminatedString(getAddress(deobfuscateReturnValue), 32) + "\"";
            String comment = getPreComment(deobfuscateReturn);
            if (comment == null) {
                comment = "";
            }
            else {
                comment += "\n";
            }
            comment += str;
            println("Updated pre-comment at " + deobfuscateReturn);
            setPreComment(deobfuscateReturn, comment);
        }
    }

When you run the script, you should see the comments added to the listings and you should also see some output in the console listing as shown below.

Porting to Other Processors

While the example bundled with Ghidra is x86 specific, there’s actually not much of the script that is architecture dependent. The only things you would need to do to extend this to another architecture are:

Setup the initial state of the registers as required by your target architecture
Modify the registers used to return the function arguments and return value as dictated by the calling convention of the target processor
Modify how you set the bogus return address to detect when main returns (i.e., some architectures might have a dedicated register used for the return address instead of popping it off the stack).

Let’s recompile the deobExample.c file for ARM and modify the emulation script to work with the ARM architecture.

First, we’ll need to install an ARM cross compiler.

$ sudo apt install gcc-8-arm-linux-gnueabi

Next, compile the program using the cross compiler just installed.

$ arm-linux-gnueabi-gcc-8 -o deobExample-arm deobExample.c -nostartfiles

Now we need to make a few changes to the script to work with ARM. Let’s make a copy of the script and leave it in the same directory.

$ cp EmuX86DeobfuscateExampleScript.java EmuARMDeobfuscateExampleScript.java

First thing we’ll do is change PROGRAM_NAME so that it matches the program we compiled above.

private static String PROGRAM_NAME = "deobExample-arm";

We also need to rename the class to EmuARMDeobfuscateExampleScript since we renamed the source .java file.

public class EmuARMDeobfuscateExampleScript extends GhidraScript {

We’ll also comment out the check that verifies the target ELF file is “x86:LE:64:default”.

        if (currentProgram == null || !currentProgram.getName().startsWith(PROGRAM_NAME) ||
            //!"x86:LE:64:default".equals(currentProgram.getLanguageID().toString()) ||
            !ElfLoader.ELF_NAME.equals(format)) {

Next, we’ll change how the bogus return value is set. ARM has a dedicated register LR that is used to hold the return address when a return instruction is executed. We’ll initialize LR to our bogus return value instead of placing the return address on the stack as we did for x86.

            // Set controlled return location so we can identify return from emulated function
            controlledReturnAddr = getAddress(CONTROLLED_RETURN_OFFSET);
            //emuHelper.writeStackValue(0, 8, CONTROLLED_RETURN_OFFSET);
            emuHelper.writeRegister(currentProgram.getProgramContext().getRegister("LR"), CONTROLLED_RETURN_OFFSET);
            emuHelper.setBreakpoint(controlledReturnAddr);

Finally, we need to change the registers that are used to access the function parameters and return value in the call to deobfuscate. We’ll update the breakpoint handler accordingly to look at register R0 for the first function argument and R3 for the return value.

    /**
     * Perform processing for the various breakpoints.
     * @param addr current execute address where emulation has been suspended
     * @throws Exception if an error occurs
     */
    private void processBreakpoint(Address addr) throws Exception {

        if (addr.equals(deobfuscateCall)) {
            lastDeobfuscateArg0 = emuHelper.readRegister(currentProgram.getProgramContext().getRegister("R0")).longValue();
        }

        else if (addr.equals(deobfuscateReturn)) {
            long deobfuscateReturnValue = emuHelper.readRegister(currentProgram.getProgramContext().getRegister("R3")).longValue();
            String str = "deobfuscate(src=0x" + Long.toHexString(lastDeobfuscateArg0) + ") -> \"" +
                emuHelper.readNullTerminatedString(getAddress(deobfuscateReturnValue), 32) + "\"";
            String comment = getPreComment(deobfuscateReturn);
            if (comment == null) {
                comment = "";
            }
            else {
                comment += "\n";
            }
            comment += str;
            println("Updated pre-comment at " + deobfuscateReturn);
            setPreComment(deobfuscateReturn, comment);
        }
    }

To get your script to show up in the Script Manager window, you might have to add the directory where your script lives as shown below.

After running the modified script for ARM, you should see the same comments populated now in the main function.

Conclusion

Ghidra’s emulation engine is a powerful tool for dynamic analysis. While the interface is not well documented, the source code and examples contain a treasure trove if you are willing to dig. I’ll leave you with an exercise for the motivated student to try out on your own. In the other script, EmuX86GccDeobfuscateHookExampleScript.java, which we didn’t look at in this post, there is a reference to the ability to single-step through the emulated code using a call to emuHelper.step(). Modify one of the scripts we worked with to get single stepping to work. You can use debug output after emulating each instruction to print out some aspect of the processor state, like the value of the current program counter.

Happy emulating!

About the Author: Anthony DeRosa

Anthony DeRosa is a software security researcher with 20 years of experience in static and dynamic reverse engineering. He holds a Masters degree in Electrical and Computer Engineering from Johns Hopkins University. He is the founder of Syscall 7, a software development and analysis firm based in Baltimore, MD. He serves as an expert witness in technology-related litigation and currently leads a team of engineers supporting patent infringement litigation through source code analysis, software reverse engineering, and runtime testing.