Modifying and running a binary by recompiling a reverse engineered disassembly

You're reverse engineering an executable and you want to modify it. Perhaps you want to add or tweak functionality of the executable, or perhaps you want to be able to modify parts of the program to get a better understanding of how it works. Either way, you want to start with a working binary and gradually replace some parts of it (but not others) with C code as you gradually understand more of the binary.

I'm somewhat surprised by the lack of tooling in this area (if someone knows of such tools, please tell me), so I had to come up with my own ghetto method. It goes like this:

Use a disassembler to create a disassembly file. For each line, the disassembly must show the machine code bytes for each instruction, as well as the virtual address and disassembly of the instruction.

For example, the disassembler I used, retdec, produces a disassembly file which looks like this:
```
 0x43278f:   66 b8 80 07                            mov ax, 0x780
```
Use a simple text processing script to munge that file into an assembly file which looks like this:
```
 .byte 0x66,0xb8,0x80,0x07                                     #43278f; mov ax, 0x780
```
Note that the original machine code is preserved verbatim using .byte directives. The comment on the right gives the virtual address and then the disassembly. This is done rather than writing the disassembly direct for several reasons:
- As far as I am aware, many x86 instructions at the assembly level can be encoded in several different ways, and it's likely that GNU as will choose a different way of encoding an instruction than the assembler used to originally build the executable for at least one instruction. Worse, this encoding will probably have a different length, which will throw off the offsets of all the code that comes after it and break everything. We need the offsets of all existing code to match exactly.
- The disassembler I used gave me output in Intel assembly format and GNU as wants input in AT&T assembly format.
- I'm also pleasantly surprised by the speed of GNU as in this application The .s disassembly file fed to GNU as is 180MiB in size, which it manages to assemble in 1.39s. Parsing .byte directives is likely to be a lot faster than parsing actual assembly, so even if I were to try and use the disassembly lines directly, I'd probably lose this speed advantage. Given that as a human I can only view and work with a tiny amount of the disassembly at a time, even if the variable encoding issue above was not an issue, this seems like a bad tradeoff to make.
Use GNU as, GNU ld and a custom linker script to reassemble and relink the disassembly in such a way that the offsets of all code and data match exactly.
Since the length of any existing code or existing data mustn't be changed, if you want to replace the implementation of a given function, you'll need to allocate a new text section for your added code out of the way of any sections allocated by the original binary, then change the first bytes of the original function to trampoline to your new code. This allows you to incrementally replace the existing code with your own code on a gradual basis.
Make changes as desired, preferably testing incrementally to ensure you haven't broken everything.

Tutorial

Tools

We'll need a disassembler to begin with; I used the FOSS tool retdec, which seems to work well as a disassembler. The binary is a 32-bit Win32 executable, so we'll need a suitable toolchain; I used mingw-w64 as NixOS provides builds of it, but clang should also be usable assuming that lld's linker script support is now mature enough.

If you're using NixOS, you can get a shell with these tools just by running:

$ nix-shell -I nixpkgs=$PATH_TO_NIXPKGS -p \
  retdec-full \
  pkgsCross.mingw32.buildPackages.gcc \
  pkgsCross.mingw32.windows.mingw_w64_headers

retdec will probably have to be compiled on your machine, which will take a while, but NixOS conveniently provides builds for mingw-w64.

Note that I recommend using the latest nixpkgs master; git clone https://github.com/nixos/nixpkgs and insert the path to the cloned repository above. Alternatively you can use your configured nixpkgs channel by omitting the -I nixpkgs=... option above, but this may give you older versions of retdec and mingw-w64.

If you're not using NixOS, you can still install and use Nix and nixpkgs and use the above command; or you can obtain retdec and mingw-w64 by some other means.

Disassembly

Firstly, let's use retdec to disassemble the binary.

$ retdec-decompiler.py \
    -k \
    --backend-keep-library-funcs \
    --no-default-static-signatures \
    foo.exe |& tee retdec.log

If you are working with a large input binary, this will TAKE A LONG TIME and WILL USE A LOT OF MEMORY. retdec is actually a decompiler, but my input binary (~7.5MiB) was sufficiently large that I was never actually able to get retdec to complete the decompile stage without running out of memory. Fortunately, it outputs the disassembly file we need well before this stage.

Note that I recommend saving the log above with the tee command, as it can be useful to examine. Also note the --backend-keep-library-funcs and --no-default-static-signatures arguments; these are important, and here's why:

retdec contains functionality designed to recognise statically linked standard library functions and name them automatically, so that you don't have to figure out which function is e.g. memcpy. The problem with this is that when retdec recognises a function as a statically linked standard library function, it does not emit the machine code or disassembly for that function into the disassembly file. This is a disaster for us since we need an exact byte-for-byte correspondence between the input .text section and our disassembly file. Thus, it's sadly necessary to disable this functionality. (Note that you may think that --backend-keep-library-funcs would suffice, but I found this not to affect generation of the disassembly file, which seems to be considered part of the frontend. Disabling the entire static signature system with --no-default-static-signatures is necessary to prevent the issue described above.)

retdec limits itself to consuming half your system's RAM by default. If you like, you can give it more by passing --max-memory <max-memory-in-bytes> or, if you're feeling adventurous, --no-memory-limit.

Disassembly transformation

At this point, retdec has either completed, or run out of memory and crashed, but hopefully not before leaving behind a number of files, including one named foo.c.frontend.dsm, which is our disassembly file. We will need to transform this into a GNU as file. This can be done via a simple script; you can find the Python 3 script I wrote to do this here.

Run the script:

$ ./dsm2s ./foo.c.frontend.dsm > foo.s

Disassembly examination

The output disassembly should now be examined for any obvious issues.

grep -Ei 'NOMATCH:|statically' foo.s

If you get any output, something has gone wrong. Either the transformation script saw a line of retdec disassembly output it didn't understand, or you forgot to pass --backend-keep-library-funcs --no-default-static-signatures to retdec as mentioned above and retdec has omitted some functions from the output. In this case, you will need to rerun retdec.

Assembling and linking

Let's write a simple makefile:

$ cat >Makefile <<END
nfoo.exe: foo.o foo.ld
  i686-w64-mingw32-gcc -o "$@" -nostdlib -T foo.ld -mwindows "$<"

foo.o: foo.s
  i686-w64-mingw32-gcc -c -o "$@" "$<"
END

We will also need a custom linker script:

$ cat >foo.ld <<END
ENTRY(Start)

SECTIONS {
  Start = 0xENTRYPOINT;

  /* TEXT */
  . = 0xTEXT_START;
  .text . : {
    *(.text)
  }

  /* RDATA */
  . = 0xRDATA_START;
  .rdata . : {
    *(.rdata)
  }

  /* DATA */
  . = 0xDATA_START;
  .data . : {
    *(.data)
  }

  /* BSS */
  . = 0xBSS_START;
  .bss . : {
    *(.bss)
  }

  /* RSRC */
  . = 0xRSRC_START;
  .rsrc . : {
    *(.rsrc)
  }
}
END

The placeholders 0xENTRYPOINT and 0xTEXT_START, etc. must be replaced with the hardcoded offsets of the entrypoint and sections. To find these, run i686-w64-mingw32-objdump -x foo.exe on the original executable.

We will also need to make some manual changes to the disassembly file. Firstly, you will probably find that some bytes in the disassembly have been output by retdec as 0x??. These bytes occur at the end of sections and should usually just be removed. Change .byte 0xAB,0xCD,0x?? to .byte 0xAB,0xCD etc. Make sure not to leave entries like this: .byte 0xAB,,0xCD or trailing commas as this will be interpreted as three bytes.

At this point the disassembly should now compile and link, but there are several issues remaining:

Firstly, in order for our strategy to work, the offsets of all functions and data must align exactly. Rarely occurring bugs in the retdec disassembler may cause the occasional machine code byte not to be emitted, causing all functions after that point to be misaligned. Fortunately after fixing the static library recognition issue, for me, this only occurred for two bytes across two functions in the entire 7.5MiB executable.
Secondly, there are some minor tweaks needed for the BSS section, discussed below.
Thirdly, creating a compatible executable doesn't just require us to create compatible sections. We also need to recreate the import tables used by the original binary. Fortunately, the import table and import address table used by PE files will all be preserved in the data sections, but we do need to tell the linker to use them so that it sets the address of them in the PE file headers.

We'll deal with the first issue first.

Finding the missing bytes

After assembling and linking for the first time, examine your new executable by running i686-w64-mingw32-objdump -x on it and compare the output to that of running the same command on the original executable. You want everything to match as much as possible, particularly section load addresses and section sizes.

If your load addresses are off, check the values you used in the linker script.

If one of your sections (text/rdata/data/etc.) has a size smaller than that of the original file, you have a serious problem, probably caused by the retdec bug described above. (If a section has a very slightly larger size, this is PROBABLY caused by GNU ld using slightly more conservative padding of sections than the linker used to link the original binary did. This should be harmless and can be ignored, assuming another section doesn't start in virtual memory immediately after that section. That, or extraneous bytes have somehow gotten emitted somewhere (retdec bug?), which is far more serious.) The nature of this bug appears to be that if retdec can't disassemble an instruction, it might simply not emit the machine code for that instruction. retdec tries to automatically distinguish between code and data, and output hex dumps for data instead of trying to disassemble it, but in rare circumstances it might mistake unused data or padding between functions in .text for code. This only occurred for two functions in the entire disassembly file in my case.

However, you do need to identify the location of the issue. Here is how I went about it. In the disassembly file, search for .section .text, which marks the start of the text section. Immediately after this line place the following:

.globl TEXT_begin
TEXT_begin:

Then search for the next .section directive, which will probably be something like .section .rdata. Immediately before this, place the following:

.globl TEXT_end
TEXT_end:

Run make, and examine your new executable by running i686-w64-mingw32-nm on it. The addresses of TEXT_begin and TEXT_end tell you where the text in your disassembly file truly begins and ends. The reason why this is more reliable than the output of i686-w64-mingw32-objdump -x is that the section sizes can be rounded up very slightly (by a couple of bytes) for alignment purposes, whereas this will give you accurate output. Run i686-w64-mingw32-objdump -x on the original binary and add the load address of the section to its size to determine where the section ends in memory in the original binary.

Binary search. You now have the true beginning and end of your text section, which is presumably short a few bytes. To locate the missing bytes causing the text to be short compared to the original binary's text by a few bytes, I applied a simple binary search technique, calculated by hand:

Search the disassembly file for the TEXT_begin address. Every line contains the virtual memory address it should appear at, so this locates the first byte in the text section. This should be right at the top of the file, immediately after .section. Note the line number of this line.
Find the line corresponding to the TEXT_end address in the same way. Note this line number.
Find the midpoint between the two lines by calculating beginning line number + (end line number - beginning line number)/2.
At the given line, insert the following:
```
.globl TEXT_midpoint01
TEXT_midpoint01:
```
Run make and then run i686-w64-mingw32-nm on your new executable to get the symbols. This tells you what the linker thinks the address of the new symbol TEXT_midpoint01 is. If this address is correct (meaning that it corresponds with the address printed on the line immediately after that label in the disassembly file), then the missing byte must be after that point. If the address is a few bytes short, then the missing byte must be before it.
Continue the binary search by going back to step 3 and repeating with the new narrower search window, naming the new symbol TEXT_midpoint02 etc. Eventually (after 17 iterations in my case) you will narrow it down to just an instruction or two.

I kept track of this process using a text editor as “pen and paper”. Here's what it looked like by the end:

     14     TEXT_begin  expect 0x401000   good
 924213     TEXT_m01    expect 0x75f5a8   good
1386327     TEXT_m02    expect 0x92ee6e   good
1617384     TEXT_m03    expect 0xa04321   good
1675148     TEXT_m05    expect 0xa30f0c   good
1689589     TEXT_m07    expect 0xa3b201   good
1693199     TEXT_m09    expect 0xa3d6ba   good
1695004     TEXT_m10    expect 0xa3e925   good
1695906     TEXT_m11    expect 0xa3f309   good
1696357     TEXT_m12    expect 0xa3f7c5   good
1696470     TEXT_m14    expect 0xa3f912   good
1696526     TEXT_m15    expect 0xa3f9a1   good
1696554     TEXT_m16    expect 0xa3fa0a   good
1696568     TEXT_m17    expect
  culprit: 0xa3fa36
1696583     TEXT_m13    expect 0xa3fa68   bad
1696809     TEXT_m08    expect 0xa3fcd8   bad
1704030     TEXT_m06    expect 0xa4493a   bad
1732912     TEXT_m04    expect 0xa57293   bad
1848441     TEXT_end    expect 0xaa470e   bad

Some notes: By adding the label lines to the disassembly, you change the line numbers of all statements following, making the results a little off, but this is too small to pose an issue for the above purposes. You could use the addresses on each line instead if you liked.

In my case, the culprit looked like this:

# function: function_a3fa36 at 0xa3fa36 -- 0xa3fa37
# data inside code section at 0xa3fa37 -- 0xa3fa38

The ranges above are [inclusive,exclusive). Note that a function is listed comprising exactly one byte, yet no byte is emitted, and then the disassembler changes its mind and decides to do a hexdump instead... but never emits the single byte of the “function”.

I fixed this by inserting a .byte 0xF4 directive, which corresponds to the x86 HLT (halt) instruction, which will cause the program to crash. My assumption is that this is dead code or not actually a function. An identical one-byte deficit occurred just two lines later, which I fixed in the same way. These were responsible for the two byte deficit I was seeing in the size of my .text section. These were the only serious issues in the entire 180MiB disassembly.

BSS issues

The executable I was working with dated from 2001 and was linked with MSVC6. I don't fully understand how its BSS region works, because it doesn't have a separate BSS section and instead seems to rely on some obtuse use of PE file headers.

In any case, retdec disassembled the BSS section as explicit zero bytes and didn't mark it as a separate section, so I had to fix this up manually. Use i686-w64-mingw32-objdump -x as necessary to find the start of BSS (you should have already put it in your linker script), and locate the correct virtual address in the disassembly file. It will look something like this:

.byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00     #b74ff9;|................|

You will need to add a .section .bss directive. Note that the addresses are not divisible by 16 but instead offset; this is because retdec likes to decode ASCII strings as quoted strings rather than hex dumps, meaning that following hex data is offset. When this happens I split the line in a visually appealing way like this:

.byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00                                                  #b74ff9;|................|
.section .bss
.byte                                    0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00     #b74ff9;|................|

Make sure you get the directive at exactly the right address. As usual, avoid accidentially leaving any trailing commas as this will be interpreted as an additional byte even if not followed by any digits.

Import tables

Since our disassembly is a dump of all section contents, and import tables live in sections, our disassembly fortunately contains a copy of the original executable's import and import address tables. In order to use these, we just need to tell GNU ld to use them. Import tables are located by a PE loader via fields in the PE file headers, so we need to tell GNU ld to set these fields to the virtual addresses of our import and import address tables. Fortunately, after grepping through binutils source code, I was able to determine that GNU ld has a way to do this: certain symbols with specific names are taken as designating import and import address table begin and end addresses. If GNU ld sees these symbols present, it will use their values to set the import table addresses in the PE header.

Firstly, we will need to add symbols in our disassembly file locating the beginnings and ends of both our import and import address tables. You can determine the addresses of these using i686-w64-mingw32-objdump -x on the original binary. Again, the hex offset means we have to split lines here.

.byte 0x00,0xff,0xff,0xff,0xff,0xfc,0x46,0xaa,0x00                                        #b00f47;|......F....p....|
.globl my_import_table_start
my_import_table_start:
.byte                                              0xf4,0x17,0x70,0x00,0x00,0x00,0x00     #b00f47;|......F....p....|
[omitted]
.byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00                    #b01097;|.............z4p|
.globl my_import_table_end
my_import_table_end:
.byte                                                                  0x7a,0x34,0x70     #b01097;|.............z4p|

The same must be done for the import address table:

.globl my_iat_start
my_iat_start:
.byte 0x7a,0x34,0x70,0x00,0x68,0x34,0x70,0x00,0x88,0x34,0x70,0x00,0x10,0x34,0x70,0x00     #aa5000;|z4p.h4p..4p..4p.|
[omitted]
.byte 0x14,0x35,0x70,0x00,0x24,0x35,0x70,0x00,0x00,0x00,0x00,0x00
.globl my_iat_end
my_iat_end:
.byte                                                             0x00,0x00,0x00,0x00     #aa57a0;|.5p.$5p.........|

Now we must amend our linker script to generate the necessary magic symbols which GNU ld uses. Amend the .rdata section defined in the linker script as follows:

/* RDATA */
. = 0xRDATA_START;
.rdata . : {
  *(.rdata)
  .idata$2 = my_import_table_start - 0x400000;
  .idata$4 = my_import_table_end   - 0x400000;
  .idata$5 = my_iat_start          - 0x400000;
  .idata$6 = my_iat_end            - 0x400000;
}

Note that the cryptic symbols .idata$2, .idata$4, .idata$5 and .idata$6 specify the import table start address, import table end address, IAT start address and IAT end address respectively. (Search for these in the binutils source code if you're curious as to how they work.) Note also that above I had to offset these relative to the image base, which on Windows is usually 0x400000 and in any case can be ascertained from objdump.

We also defined a .rsrc section earlier. In much the same vein as for the IAT, if GNU ld sees a section named .rsrc, it automatically sets the PE header to point to it as the executable's resources, so we don't need to do anything more to have the resource section work.

Linker assertions

This is probably the most important advice so far: Add assertions to your linker script to ensure that your offsets are correct and don't slide.

/* TEXT */
. = 0xTEXT_START;
.text . : {
  *(.text)
}

ASSERT(TEXT_end == 0xCORRECT_TEXT_END,                    "text end (object)")
ASSERT(ADDR(.text) + SIZEOF(.text) == 0xPADDED_TEXT_END,  "text end (padded)")
ASSERT(.                           == 0xPADDED_TEXT_END,  "position")

You should add similar ASSERT lines after each section (text, rdata, data, etc.) This will generate a linker error if you ever change anything in your disassembly that causes some function to grow or shrink, which would otherwise have catastrophic consequences.

Note that TEXT_end is a symbol we defined inside the disassembly file earlier to get the true, linker-unpadded text size. The second and third ASSERT lines aren't as important, especially if you don't have one section in the original binary starting immediately after another, in which case the linker adding a few bytes of padding is likely of no consequence.

You can determine 0xPADDED_TEXT_END by examining the output of i686-w64-mingw32-objdump -x, calculating the section end according to the shown VMA and size fields, checking that the size is sane, and adding the two.

Eyeballing objdump

Now run make once again and compare the output of i686-w64-mingw32-objdump -x on the new executable and the original executable. Eyeball the output for differences, which should be minimised. Exact correspondance is unlikely to occur and in many circumstances will probably be impossible, but all consequential differences should be eliminable. Slight differences in padding or the file offset at which section data is placed should be inconsequential in the vast majority of cases (unless an executable is doing something weird and trying to examine its own binary; self-extractors perhaps).

The important things which must match are certain data directory descriptors, particularly the import and import address table sizes and addresses; section load addresses; and section sizes, the sanity-checking of which has already been discussed; and the entrypoint address.

Although probably not strictly necessary, at this point I amended the Makefile to make the new executable slightly more similar to the original one:

foo.exe: foo.o foo.ld
  i686-w64-mingw32-gcc -o "$@" -nostdlib -T foo.ld -mwindows -Wl,--file-alignment -Wl,0x1000 -Wl,--major-image-version -Wl,0 "$<"

The -Wl,--file-alignment -Wl,0x1000 option uses a higher file alignment than that used by GNU ld by default; it seems MSVC6 preferred a higher alignment of sections in files, even though Windows does not require this. Setting this should be unnecessary but makes the objdump output more similar.

The -Wl,--major-image-version -Wl,0 command sets a versioning field in the PE header. Nothing cares about this, but it's one less difference.

The moment of truth

After doing all this, I ran make one more time and checked that the output of objdump looked perfect, or at least as perfect as was obtainable without modifying GNU ld; convincing ld not to pad sections by two or three bytes probably wasn't happening, but also doesn't matter.

Finally, I ran the binary under wine. And, amazingly, it worked perfectly the first time, just like the original.

Caveats

This technique was performed on an old, non-relocatable executable compiled in 2001 with MSVC6. I don't see why this technique wouldn't also be applicable to modern, relocatable, ASLR-enabled executables, but I haven't thought about any issues those might pose while writing this. YMMV.