Modifying and running a binary by recompiling a reverse engineered disassembly
You're reverse engineering an executable and you want to modify it. Perhaps you want to add or tweak functionality of the executable, or perhaps you want to be able to modify parts of the program to get a better understanding of how it works. Either way, you want to start with a working binary and gradually replace some parts of it (but not others) with C code as you gradually understand more of the binary.
I'm somewhat surprised by the lack of tooling in this area (if someone knows of such tools, please tell me), so I had to come up with my own ghetto method. It goes like this:
Use a disassembler to create a disassembly file. For each line, the disassembly must show the machine code bytes for each instruction, as well as the virtual address and disassembly of the instruction.
For example, the disassembler I used, retdec, produces a disassembly file which looks like this:
0x43278f: 66 b8 80 07 mov ax, 0x780
Use a simple text processing script to munge that file into an assembly file which looks like this:
.byte 0x66,0xb8,0x80,0x07 #43278f; mov ax, 0x780
Note that the original machine code is preserved verbatim using
.byte
directives. The comment on the right gives the virtual address and then the disassembly. This is done rather than writing the disassembly direct for several reasons:As far as I am aware, many x86 instructions at the assembly level can be encoded in several different ways, and it's likely that GNU as will choose a different way of encoding an instruction than the assembler used to originally build the executable for at least one instruction. Worse, this encoding will probably have a different length, which will throw off the offsets of all the code that comes after it and break everything. We need the offsets of all existing code to match exactly.
The disassembler I used gave me output in Intel assembly format and GNU as wants input in AT&T assembly format.
I'm also pleasantly surprised by the speed of GNU as in this application The .s disassembly file fed to GNU as is 180MiB in size, which it manages to assemble in 1.39s. Parsing
.byte
directives is likely to be a lot faster than parsing actual assembly, so even if I were to try and use the disassembly lines directly, I'd probably lose this speed advantage. Given that as a human I can only view and work with a tiny amount of the disassembly at a time, even if the variable encoding issue above was not an issue, this seems like a bad tradeoff to make.
Use GNU as, GNU ld and a custom linker script to reassemble and relink the disassembly in such a way that the offsets of all code and data match exactly.
Since the length of any existing code or existing data mustn't be changed, if you want to replace the implementation of a given function, you'll need to allocate a new text section for your added code out of the way of any sections allocated by the original binary, then change the first bytes of the original function to trampoline to your new code. This allows you to incrementally replace the existing code with your own code on a gradual basis.
Make changes as desired, preferably testing incrementally to ensure you haven't broken everything.
Tutorial
Tools
We'll need a disassembler to begin with; I used the FOSS tool retdec, which seems to work well as a disassembler. The binary is a 32-bit Win32 executable, so we'll need a suitable toolchain; I used mingw-w64 as NixOS provides builds of it, but clang should also be usable assuming that lld's linker script support is now mature enough.
If you're using NixOS, you can get a shell with these tools just by running:
$ nix-shell -I nixpkgs=$PATH_TO_NIXPKGS -p \
retdec-full \
pkgsCross.mingw32.buildPackages.gcc \
pkgsCross.mingw32.windows.mingw_w64_headers
retdec will probably have to be compiled on your machine, which will take a while, but NixOS conveniently provides builds for mingw-w64.
Note that I recommend using the latest nixpkgs master; git clone
https://github.com/nixos/nixpkgs
and insert the path to the cloned repository
above. Alternatively you can use your configured nixpkgs channel by omitting the
-I nixpkgs=...
option above, but this may give you older versions of retdec
and mingw-w64.
If you're not using NixOS, you can still install and use Nix and nixpkgs and use the above command; or you can obtain retdec and mingw-w64 by some other means.
Disassembly
Firstly, let's use retdec to disassemble the binary.
$ retdec-decompiler.py \
-k \
--backend-keep-library-funcs \
--no-default-static-signatures \
foo.exe |& tee retdec.log
If you are working with a large input binary, this will TAKE A LONG TIME and WILL USE A LOT OF MEMORY. retdec is actually a decompiler, but my input binary (~7.5MiB) was sufficiently large that I was never actually able to get retdec to complete the decompile stage without running out of memory. Fortunately, it outputs the disassembly file we need well before this stage.
Note that I recommend saving the log above with the tee command, as it can be
useful to examine. Also note the --backend-keep-library-funcs
and
--no-default-static-signatures
arguments; these are important, and here's
why:
retdec contains functionality designed to recognise statically linked standard
library functions and name them automatically, so that you don't have to figure
out which function is e.g. memcpy
. The problem with this is that when retdec recognises
a function as a statically linked standard library function, it does not emit
the machine code or disassembly for that function into the disassembly file.
This is a disaster for us since we need an exact byte-for-byte correspondence
between the input .text
section and our disassembly file. Thus, it's sadly
necessary to disable this functionality. (Note that you may think that
--backend-keep-library-funcs
would suffice, but I found this not to affect
generation of the disassembly file, which seems to be considered part of the
frontend. Disabling the entire static signature system with
--no-default-static-signatures
is necessary to prevent the issue described
above.)
retdec limits itself to consuming half your system's RAM by default. If you
like, you can give it more by passing --max-memory <max-memory-in-bytes>
or,
if you're feeling adventurous, --no-memory-limit
.
Disassembly transformation
At this point, retdec has either completed, or run out of memory and crashed,
but hopefully not before leaving behind a number of files, including one named
foo.c.frontend.dsm
, which is our disassembly file. We will need to transform
this into a GNU as file. This can be done via a simple script; you can find the
Python 3 script I wrote to do this
here.
Run the script:
$ ./dsm2s ./foo.c.frontend.dsm > foo.s
Disassembly examination
The output disassembly should now be examined for any obvious issues.
grep -Ei 'NOMATCH:|statically' foo.s
If you get any output, something has gone wrong. Either the transformation
script saw a line of retdec disassembly output it didn't understand, or you
forgot to pass --backend-keep-library-funcs --no-default-static-signatures
to
retdec as mentioned above and retdec has omitted some functions from the
output. In this case, you will need to rerun retdec.
Assembling and linking
Let's write a simple makefile:
$ cat >Makefile <<END
nfoo.exe: foo.o foo.ld
i686-w64-mingw32-gcc -o "$@" -nostdlib -T foo.ld -mwindows "$<"
foo.o: foo.s
i686-w64-mingw32-gcc -c -o "$@" "$<"
END
We will also need a custom linker script:
$ cat >foo.ld <<END
ENTRY(Start)
SECTIONS {
Start = 0xENTRYPOINT;
/* TEXT */
. = 0xTEXT_START;
.text . : {
*(.text)
}
/* RDATA */
. = 0xRDATA_START;
.rdata . : {
*(.rdata)
}
/* DATA */
. = 0xDATA_START;
.data . : {
*(.data)
}
/* BSS */
. = 0xBSS_START;
.bss . : {
*(.bss)
}
/* RSRC */
. = 0xRSRC_START;
.rsrc . : {
*(.rsrc)
}
}
END
The placeholders 0xENTRYPOINT
and 0xTEXT_START
, etc. must be replaced with
the hardcoded offsets of the entrypoint and sections. To find these, run
i686-w64-mingw32-objdump -x foo.exe
on the original executable.
We will also need to make some manual changes to the disassembly file. Firstly,
you will probably find that some bytes in the disassembly have been output by
retdec as 0x??
. These bytes occur at the end of sections and should usually
just be removed. Change .byte 0xAB,0xCD,0x??
to .byte 0xAB,0xCD
etc. Make
sure not to leave entries like this: .byte 0xAB,,0xCD
or trailing commas as this will be
interpreted as three bytes.
At this point the disassembly should now compile and link, but there are several issues remaining:
Firstly, in order for our strategy to work, the offsets of all functions and data must align exactly. Rarely occurring bugs in the retdec disassembler may cause the occasional machine code byte not to be emitted, causing all functions after that point to be misaligned. Fortunately after fixing the static library recognition issue, for me, this only occurred for two bytes across two functions in the entire 7.5MiB executable.
Secondly, there are some minor tweaks needed for the BSS section, discussed below.
Thirdly, creating a compatible executable doesn't just require us to create compatible sections. We also need to recreate the import tables used by the original binary. Fortunately, the import table and import address table used by PE files will all be preserved in the data sections, but we do need to tell the linker to use them so that it sets the address of them in the PE file headers.
We'll deal with the first issue first.
Finding the missing bytes
After assembling and linking for the first time, examine your new executable by
running i686-w64-mingw32-objdump -x
on it and compare the output to that of
running the same command on the original executable. You want everything to
match as much as possible, particularly section load addresses and section
sizes.
If your load addresses are off, check the values you used in the linker script.
If one of your sections (text/rdata/data/etc.) has a size smaller than that
of the original file, you have a serious problem, probably caused by the retdec
bug described above. (If a section has a very slightly larger size, this is
PROBABLY caused by GNU ld using slightly more conservative padding of
sections than the linker used to link the original binary did. This should be
harmless and can be ignored, assuming another section doesn't start in virtual
memory immediately after that section. That, or extraneous bytes have somehow
gotten emitted somewhere (retdec bug?), which is far more serious.) The nature
of this bug appears to be that if retdec can't disassemble an instruction, it
might simply not emit the machine code for that instruction. retdec tries to
automatically distinguish between code and data, and output hex dumps for data
instead of trying to disassemble it, but in rare circumstances it might mistake
unused data or padding between functions in .text
for code. This only
occurred for two functions in the entire disassembly file in my case.
However, you do need to identify the location of the issue. Here is how I went
about it. In the disassembly file, search for .section .text
, which marks the
start of the text section. Immediately after this line place the following:
.globl TEXT_begin
TEXT_begin:
Then search for the next .section
directive, which will probably be something
like .section .rdata
. Immediately before this, place the following:
.globl TEXT_end
TEXT_end:
Run make, and examine your new executable by running i686-w64-mingw32-nm
on it.
The addresses of TEXT_begin
and TEXT_end
tell you where the text in your
disassembly file truly begins and ends. The reason why this is more reliable
than the output of i686-w64-mingw32-objdump -x
is that the section sizes can
be rounded up very slightly (by a couple of bytes) for alignment purposes,
whereas this will give you accurate output. Run i686-w64-mingw32-objdump -x
on the original binary and add the load address of the section to its size to
determine where the section ends in memory in the original binary.
Binary search. You now have the true beginning and end of your text section, which is presumably short a few bytes. To locate the missing bytes causing the text to be short compared to the original binary's text by a few bytes, I applied a simple binary search technique, calculated by hand:
Search the disassembly file for the
TEXT_begin
address. Every line contains the virtual memory address it should appear at, so this locates the first byte in the text section. This should be right at the top of the file, immediately after.section
. Note the line number of this line.Find the line corresponding to the
TEXT_end
address in the same way. Note this line number.Find the midpoint between the two lines by calculating
beginning line number + (end line number - beginning line number)/2
.At the given line, insert the following:
.globl TEXT_midpoint01 TEXT_midpoint01:
Run
make
and then runi686-w64-mingw32-nm
on your new executable to get the symbols. This tells you what the linker thinks the address of the new symbolTEXT_midpoint01
is. If this address is correct (meaning that it corresponds with the address printed on the line immediately after that label in the disassembly file), then the missing byte must be after that point. If the address is a few bytes short, then the missing byte must be before it.Continue the binary search by going back to step 3 and repeating with the new narrower search window, naming the new symbol
TEXT_midpoint02
etc. Eventually (after 17 iterations in my case) you will narrow it down to just an instruction or two.
I kept track of this process using a text editor as “pen and paper”. Here's what it looked like by the end:
14 TEXT_begin expect 0x401000 good
924213 TEXT_m01 expect 0x75f5a8 good
1386327 TEXT_m02 expect 0x92ee6e good
1617384 TEXT_m03 expect 0xa04321 good
1675148 TEXT_m05 expect 0xa30f0c good
1689589 TEXT_m07 expect 0xa3b201 good
1693199 TEXT_m09 expect 0xa3d6ba good
1695004 TEXT_m10 expect 0xa3e925 good
1695906 TEXT_m11 expect 0xa3f309 good
1696357 TEXT_m12 expect 0xa3f7c5 good
1696470 TEXT_m14 expect 0xa3f912 good
1696526 TEXT_m15 expect 0xa3f9a1 good
1696554 TEXT_m16 expect 0xa3fa0a good
1696568 TEXT_m17 expect
culprit: 0xa3fa36
1696583 TEXT_m13 expect 0xa3fa68 bad
1696809 TEXT_m08 expect 0xa3fcd8 bad
1704030 TEXT_m06 expect 0xa4493a bad
1732912 TEXT_m04 expect 0xa57293 bad
1848441 TEXT_end expect 0xaa470e bad
Some notes: By adding the label lines to the disassembly, you change the line numbers of all statements following, making the results a little off, but this is too small to pose an issue for the above purposes. You could use the addresses on each line instead if you liked.
In my case, the culprit looked like this:
# function: function_a3fa36 at 0xa3fa36 -- 0xa3fa37
# data inside code section at 0xa3fa37 -- 0xa3fa38
The ranges above are [inclusive,exclusive)
. Note that a function is listed
comprising exactly one byte, yet no byte is emitted, and then the disassembler
changes its mind and decides to do a hexdump instead... but never emits the
single byte of the “function”.
I fixed this by inserting a .byte 0xF4
directive, which corresponds to the
x86 HLT
(halt) instruction, which will cause the program to crash. My
assumption is that this is dead code or not actually a function. An identical
one-byte deficit occurred just two lines later, which I fixed in the same way.
These were responsible for the two byte deficit I was seeing in the size of my
.text
section. These were the only serious issues in the entire 180MiB
disassembly.
BSS issues
The executable I was working with dated from 2001 and was linked with MSVC6. I don't fully understand how its BSS region works, because it doesn't have a separate BSS section and instead seems to rely on some obtuse use of PE file headers.
In any case, retdec disassembled the BSS section as explicit zero bytes and
didn't mark it as a separate section, so I had to fix this up manually. Use
i686-w64-mingw32-objdump -x
as necessary to find the start of BSS (you should
have already put it in your linker script), and locate the correct virtual
address in the disassembly file. It will look something like this:
.byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 #b74ff9;|................|
You will need to add a .section .bss
directive. Note that the addresses are
not divisible by 16 but instead offset; this is because retdec likes to decode
ASCII strings as quoted strings rather than hex dumps, meaning that following
hex data is offset. When this happens I split the line in a visually appealing
way like this:
.byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00 #b74ff9;|................|
.section .bss
.byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 #b74ff9;|................|
Make sure you get the directive at exactly the right address. As usual, avoid accidentially leaving any trailing commas as this will be interpreted as an additional byte even if not followed by any digits.
Import tables
Since our disassembly is a dump of all section contents, and import tables live in sections, our disassembly fortunately contains a copy of the original executable's import and import address tables. In order to use these, we just need to tell GNU ld to use them. Import tables are located by a PE loader via fields in the PE file headers, so we need to tell GNU ld to set these fields to the virtual addresses of our import and import address tables. Fortunately, after grepping through binutils source code, I was able to determine that GNU ld has a way to do this: certain symbols with specific names are taken as designating import and import address table begin and end addresses. If GNU ld sees these symbols present, it will use their values to set the import table addresses in the PE header.
Firstly, we will need to add symbols in our disassembly file locating the
beginnings and ends of both our import and import address tables. You can
determine the addresses of these using i686-w64-mingw32-objdump -x
on the
original binary. Again, the hex offset means we have to split lines here.
.byte 0x00,0xff,0xff,0xff,0xff,0xfc,0x46,0xaa,0x00 #b00f47;|......F....p....|
.globl my_import_table_start
my_import_table_start:
.byte 0xf4,0x17,0x70,0x00,0x00,0x00,0x00 #b00f47;|......F....p....|
[omitted]
.byte 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00 #b01097;|.............z4p|
.globl my_import_table_end
my_import_table_end:
.byte 0x7a,0x34,0x70 #b01097;|.............z4p|
The same must be done for the import address table:
.globl my_iat_start
my_iat_start:
.byte 0x7a,0x34,0x70,0x00,0x68,0x34,0x70,0x00,0x88,0x34,0x70,0x00,0x10,0x34,0x70,0x00 #aa5000;|z4p.h4p..4p..4p.|
[omitted]
.byte 0x14,0x35,0x70,0x00,0x24,0x35,0x70,0x00,0x00,0x00,0x00,0x00
.globl my_iat_end
my_iat_end:
.byte 0x00,0x00,0x00,0x00 #aa57a0;|.5p.$5p.........|
Now we must amend our linker script to generate the necessary magic symbols
which GNU ld uses. Amend the .rdata
section defined in the linker script as
follows:
/* RDATA */
. = 0xRDATA_START;
.rdata . : {
*(.rdata)
.idata$2 = my_import_table_start - 0x400000;
.idata$4 = my_import_table_end - 0x400000;
.idata$5 = my_iat_start - 0x400000;
.idata$6 = my_iat_end - 0x400000;
}
Note that the cryptic symbols .idata$2
, .idata$4
, .idata$5
and .idata$6
specify the import table start address, import table end address, IAT start
address and IAT end address respectively. (Search for these in the binutils
source code if you're curious as to how they work.) Note also that above I had
to offset these relative to the image base, which on Windows is usually
0x400000 and in any case can be ascertained from objdump.
We also defined a .rsrc
section earlier. In much the same vein as for the
IAT, if GNU ld sees a section named .rsrc
, it automatically sets the PE
header to point to it as the executable's resources, so we don't need to do
anything more to have the resource section work.
Linker assertions
This is probably the most important advice so far: Add assertions to your linker script to ensure that your offsets are correct and don't slide.
/* TEXT */
. = 0xTEXT_START;
.text . : {
*(.text)
}
ASSERT(TEXT_end == 0xCORRECT_TEXT_END, "text end (object)")
ASSERT(ADDR(.text) + SIZEOF(.text) == 0xPADDED_TEXT_END, "text end (padded)")
ASSERT(. == 0xPADDED_TEXT_END, "position")
You should add similar ASSERT
lines after each section (text, rdata, data,
etc.) This will generate a linker error if you ever change anything in your
disassembly that causes some function to grow or shrink, which would otherwise
have catastrophic consequences.
Note that TEXT_end
is a symbol we defined inside the disassembly file earlier
to get the true, linker-unpadded text size. The second and third ASSERT lines
aren't as important, especially if you don't have one section in the original
binary starting immediately after another, in which case the linker adding a
few bytes of padding is likely of no consequence.
You can determine 0xPADDED_TEXT_END
by examining the output of
i686-w64-mingw32-objdump -x
, calculating the section end according to the
shown VMA and size fields, checking that the size is sane, and adding the
two.
Eyeballing objdump
Now run make
once again and compare the output of i686-w64-mingw32-objdump
-x
on the new executable and the original executable. Eyeball the output for
differences, which should be minimised. Exact correspondance is unlikely to
occur and in many circumstances will probably be impossible, but all
consequential differences should be eliminable. Slight differences in padding
or the file offset at which section data is placed should be inconsequential in
the vast majority of cases (unless an executable is doing something weird and
trying to examine its own binary; self-extractors perhaps).
The important things which must match are certain data directory descriptors, particularly the import and import address table sizes and addresses; section load addresses; and section sizes, the sanity-checking of which has already been discussed; and the entrypoint address.
Although probably not strictly necessary, at this point I amended the Makefile to make the new executable slightly more similar to the original one:
foo.exe: foo.o foo.ld
i686-w64-mingw32-gcc -o "$@" -nostdlib -T foo.ld -mwindows -Wl,--file-alignment -Wl,0x1000 -Wl,--major-image-version -Wl,0 "$<"
The -Wl,--file-alignment -Wl,0x1000
option uses a higher file alignment than
that used by GNU ld by default; it seems MSVC6 preferred a higher alignment of
sections in files, even though Windows does not require this. Setting this should be
unnecessary but makes the objdump output more similar.
The -Wl,--major-image-version -Wl,0
command sets a versioning field in the PE
header. Nothing cares about this, but it's one less difference.
The moment of truth
After doing all this, I ran make one more time and checked that the output of objdump looked perfect, or at least as perfect as was obtainable without modifying GNU ld; convincing ld not to pad sections by two or three bytes probably wasn't happening, but also doesn't matter.
Finally, I ran the binary under wine. And, amazingly, it worked perfectly the first time, just like the original.
Caveats
This technique was performed on an old, non-relocatable executable compiled in 2001 with MSVC6. I don't see why this technique wouldn't also be applicable to modern, relocatable, ASLR-enabled executables, but I haven't thought about any issues those might pose while writing this. YMMV.