Embedding of binary data into programs
Methods of embedding binary blobs of any nature into a program using native toolchains.
bin2h-style tools
These tools produce from an arbitrary file a C file which looks something like
uint8_t filename[] = {0x42, 0x42, ... };
There are a lot of these tools around and they're trivial to make:
xxd
(included with vim) (can also reverse this)- ImageMagick
convert file.bin file.h
- Search for
bin2h
orbin2c
.
bin2obj-style tools
These skip the C source stage by producing an object file directly. The disadvantage is that the tool is then platform-dependent.
objcopy
; *NIX/mingw.objcopy -I binary -O elf32-i386 -B i386 file.bin file.o
GNU
ld
; *NIX/mingw.ld -r -b binary -o file.o file.bin
ld has the advantage that it doesn't require you to explicitly specify the desired object format and architecture.
For Windows, look for tools named
bin2coff
orbin2obj
. Again, there appear to be a lot of tools going by these names. Cursory examination reveals tools to generate both COFF and OMF object files are available.Creating an object file with a single symbol isn't too difficult, so you can also construct such a tool yourself without too much difficulty.
Various considerations:
The symbols exported by such tools may vary. GNU objcopy and GNU ld export these symbols:
_binary_FILENAME_start _binary_FILENAME_end _binary_FILENAME_size
Invalid characters such as
.
in the input filename are converted to_
. GNU's objcopy does not appear to have any way to override the symbol names used, so the input filename and the desired symbol name must match.Pay attention to in which section the data is placed. For objcopy, you can customize this with
--rename-section .data=.rodata,alloc,load,readonly,data,contents
You might have to use
extern "C"
for the declarations if you're using C++. Some examples I've seen useasm("")
to override the symbol names like so:extern uint8_t _binary_FILENAME_start[] asm("_binary_FILENAME_start");
Important: These symbols represent the start and end addresses of the data. They are not pointers. You should access them with something like:
extern uint8_t _binary_FILENAME_start[]; extern uint8_t _binary_FILENAME_end; extern uint8_t _binary_FILENAME_size;
This is a curious case where declaring an
extern void
variable and taking the address of it would make sense. This is invalid in ANSI C, though you can do it in GCC. Doing so generates an annoying warning which can't be disabled. Probably best just to useuint8_t
and cast tovoid*
.The
_size
symbol is particularly bizarre, since it's a length exposed as a symbol's address, not a variable. Access it with(size_t)&_size
.
Via Windows Resource Files
You can actually embed files via Windows
.rc
files. A caveat of this is that the data gets put in the resource table, not one of the main sections. On the other hand, this means you can use Windows's resource lookup functions (FindResource
,LoadResource
,SizeofResource
,LockResource
).Another advantage is that you may be able to change the binary files out for others after compilation using resource editing tools. The key term to search for is “user-defined resource”.
#define BINARY_FILE 256 #define RES_SOME_FILE 123 RES_SOME_FILE BINARY_FILE "filename.bin"
Via Assembly
Many assemblers support an 'include file' directive.
nasm
andyasm
support theincbin
command; GNUas
supports the.incbin
command. The arguments to these commands appear to be identical, and in all cases take a filename and optional offset and length..section rodata .global mydata .type mydata, @object .align 4 mydata: .incbin "data.bin" .global mydata_size .type mydata_size, @object .align 4 mydata_size: .int mydata_size - mydata
References:
Via Inline Assembly
This is quite insane. It can be found in
ipxe
..
extern uint8_t _binary_FILENAME_start[]; extern uint8_t _binary_FILENAME_end; __asm__( ".section \".rodata\", \"a\", @progbits\n" "_binary_FILENAME_start:\n" ".incbin \"" PATH "\"\n" "_binary_FILENAME_end:\n" ".previous\n" );