Embedding of binary data into programs
Methods of embedding binary blobs of any nature into a program using native toolchains.
- bin2h-style tools - These tools produce from an arbitrary file a C file which looks something like - uint8_t filename[] = {0x42, 0x42, ... };- There are a lot of these tools around and they're trivial to make: - xxd(included with vim) (can also reverse this)
- ImageMagick convert file.bin file.h
- Search for bin2horbin2c.
 
- bin2obj-style tools - These skip the C source stage by producing an object file directly. The disadvantage is that the tool is then platform-dependent. - objcopy; *NIX/mingw.- objcopy -I binary -O elf32-i386 -B i386 file.bin file.o
- GNU - ld; *NIX/mingw.- ld -r -b binary -o file.o file.bin- ld has the advantage that it doesn't require you to explicitly specify the desired object format and architecture. 
- For Windows, look for tools named - bin2coffor- bin2obj. Again, there appear to be a lot of tools going by these names. Cursory examination reveals tools to generate both COFF and OMF object files are available.
- Creating an object file with a single symbol isn't too difficult, so you can also construct such a tool yourself without too much difficulty. 
 - Various considerations: - The symbols exported by such tools may vary. GNU objcopy and GNU ld export these symbols: - _binary_FILENAME_start _binary_FILENAME_end _binary_FILENAME_size 
- Invalid characters such as - .in the input filename are converted to- _. GNU's objcopy does not appear to have any way to override the symbol names used, so the input filename and the desired symbol name must match.
- Pay attention to in which section the data is placed. For objcopy, you can customize this with - --rename-section .data=.rodata,alloc,load,readonly,data,contents
- You might have to use - extern "C"for the declarations if you're using C++. Some examples I've seen use- asm("")to override the symbol names like so:- extern uint8_t _binary_FILENAME_start[] asm("_binary_FILENAME_start");
- Important: These symbols represent the start and end addresses of the data. They are not pointers. You should access them with something like: - extern uint8_t _binary_FILENAME_start[]; extern uint8_t _binary_FILENAME_end; extern uint8_t _binary_FILENAME_size; - This is a curious case where declaring an - extern voidvariable and taking the address of it would make sense. This is invalid in ANSI C, though you can do it in GCC. Doing so generates an annoying warning which can't be disabled. Probably best just to use- uint8_tand cast to- void*.- The - _sizesymbol is particularly bizarre, since it's a length exposed as a symbol's address, not a variable. Access it with- (size_t)&_size.
 
- Via Windows Resource Files - You can actually embed files via Windows - .rcfiles. A caveat of this is that the data gets put in the resource table, not one of the main sections. On the other hand, this means you can use Windows's resource lookup functions (- FindResource,- LoadResource,- SizeofResource,- LockResource).- Another advantage is that you may be able to change the binary files out for others after compilation using resource editing tools. The key term to search for is “user-defined resource”. - #define BINARY_FILE 256 #define RES_SOME_FILE 123 RES_SOME_FILE BINARY_FILE "filename.bin"
- Via Assembly - Many assemblers support an 'include file' directive. - nasmand- yasmsupport the- incbincommand; GNU- assupports the- .incbincommand. The arguments to these commands appear to be identical, and in all cases take a filename and optional offset and length.- .section rodata .global mydata .type mydata, @object .align 4 mydata: .incbin "data.bin" .global mydata_size .type mydata_size, @object .align 4 mydata_size: .int mydata_size - mydata- References: - Via Inline Assembly - This is quite insane. It can be found in - ipxe..
 - extern uint8_t _binary_FILENAME_start[]; extern uint8_t _binary_FILENAME_end; __asm__( ".section \".rodata\", \"a\", @progbits\n" "_binary_FILENAME_start:\n" ".incbin \"" PATH "\"\n" "_binary_FILENAME_end:\n" ".previous\n" );