Embedding of binary data into programs

Methods of embedding binary blobs of any nature into a program using native toolchains.

  • bin2h-style tools

    These tools produce from an arbitrary file a C file which looks something like

    uint8_t filename[] = {0x42, 0x42, ... };

    There are a lot of these tools around and they're trivial to make:

    • xxd (included with vim) (can also reverse this)
    • ImageMagick convert file.bin file.h
    • Search for bin2h or bin2c.
  • bin2obj-style tools

    These skip the C source stage by producing an object file directly. The disadvantage is that the tool is then platform-dependent.

    • objcopy; *NIX/mingw.

        objcopy -I binary -O elf32-i386 -B i386 file.bin file.o
    • GNU ld; *NIX/mingw.

        ld -r -b binary -o file.o file.bin

      ld has the advantage that it doesn't require you to explicitly specify the desired object format and architecture.

    • For Windows, look for tools named bin2coff or bin2obj. Again, there appear to be a lot of tools going by these names. Cursory examination reveals tools to generate both COFF and OMF object files are available.

    • Creating an object file with a single symbol isn't too difficult, so you can also construct such a tool yourself without too much difficulty.

    Various considerations:

    • The symbols exported by such tools may vary. GNU objcopy and GNU ld export these symbols:

      _binary_FILENAME_start
      _binary_FILENAME_end
      _binary_FILENAME_size
    • Invalid characters such as . in the input filename are converted to _. GNU's objcopy does not appear to have any way to override the symbol names used, so the input filename and the desired symbol name must match.

    • Pay attention to in which section the data is placed. For objcopy, you can customize this with

        --rename-section .data=.rodata,alloc,load,readonly,data,contents
    • You might have to use extern "C" for the declarations if you're using C++. Some examples I've seen use asm("") to override the symbol names like so:

      extern uint8_t _binary_FILENAME_start[] asm("_binary_FILENAME_start");
    • Important: These symbols represent the start and end addresses of the data. They are not pointers. You should access them with something like:

      extern uint8_t _binary_FILENAME_start[];
      extern uint8_t _binary_FILENAME_end;
      extern uint8_t _binary_FILENAME_size;

      This is a curious case where declaring an extern void variable and taking the address of it would make sense. This is invalid in ANSI C, though you can do it in GCC. Doing so generates an annoying warning which can't be disabled. Probably best just to use uint8_t and cast to void*.

      The _size symbol is particularly bizarre, since it's a length exposed as a symbol's address, not a variable. Access it with (size_t)&_size.

  • Via Windows Resource Files

    You can actually embed files via Windows .rc files. A caveat of this is that the data gets put in the resource table, not one of the main sections. On the other hand, this means you can use Windows's resource lookup functions (FindResource, LoadResource, SizeofResource, LockResource).

    Another advantage is that you may be able to change the binary files out for others after compilation using resource editing tools. The key term to search for is “user-defined resource”.

    #define BINARY_FILE 256
    #define RES_SOME_FILE 123
    RES_SOME_FILE  BINARY_FILE  "filename.bin"
  • Via Assembly

    Many assemblers support an 'include file' directive. nasm and yasm support the incbin command; GNU as supports the .incbin command. The arguments to these commands appear to be identical, and in all cases take a filename and optional offset and length.

    .section rodata
    .global mydata
    .type mydata, @object
    .align 4
    
    mydata:
    .incbin "data.bin"
    
    .global mydata_size
    .type mydata_size, @object
    .align 4
    mydata_size:
    .int mydata_size - mydata

    References:

    extern uint8_t _binary_FILENAME_start[];
    extern uint8_t _binary_FILENAME_end;
    __asm__(
     ".section \".rodata\", \"a\", @progbits\n"
     "_binary_FILENAME_start:\n"
     ".incbin \"" PATH "\"\n"
     "_binary_FILENAME_end:\n"
     ".previous\n"
    );

References