Adventures in reverse engineering Broadcom NIC firmware
2023: This article will be given as a talk at 37C3.
For some time now, I've been reverse engineering the firmware of the Broadcom BCM5719 Ethernet NIC chip, so that open source firmware can be produced for it. The BCM5719 is a PCIe chip which provides up to four Gigabit Ethernet ports, and is mainly intended for use in server applications. It can be used with the Linux "tg3" driver and is approximately the twelfth generation of chips in a long line of Ethernet NICs ultimately descended from the Tigon range of NICs made by Alteon, the IP of which got transferred to Broadcom at some point.
One example motivating the production of open source firmware for the BCM5719 is that it's the only closed-source firmware blob found in the Talos II, a high-performance POWER9-based system otherwise wholly free of firmware blobs.
The reverse engineering project, Project Ortega, began in December 2017 and involved reverse engineering proprietary firmware to determine what any open source replacement would need to do. Mainly this involved producing a reverse engineered C codebase from the disassembly of proprietary firmware, then producing a natural-language specification for others to reimplement; the actual reversed code itself is not published. In other words, this is a clean-room reverse engineering workflow.
The reverse engineering side is now pretty much done and availability of open source firmware for the BCM5719 is waiting on the completion of a reimplementation effort (thanks to Evan Lojewski). This is a cleanroom implementation and doesn't share any code with Project Ortega or the proprietary firmware, but is produced using the human-readable specifications delivered by Project Ortega. Once this is delivered, it will be possible to use Raptor's POWER9 systems with purely 100% free, open source firmware. As far as I am aware, there is no other machine in the same performance class which can make such a claim.
The rest of this article describes the entire journey of getting to this point, and briefly discusses the innards of the BCM5719.
Reverse engineering: the road to MIPS
I'm not the first person to have tried reverse engineering Broadcom NICs. These slides, these slides and these slides discuss previous reverse engineering effort, though of an older device. They discuss certain debugging tools they produced as part of their reverse engineering. However, they never published these tools, so except for a few hints which could be gleaned from the slides, and the knowledge that MIPS cores were involved, I was starting from scratch.
The first step in reverse engineering was examining the firmware images fed to
the device. I was able to find such images online, and a cursory binwalk
,
binwalk -A
told me the image indeed started with a chunk of big-endian MIPS
code.
When you start reverse engineering firmware, the big problem you have is you don't know what's up, down, left, or right; that is, you have no particular plane of reference, because you don't know what any of the memory addresses do. Figuring out where the code you're examining is loaded, and which area of SRAM is used for stack space are the first, and easiest, steps; then you're stuck examining a program which, in terms of its input/output, deals solely in access to mysterious I/O registers of unknown meaning, semantics, and purpose. There's only one way to get started: try and find a clue, any clue, about what any part of it does, even just one part of it, and try and use that to infer what other parts of it do, and so on. The realisations tend to “snowball”; you figure out what one register does, and then suddenly you understand what another (previously unknown) block of code, which interacts with that register, does, which in turn results in you suddenly understanding what another register which that same block of (now comprehended) code relates to, and so on. Then at some point this “avalanching” process ends and you have to find a new “thread” to pull on to get it started again. As a result, reverse engineering as a process tends to alternate between periods of exhilaration and of feeling like it's completely hopeless and there's no prospect of ever figuring out what's going on. Looking back at what I now know about BCM5719 firmware, I'm astonished I got to this point; I've traditionally thought myself bad at reverse engineering.
Actually, I only took a look at the BCM5719 images in the first place out of curiosity, not intending to actually reverse engineer them; but I started doing so when I found that what I was disassembling looked significantly easier to understand and reverse engineer than I was expecting. The reason for this was the first big break that gave a starting point to assign meaning to register addresses: Broadcom (rather unusually for them) actually publishes a register manual for the BCM5719, and the register addresses in that manual actually appeared to correspond with many of the register accesses found inside the MIPS firmware. This meant that most of the work performed by the MIPS firmware could actually be readily comprehended, leaving only a small number of undocumented registers of unknown purpose; of those, many have subsequently been successfully guessed, leaving only a handful of miscellaneous and inconsequential mysteries.
The role of the MIPS. There's one MIPS core per port in these NICs, so for the 4-port BCM5719, there are four MIPS cores. These are referred to as the “RX RISC” internally, but despite the name, they don't actually have anything to do with RX and, in fact, aren't involved in the data plane, the flow of network traffic at all.
I mentioned above that the BCM5719 is approximately the twelfth generation in a long line of Ethernet NICs. As a result, they've been tweaked and mutated countless times over those generations, as far as I can tell, never being fully rearchitected, but changed just enough to meet new requirements each time. This leads to peculiarities like the “RX RISC” MIPS core which doesn't have anything to do with RX.
The story appears to go like this: once upon a time, the distant ancestors of the BCM5719 had two MIPS cores per port; an “RX RISC” (CPU 0) and a “TX RISC” (CPU 1), which were involved in the actual transmission/reception process. At some point, however, these functions were moved into hardware. The MIPS cores couldn't be scrapped entirely, though, because of some random dregs of functionality that also happened to be implemented on them. We're talking about highly miscellaneous things, like loading the MAC addresses into registers from flash at boot up, wake on LAN support, or even things as truly inane as implementing the PCIe VPD capability. These assorted functions were likely too random for anyone to want to bother to move them into hardware, and there would be no advantage to doing so. So one of the MIPS cores was picked to stay, and the other was scrapped; “RX RISC” was probably kept over “TX RISC” because it was numbered CPU 0. In actuality though, “RX RISC” now implements only random dregs of functionality and has long since ceased to have anything to do with RX.
As I mentioned above, each port's MIPS core mainly deals in registers which are also in the official register manual and can be accessed by drivers. There's little that can be done by the MIPS core that can't be done by the host, making the MIPS something of an “autoconfigurator”. The bulk of its utility is in its ability to run when the host isn't (WoL, etc.), but it has few special powers. It's also not powerful; it's something along the lines of a MIPS II and has no hardware multiply or divide support.
Compiler adventures. After a somewhat trivial but highly laborious process of translating MIPS disassembly to C, I had a functional reference codebase for the MIPS side of the device. Actually compiling this turned out to be an amusing excercise, because MIPS cores without hardware multiply or divide support aren't officially a thing anymore, which means that neither clang or GCC support targeting such devices. The last version of GCC to support targeting MIPS cores without hardware multiply/divide is apparently version 2.96 (!). Not wanting to have to use an ancient version of gcc, I sought alternatives, and ended up cheesing it by invoking clang in a very particular way. Mercifully, this worked. I was able to confirm the functionality of my reversed C code by compiling it and running it on the device, and confirming that everything still worked.
Decoding the APE
Having finished the reverse engineering of the relatively easy MIPS side of the device, this left the APE. The register manual briefly mentioned that the APE was another CPU on the chip (unlike the MIPS cores, there's one APE for the entire chip, not per port), but barely any of the registers in the manual related to it, and it didn't even state what architecture it used. The firmware image fed to the APE appeared to be lightly compressed, and binwalk couldn't figure out its architecture. I knew the APE would be important, though, because the APE firmware contained strings such as "NCSI", implying that the APE firmware implemented the device's NC-SI functionality.
NC-SI, for those unfamiliar with it, stands for “Network Controller Sideband Interface”. It essentially provides a way for a NIC to be attached to a BMC (in addition to being attached to the host via PCIe). NC-SI can use either a variant of RGMII or SMBus and the BCM5719 supports both, but for our purposes we're only interested in the RGMII variant.
Machines such as the Talos II use NC-SI to provide the machine's BMC with network connectivity, so implementing the NC-SI functionality in any open source firmware is important. Thus, I also needed to reverse engineer the APE... but had no idea as to how the image was compressed. It clearly wasn't compressed with any common compression algorithm. Mercifully unlike the MIPS firmware, it had at least a few strings, which is how I was able to tell it was compressed; a hex dump showed chunks of human-readable text with garbage interrupting them. This implied that the compression algorithm was rather crummy, quite possibly something homegrown and only minimally effective, so I had some hopes of being able to figure it out.
The obvious way to figure out the compression would be to dump the APE's boot ROM, since it would necessarily contain an implementation of the decompression algorithm, but I had no way of accessing the APE's boot ROM. Whereas the MIPS cores could easily be debugged and their entire address space accessed over PCIe via a number of registers exposed for controlling them, no such means of access was ever found for the APE. No direct means of accessing the APE's memory space over PCIe was ever found, and scanning of the address space of the MIPS cores made it clear that the APE's boot ROM was mapped only in the APE's address space.
Thus, I had no way of gaining access to the APE's address space over PCIe, since the hardware itself provided no means of such access. The only way for me to get control of the APE, then, would be to provide a well-formed firmware image so that it could be loaded in the same way that the official APE firmware is loaded by the APE's boot ROM. However, formulating such a firmware image would require me to understand the compression algorithm used by the APE boot ROM, which would require me to dump the APE boot ROM, which would require me to get access to the APE's address space, which would require me to gain execution on the APE, which would require me to provide a well-formed firmware image. In other words, bootstrapping access to the APE had turned into a circular problem.
A hunch. After extensive amounts of time trying and failing to eyeball the compression algorithm from hexdumps of compressed code, and trying any decompression algorithm I could think of against it, I in desperation decided to investigate the PXE option ROM on a hunch.
A PXE option ROM can be placed in the flash chip attached to the NIC to be served to the host on boot. On x86 platforms, this option ROM contains x86 code executed by the host, which for NICs generally implements PXE/iSCSI boot functionality. The BCM5719 firmware image I was working with at this time was for a PCIe add-in card, and thus had two x86 option ROMs included: an x86 real mode option ROM for legacy PC BIOS systems, and an x86-64 UEFI option ROM for modern x86-64 systems.
Since implementing PXE boot functionality wasn't my priority or even my interest, I hadn't paid any real attention to the PXE option ROM found in Broadcom's firmware image. But cursory eyeballing of a hexdump had shown a four-character ASCII eyecatcher “CMPS”, suggesting part of the PXE option ROM was compressed. A hex dump of the compressed data showed patterns eerily similar to the compressed APE code, so I hypothesised that the decompression algorithm used for the option ROM was in fact the same as that used for the APE. Most likely, this algorithm was first adopted for the option ROM, and later on, when the APE core was added and a compression algorithm was needed for its firmware, this algorithm was already lying around—was my theory.
Not wanting to delve into x86 real mode code (who does?) I set about reverse engineering the UEFI option ROM. This immediately went nowhere as I discovered that the UEFI option ROM was compressed with a UEFI standard compression algorithm. The decompression code I was looking for would be found only in the real mode variant of the option ROM. With utter dread, I waded into a truly horrible reverse engineering experience — without a doubt, the most horrible and most mentally draining reverse engineering process I've ever suffered. After a long period just locating which bit was the decompression code, I then set about “raising” the x86 disassembly to C, as I do. Doing this directly however proved a nonstarter: this being x86 real mode code, working with pointers isn't simple. The disassembled code constantly changed the value of segment registers, making it extremely hard to follow with a C mindset. Trying to follow this code and convert it to C proved cognitively exhausting, and I had no confidence in the accuracy of the C I was producing.
Instead, I changed my approach and decided to cheese it: I decided to “emulate”
x86 real mode inside C by translating x86 real mode disassembly into C very
directly, modelling segment registers explicitly in C. An instruction which
sets the segment register ES became SetES(x)
, and an instruction which loads
from an offset relative to segment register ES became LoadES(offset)
. I kept
each disassembled instruction as a comment above each line of C produced from
it, and massaged my C code until I appeared to have something equivalent. Once
I was successfully able to decompress the PXE option ROM, I was able to then
refactor this “x86 real mode in C” code into something more comprehensible,
rerunning the code every step of the way to ensure I hadn't changed the
algorithm.
Once I finally had a concise, sane description of the decompression algorithm in C, the algorithm turned out to be hilariously simple. I was also then able to figure out the origins of the compression algorithm; it's called LZSS, and the particular LZSS format used here turns out to originate from some public domain DOS code which someone posted on a Japanese BBS in 1988. In fact, this public domain code was linked from the Wikipedia article on LZSS all along. I confirmed that the code matched the algorithm I'd reverse engineered, which at least meant I didn't have to write a compression algorithm for it; I could use the original compression routine of 1988.
Thankfully, trying to decompress the APE image with this algorithm proved successful. My hunch had been correct, and I hadn't waded through x86 real mode code for nothing. It was only at this point that I finally discovered that the APE was an ARM Cortex-M3 running in little-endian mode.
Panic. Although I now knew how to compress APE firmware images to be flashed to the device, and flash them to the device, I had at this point discovered the meaning of a chunk of previously unknown data at the end of the APE firmware image: an RSA signature. This indicated almost certain doom; it seemed a given that the APE boot ROM verified the signature of the APE firmware, making any effort to make an open-source replacement a non-starter.
I was certain that I was screwed at this point, but someone prodded me to keep giving it a go anyway; maybe they didn't check the signature, or maybe they had a bug in their verification code which could be exploited. I was dubious, but trundled on with withered motivation for a while. I decided to examine the proprietary APE code which I had now decompressed, to see if it might give me some means of access to the APE. As I mentioned above, the areas of SRAM from which the APE code is stored and executed aren't accessible at all over PCIe, so if the use of the bootloader in the APE boot ROM is ruled out, the only prospect of gaining access to the APE is if the proprietary firmware running on it will let me in somehow.
Poring over disassembly of the proprietary APE code, I was able to find a message which can be sent to the APE's firmware from the host, which will cause it to write a word to an address of your choice within its address space... provided that it is within a certain narrow range. Bizarrely however, this range included some of the APE's executable code; I simply had to invoke this function repeatedly to copy every word of the image I wanted to execute to the APE's SRAM, then stimulate the APE into jumping to the right address, which was done by sending another seemingly innocent message to its firmware, the ordinary handling of which just happened to cause jumping into that area.
(In case you're wondering, no, this isn't a remotely exploitable vulnerability. This can only be done from the host, and the host has the ability to reflash the entire device's firmware anyway, so.)
After writing the shellcode to facilitate this mode of access, I finally had a way to access the APE's address space. Only at this point was I able to dump the APE's boot ROM. After disassembling that boot ROM, I finally understood properly how to formulate APE images, and was also able to confirm the decompression algorithm again. (It would have been much easier to reverse engineer this ARM decompression code than the x86 real mode decompression code, but on the other hand, it would never have been possible for me to figure out how to get execution on the APE — and thus be able to dump the APE boot ROM — without knowing how to decompress the APE image. Chicken and egg...)
Most importantly of all, however, I was able to verify the complete absence of any RSA signature verification code. The RSA signature on the APE firmware appears, in fact, quite vestigial. (There is some evidence to suggest that Broadcom once added certain utterly random functions to its NICs; namely card readers and TPMs (yes really). Although these have long since been removed by the time of the BCM5719, it seems likely that this RSA signing was added when the device was to have TPM functionality, and presumably removed from the boot ROM once that functionality was no longer needed.)
More clues. I now had all the tools I needed to start reverse engineering the APE firmware and targeting the APE with my own code, but still had basically no conception of the registers, the various APE-specific I/O peripherals to which only the APE has access, as unlike the registers on the MIPS side, these are not documented at all in the register manual. Given the total lack of documentation by comparison to the MIPS side, I was quite dubious that I would ever be able to properly figure out what the APE is actually doing. Fortunately, however, I was able to extract more clues from Broadcom's diagnostic tools.
Broadcom's diagnostic tool is a command-line tool with various subcommands which execute diagnostic functions against various models of Broadcom NIC. It's available in three versions: a DOS version, a UEFI version and a Windows version. The DOS version is available on their website, the UEFI version appears to be available on their website but on closer inspection the ZIP file contains the DOS version (did someone upload the wrong file?), and as far as their website is concerned you'd think the Windows version doesn't exist, yet it definitely does, because you can find it lurking on the FTP sites of server OEMs like Dell or Supermicro. It's no joke that if you need drivers or diagnostic tools for Broadcom NICs, your best hope is probably a server OEM's website rather than Broadcom's website—oh wait, I already ranted about this.
Although the diagnostic tool was quite helpful, ironically this is not because I ever managed to run it. Neither the DOS, UEFI nor Windows versions have ever worked for me. Instead, the diagnostic tools are useful because they contain various routines to probe APE registers, and then print the contents of these registers along with their names. It's not much information, but it's all I have, and it makes all the difference. Pretty much everything I know about the APE that isn't guessed is from the dry reverse engineering of this diagnostic tool.
Finishing up. After extensive reverse engineering of the APE, I finally was able to come to a good understanding of how the NC-SI functionality is implemented. A fair third of the firmware related to the SMBus variant of NC-SI, which I don't care about and was able to ignore; another third or so of it appears to relate to miscellaneous “monitoring”-type functionality, like getting temperature readings over SMBus, etc., which I could also ignore. That left the important chunk of it, which related to NC-SI over RGMII. After some clues from the diagnostic tool and a lot of guessing, I ended up with a complete idea of how frames are transmitted to the BMC, received from the BMC, transmitted to the network and received from the network.
It's worth noting that the BCM5719's NC-SI connectivity is slow, much slower than the Gigabit Ethernet ports to which it's linked, and always will be; whereas TX/RX is done in hardware on the host side, for frames going between a BMC and the network, the frame bodies have to be manually shunted around, 32 bits at a time, between one set of registers and another.
The only big hurdle was one final remaining bug in my proof of concept code, which had me tearing my hair out. The bug manifested in a truly bizarre way: transmissions of IPv4 frames containing TCP or UDP traffic would be mysteriously eaten, never to be seen on the network. ARP and ICMP traffic, however, would work fine, as would TCP or UDP traffic if I sent it with a modified Ethertype. Eventually this turned out to be something stupidly simple, if peculiar in the symptoms it caused: I was including the FCS field of each Ethernet frame in what I passed to the rest of the hardware for transmission to the network, and the hardware really wanted me not to include that field. I don't know what was going on there, but the most likely explanation is that it was confusing a state machine hardcoded inside hardware intended to implement some sort of TCP/UDP checksum offload.
Documentation. Since this entire reverse engineering project involved my extensive exposure to reverse engineered, proprietary code, I can't exactly just go and write FOSS firmware for this thing. The writing of open source firmware therefore, was always going to have to be undertaken by someone else. The objective of this project, then, was to produce the documentation for how to write that firmware; that objective has been pretty much completed. Enjoy the libre firmware when it's ready!