Fuzzing PCI express: security in plaintext
By Julia Hansbrough, Software Engineer
Google recently launched GPUs on Google Cloud Platform (GCP), which will allow customers to leverage this hardware for highly parallel workloads. These GPUs are connected to our cloud machines via a variety of PCIe switches, and that required us to have a deep understanding of PCIe security. Securing PCIe devices requires overcoming some inherent challenges. For instance, GPUs have become far more complex in the past few decades, opening up new avenues for attack. Since GPUs are designed to directly access system memory, and since hardware has historically been considered trusted, it’s difficult to ensure all the settings to keep it contained are set accurately, and difficult to ensure whether such settings even work. And since GPU manufacturers don’t make the source code or binaries available for the GPU’s main processes, we can’t examine those to gain more confidence. You can read more about the challenges presented by the PCI and PCIe specs here. With the risk of malicious behavior from compromised PCIe devices, Google needed to have a plan for combating these types of attacks, especially in a world of cloud services and publicly available virtual machines. Our approach has been to focus on mitigation: ensuring that compromised PCIe devices can’t jeopardize the security of the rest of the computer. Fuzzing to the rescue[…]
Code demonstrating how to load custom ISA on Intel Haswell GPUs via OpenGL. Also includes various ISA utilities and benchmarks. This code works on Windows 8.1. […] For more information, see my related blog posts:
GPU Ray-Tracing The Wrong Way: http://www.joshbarczak.com/blog/?p=1197
SPMD Is Not Intel’s Cup of Tea: http://www.joshbarczak.com/blog/?p=1120
You Compiled This Driver, Trust Me: http://www.joshbarczak.com/blog/?p=1028
Stealing Webpages Rendered on Your Browser by Exploiting GPU Vulnerabilities
Graphics processing units (GPUs) are important components of modern computing devices for not only graphics rendering, but also efficient parallel computations. However, their security problems are ignored despite their importance and popularity. In this paper, we first perform an in-depth security analysis on GPUs to detect security vulnerabilities. We observe that contemporary, widely-used GPUs, both NVIDIA’s and AMD’s, do not initialize newly allocated GPU memory pages which may contain sensitive user data. By exploiting such vulnerabilities, we propose attack methods for revealing a victim program’s data kept in GPU memory both during its execution and right after its termination. We further show the high applicability of the proposed attacks by applying them to the Chromium and Firefox web browsers which use GPUs for accelerating webpage rendering. We detect that both browsers leave rendered webpage textures in GPU memory, so that we can infer which webpages a victim user has visited by analyzing the remaining textures. The accuracy of our advanced inference attack that uses both pixel sequence matching and RGB histogram matching is up to 95.4%.
As I’ve mentioned, I don’t know much about the firmware modding community. It is amazing the things they can do to a system, in ways completely unrelated to security. 🙂 But other readers of this blog are accomplished firmware modders, and one of the smarter ones have suggested a new tool to mention on the blog: GOPupd.
UBU has already been mentioned in previous blog posts, it was done by the win-raid.com forum members, as is this other tool. GOPupd is also hosted on the win-raid.com forum by one if it’s members, LordKAG, who you may have noticed as one of source of many of UEFItools bug reports.
GOPupd is a tool that updates a GOP portion of VideoBIOS dumped from various AMD/ATI and Nvidia graphic cards. Advanced users can use the tool to not only dump, but also can insert a GOP into a VBIOS without it, basically making an older GPU compatible to pure UEFI (non-CSM) mode. That sounds like a risky operation, but it appears that many readers of this blog are smarter than the writer of this blog, so I presume a few of you would be able to handle this, I’m not sure I would. 🙂 The tool is written in Python. You have to register to the forum to get access to their download URLs.
[…] If you are interested in this thread, then you should know a thing or two about GOP. If you need/want pure UEFI Boot (CSM disabled) or Fast Boot, then you need a GOP for your GPU/iGPU, otherwise it is optional (for now). For the iGPU side there is not much you can do, because manufacturers have included them in the UEFI firmware, with GOP drivers from Intel, AMD, ASPEED, Nvidia (recently) and even Matrox. This thread only deals with external cards and only with AMD and Nvidia. This is further limited by the fact that only specific generations have GOP support: for AMD there is a list of IDs in each GOP version, but it is safe to assume that every card after 7xxx generation should work, maybe even 6xxx; for Nvidia there are 6 generations supported – GT21x, GF10x, GF119, GK1xx/GK2xx, GM1xx, GM2xx. […]
Budi Purnomo of AMD posted a message on the GPUopen.com site, about AMD’s GPUOpen initiative, including HSAIL GDB and a related AMD GPU Debug SDK for it. These both sound very interesting, thanks AMD!
Today as part of AMD’s GPUOpen initiative, we are happy to announce the release of HSAIL GDB version 1.0 (prebuilt binary and source code). This is AMD’s first debugger product that is built based on the new GCN debugger core technology. HSAIL GDB marks our first step toward building a rich debugging ecosystem for HSA and HCC. Using HSAIL GDB, you can debug the execution of host CPU code and GPU agent kernel code (at the HSAIL language level) in a single debug session. HSA applications, including HCC and HIP, are supported on AMD APU platforms. To learn more about the capability of HSAIL GDB, I encourage you to read through the HSAIL GDB tutorial. In addition, we also released a new AMD GPU Debug SDK (pre-built binary and source code). This AMD GPU Debug SDK enables tool vendors to build their own rich GPU debugging ecosystem for AMD platforms based on the same GCN debugger core technology introduced within HSAIL GDB. The hardware based implementation provided in the GCN debugger core technology is a vast improvement over the previous debugger implementation provided in the AMD CodeXL OpenCL™ debugger which relies on repeated kernel recompilation and replay. Using the GCN debugger technology, we are able to stop all the massively parallel threads in the GPU at a breakpoint, inspect the machine state by reading registers and memory, and then resume and execute all the GPU threads. The instruction pointer at the ISA level can be correlated with the HSAIL line. This project is the result of much hard work from the hardware and software teams within AMD over the past several years requiring much innovation in the hardware, firmware, kernel mode driver, user mode driver, runtime, compiler and the tools domain. […]
I know nothing about firmware for GPUs, a lot to learn on this topic… 😦
Joshua Barczak has a new blog post on reversing an Intel Haswell GPU, and has sample code on Github.
[…] Why Did I Go To All This Trouble? There are a couple of reasons why this is more than just a pointless exercise in reverse engineering. This hardware contains a lot of goodies that the graphics APIs simply do not expose, and I’m wondering if I can exploit any of them to demonstrate cool stuff. I obviously can’t do anything in a shipping product, but perhaps we’ll find ways of using the hardware that aren’t anticipated by current APIs and programming models. There is a lot of unexposed functionality in this architecture. You can send messages back and forth between threads. Threads can spawn child threads and read/write their register files using message passing. I dont know whether all the right bits are set by the GL driver to enable this, but if it works, it might be fun to experiment with. You can mix SIMD modes. Use SIMD4 instructions in one cycle and SIMD8/SIMD16 in another. You can use swizzles or register movs to do all sorts of cross-lane communication at very little cost. You can do warp-level programming, where each thread is 1 16-wide instruction stream instead of 16 1-wide streams. You can switch modes multiple times within the same thread if you like. As some of my Intel friends like to point out, on this hardware, every thread has 4KB of register space. The register file in total is roughly the size of the cache. There’s no “use fewer registers to boost occupancy”, the occupancy is just there. There is also GPR indexing, and unlike on other architectures it is actually potentially useful. Hardware threads can implement small fixed-size data structures entirely in their register files without it totally sucking. […]