GPU-pass-through-compatibility-check: Automatically set up a Linux system for PCI pass-through and check if it is compatible

This project consists of 3 parts.
1) A script ( that automatically checks to what extend a computer is compatible with GPU pass-through in its given configuration.
2) A script ( that automatically installs and configures your system for GPU pass-through (Only tested on fresh installs of Fedora 28 x64 with Gnome, booted in UEFI mode!)
3) Instructions on how to create a bootable Linux USB stick that automatically runs the script when you boot from it without any user interaction required.

example output

Rendered Insecure: GPU Side Channel Attacks are Practical

Graphics Processing Units (GPUs) are commonly integrated with computing devices to enhance the performance and capabilities of graphical workloads. In addition, they are increasingly being integrated in data centers and clouds such that they can be used to accelerate data intensive workloads. Under a number of scenarios the GPU can be shared between multiple applications at a fine granularity allowing a spy application to monitor side channels and attempt to infer the behavior of the victim. For example, OpenGL and WebGL send workloads to the GPU at the granularity of a frame, allowing an attacker to interleave the use of the GPU to measure the side-effects of the victim computation through performance counters or other resource tracking APIs. We demonstrate the vulnerability using two applications. First, we show that an OpenGL based spy can fingerprint websites accurately, track user activities within the website, and even infer the keystroke timings for a password text box with high accuracy. The second application demonstrates how a CUDA spy application can derive the internal parameters of a neural network model being used by another CUDA application, illustrating these threats on the cloud. To counter these attacks, the paper suggests mitigations based on limiting the rate of the calls, or limiting the granularity of the returned information.

Click to access ccs18_gpu_side_channel.pdf

GPUTop: a GPU profiling tool

Intel posted info about a new blog post using GPUTop with Caledon (Intel-flavored Android):

We are excited to bring out a new tutorial for profiling gpu on Android. Gputop exposes many GPU parameters module wise such as frequency, busyness, threads, EU activeness etc. These are very helpful in identifying performance bottlenecks as well as impact of performance improvements on the GPU either through graphics software stack or through the graphics application. If you are learning/ new to gpu, this should attract you even more. Please take a look, try out and feel free to share your feedback.

GPU Top is a tool to help developers understand GPU performance counters and provide graphical and machine readable data for the performance analysis of drivers and applications. GPU Top is compatible with all GPU programming apis such as OpenGL, OpenCL or Vulkan since it primarily deals with capturing periodic sampled metrics. GPU Top so far includes a web based interactive UI as well as a non-interactive CSV logging tool suited to being integrated into continuous regression testing systems. Both of these tools can capture metrics from a remote system so as to try an minimize their impact on the system being profiled. GPUs supported so far include: Haswell, Broadwell, Cherryview, Skylake, Broxton, Apollo Lake, Kabylake, Cannonlake and Coffeelake.

A Survey of Techniques for Improving Security of GPUs

Graphics processing unit (GPU), although a powerful performance-booster, also has many security vulnerabilities. Due to these, the GPU can act as a safe-haven for stealthy malware and the weakest `link’ in the security `chain’. In this paper, we present a survey of techniques for analyzing and improving GPU security. We classify the works on key attributes to highlight their similarities and differences. More than informing users and researchers about GPU security techniques, this survey aims to increase their awareness about GPU security vulnerabilities and potential countermeasures.


Intel Security Essentials: A Built-in Foundation with Security at the Core

Intel Threat Detection Technology (TDT) announced at RSA. Includes GPU-powered antivirus code.

Intel Security Essentials


AMD Vega Pro GPU contains a Security Processor





A Study of Overflow Vulnerabilities on GPUs

A Study of Overflow Vulnerabilities on GPUs
Bang Di, Jianhua Sun, Hao Chen

GPU-accelerated computing gains rapidly-growing popularity in many areas such as scientific computing, database systems, and cloud environments. However, there are less investigations on the security implications of concurrently running GPU applications. In this paper, we explore security vulnerabilities of CUDA from multiple dimensions. In particular, we first present a study on GPU stack, and reveal that stack overflow of CUDA can affect the execution of other threads by manipulating different memory spaces. Then, we show that the heap of CUDA is organized in a way that allows threads from the same warp or different blocks or even kernels to overwrite each other’s content, which indicates a high risk of corrupting data or steering the execution flow by overwriting function pointers. Furthermore, we verify that integer overflow and function pointer overflow in struct also can be exploited on GPUs. But other attacks against format string and exception handler seems not feasible due to the design choices of CUDA runtime and programming language features. Finally, we propose potential solutions of preventing the presented vulnerabilities for CUDA.

Click to access npc16-overflow.pdf

Google on fuzzing PCIe

Fuzzing PCI express: security in plaintext
By Julia Hansbrough, Software Engineer

Google recently launched GPUs on Google Cloud Platform (GCP), which will allow customers to leverage this hardware for highly parallel workloads. These GPUs are connected to our cloud machines via a variety of PCIe switches, and that required us to have a deep understanding of PCIe security. Securing PCIe devices requires overcoming some inherent challenges. For instance, GPUs have become far more complex in the past few decades, opening up new avenues for attack. Since GPUs are designed to directly access system memory, and since hardware has historically been considered trusted, it’s difficult to ensure all the settings to keep it contained are set accurately, and difficult to ensure whether such settings even work. And since GPU manufacturers don’t make the source code or binaries available for the GPU’s main processes, we can’t examine those to gain more confidence. You can read more about the challenges presented by the PCI and PCIe specs here. With the risk of malicious behavior from compromised PCIe devices, Google needed to have a plan for combating these types of attacks, especially in a world of cloud services and publicly available virtual machines. Our approach has been to focus on mitigation: ensuring that compromised PCIe devices can’t jeopardize the security of the rest of the computer. Fuzzing to the rescue[…]

HAXWell: loads custom ISA on Intel Haswell GPUs

Code demonstrating how to load custom ISA on Intel Haswell GPUs via OpenGL. Also includes various ISA utilities and benchmarks. This code works on Windows 8.1. […] For more information, see my related blog posts:
GPU Ray-Tracing The Wrong Way:
SPMD Is Not Intel’s Cup of Tea:
You Compiled This Driver, Trust Me:

GPU security analysis from POSTECH

Stealing Webpages Rendered on Your Browser by Exploiting GPU Vulnerabilities

Graphics processing units (GPUs) are important components of modern computing devices for not only graphics rendering, but also efficient parallel computations. However, their security problems are ignored despite their importance and popularity. In this paper, we first perform an in-depth security analysis on GPUs to detect security vulnerabilities. We observe that contemporary, widely-used GPUs, both NVIDIA’s and AMD’s, do not initialize newly allocated GPU memory pages which may contain sensitive user data. By exploiting such vulnerabilities, we propose attack methods for revealing a victim program’s data kept in GPU memory both during its execution and right after its termination. We further show the high applicability of the proposed attacks by applying them to the Chromium and Firefox web browsers which use GPUs for accelerating webpage rendering. We detect that both browsers leave rendered webpage textures in GPU memory, so that we can infer which webpages a victim user has visited by analyzing the remaining textures. The accuracy of our advanced inference attack that uses both pixel sequence matching and RGB histogram matching is up to 95.4%.

Click to access StealingWebpagesRenderedonYourBrowserbyExploitingGPUVulnerabilities.pdf

tool: GOPupd

As I’ve mentioned, I don’t know much about the firmware modding community. It is amazing the things they can do to a system, in ways completely unrelated to security. 🙂 But other readers of this blog are accomplished firmware modders, and one of the smarter ones have suggested a new tool to mention on the blog:  GOPupd.

UBU has already been mentioned in previous blog posts, it was done by the forum members, as is this other tool. GOPupd is also hosted on the forum by one if it’s members, LordKAG, who you may have noticed as one of source of many of UEFItools bug reports.

GOPupd is a tool that updates a GOP portion of VideoBIOS dumped from various AMD/ATI and Nvidia graphic cards.  Advanced users can use the tool to not only dump, but also can insert a GOP into a VBIOS without it, basically making an older GPU compatible to pure UEFI (non-CSM) mode. That sounds like a risky operation, but it appears that many readers of this blog are smarter than the writer of this blog, so I presume a few of you would be able to handle this, I’m not sure I would. 🙂 The tool is written in Python. You have to register to the forum to get access to their download URLs.

[…] If you are interested in this thread, then you should know a thing or two about GOP. If you need/want pure UEFI Boot (CSM disabled) or Fast Boot, then you need a GOP for your GPU/iGPU, otherwise it is optional (for now). For the iGPU side there is not much you can do, because manufacturers have included them in the UEFI firmware, with GOP drivers from Intel, AMD, ASPEED, Nvidia (recently) and even Matrox. This thread only deals with external cards and only with AMD and Nvidia. This is further limited by the fact that only specific generations have GOP support: for AMD there is a list of IDs in each GOP version, but it is safe to assume that every card after 7xxx generation should work, maybe even 6xxx; for Nvidia there are 6 generations supported – GT21x, GF10x, GF119, GK1xx/GK2xx, GM1xx, GM2xx. […]

AMD announces HSAIL GDB and GPU Debug SDK

Budi Purnomo of AMD posted a message on the site, about AMD’s GPUOpen initiative, including HSAIL GDB and a related AMD GPU Debug SDK for it. These both sound very interesting, thanks AMD!

Today as part of AMD’s GPUOpen initiative, we are happy to announce the release of HSAIL GDB version 1.0 (prebuilt binary and source code).  This is AMD’s first debugger product that is built based on the new GCN debugger core technology. HSAIL GDB marks our first step toward building a rich debugging ecosystem for HSA and HCC.  Using HSAIL GDB, you can debug the execution of host CPU code and GPU agent kernel code (at the HSAIL language level) in a single debug session. HSA applications, including HCC and HIP, are supported on AMD APU platforms.  To learn more about the capability of HSAIL GDB, I encourage you to read through the HSAIL GDB tutorial. In addition, we also released a new AMD GPU Debug SDK (pre-built binary and source code).  This AMD GPU Debug SDK enables tool vendors to build their own rich GPU debugging ecosystem for AMD platforms based on the same GCN debugger core technology introduced within HSAIL GDB. The hardware based implementation provided in the GCN debugger core technology is a vast improvement over the previous debugger implementation provided in the AMD CodeXL OpenCL™ debugger which relies on repeated kernel recompilation and replay.  Using the GCN debugger technology, we are able to stop all the massively parallel threads in the GPU at a breakpoint, inspect the machine state by reading registers and memory, and then resume and execute all the GPU threads.  The instruction pointer at the ISA level can be correlated with the HSAIL line.  This project is the result of much hard work from the hardware and software teams within AMD over the past several years requiring much innovation in the hardware, firmware, kernel mode driver, user mode driver, runtime, compiler and the tools domain. […]

Full announcement:


I know nothing about firmware for GPUs, a lot to learn on this topic… 😦


Intel Haswell GPU reversing

Joshua Barczak has a new blog post on reversing an Intel Haswell GPU, and has sample code on Github.

[…] Why Did I Go To All This Trouble? There are a couple of reasons why this is more than just a pointless exercise in reverse engineering. This hardware contains a lot of goodies that the graphics APIs simply do not expose, and I’m wondering if I can exploit any of them to demonstrate cool stuff. I obviously can’t do anything in a shipping product, but perhaps we’ll find ways of using the hardware that aren’t anticipated by current APIs and programming models. There is a lot of unexposed functionality in this architecture. You can send messages back and forth between threads. Threads can spawn child threads and read/write their register files using message passing. I dont know whether all the right bits are set by the GL driver to enable this, but if it works, it might be fun to experiment with. You can mix SIMD modes. Use SIMD4 instructions in one cycle and SIMD8/SIMD16 in another. You can use swizzles or register movs to do all sorts of cross-lane communication at very little cost. You can do warp-level programming, where each thread is 1 16-wide instruction stream instead of 16 1-wide streams. You can switch modes multiple times within the same thread if you like. As some of my Intel friends like to point out, on this hardware, every thread has 4KB of register space. The register file in total is roughly the size of the cache. There’s no “use fewer registers to boost occupancy”, the occupancy is just there. There is also GPR indexing, and unlike on other architectures it is actually potentially useful. Hardware threads can implement small fixed-size data structures entirely in their register files without it totally sucking. […]

Full post:!107&app=PowerPoint&authkey=!AD-O3oq3Ung7pzk

GPU security

Someone just pointed out that Mozilla Firefox treats GPUs differently when it uses WebGL:

This got me thinking how little I know about GPU security. 😦

There’s a bit on GPU security in the August Intel security report:

Click to access rp-quarterly-threats-aug-2015.pdf

I noticed the tool Cryptohaze, but it appears to have not been updated since 2013 or so:

“Cryptohaze is the home of high performance, open source, network-enabled, US-based cross-platform GPU and OpenCL accelerated password auditing tools for security professionals. The tools run on all platforms that support CUDA or OpenCL (currently Windows, Linux, OS X). If you don’t have a GPU – the OpenCL code will run just fine on your host CPU!”

If you know of other useful GPU security tools, please speak up!

MIAOW and Raven3 at HotChips

HotChips ended this week. As mentioned in the last post on this event:

not only is the Open Source ISA RISC-V there, but so was an Open Hardware GPU, MIAOW (Many-core Integrated Accelerator of Wisconsin):

Rick Merritt of EE Times has written a new articles on both the RISC-V ISA and MIAOW GPU:

AMD releases clSPARSE library

Earlier this week, Kent Knox of AMD announced the beta release of a new library on their blog.

The clSPARSE library, created by AMD in partnership with Vratis Ltd., is an open source sparse linear algebra library that uses OpenCL(TM) to accelerate performance with GPU Compute. clSPARSE expands upon exiting the clMathLibraries offerings: dense clBLAS (Basic Linear Algebra Subprograms), clFFT (Fast Fourier Transform) and clRNG (random number generator), and adds new sparse operations:

* Sparse matrix – dense vector multiply (SpM-dV)
* Sparse matrix – dense matrix multiply
* Iterative conjugate gradient (CG) solver
* Iterative biconjugate gradient stabilized (BiCGStab) solver
* Dense to Compressed Sparse Row (CSR) conversions (and converse)
* Coordinate list (COO) to CSR conversions (and converse)
* Functions to read matrix market files

clSPARSE contains optimized kernels that compute on matrices represented in CSR (Compressed Sparse Row) format. The library provides conversion routines to and from the CSR compressed matrix format, and is the required sparse matrix format to use the SpM-dV multiply, CG or the BiCGStab solvers. clSPARSE exports a C interface which allows developers to build wrappers around clSPARSE in any language they need. This means users do not have to write sparse OpenCL kernels to gain the performance benefits of sparse GPU acceleration. OpenCL fluency is still required. The implementation is abstracted, allowing you to focus on memory placement and transport.

This new AMD open source library uses the ASFv2 (Apache Software Foundation) license, and uses the CMake build tool.

More Information: