Joshua Barczak has a new blog post on reversing an Intel Haswell GPU, and has sample code on Github.
[…] Why Did I Go To All This Trouble? There are a couple of reasons why this is more than just a pointless exercise in reverse engineering. This hardware contains a lot of goodies that the graphics APIs simply do not expose, and I’m wondering if I can exploit any of them to demonstrate cool stuff. I obviously can’t do anything in a shipping product, but perhaps we’ll find ways of using the hardware that aren’t anticipated by current APIs and programming models. There is a lot of unexposed functionality in this architecture. You can send messages back and forth between threads. Threads can spawn child threads and read/write their register files using message passing. I dont know whether all the right bits are set by the GL driver to enable this, but if it works, it might be fun to experiment with. You can mix SIMD modes. Use SIMD4 instructions in one cycle and SIMD8/SIMD16 in another. You can use swizzles or register movs to do all sorts of cross-lane communication at very little cost. You can do warp-level programming, where each thread is 1 16-wide instruction stream instead of 16 1-wide streams. You can switch modes multiple times within the same thread if you like. As some of my Intel friends like to point out, on this hardware, every thread has 4KB of register space. The register file in total is roughly the size of the cache. There’s no “use fewer registers to boost occupancy”, the occupancy is just there. There is also GPR indexing, and unlike on other architectures it is actually potentially useful. Hardware threads can implement small fixed-size data structures entirely in their register files without it totally sucking. […]