new Windows PDB tool: pdb_type_theft.py

As pointed out by ZDI, Dustin Childs of HP Security Research (HPSR) wrote an article on Windows binaries and symbols, and how some symbolic information is missing from current binaries, and how he wrote a new tool — pdb_type_theft.py — to extract the missing information from old binaries.

In August of this year, Microsoft published an update to NTDLL and along with it, released updated symbols for debugging. These symbols are available as PDBs (program databases). Unfortunately, the symbols that were released contain type information that is missing standard structures and enumerations. As a result, debugging applications on Windows became a far more involved task. Microsoft is aware of the issue but has yet to release updated PDBs that rectify this issue. While they are working on it, I found myself wondering if I could avoid their involvement altogether. Barring any changes to the structures and enumerations, the information from previous versions of the PDBs should still be valid. As such, if I could copy the type information from a previous PDB and inject it into the current PDB, I’d theoretically be able to have everything I expect from a working build process. […] This script requires having a PDB with the type information you want available to copy into another PDB.  If you are not in the habit of snapshotting your VMs after every update, the following links may be helpful […]

Full article and source:
http://community.hpe.com/t5/Security-Research/PDB-Type-Theft/ba-p/6801065
https://github.com/thezdi/scripts/blob/master/pdb_type_theft.py

(If you’ve read a few blog entries, you know that I misspell things a lot. Sorry. The other day, Microsoft finally made the PDB spec public, and I blogged on it, calling it “PDF”. Sigh.)

Microsoft publishes PDF file format

Microsoft executables have a format for symbols that has not been publicly documented, it was kept close to the C/C++ compiler team.

https://github.com/Microsoft/microsoft-pdb

Microsoft symbols can be included inside the image, or a separate “sidecar” file. This spec will help tool developers understand the symbols of the code. Microsoft does not ship symbols to all of it’s code, much are stripped before shipping them. Once LLVM clang or GCC supports proper Windows symbols, those compilers can finally become “first-class citizens” on the Windows platform, where the Windows system debugger will recognize their symbols, and the outdated C89-centric Microsoft C will not longer be needed to do Windows development!  It also means reverse-engeering tools now have the potential to find more information about Windows apps/drivers, if they haven’t already reversed the format earlier.

UEFI uses the Microsoft executable tools, up until the last second, when PE images are converted to TE images, Terse Executables are a slight variation to PE images, more suited for firmware. I am not sure how this new symbol spec will impact UEFI, if at all.