Open-sourcing Sonar, a new extensible debugging tool
One challenge that comes from having many engineers working collaboratively on larger apps is that typically no single person knows how every module works. This segmentation of knowledge and expertise can make it difficult to develop new features, investigate bugs, or optimize performance. To help engineers at Facebook manage this complexity, we built Sonar, an extensible cross-platform debugging tool. Sonar gives us a surface where framework experts and developers can convey important information to framework users. Now we are adding functionality and sharing Sonar as an open source project to help others accelerate the app development process. With Sonar, engineers have a highly flexible, intuitive way to inspect and understand the structure and behavior of their iOS and Android applications. We believe Sonar improves on current tools by providing a more visual and interactive experience that is extensible to fit engineers’ specific needs.[…]
Manager, Firmware (Oculus)
As a Firmware Engineering Manager at Oculus you will lead, manage, and inspire engineering teams developing next-generation platforms for virtual reality. Firmware for VR systems spans multiple target classes, requires deep collaboration across engineering disciplines and the full software stack (from content to RTL), and directly impacts user immersion. You’ll guide architecture and delivery of highly performant and reliable firmware across multiple platforms and product lines. The ideal candidate will have deep embedded system technical knowledge along with a passion for building top teams who deliver great consumer products focused on incredible customer experiences.[…]
I only recently learned about Facebook’s osquery project. If you have not looked at it, it is fairly impressive.
Mike Arpaia and Ted Reed of Facebook have post on Facebook infrastructure, and they include firmware in their coverage of infrastructure testing:
In late 2014, we released osquery to the open source community. It’s now an increasingly important element of maintaining insight into the security of Facebook infrastructure. As such, it’s held to incredibly strict security standards to ensure we’re not introducing new vulnerabilities into our network. We also committed to a high standard of code quality when we open-sourced it because we want to build a community of trust with a secure software development ecosystem. In this same vein, we believe it’s important for people who use osquery to know what we do to keep it secure. […]
Ted Reed of Facebook — aka the Teddy Reed who creates UEFI Firmware Parser and related tools — posted a VERY GOOD article on how Facebook defends systems against hardware and firmware attacks, including coverage of Facebook’s osquery tool, and his recent Usenix Enigma presentation. Excerpt of introduction (with whitespace editing by me, sorry):
Hardware and Firmware Attacks: Defending, Detecting, and Responding
The attack landscape for firmware is maturing and needs more attention from defense and detection communities. Recent examples of firmware attacks include the Equation Group’s attacks on drive firmware, Hacking Team’s commercialized EFI RAT, Flame, and Duqu. Simple tools like osquery give defenders important insights about what’s happening on their network so they can quickly detect a potential compromise. Facebook released osquery as an open source project in 2014. Facebook recently added hardware monitoring to osquery, which already aids security teams in vulnerability management, incident response, OS X attacks, and IT compliance. Firmware on commodity laptops and servers is interesting to me as a security engineer for several reasons. This code often bootstraps trust protocols and protective architecture primitives. At the same time, it is a target for vulnerabilities aimed at bypassing those exact controls to unlock, jailbreak, and homebrew — for either good or malicious purposes. Firmware is also a vector for virtualization escapes, hypervisor attacks, and extreme persistence. That risk is magnified by the same fragmentation problem plaguing Android devices, but with an even more complex ecosystem of developers and supported devices. Recent examples of firmware attacks include the Equation Group’s attacks on drive firmware, Hacking Team’s commercialized EFI RAT, Flame, and Duqu. Trammell Hudson’s Thunderstrike-style local system takeover is fast and effective. Drew Suarez’s demonstrations of firmware flashing of Android devices take four seconds of a distracted local user’s attention. Additionally, Computrace has used a UEFI DXE driver capable of injecting a RAT onto unencrypted NTFS partitions for several years. All of this makes firmware security critical for protecting your enterprise. This week, I shared recent work on firmware security at the Enigma 2016 Conference, hosted by USENIX. Since releasing osquery to open source in 2014, I’ve been using it to explore new ways to recognize vulnerable systems and potential compromise. Defensive security professionals should begin scoping firmware components and use simple tools like osquery to gather insight and signal from their corporate network. […]
I’ve not used Facebook’s osquery before, so I have a lot of catching up to do. ;-(
Big Sur is our newest Open Rack-compatible hardware designed for AI computing at a large scale. In collaboration with partners, we’ve built Big Sur to incorporate eight high-performance GPUs of up to 300 watts each, with the flexibility to configure between multiple PCI-e topologies. Leveraging NVIDIA’s Tesla Accelerated Computing Platform, Big Sur is twice as fast as our previous generation, which means we can train twice as fast and explore networks twice as large. And distributing training across eight GPUs allows us to scale the size and speed of our networks by another factor of two.
In addition to the improved performance, Big Sur is far more versatile and efficient than the off-the-shelf solutions in our previous generation. While many high-performance computing systems require special cooling and other unique infrastructure to operate, we have optimized these new servers for thermal and power efficiency, allowing us to operate them even in our own free-air cooled, Open Compute standard data centers. Big Sur was built with the NVIDIA Tesla M40 in mind but is qualified to support a wide range of PCI-e cards. We also anticipate this will achieve efficiencies in production and manufacturing, meaning we’ll get a lot more computational power per dollar we invest.
Servers can also require maintenance and hefty operational resources, so, like the other hardware in our data centers, Big Sur was designed around operational efficiency and serviceability. We’ve removed the components that don’t get used very much, and components that fail relatively frequently — such as hard drives and DIMMs — can now be removed and replaced in a few seconds. Touch points for technicians are all Pantone 375 C green, the same touch-point color as all of Facebook’s custom data center hardware, which allows technicians to intuitively identify, access and remove parts. No special training or service guide is really needed. Even the motherboard can be removed within a minute, whereas on the original AI hardware platform it would take over an hour. In fact, Big Sur is almost entirely toolless — the CPU heat sinks are the only things you need a screwdriver for.
Collaboration through open source
We plan to open-source Big Sur and will submit the design materials to the Open Compute Project (OCP). Facebook has a culture of support for open source software and hardware, and FAIR has continued that commitment by open-sourcing our code and publishing our discoveries as academic papers freely available from open-access sites. We’re very excited to add hardware designed for AI research and production to our list of contributions to the community.
We want to make it a lot easier for AI researchers to share techniques and technologies. As with all hardware systems that are released into the open, it’s our hope that others will be able to work with us to improve it. We believe that this open collaboration helps foster innovation for future designs, putting us all one step closer to building complex AI systems that bring this kind of innovation to our users and, ultimately, help us build a more open and connected world.
I just learned about Facebook’s OpenBMC, thanks to Sai Dasari of Facebook, who just posted a message to the Open Compute Project’s hardware management list, talking about DMTF Redfish and Facebook’s OpenBMC.
OpenBMC is an open software framework to build a complete Linux image for a Board Management Controller (BMC).
When we were developing Facebook’s top-of-rack “Wedge” switch, we followed our usual process in the beginning; our partner was responsible for developing the BMC software. However, in the first months of the project, many requirements for the BMC software emerged, introducing extra complexity, coordination, and delays into the BMC software-development process. To address these challenges, at one of Facebook’s hackathon events, four engineers worked to create our own BMC software. Within 24 hours, we were able to build a minimum BMC software image, including an SSH server and the ability to change fan speed, power-on the host CPU, and blink some LEDs. It was far from a production image, but it gave us a strong confidence that we could eventually develop our own BMC software for “Wedge.” Fast-forward eight months, and we’ve deployed our solution — code-named “OpenBMC” — into production along with Wedge. And today we’re sharing OpenBMC with the open source community in the hope that we can collaborate based on this open software framework for next-generation system management.