Big Sur is our newest Open Rack-compatible hardware designed for AI computing at a large scale. In collaboration with partners, we’ve built Big Sur to incorporate eight high-performance GPUs of up to 300 watts each, with the flexibility to configure between multiple PCI-e topologies. Leveraging NVIDIA’s Tesla Accelerated Computing Platform, Big Sur is twice as fast as our previous generation, which means we can train twice as fast and explore networks twice as large. And distributing training across eight GPUs allows us to scale the size and speed of our networks by another factor of two.
In addition to the improved performance, Big Sur is far more versatile and efficient than the off-the-shelf solutions in our previous generation. While many high-performance computing systems require special cooling and other unique infrastructure to operate, we have optimized these new servers for thermal and power efficiency, allowing us to operate them even in our own free-air cooled, Open Compute standard data centers. Big Sur was built with the NVIDIA Tesla M40 in mind but is qualified to support a wide range of PCI-e cards. We also anticipate this will achieve efficiencies in production and manufacturing, meaning we’ll get a lot more computational power per dollar we invest.
Servers can also require maintenance and hefty operational resources, so, like the other hardware in our data centers, Big Sur was designed around operational efficiency and serviceability. We’ve removed the components that don’t get used very much, and components that fail relatively frequently — such as hard drives and DIMMs — can now be removed and replaced in a few seconds. Touch points for technicians are all Pantone 375 C green, the same touch-point color as all of Facebook’s custom data center hardware, which allows technicians to intuitively identify, access and remove parts. No special training or service guide is really needed. Even the motherboard can be removed within a minute, whereas on the original AI hardware platform it would take over an hour. In fact, Big Sur is almost entirely toolless — the CPU heat sinks are the only things you need a screwdriver for.
Collaboration through open source
We plan to open-source Big Sur and will submit the design materials to the Open Compute Project (OCP). Facebook has a culture of support for open source software and hardware, and FAIR has continued that commitment by open-sourcing our code and publishing our discoveries as academic papers freely available from open-access sites. We’re very excited to add hardware designed for AI research and production to our list of contributions to the community.
We want to make it a lot easier for AI researchers to share techniques and technologies. As with all hardware systems that are released into the open, it’s our hope that others will be able to work with us to improve it. We believe that this open collaboration helps foster innovation for future designs, putting us all one step closer to building complex AI systems that bring this kind of innovation to our users and, ultimately, help us build a more open and connected world.