close
close

NOAA Scores $100 Million for ‘Rhea’ Research Supercomputer

NOAA Scores 0 Million for ‘Rhea’ Research Supercomputer

Weather and climate simulation centers around the world are trying to find a way to combine traditional HPC simulation and modeling with various AI predictions to create more accurate and future-oriented forecasts for both near-term weather and long-term climate.

The National Oceanic and Atmospheric Administration in the United States is no exception, but it is unique in that it has been allocated $100 million from Congress and the Biden administration. Bipartisan Infrastructure Law And Inflation Reduction Act He is working to improve research supercomputers at his HPC center in Fairmont, West Virginia.

That seems like a lot of money for an HPC system, and it is, but that $100 million represents only a small fraction of the money Congress has allocated to weather and climate issues through these two bills.

The Bipartisan Infrastructure Act, signed by President Biden in November 2023, allocates $108 billion over five years to support transportation and related infrastructure, including weather and climate prediction and mitigation. NOAA is allocated three separate tranches of funding under the act: $904 million for climate and data services, $1.47 billion for natural infrastructure projects to rebuild U.S. coastlines, and $592 million to manage fisheries and other protected resources.

The Inflation Relief Act also earmarked $2.6 billion for rebuilding and strengthening coastal communities and $200 million for climate data and services.

NOAA has both production and research systems, which are funded separately. In recent funding rounds, General Dynamics Information Technology is the primary contractor. NOAA’s Weather and Climate Operational Supercomputer System (WCOSS) received its latest twin HPC systems, nicknamed “Cactus” and “Dogwood,” installed in June 2022, which drive national and regional weather forecasts for the National Weather Service, which in turn relays forecast information to AccuWeather, The Weather Channel, and numerous other weather services.

The new “Rhea” supercomputer to be installed at the NOAA Environmental Security Computing Center (NESCC) in Fairmont, operating under the auspices of the Research and Development HPC System (RDHPCS) office.

The Rhea system will be the successor to the “Hera” system, which was installed at the Fairmont data center in June 2020. Hera is a Cray EX system with a total of 63,840 cores and a peak FP64 performance of 3.27 petaflops, with 2 petaflops coming from GPU accelerators. The Hera system was also integrated by General Dynamics and was one of five research supercomputers operated by NOAA at its HPC centers in Boulder, Colorado; Princeton, New Jersey; Oak Ridge, Tennessee; and Mississippi State University in Starkville, Mississippi.

At the time, four years ago, the combined performance of those five research centers was 17 petaflops, and most of that capacity was driven by CPUs with a healthy dose of GPU nodes inside or alongside those clusters. Hera has a Lustre file system built by DataDirect Networks that has a capacity of 18.5PB.

A statement from NOAA Rhea, he says, will add about 8 petaflops of total computation (probably at FP64 precision) to the agency’s current capacity of about 35 petaflops across those five research facilities. We’ve learned from sources inside NOAA that another machine, nicknamed “Ursa,” is expected to be installed sometime this winter. We’ve also learned that depending on how NOAA handles the supply chain and the new data center being built in Fairmont (built by some ambitious New Jersey and Delaware folks in 1774 at the confluence of Prickett’s Creek and the Monongahela River, just a few miles down Interstate 19 from Prickett’s Fort State Park), it will be available to NOAA. Once Rhea and Ursa are installed, NOAA will have a total capacity of about 48 petaflops.

The Fairmont data center was built in 2010, and as part of the $100 million allocation, GDIT is subcontracting to build a modular data center at NESCC. This new data center site at NESCC will be designed to accommodate additional modular data centers, and we assume that the machines will be interconnected as needed. iM Data Centers is GDIT’s subcontractor for this portion of the project.

NOAA has been vague about the performance of the Rhea machine, and for good reason. The machine will be built next year and hopefully operational by fall 2025. But as we know, supply chains are still a bit wonky, and parts can change or be in short supply, delaying projects. The folks at NESCC don’t want to spoil anything or make too many promises.

What we can tell you is that Rhea will be based on servers from Hewlett Packard Enterprise — likely Cray CS chassis and racks like the current Hera system, but NOAA isn’t saying that — and the primary compute nodes will be based on two-socket servers using 96-core versions of AMD’s upcoming “Turin” Epyc 9005 processors. Those Rhea nodes will have 768GB of memory, which is pretty good for an HPC system, and will be interconnected via a 200Gbps NDR Quantum-2 InfiniBand fabric from Nvidia. That network will be in a fat tree topology with full two-partition bandwidth.

A subset of Rhea nodes will be equipped with Nvidia “Hopper” H100 GPUs with 96GB of main memory. NOAA is not specific about how the flops will play out with Rhea. However, with Hera, the CPU to GPU compute ratio was 1.27 to 2, and if that ratio holds, it will be 3.1 petaflops of Turin CPU compute and 4.9 teraflops of H100 GPU compute at FP64 precision.

For storage for Rhea, NOAA is tapping DDN for a Lustre parallel file system, as it did for the Hera system. Like many of today’s HPC systems, NOAA will be test-driving a relatively smaller all-flash NFS array from Vast Data.

It’s not clear exactly how the $100 million given to NOAA under the two laws mentioned above was allocated. We assume this funding includes the cost of the Rhea system, the modular data center that will house it, and the construction of the data center site for the several modular data centers that will eventually be built there. It also likely includes electricity and cooling for a certain number of years, and may also include the cost of the Ursa system. We think $100 million could buy a decent amount of flops in CPUs and GPUs, and we can only hope that NOAA can do the hard work of integrating AI into weather and climate models to make these predictions better in terms of resolution and reach into the future.

That’s what supercomputers are for, and we think NOAA needs to get a lot more money to build bigger clusters to push the boundaries of weather and climate. But thankfully, in an uncertain economic and political world, this is a good start.

Subscribe to our newsletter

The week’s highlights, analysis and stories delivered straight to your inbox, with nothing in between.
Subscribe now