By: George Clement
Whether deployed on premises, in the cloud, or at the edge, high performance computing (HPC) solutions are now transforming a wide range of industries, including financial services, healthcare, manufacturing, oil and gas, and research and educational institutions.
Often integrating artificial intelligence (AI) and machine learning technologies, HPC is able to process data and perform complex calculations at speeds up to quadrillions per second. Today, the data science applications, large-scale analytics, and recommendation engines driven by HPC are helping to detect credit card fraud, make faster and more accurate patient diagnoses, and assist scientists to find sources of renewable energy.
Indeed, HPC is more vital than ever as both public and private organizations seek faster paths to solving their toughest technical and research challenges. According to Grand View Research, the global HPC market is expected to increase at a compound annual growth rate (CAGR) of 6.5 percent from 2020 to 2027 to reach $53.6 billion, up from $39.1 billion in 2019. There was a time when HPC systems were primarily used by the aerospace and navigation industries. However, the growing adoption of cloud computing, continuous developments in AI, and the rising need for business analytics have prompted a broad expanse of industries to adopt HPC systems.
While HPC is undoubtedly a technology that is powering game-changing projects, in the data center, HPC environments present several formidable challenges. Chief among these are power requirements, which can mean high energy costs. HPC also requires denser banks of computer resources to increase capacity and reduce latency while minimizing floorspace. An HPC cluster can comprise hundreds or even thousands of compute servers, and in order to avert unplanned downtime, special consideration must be given to future-proofing as it relates to power availability.
Running a high-power-density HPC deployment also generates significant heat. In fact, due to the size of the compute workloads and density, servers in an HPC environment can run approximately 30 percent hotter than traditional computing platforms. Especially in an older data center where the cooling systems were designed to accommodate significantly lower power densities, HPC will expose ineffective or insufficient cooling capacity.
While large cloud and hyperscale data centers utilize ultra-efficient cooling systems, at legacy facilities, cooling alone can account for 30 to 40 percent of the power costs of the data center.
To offset this liability, the common response is to overcool a facility. However, this leads to wasted electrical energy and an expansion of a data center’s carbon footprint.
Now that we understand some of the challenges that HPC presents to the data center, let’s take a look at how one world-renowned research institute was able to gain real-time power, thermal, and utilization analysis in its high-performance computing environment, thereby improving server utilization and uptime.
The Institute for Health Metrics and Evaluation (IHME) is an independent global health research center at the University of Washington. The IHME provides rigorous measurement of the world's most pressing health problems and evaluates the strategies used to address them. IHME makes this information freely available so that policymakers have the evidence they need to make informed decisions about how to allocate resources to best improve population health.
The IHME IT staff deployed Intel® Data Center Manager (Intel® DCM) to monitor more than 600 servers in its HPC data center environment at the university’s colocation facility. Intel® DCM is a software solution that collects and analyzes the real-time health, power, and thermals of a variety of devices in data centers, providing the clarity needed to improve data center reliability and efficiency.