AsianScientist (Jul. 19, 2019) – The Tianhe-1A supercomputer made history when it was unveiled in 2010 as the first Chinese supercomputer to headline the TOP500 list of the world’s fastest supercomputers. For the first time since the biannual list was published in 1993, a Chinese supercomputer was recognized as the most powerful in the world, displacing long-time leaders like the US and Japan.
In the decade since, continued Chinese dominance of the TOP500 list has proven that the feat was no flash in the pan but a milestone in China’s high performance computing (HPC) journey. Tianhe-1A’s achievement was repeated in 2013 by its successor—Tianhe-2—and once again in 2016 by Sunway TaihuLight, enabling China to claim bragging rights as owner of the world’s most powerful supercomputer for five years in a row.
While China’s rise might appear unusually rapid—the country did not have a system powerful enough to make it to the top ten till 2010 and only began appearing in the TOP500 at all in 2001—its current supercomputing prowess is actually the result of decades of work behind the scenes.
To understand how Chinese supercomputing has developed and where it is headed in the future, we spoke to Professor Lu Yutong, director of the National Supercomputing Center in Guangzhou, China, at the SupercomputingAsia conference held in Singapore from March 11-14, 2019.
From generation to generation
One of the few people to have extensive first-hand knowledge of the successive generations of Chinese supercomputers, Lu traces her academic lineage to National University of Defense Technology (NUDT), home to China’s first supercomputer—Yinhe-1—which was launched in 1983. As a computer science undergraduate, Lu worked on compiler software for Yinhe-2, developing a taste for writing code and going on to complete both her masters and doctorate degrees at the same institution.
“In a way, I’ve grown up with the Chinese domestic supercomputer,” Lu shared.
At NUDT, Lu focused on designing and implementing supercomputers, rising through the ranks to become the design director of the Tianhe-1A and Tianhe-2 systems, both of which held number-one positions on the TOP500 list. Apart from cementing China’s place in the international supercomputing scene, the Tianhe series also represented a significant shift in supercomputer architecture trends—from using ever more general purpose central processing units (CPUs) to a mix of CPUs and accelerators.
“The Tianhe-1 was the first heterogeneous large-scale supercomputer in the world, using a mix of CPUs and graphics processing units (GPUs),” Lu explained. “At that time, many scholars asked me why we chose the heterogeneous approach as it was relatively hard to use. We did indeed have a difficult time optimizing the applications on Tianhe-1 as the GPU part was not as mature as it is today; we had to work more on the software, libraries and data transfer optimization.”
“Supercomputer design, however, is a trade-off,” she continued, explaining how using a heterogeneous architecture allowed the Tianhe team to achieve performance while balancing equally important considerations such as power consumption, system cost and footprint.
Trends in the TOP500 have since validated their approach, Lu said, with heterogeneous systems now making up the majority of the most powerful systems on the list in the last ten years.
China’s remarkable progress on the Tianhe systems did not go unnoticed by the international community, and not all reactions were positive. Alarmed by the country’s newfound potential to use supercomputers for the design of nuclear weapons, in 2015 the US banned the sale of Intel chips to China. However, rather than stymying China’s advance, the move is largely considered to have backfired, accelerating the development of domestically-produced chips instead.
Just one year after the Intel ban, China’s Sunway TaihuLight stormed to the first place on the TOP500 list with a peak performance 125 petaFLOPS, more than double the previous record holder Tianhe-2’s 54 petaFLOPS. But more than its impressive performance, Sunway TaihuLight stood out for being a fully made-in-China or ‘self-controllable’ machine, featuring not only custom interconnects and memory but also the Shenwei SW26010 processor that was designed and fabricated domestically. The message was clear: China was determined to succeed at supercomputing, with or without the help of US chips.
That year, three out of six finalists for the Gordon Bell Prize—which honors the most outstanding HPC applications—were from China, and the highly coveted award was ultimately given to a team that ran a weather simulation over ten million cores of Sunway TaihuLight, definitively demonstrating that it was no mere ‘stunt machine’ but a fully functional, even prize-winning, one.
This application-centric focus has also guided Lu in her role as director of the National Supercomputing Center in Guangzhou. For her, while designing and deploying increasingly powerful systems remain important, developing a flexible and easy-to-use interface is a top priority.
“Many of our users are not computer science experts but domain-specific scientists so they may not be very familiar with HPC programming models such as message passing interface (MPI),” Lu said. “We have tried to design a platform that can be used by scientists and even industry users, providing some domain-specific custom applications to help them.”
China’s three-horse race to exascale
China’s reign at the peak of the TOP500 has recently been disrupted by the US, whose Summit and Sierra systems now hold first and second position respectively on the latest edition of the list. Together with the Perlmutter supercomputer, which is expected to be deployed at the Lawrence Livermore National Laboratory in 2020, Summit and Sierra are stepping stones towards the goal of exascale supercomputers, machines that can perform a billion billion (1018) calculations per second. To date, the US has set aside US$430 million for their Exascale Computing Project, and at least US$400 million for the Aurora exascale supercomputer, expected to come online in 2021.
China, too, has made considerable investments into developing exascale or e-class supercomputers, and is a serious contender in the race to exascale. Three teams—namely Sunway, Sugon and Tianhe—have successfully deployed pre-exascale prototypes, each with its own approach drawn from academia, industry and running a national supercomputing center, respectively. While the teams may vary in their architectural approaches, all three prototypes make use of self-controlled chips, underscoring the importance of selfreliance to the Chinese.
Lu leads the Tianhe team, which completed their prototype in July 2018. Installed at the National Supercomputing Center in Tianjin, the Tianhe prototype is a heterogeneous system that relies on an undisclosed processor paired with an improved version of the Matrix-2000 accelerator used in Tianhe-2A. The Sugon prototype is similarly heterogeneous, making use of a Hygon processor and accelerator, while the Sunway prototype features the SW26010 chips used in TaihuLight.
“Even in the US, Japan and Europe, people are still not sure which architecture will suit their priorities best, so we need different approaches,” Lu said. “For Tianhe, we will have more users, not only numerically but also in terms of the range of application areas. Our goal is to establish a general purpose exascale system rather than a specialized one, so we have a different technical approach from the other two teams.”
The strategy of investing in different architectures in parallel is designed to deepen and broaden Chinese expertise, with knowledge gleaned from all three approaches expected to contribute to the final exascale system to be deployed between 2020 and 2021. “Nothing will be wasted,” Lu added.
The future is intelligent
Regardless of which approach will be implemented internally in China or by the US, Japan or Europe, all signs suggest that key applications for future exascale systems are big data and artificial intelligence (AI). Supercomputers have traditionally been used for HPC applications such as climate modeling and scientific computation, and while these workloads will likely remain a mainstay of supercomputing centers around the world, there is increasing convergence between HPC and big data/AI applications.
Last year, Japan launched the AI Bridging Cloud Infrastructure (ABCI), a supercomputer purpose-built for AI and machine learning applications on the cloud. Similarly, the Aurora exascale supercomputer has been earmarked for AI projects in neuroscience and personalized medicine, among others. China, meanwhile, has identified deep learning applications such as tumor diagnosis and video analytics as key focus areas.
“Every country wants to get to exascale not just to be the first but because of the requirements of scientific research, technological innovation and industry; the world needs computing power. In other words, exascale is not the end goal but a means to an end,” Lu said.
“Over the past 40 years we have seen very clearly how supercomputing leads technology development for the whole IT industry,” she continued, citing the example of how China’s decades-long investment in supercomputing has paid innovation dividends for its domestic IT sector with success stories like Sugon, Inspur and Huawei.
Investment in supercomputing capabilities is particularly strategic for Asia, Lu added, noting that Asia has a high demand for HPC simply by virtue of its large population.
“I hope to see more collaboration between Asian countries, particularly more economically developed ones like Japan, Korea and Singapore; we need to work together to boost the HPC community in Asia.”
This article was first published in the print version of Supercomputing Asia, July 2019.
Click here to subscribe to Asian Scientist Magazine in print.
Copyright: Asian Scientist Magazine.
Disclaimer: This article does not necessarily reflect the views of AsianScientist or its staff.