AsianScientist (Jan. 25, 2018) – In the relatively staid world of supercomputing, there is nothing quite like the biannual announcement of the TOP500 list, the closest thing the community has to its own version of the Olympics. In June and November every year, the ranking of the world’s top supercomputers is released to an audience of thousands, including scores of journalists who instantly begin tweeting the results to those unlucky enough to miss out on the event.
For the last 25 years, the closely watched TOP500 list has kept the world informed about the latest developments in the fast-moving field of high performance computing (HPC). It might come as a surprise, then, to find out that for Professor Jack Dongarra benchmarking is but a hobby.
“The whole benchmarking thing came about almost as an accident,” Dongarra told Supercomputing Asia. “When I’m talking to somebody outside of my field and they ask what hobbies I have, I would say, ‘I benchmark supercomputers’.”
Unpacking the LINPACK story
Computers on the TOP500 list are ranked according to how quickly they can solve a set of problems in the LINPACK software library, which, as the name suggests, is a package of linear algebra problems. LINPACK was developed over the course of three intense summers, where Dongarra, Cleve Moler, Pete Stewart and James Bunch met at the Argonne National Laboratory to thrash out what would eventually be included. In 1979, the group finally released LINPACK to the world, along with a manual to show people how to use the software.
Tucked into the back of that manual, in what first began as a series of hand-written scribbles, was an appendix that became the very first LINPACK benchmark.
“It was a table of performance numbers for one of the routines in LINPACK that I had the opportunity to run on maybe close to 20 different computers,” said Dongarra, who is currently the director of the Innovative Computing Laboratory at the University of Tennessee in the US. “The table grew as I maintained it over the years, adding to it whenever somebody came to Argonne and wanted to sell us a computer.”
By the 1990s, Dongarra’s list had hundreds of entries, allowing him to rank the different computers based on how they had performed on the LINPACK benchmark. Hans Meuer, a computer scientist at the University of Mannheim, Germany, maintained a similar list, ranking computers by their peak performance. In 1992, Meuer approached Dongarra suggesting that they should combine their lists, and thus the first TOP500 list was born.
From benchmark to bookends
In the quarter century since then, supercomputers have vastly improved. The most powerful supercomputer on the first TOP500 list, the Fujitsu Numerical Wind Tunnel at Japan’s National Aerospace Laboratory, topped the list with a peak performance of 124 gigaFLOPS. In comparison, the most powerful computer on the November 2017 list was the Sunway TaihuLight, which at 93 petaFLOPS is almost a million times faster.
Despite these dramatic improvements, the LINPACK benchmark is still used to compare machines on the TOP500, and does not look set to be replaced any time soon. Some of the reasons the LINPACK benchmark has stood the test of time is that it is easy to run and understandable to a broad community, Dongarra said.
“But most importantly, it provides a historic reference point, giving us a good snapshot of supercomputing over the last 25 years. It allows us to look at trends and what the impact of various different architectures has been,” he continued.
Nonetheless, LINPACK has its limitations. When it was designed nearly 40 years ago, floating-point arithmetic was very computationally expensive, making it the most important thing to optimize.
“Today, our computers are over-provisioned for floating-point calculations; the more important thing now is data movement,” Dongarra said.
To do numerical operations on data, the data first has to be moved through the memory hierarchy of the computer from the main memory, through to different levels of cache and then finally to the place where the arithmetic will be performed, the register. While modern processors can perform numerical operations very quickly—completing roughly 32 floating-point operations every cycle—it could take several hundred clock cycles to move the data from the memory to the register.
So instead of simply measuring how quickly computers can perform floating-point operations, a more accurate reflection of how a supercomputer handles real-world problems would be one that captures its ability to handle memory as well. The high performance conjugate gradient (HPCG) benchmark is one such measure, and since the November 2017 edition, has been featured on the TOP500 list alongside the LINPACK benchmark.
“We think of the LINPACK and HPCG as bookends, where LINPACK gives you a number that is very close to the peak performance of a system while the HPCG reflects the lower end,” Dongarra explained. “Your own application—which is the best benchmark of performance—will fit somewhere between those two points, most likely towards the bottom.”
Tracing the trends
Although the LINPACK benchmark no longer reflects how supercomputers are used today, Dongarra nonetheless feels that it provides a valuable perspective.
“We don’t want to lose that historic information, but we want to augment it with other kinds of benchmarks such as the HPCG benchmark.”
By allowing us to compare different computer architectures over the decades, for example, the LINPACK benchmark reveals some interesting trends. In the early 90s, the list was dominated by vector-based machines which produced very high performance but were also very expensive. As cheaper, mass-produced or ‘commodity’ processors became more powerful, they gradually displaced vector computers and ushered in the era of parallel and distributed computing.
“Today, we see diverse architectures for HPC emerging. One is to use more and more commodity processors; the second is to offload the floating-point computation to an accelerator such as a graphics processing unit; and the third is to use what I call lightweight cores,” Dongarra said.
As important as these different hardware architectures are, Dongarra stresses that the software ecosystem for scientific simulation and computational modeling must be developed in tandem with the investments in building machines.
“At the application level, the science has to be captured in mathematical models, which in turn are expressed algorithmically and ultimately encoded as software,” he said. “This process also relies on a large infrastructure of mathematical libraries, protocols and system software that takes years to build up and must be maintained, ported and enhanced.”
The scientific problems that supercomputers were designed to solve require close collaboration between domain-specific scientists, computer scientists and applied mathematicians, Dongarra continued.
“To be able to run scientific applications on petascale systems, with tens of thousands of processors and extract all the performance that these platforms can deliver, demands that all parties involved work together to develop the necessary software.”
Scaling software to the next level
This brings us away from his hobby and back to Dongarra’s ‘day job,’ where he is a principal investigator for three of the 35 software development projects funded by the US Department of Energy’s Exascale Computing Project.
The first project, called Software for Linear Algebra Targeting at Exascale (SLATE), is a next-generation linear algebra program that Dongarra and his team are developing to run efficiently on exascale systems, supercomputers that are at least ten times faster than China’s Sunway TaihuLight.
Parallel Runtime Scheduling and Execution Control (PaRSEC), on the other hand, is a program that will allocate tasks from SLATE to the hardware components available on a supercomputer, a job that is complicated because of the sheer number of ways to prioritize and execute the many tasks.
Last but not least is the Exascale Performance Application Programming Interface (EXA-PAPI), an interface that tracks diagnostic information such as the use of memory bandwidth to help users understand how well their software is performing on the hardware.
“These three projects tackle some of the most challenging—and interesting—problems standing in the way of exascale computing,” Dongarra said.
“Software routinely outlasts—by years, and sometimes even decades—the hardware that it was originally designed to run on, as well as the individuals who designed and developed it. With so many problems to overcome, and the new ways of thinking that it has prompted, this is one of the most exciting times I have faced in my career,” he concluded.
This article was first published in the print version of Supercomputing Asia, January 2018. Click here to subscribe to Supercomputing Asia in print.
———
Copyright: Asian Scientist Magazine.
Disclaimer: This article does not necessarily reflect the views of AsianScientist or its staff.