TOP 10 Sites for June 2018

TOP 10 Sites for June 2018

Ali Moradi Alamdarloo No Comments

The TOP500 celebrates its 25th anniversary with a major shakeup at the top of the list. For the first time since November 2012, the US claims the most powerful supercomputer in the world, leading a significant turnover in which four of the five top systems were either new or substantially upgraded.

Summit, an IBM-built supercomputer now running at the Department of Energy’s (DOE) Oak Ridge National Laboratory (ORNL), captured the number one spot with a performance of 122.3 petaflops on High Performance Linpack (HPL), the benchmark used to rank the TOP500 list. Summit has 4,356 nodes, each one equipped with two 22-core Power9 CPUs, and six NVIDIA Tesla V100 GPUs. The nodes are linked together with a Mellanox dual-rail EDR InfiniBand network.

Sunway TaihuLight, a system developed by China’s National Research Center of Parallel Computer Engineering & Technology (NRCPC) and installed at the National Supercomputing Center in Wuxi, drops to number two after leading the list for the past two years. Its HPL mark of 93 petaflops has remained unchanged since it came online in June 2016.

Sierra, a new system at the DOE’s Lawrence Livermore National Laboratory took the number three spot, delivering 71.6 petaflops on HPL. Built by IBM, Sierra’s architecture is quite similar to that of Summit, with each of its 4,320 nodes powered by two Power9 CPUs plus four NVIDIA Tesla V100 GPUs and using the same Mellanox EDR InfiniBand as the system interconnect.

Tianhe-2A, also known as Milky Way-2A, moved down two notches into the number four spot, despite receiving a major upgrade that replaced its five-year-old Xeon Phi accelerators with custom-built Matrix-2000 coprocessors. The new hardware increased the system’s HPL performance from 33.9 petaflops to 61.4 petaflops, while bumping up its power consumption by less than four percent. Tianhe-2A was developed by China’s National University of Defense Technology (NUDT) and is installed at the National Supercomputer Center in Guangzhou, China.

The new AI Bridging Cloud Infrastructure (ABCI) is the fifth-ranked system on the list, with an HPL mark of 19.9 petaflops. The Fujitsu-built supercomputer is powered by 20-core Xeon Gold processors along with NVIDIA Tesla V100 GPUs. It’s installed in Japan at the National Institute of Advanced Industrial Science and Technology (AIST).

Piz Daint (19.6 petaflops), Titan (17.6 petaflops), Sequoia (17.2 petaflops), Trinity (14.1 petaflops), and Cori (14.0 petaflops) move down to the number six through 10 spots, respectively.

The US Again Has the World’s Most Powerful Supercomputer

Ali Moradi Alamdarloo No Comments

PLENTY OF PEOPLE around the world got new gadgets Friday, but one in Eastern Tennessee stands out. Summit, a new supercomputer unveiled at Oak Ridge National Lab is, unofficially for now, the most powerful calculating machine on the planet. It was designed in part to scale up the artificial intelligence techniques that power some of the recent tricks in your smartphone.

America hasn’t possessed the world’s most powerful supercomputer since June 2013, when a Chinese machine first claimed the title. Summit is expected to end that run when the official ranking of supercomputers, from an organization called Top500, is updated later this month.

Supercomputers have lost some of their allure in the era of cloud computing and humongous data centers. But many thorny computational problems require the giant machines. A US government report last year said the nation should invest more in supercomputing, to keep pace with China on defense projects such as nuclear weapons and hypersonic aircraft, and commercial innovations in aerospace, oil discovery, and pharmaceuticals.

Summit, built by IBM, occupies floor space equivalent to two tennis courts, and slurps 4,000 gallons of water a minute around a circulatory system to cool its 37,000 processors. Oak Ridge says its new baby can deliver a peak performance of 200 quadrillion calculations per second (that’s 200 followed by 15 zeros) using a standard measure used to rate supercomputers, or 200 petaflops. That’s about a million times faster than a typical laptop, and nearly twice the peak performance of China’s top-ranking Sunway TaihuLight.

DOE Opens RFP for Exascale; Up to $1.8 Billion for CORAL-2

Ali Moradi Alamdarloo No Comments


Ali Moradi Alamdarloo No Comments

Sierra, Livermore’s next advanced technology high performance computing system, will join LLNL’s lineup of supercomputers in 2017–2018. The new system will provide computational resources that are essential for nuclear weapon scientists to fulfill the National Nuclear Security Administration’s stockpile stewardship mission through simulation in lieu of underground testing. Advanced Simulation and Computing (ASC) Program scientists and engineers will use Sierra to assess the performance of nuclear weapon systems as well as nuclear weapon science and engineering calculations. These calculations are necessary to understand key issues of physics, the knowledge of which later makes its way into the integrated design codes. This work on Sierra has important implications for other global and national challenges such as nonproliferation and counterterrorism.

For Sierra’s installation, the CORAL partnership (Collaboration of Oak Ridge, Argonne, and Livermore) was formed to procure high performance computers from multiple vendors. IBM was selected as the vendor for LLNL. The IBM-built Sierra supercomputer is projected to provide four to six times the sustained performance and five to seven times the workload performance of Sequoia, with a 125 petaFLOP/s peak. At approximately 11 megawatts, Sierra will also be about five times more power efficient than Sequoia. By combining two types of processor chips—IBM’s Power 9 processors and NVIDIA’s Volta graphics processing units (GPUs)—Sierra is designed for more efficient overall operations and is expected to be a promising architecture for extreme-scale computing.

In late 2016, LLNL acquired three small-scale “early access” (EA) versions of Sierra, consisting of IBM Minsky compute nodes with 20 Power 8 cores each and 4 NVIDIA Pascal GPUs. These small systems feature components only one generation behind those of Sierra. EA systems enable application porting and tuning in advance of the CORAL Sierra system delivery and acceptance (late 2017 to mid-2018). To enable this work, beta software co-designed by the CORAL laboratories and IBM is being installed on the EA systems.

IBM Power and NVIDIA Volta processors
capability parallel computing
1.29 PB

Energy-Efficient and Power-Constrained Techniques for Exascale Computing

Ali Moradi Alamdarloo No Comments

The future of computing will be driven by constraints on power consumption. Achieving an exaflop will be limited to no more than 20 MW of power, forcing co-design innovations in both hardware and software to improve overall efficiency. On the hardware side, processor designs are shifting to many-core architectures to increase the ratio of computational power to power consumption. Research and development efforts of other hardware components, such as the memory and inter- connect, further enhance energy efficiency and overall reliability. On the software side, simulation codes and parallel programming models will need modifications to adapt to the increased concurrency and other new features of future architectures. Developing power-aware runtime systems is key to fully utilizing the limited re- sources. In this paper, we survey the current research in energy-efficient and power-constrained techniques in software, then present an analysis of these techniques as they apply to a specific high-performance computing use case.

Algorithms and Scheduling Techniques for Exascale Systems

Ali Moradi Alamdarloo No Comments

Exascale systems to be deployed in the near future will come with deep hierarchical parallelism, will exhibit various levels of heterogeneity, will be prone to frequent component failures, and will face tight power consumption constraints. The notion of application performance in these systems becomes multi-criteria, with fault-tolerance and power consumption metrics to be considered in addition to sheer compute speed. As a result, many of the proven algorithmic techniques used in parallel computing for decades will not be effective in Exascale systems unless they are adapted or in some cases radically changed. The Dagstuhl seminar “Algorithms and Scheduling Techniques for Exascale Systems” was aimed at sharing open problems, new results, and prospective ideas broadly connected to the Exascale problem. This report provides a brief executive summary of the seminar and lists all the presented material.

Seminar 15.–20. September, 2013 –

The Power and Possibilities of Exascale Computing

Ali Moradi Alamdarloo No Comments

Eighteen zeroes. That is the ability to run a quintillion calculations per second and exascale computing using memory driven computing processes that will touch all aspects of our lives. The race to the Exascale is the space race of this century.

High Performance Computing in 2017: Hits and Misses

Ali Moradi Alamdarloo No Comments

The past 12 months encompassed a number of new developments in HPC, as well as an intensification of existing trends. TOP500 News takes a look at the top eight hits and misses of 2017.

Hit: Machine learning,  the killer app for HPC

Machine learning, and the broader category of AI, continued to spread its influence across the HPC landscape in 2017. Web-based applications in search, ad-serving, language translation and image recognition continued to get smarter this year, as more sophisticated neural network models were developed. What’s new this year is the beginning of trend that inserts this technology into a broad range of traditional HPC workflows.

In applications as distinct as weather modeling, financial risk analysis, astrophysics simulations, and diagnostic medicine, developers used machine learning software to improve accuracy of their models and speed time-to-result. At the same time, conventional supercomputing platforms are also being used for machine learning R&D. In one of the most impressive computing demonstrations of the year, a poker-playing AI known as Libratus trained itself on the Bridges supercomputer at the Pittsburgh Supercomputing Center, and went on to crush four of the best professional players in the world. As more powerful GPUs make thier way into supercomputers (see below), we should see a lot more cutting-edge machine learning research being performed on these machines.

Hit: NVIDIA makes Volta GPU a deep learning monster

NVIDIA intensified its dominance in the AI space, with the launch of its Volta V100 GPU in May. With special circuitry for tensor processing, the V100 put unprecedented amounts of deep learning processing power – 120 teraflops per chip – into the hands of anyone with a spare PCIe port. Amazon and Microsoft will be the earliest adopters of the technology, followed soon thereafter by Baidu.

In addition to its deep learning prowess, the V100 GPU also deliver 7 double precision teraflops, making it eminently suitable for conventional HPC setups. The devices are already being deployed in the Department of Energy’s two most powerful supercomputers, Summit and Sierra, both of which are expected to come online in the first half of 2018. Those systems promise to be in high demand for both traditional HPC simulations and machine learning applications.

Miss: Intel fumbles pre-exascale deployment, drops Knights Hill

In October, the Department of Energy reported that its 180-petaflop Aurora supercomputer, which was slated to be installed at Argonne National Lab next year, was canceled. The system was to be powered by Knights Hill, Intel’s next-generation Xeon Phi processors. Instead, Aurora will be remade into a one exaflop system to be deployed in the 2020-2021 timeframe.

The rationale for the change in plans was not made clear, and as we wrote at the time, “something apparently went wrong with the Aurora work, and the Knights Hill chip looks like the prime suspect.” In November, Intel revealed it had dumped the Knights Hill product, without specifying any alternate roadmap for the Xeon Phi line.

Hit and Miss: AMD offers alternatives to Intel and NVIDIA silicon

In June, AMD launched EPYC, the chipmaker’s first credible alternative to Intel’s Xeon product line since the original Opteron processors. The EPYC 7000 series processors has more cores, more I/O connectivity, and better memory bandwidth than Intel’s “Skylake” Xeon CPUs, which were launched in July. Although AMD initially missed the opportunity to talk about the EPYC processors during ISC 2017, subsequent third-party testing and a more concerted effort by AMD at SC17 revealed that the EPYC processors had some advantages for HPC workloads, at least for some of them. Nonetheless, Intel will prove difficult dislodge from its position atop the datacenter food chain.

At SC17, AMD also talked up their Radeon Instinct GPUs (initially announced in December 2016), the chipmaker’s first serious foray into the machine learning datacenter space. These processors have plenty of flops to offer, but nothing approaching the performance of V100 for deep learning, since the Radeon hardware lacks the specialized arithmetic units that NVIDIA added for neural net acceleration. AMD is counting on its more open approach to GPU software to lure CUDA customers away from NVIDIA’s clutches.

Hit: Cavium becomes the center of gravity for ARM-powered HPC

Cavium’s second-generation ThunderX2 ARM server SoC was soft-launched way back in May 2016, but it wasn’t until this year that the chip got some attention from users and vendors. The processor offers decent performance, superior memory bandwidth, and an abundance of external connectivity to distinguish it from other ARM chip vendors taking aim at the datacenter.

In January, the EU’s Mont-Blanc project selected the ThunderX2 for its phase three pre-exascale prototype, which will be constructed by Atos/Bull. The French computer-maker intends to productize the ARM-based Mont-Blanc design as an option on its Sequana supercomputer line. In November, Cray followed suit with a ThunderX2-powered XC50 blade, which will become the basis of the Isambard supercomputer in the UK. HPE, Gigabyte Technology, and Ingrasys also came up with their own versions of ThunderX2-based servers. With the ARM software ecosystem for the datacenter also starting to fill out, 2018 could be a breakout year for the architecture in high performance computing and elsewhere.

Hit: Microsoft inches its way back into HPC

Between Microsoft’s acquisition of Cycle Computing and the next upgrade of its FPGA-accelerated Azure cloud, Microsoft looks like it’s becoming a bigger HPC player, at least in terms of technology prowess. Although the company still offers plenty of NVIDIA GPUs in Azure for cloud customers interested in accelerating HPC, data analytics, and deep learning workloads, the long-term strategy appears to moving toward an FPGA approach. If they manage to pull this off, Microsoft could drive a lot more interest in reconfigurable computing from performance-minded users, while simultaneously becoming a technology leader in this area.

Hit: Quantum computing on the cusp

Perhaps the fastest-moving HPC technology of 2017 was quantum computing, which, in a fairly short space of time, grew from an obscure set of research projects into a technology battle between some of the biggest names in the industry. The most visible of these IBM and Google, both of which built increasingly more capable quantum computers over the past 12 months. Currently, IBM currently has a 20-qubit system available for early users, with a 50-qubit prototype waiting in the wings. The company even managed to collect a handful of paying customers for this early hardware. Meanwhile, Google is fiddling with a 22-qubit system, with a 49-qubit machine promised before the end of the year.

In October, Intel has made its own quantum intentions known, with the revelation of a 17-qubit processor. For its part, Microsoft is working on a topological quantum computer, and while it has yet to field a working prototype, the company has come up with a software toolkit for the technology, complete with its own quantum computing programming language (Q#). In a similar vein, Atos/Bull launched a 40-qubit quantum simulator this year, softening the ground for the eventual hardware that everyone expects is right around the corner. 2018 is shaping up to be an even more exciting year for qubit followers.

Miss: Exascale computing fatigue

While exascale projects around the world made a lot of news in 2016, with the different players jockeying for position, this year the news has been a lot more subdued.  Maybe that’s because the various efforts in China, Japan, Europe, and the US are now pretty well set in place, and are just methodically moving forward at their own pace. But with the rise of AI and machine learning, and more generally, data analytics, the artificial milestone of reaching an exaflop on double precision floating point math seems a lot less relevant.

Consider that the DOE’s 200-petaflop Summit supercomputer will deliver three peak exaflops of deep learning performance, and drawing on that capability with large-scale neural networks may dwarf any advances made with the first “true” exascale machines used for traditional modeling. In a Moor Insights and Strategy white paper, senior analyst Karl Freund writes: “It is becoming clear that the next big advances in HPC may not have to wait for exascale-class systems, but are being realized today using Machine Learning methodologies. In fact, the convergence of HPC and [machine learning] could potentially redefine what an exascale system should even look like.”

In a world where machine learning can outperform oncologists, poker-players, and hedge fund analysts, it’s hard to argue with that assessment.

On the Road to Exascale

Ali Moradi Alamdarloo No Comments

Exascale computing

Exascale computing refers to computing systems capable of at least one exaFLOPS, or a billion billion calculations per second. Such capacity represents a thousandfold increase over the first petascale computer that came into operation in 2008. At a supercomputing conference in 2009, Computerworld projected exascale implementation by 2018. Exascale computing would be considered as a significant achievement in computer engineering, for it is believed to be the order of processing power of the human brain at neural level (functional might be lower). It is, for instance, the target power of the Human Brain Project.

Why do we need exascale computers ?

The only bad news is that we need more than exascale computing. Some of the key computational challenges, that face not just individual companies, but civilisation as a whole, will be enabled by exascale computing.

Everyone is concerned about climate change and climate modelling. The computational challenge for doing oceanic clouds, ice and topography are all tremendously important. And today we need at least two orders of magnitude improvement on that problem alone.

Controlled fusion – a big activity shared with Europe and Japan – can only be done with exascale computing and beyond. There is also medical modelling, whether it is life sciences itself, or the design of future drugs for every more rapidly changing and evolving viruses – again it’s a true exascale problem.

Exascale computing is really the medium and the only viable means of managing our future. It is probably crucial to the progress and the advancement of the modern age.

Sunway TaihuLight

The Sunway TaihuLight is a Chinese supercomputer which, as of June 2016, is ranked number one in the TOP500 list as the fastest supercomputer in the world, with a LINPACK benchmark rating of 93 petaflops. This is nearly three times as fast as the previous holder of the record, the Tianhe-2, which ran at 34 petaflops. As of June 2016, it is also ranked as the third most energy-efficient supercomputer in TOP500, with an efficiency of 6,051.30 MFLOPS/W. It was designed by the National Research Center of Parallel Computer Engineering & Technology (NRCPC) and is located at the National Supercomputing Center in Wuxi in the city of Wuxi, in Jiangsu province, China.