Tendr: Pořízení, dodání, instalace a údržba hardwaru a softwaru prekursorů za účelem eskalace superpočítačů
Zadavatel: Evropská komise jménem Evropského společného podniku pro vysoce výkonnou výpočetní techniku
Lot 1 Acquisition, Delivery, Installation and Maintenance of Supercomputer LUMI for Hosting Entity CSC, the Finnish IT Center for Science. Budget €144.500.000
Lot 2 Acquisition, Delivery, Installation and Maintenance of Supercomputer Marenostrum 5 for Hosting Entity Barcelona Supercomputer Center. Budget €151.410.000
Lot 3 Acquisition, Delivery, Installation and Maintenance of Supercomputer Leonardo for Hosting Entity CINECA. Budget €120.000.000
LOT1 – The Purpose and Scope
This lot foresees the procurement and maintenance up to 72 months of a leadership-class pre-exascale accelerated supercomputer, which enables the convergence of high-performance computing (HPC), artificial intelligence (AI), and high-performance data analytics (HPDA). The system will be a true precursor to exascale, spearheaded by a large accelerated partition utilizing the latest generation graphics processing units (GPUs). Accelerated computing, backed up by extreme-performance interconnect and storage, is required both for exascale HPC and AI (deep learning and in general machine learning) workloads.
The supercomputer should aim at a sustained capability (sustained Linpack) around 150 PFlop/s and an anticipated maximum (Linpack) power consumption around 10 MW.
The Accelerated partition (LUMI-G) represents the Tier-0 floating-point workhorse of the system, and excels both at HPC and high-performance AI workloads. At the convergence of AI and HPC the partition will be designed with a highperformance and low-latency interconnect, high-performance storage providing bandwidth and IOPS, as well as accelerated compute nodes providing very good floating point performance across half, single and double precision compute.
The Tier-1 CPU partition (LUMI-C) targets the wide base of scientific applications and workflows which have not been ported to accelerators, or which do not benefit from accelerators due to lack of inherent parallelism in the problem. This enables scientists to utilize the best tool for each step in their workflow, and expands the impact of the whole system. It should comprise of the order of 1500-1800 dualsocket x86 nodes. CPU SKUs of the highest performance are valued. The partition will feature nodes with a range of memory configurations (e.g. 256, 1024 and 2048 GB per node, 95%, 4% and 1% of the nodes, respectively) to support a wide range of workloads, while keeping the cost at bay.
Pre/post-processing & data analysis partition (LUMI-D) is a special resource for interactive usage, featuring a couple of fat GPU nodes, with large amount of local memory and 8–16 GPU cards. It should feature also a couple of SMP-like fat CPU nodes, each featuring 4 to 8 CPUs running at a high frequency and over 10 TB of shared memory. These are geared for machine learning tasks, data analytics, meshing, interactive visualization and other non-batch and non-MPI-parallel workloads, and in general GPU focused applications, which do not scale well across the interconnect and make extensive use of the large shared memory on the GPU and CPU side. Further use cases include in-memory analytics and databases.
The partitions can be separate islands; no jobs need to be run over multiple partitions. However, the partitions should be visible from the same login nodes and be able to access the storage resources (below).
In addition, and in synergy with the main HPC system, a small set of resources (of the order of two racks of same technology as in the CPU partition) will be dedicated to running an IaaS solution as well as Kubernetes style container cloud. These allow scientific communities and users to develop services and solutions for accessing, analysing and sharing the large sets of data computed on the HPC system. The vendor may offer its own approach for the cloud resources, or just provide hardware for running CSC’s cloud stack; that is an IaaS OpenStack cloud and a container cloud based on OpenShift.
The storage ecosystem should feature the following components, with volumes cost-optimised for supporting the anticipated use cases, avoiding excessively large solutions:
A highly capable parallel file system (LUMI-P), by default based on Lustre but other alternatives can be proposed as well. Anticipated volume is 60 PB or more. It should feature some resiliency, by e.g. consisting of several separate filesystems. If a single filesystem is proposed, the vendor needs to show sufficient capability for resiliency including availability during maintenance. An accelerated I/O (SSD/NVMe) layer (LUMI-F), providing automated tiering capability to and from the LUMI-P filesystem, acting as a cache layer. The target is to achieve more than 1 TB/s sustained bandwidth and an extreme IOPS capability. Anticipated volume is around 5 PB. An object storage service (LUMI-O) for project-time storage and for convenient data management. This allows the users for fast and reliable moving of data, fine-grained authorization, sharing datasets and other results among their colleagues or to the whole world. This would also act as the storage solution for the add-on cloud computing services. Its expected size is at least 30 PB.
The abovementioned is indicative; alternative solutions providing similar volume, bandwidth and IOPS capabilities can be proposed.
LOT2 – purpose and scope
This lot foresees the procurement and maintenance for up to 72 months of a world-class supercomputer with a sustained capability (sustained Linpack) above 150 PFlop/s, with more than 2,4 PB of main memory (RAM), more than 200 PB in disk storage, nearly 400 PB available for long-term storage (tapes), and an anticipated maximum (Linpack) power consumption around 12,3 MW. The system will be located in an innovative hosting site designed specifically to meet the needs of this supercomputer, which will be fully powered with green energy and will include heat reuse.
High-performance computer and rest of components of this lot is planned to be installed at Barcelona Supercomputing facilities in Q3 2020, in order to provide full operation for 31st December 2020.
BSC-CNS lot specifications includes the following requirements: • An efficient use of energy, ranging from the site cooling infrastructure design to an application level energy framework, leading to an effective PUE below 1,08. • Visibility and public outreach, making the EuroHPC system an iconic supercomputer, open to visitors and fostering attraction for the celebration of courses, workshops and other events
The EuroHPC supercomputer intended for this lot needs to fulfil two basic requirements, i.e. (1) to have an aggregated sustained performance of at least 150 PFlop/s and (2) to provide service to a large group of user communities. Fulfilling the two requirements, with the technologies available on 2020-21, will require to install an heterogeneous machine, with one partition using accelerators and a second one (or several) with a general-purpose processor without accelerators. The accelerators will provide the PFlop/s, and the general purpose processors the easy usability for all domains.
General-purpose processor partition has to be as big as possible to enable usability of the supercomputer by any application, meanwhile GPU partition has to complement it to achieve this lot performance targets.
The supercomputer will include a new High Performance storage system used by the EuroHPC supercomputer. Storage system has to guarantee that the performance and capabilities provided by the new computing system are matched with the latest technology based in storage systems.
High performance storage is expected to provide a minimum of 200 Petabytes net capacity and a minimum aggregated performance of 1 Terabyte per second.
LOT3 purpose and scope
This lot foresees the procurement and maintenance for up to 72 months of a new supercomputing system that will support EuroHPC JU and CINECA consortium in providing leading-edge innovative computing resources to European research.
With the deployment of the new system, scheduled for the second half of 2020, the main goal is to achieve at least 10x sustained performance on user applications with respect to today PRACE tier-0 systems, by exploiting a raw computing power of the order of 150 PFlops (measured with LINPACK), and targeting, featuring node architectures that maximizes the performance throughput per energy unit.
Moreover, the Leonardo supercomputer shall represent a concrete step towards the architectural convergence of high-performance computing and high-performance data analytics (data-centric architecture) and will be able to run computing and data intensive workloads, which is particularly relevant for solutions with a high energy efficiency. In this regard, the system shall be able to benefit from innovative hardware and software solutions, and the most relevant technology outcomes of European projects, such as the procurement for innovations that aim to reduce power consumption, improve power management and enhance I/O performance, helping to make supercomputing sustainable towards exascale for a broad range of scientific applications.
The HPC system targets a modular solution in order to maximize efficiency and sustainability. In particular, the system architecture shall feature three main modules, all tightly integrated in a single system to increase usability. A booster partition targeting energy efficient solutions that can sustain a Linpack performance in the range of 150-180 PFlops, and providing the raw computing power required to tackle the upcoming HPC challenges. A general purpose partition featuring state of the art general purpose processors to run workloads that need time to be adapted in order to exploit the booster module capabilities, equivalent in term of performance to the Marconi supercomputer.. Finally, a memory and data-centric partition, supporting the other two partitions, available for innovative usage models (e.g. virtual machine, interactive computing). To maximize energy efficiency and free cooling Leonardo design requires that the system will be energy efficient and exploit direct liquid cooling with warm water (40-50 C).
The data centre includes 890 sqm of data hall, 350 sqm of data storage, electrical, cooling and ventilation systems, offices and ancillary spaces, and is designed for extreme energy efficiency, targeting a PUE of 1.08. The HPC area can be increased by other 700 sqm if needed.
The facility is designed for up to 20 MW of IT load, but, in the first phase of operation of the Tecnopolo (2020-2025), it will be equipped with an infrastructure capable of 10 MW IT. Bologna Tecnopolo will be interconnected at the European level with a dedicated point of presence of the GARR National Consortium, providing two threads in fail over, each with a bandwidth of 100Gbits/s, and directly connected to the GEANT point of presence in Milan. All this, with a roadmap coherent with technological innovation in the field of networking, which will increase the effective bandwidth to 400 Gb/s during the lifetime of this project, and up to 1 Tb/s in the mid-term.
The system will be integrated with a high speed network with state of the art topology, bandwidth and latency, and supported by a storage with a capacity of 150PByte and connected with a bandwidth of at least 1TByte/s.