Program

Workshop Date : June 3rd 2022
Registration link
Event clock

All times are in Eastern Daylight Time (EDT)

9:30 - 9:40 Welcome from the organizers

Session 1
Chairs: Prof. Danilo Ardagna and Prof. Stacy Patterson

9:40 - 10:20 Design of secure power monitors for accelerators, by exploiting ML techniques, in the Euro-HPC TEXTAROSSA project

Invited talk by Prof. William Fornaciari, Politecnico di Milano – DEIB, Italy.

Abstract :

The evolution of High Performance Computing (HPC) has to face with several obstacles, including power/thermal management of the cores and the presence of heterogeneous computing platforms. Within the Euro-HPC project TEXTAROSSA, started in spring 2021, several HPC applications exploiting AI and with the need of processing big chunks of data in a secure context, are properly accelerated by leveraging ad-hoc designed accelerators to be implemented in hardware.

Such customized heterogeneity of execution has many benefits, but increases the problem of power management, since the accelerators, possibly generated through high-level-synthesis, are neither providing run-time information on their power consumption, nor allows to control the security of the information flow against implementation attacks. In such scenario, any global power manager / resource orchestrator, can operate only with a partial picture of the overall systems and not in real-time, with the risk of being trapped in poor power optimizations and unbalanced resource exploitation.

The goal of the talk is to show how is possible to exploit popular ML techniques for a twofold purpose:

  • Automatically generate an on-line power monitor, to augment the hardware description of any piece of hardware, in particular that of cores and accelerators, capable to provide on-line power estimated in less that few milliseconds.

  • Select, in the space of all the possible power monitors, those that are not leaking information that can be used to mount side-channel attacks.

Speaker Bio :

William Fornaciari, is Associate Professor at POLIMI. He published six books and over 300 papers, collecting 6 best paper awards, one certification of appreciation from IEEE. He filed three international patents on low power design and two on cybersecurity. Since 1997 he has been involved in 20 EU-funded international projects (two as project coordinator, and two as technical manager) and he has been part of the pool of experts of the Call For Tender No. 964-2005 – WING – Watching IST INnovation and knowledge, studying the impact of FP5 and FP6 expenditures to support the identification of FP7 and Horizon2020 research directions. During FP7 he won the 2016 HiPEAC Technology Transfer Award for the output of the CONTREX project. In H2020 he coordinates the FET-HPC RECIPE project and also participated to the SafeCOP, M2DC and MANGO projects as principal investigator. In 2022 he is the Project Technical manager of the Euro-HPC TEXTAROSSA project, principal investigator of Euro-HPC pilot EUPEX and also cooperate with the Euro-HPC “The European Pilot”.

Since 2021 he is in the board of directors of the CINI labs on Embedded Systems and Smart Manufacturing and in that on High-Performance-Computing. He cooperated for 20 years with the Technology Transfer Centre of POLIMI, actively working with companies to the development of leading edge products. In 2019 his team won the Switch to Product (S2P) competition with a solution on the hardware protection from side-channel attacks, with the creation of the Spin-off “Blue Signals srl”. His main research interests cover multi/many core architectures, High-Performance-Computing, Design of low power hardware and software, runtime resource management, thermal and power management, and EDA-based design methodologies tailored to enhance the security of IoT/Embedded systems.

10:20 - 10:40 Distributing Deep Learning Hyperparameter Tuning for 3D Medical Image Segmentation

Oriol Aranda (Barcelona Supercomputing Center), Josep Lluis Berral Garcia (Barcelona Supercomputing Center, Universitat Politecnica de Catalunya) , Juan Luis Dominguez (Barcelona Supercomputing Center), Jordi Torres (Barcelona Supercomputing Center, Universitat Politecnica de Catalunya)


10:40 - 11:20 AI/ML Pipelines using CodeFlare

Invited talk by Dr. Mudhakar Srivatsa, IBM Research, USA

Abstract :

Pipelines have become a ubiquitous construct in machine learning spanning tasks ranging from data cleaning and preprocessing, training foundational models, model optimization and transfer learning and low latency inferencing. While the many pipeline construct has existed for many years (e.g., SciKit learn pipelines, Spark pipelines), this talk will focus on a process calculus style definition of pipeline - called CodeFlare pipelines - that makes it readily amenable to scaling complex AI/ML workflows on a commodity cluster. CodeFlare pipelines not only enable data scientists to introduce compute, data and multi-stage parallelism using simple annotations on the pipeline graph, but also operationalize them on a hybrid cloud platform (Red Hat OpenShift), thereby making the solution deployable just about anywhere and leverage the benefits of serverless computing. This talk will cover a basic realization of CodeFlare pipelines on the Ray platform (1.7.0 release) that has shown near linear scalability for transfer learning and inferencing on foundational models.

Speaker Bio :

Mudhakar Srivatsa is a distinguished research staff member at the Hybrid Cloud department in IBM T. J. Watson Research Center. His work is focused on cloud-native scaling of AI/ML workloads with applications to large scale spatial and time series data. He has led the deployment of AI-assisted solutions for air traffic control, IT operations, combating piracy in the maritime domain, and public safety in dense urban environments such as stadiums and music festivals.

11:20 - 11:40 A Methodology to Build Decision Analysis Tools Applied to Distributed Reinforcement Learning

Gabriel Antoniu (INRIA), Alexandru Costan (INRIA), Loïc Cudennec (DGA Maîtrise de l'Information), Cédric Prigent (Inria Rennes - Bretagne Atlantique)


11:40 - 12:50 Break


Session 2
Chairs:
Prof. Alex Gittens and Hadjer Benmeziane

12:50 - 13:10 MadPipe: Memory Aware Dynamic Programming Algorithm for Pipelined Model Parallelism

Olivier Beaumont (INRIA, University of Bordeaux), Lionel Eyraud-Dubois (INRIA, University of Bordeaux), Alena Shilova (INRIA, University of Bordeaux)


13:10 - 13:50 Sustainable AI @ Scale: Accelerating AI models for billions of users

Invited talk by Dr. Michael Gschwind, Meta AI and MLPerf

Abstract :

AI is a foundational technology at Meta: we use AI to identify relevant and interesting content our users delight at interacting with, to translate content to transcend language barriers, and to keep our communities safe by identifying inappropriate content, such as bullying, domestic violence and terrorism in images, videos, and text. The need for ever higher quality models intersects with the imperative of keeping AI growth sustainable. While academic research has emphasized an exponential resource growth of models to deliver quality, we must bend the curve to ensure sustainable growth and minimize environmental impact. As we are looking for ever higher quality, larger scale models to deliver on our mission to connect users and build safe communities, AI accelerators provide the foundation for scaling up quality, while keeping power consumption manageable and sustainable, and delivering on our sustainability commitments.

Speaker Bio :

Michael Gschwind leads Accelerator Enablement at Meta AI where he is responsible for AI software development for accelerators and AI deployment at scale . He was previously VP and AI Chief Architect at Huawei and at IBM leading hardware and software development of AI and general-purpose systems. Michael was chief architect of three supercomputers that were the fastest systems of their day, and for three game console chips (Playstation 3, Xbox 360, Wii). He invented Cell, the first programmable accelerator, and led development of accelerator hardware and software. He is an IEEE Fellow, the author of over 100 technical papers and is one of the most prolific inventors in history with over 800 patents in the field.

13:50 - 14:10 APPFL: Open-Source Software Framework for Privacy-Preserving Federated Learning

Youngdae Kim (Argonne National Laboratory), Kibaek Kim (Argonne National Laboratory), Ravi Madduri (Argonne National Laboratory), Minseok Ryu (Argonne National Laboratory)


14:10 - 14:40 Break


Session 3
Chairs:
Dr. Kaoutar El Maghraoui and Dr. Praveen Venkateswaran

14:40 - 15:20 When Moore Just Isn’t Enough: Scaling ML in the Datacenter

Invited talk by Mr. David Kanter, MLCommons

Abstract :

As the industry drives towards more capable ML, workloads are rapidly evolving and the need for performance is nearly unlimited. Performance has vastly outstripped the pace of Moore's Law as measured by MLPerf™ through software/hardware co-design. This talk will discuss challenges in benchmarking ML Training and explore the design space and also identify opportunities in future ML systems.

Speaker Bio :

David Kanter is a Founder and the Executive Director of MLCommons™ where he helps lead the MLPerf™ benchmarks and other initiatives. He has 16+ years of experience in semiconductors, computing, and machine learning. He founded a microprocessor and compiler startup, was an early employee at Aster Data Systems, and has consulted for industry leaders such as Intel, Nvidia, KLA, Applied Materials, Qualcomm, Microsoft and many others. David holds a Bachelor of Science degree with honors in Mathematics with a specialization in Computer Science, and a Bachelor of Arts with honors in Economics from the University of Chicago.

15:20 - 15:35 Throughput-oriented and Accuracy-aware DNN Training with BFloat16 on GPU

Zhen Xie (Argonne National Laboratory), Siddhisanket Raskar (Argonne National Laboratory), Murali Emani (Argonne National Laboratory)


15:35 - 16:15 Designing Effective Sparse Expert Models

Invited talk by Dr. Barret Zoph, Google Brain

Abstract :

Scale has opened new frontiers in natural language processing -- but at a high cost. In response, Mixture-of-Experts (MoE) and Switch Transformers have been proposed as an energy efficient path to even larger and more capable language models. But advancing the state-of-the-art across a broad set of natural language tasks has been hindered by training instabilities and uncertain quality during fine-tuning. Our work focuses on these issues and acts as a design guide. We conclude by scaling a sparse model to 269B parameters, with a computational cost comparable to a 32B dense encoder-decoder Transformer (Stable and Transferable Mixture-of-Experts or ST-MoE-32B). For the first time, a sparse model achieves state-of-the-art performance in transfer learning, across a diverse set of tasks including reasoning (SuperGLUE, ARC Easy, ARC Challenge), summarization (XSum, CNN-DM), closed book question answering (WebQA, Natural Questions), and adversarially constructed tasks (Winogrande, ANLI R3).

Speaker Bio :

Barret Zoph is currently a Staff Research Scientist on the Google Brain team working on making large scale language modeling more efficient. Previously he worked on data augmentation, semi-supervised learning and neural architecture search.

16:15 - 16:35 Adaptive Optimization for Sparse Data on Heterogeneous GPUs

Yujing Ma (University of California Merced), Florin Rusu (University of California Merced, Lawrence Berkeley National Lab), Alexander Sim (Lawrence Berkeley National Laboratory, Energy Sciences Network (ESnet)), Kesheng Wu (Lawrence Berkeley National Laboratory)


16:35 - 16:45 Closing Remarks