Jaemin Choi

Jaemin Choi

Senior Deep Learning Architect

NVIDIA Corporation


Jaemin Choi is a Senior Deep Learning Architect at NVIDIA Corporation. With broad interests in Deep Learning, High Performance Computing (HPC) and GPU Computing, his work involves analyzing and optimizing deep learning training performance at scale and driving HW/SW co-design of NVIDIA’s full deep learning platform stack from silicon to DL frameworks.

He received his PhD degree in Computer Science from the University of Illinois Urbana-Champaign, where he performed research on GPU-accelerated high-performance computing with the Charm++ parallel programming system at the Parallel Programming Laboratory (PPL) led by Prof. Laxmikant (Sanjay) Kale.

Download my CV.

  • Large-scale Training of Deep Learning Models
  • High Performance Computing
  • GPU Computing
  • Parallel Programming Models and Runtime Systems
  • Performance Modeling
  • PhD in Computer Science, 2016-2022

    University of Illinois Urbana-Champaign

  • BSc in Computer Science and Engineering, 2010-2016

    Seoul National University


NVIDIA Corporation
Senior Deep Learning Architect
Aug 2022 – Present Santa Clara, CA
  • Key contributor to NVIDIA’s success at MLPerf Training benchmarks, focused on performance optimizations of training generative AI models including large language models (GPT-3), parameter-efficient fine-tuning (PEFT on LLaMa-2-70B), text-to-image models (Stable Diffusion), and computer vision benchmarks (RetinaNet).
  • Benchmark and project performance of deep learning workloads on the latest and next-generation NVIDIA GPUs, to identify performance bottlenecks and build roadmaps to achieving peak performance.
  • Optimize training performance across all scales, from a single DGX to thousands of compute nodes on large-scale supercomputers such as NVIDIA Eos.
  • Collaborate with various deep learning framework, library, and kernel development teams at NVIDIA, including PyTorch, cuDNN, cuBLAS, DALI, NeMo, Megatron-LM, and TransformerEngine.
Parallel Programming Laboratory, UIUC
Research Assistant
Aug 2016 – Aug 2022 Urbana, IL
  • Optimized performance for GPU-accelerated applications on modern heterogeneous HPC platforms by developing new features in the Charm++ parallel programming system, including asynchronous completion notification and GPU-aware communication.
  • Developed CharminG, a GPU-resident parallel programming framework built on CUDA and NVSHMEM, with the goal of performing task scheduling and communication inside the GPU devices.
  • Improved support for NVIDIA and Intel GPUs in the NAMD molecular dynamics simulation framework.
Intel Corporation
Graduate Technical Intern
May 2021 – Aug 2022 Austin, TX (Virtual)
  • Developed support for Intel GPUs in OpenMPI using Intel oneAPI Level Zero and Libfabric/OFI.
  • Validated point-to-point and collective MPI calls on Intel GPU clusters with OSU micro-benchmark suite.
Lawrence Livermore National Laboratory
Research Intern
May 2019 – Aug 2019 Livermore, CA
  • Created performance models using parallel discrete event simulation (PDES) and roofline model to analyze and predict the performance of GPU-accelerated proxy applications in the Exascale Computing Project (ECP), including SW4lite and MiniFE.
Walt Disney Animation Studios
Technology Research Intern
May 2018 – Aug 2018 Burbank, CA
  • Optimized memory usage in a parallel path tracing renderer via de-duplication of scene objects.


  • jaemin@acm.org
  • 2788 San Tomas Expy, Santa Clara, CA 95051
  • NVIDIA Corporation