Course Outline

Introduction

  • What is ROCm?
  • What is HIP?
  • ROCm vs CUDA vs OpenCL
  • Overview of ROCm and HIP features and architecture
  • ROCm for Windows vs ROCm for Linux

Installation

  • Installing ROCm on Windows
  • Verifying the installation and check the device compatibility
  • Updating or uninstall ROCm on Windows
  • Troubleshooting common installation issues

Getting Started

  • Creating a new ROCm project using Visual Studio Code on Windows
  • Exploring the project structure and files
  • Compiling and run the program
  • Displaying the output using printf and fprintf

ROCm API

  • Using ROCm API in the host program
  • Querying device information and capabilities
  • Allocating and deallocate device memory
  • Copying data between host and device
  • Launching kernels and synchronize threads
  • Handling errors and exceptions

HIP Language

  • Using HIP language in the device program
  • Writing kernels that execute on the GPU and manipulate data
  • Using data types, qualifiers, operators, and expressions
  • Using built-in functions, variables, and libraries

ROCm and HIP Memory Model

  • Using different memory spaces, such as global, shared, constant, and local
  • Using different memory objects, such as pointers, arrays, textures, and surfaces
  • Using different memory access modes, such as read-only, write-only, read-write, etc.
  • Using memory consistency model and synchronization mechanisms

ROCm and HIP Execution Model

  • Using different execution models, such as threads, blocks, and grids
  • Using thread functions, such as hipThreadIdx_x, hipBlockIdx_x, hipBlockDim_x, etc.
  • Using block functions, such as __syncthreads, __threadfence_block, etc.
  • Using grid functions, such as hipGridDim_x, hipGridSync, cooperative groups, etc.

Debugging

  • Debugging ROCm and HIP programs on Windows
  • Using Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
  • Using ROCm Debugger to debug ROCm and HIP programs on AMD devices
  • Using ROCm Profiler to analyze ROCm and HIP programs on AMD devices

Optimization

  • Optimizing ROCm and HIP programs on Windows
  • Using coalescing techniques to improve memory throughput
  • Using caching and prefetching techniques to reduce memory latency
  • Using shared memory and local memory techniques to optimize memory accesses and bandwidth
  • Using profiling and profiling tools to measure and improve the execution time and resource utilization

Summary and Next Steps

Requirements

  • An understanding of C/C++ language and parallel programming concepts
  • Basic knowledge of computer architecture and memory hierarchy
  • Experience with command-line tools and code editors
  • Familiarity with Windows operating system and PowerShell

Audience

  • Developers who wish to learn how to install and use ROCm on Windows to program AMD GPUs and exploit their parallelism
  • Developers who wish to write high-performance and scalable code that can run on different AMD devices
  • Programmers who wish to explore the low-level aspects of GPU programming and optimize their code performance
 21 Hours

Number of participants


Price per participant

Testimonials (2)

Upcoming Courses