Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Introduction
- What is OpenCL?
- Comparison of OpenCL, CUDA, and SYCL
- Overview of OpenCL features and architecture
- Setting up the Development Environment
Getting Started
- Creating a new OpenCL project using Visual Studio Code
- Exploring the project structure and files
- Compiling and running the program
- Displaying the output using printf and fprintf
OpenCL API
- Understanding the role of the OpenCL API in host programs
- Using the OpenCL API to query device information and capabilities
- Using the OpenCL API to create contexts, command queues, buffers, kernels, and events
- Using the OpenCL API to enqueue commands such as read, write, copy, map, unmap, execute, and wait
- Handling errors and exceptions using the OpenCL API
OpenCL C
- Understanding the role of OpenCL C in device programs
- Writing kernels using OpenCL C that execute on the device and manipulate data
- Using OpenCL C data types, qualifiers, operators, and expressions
- Using OpenCL C built-in functions, such as math, geometric, and relational functions
- Utilizing OpenCL C extensions and libraries, such as atomic, image, and cl_khr_fp16
OpenCL Memory Model
- Understanding the differences between host and device memory models
- Using OpenCL memory spaces, including global, local, constant, and private
- Using OpenCL memory objects such as buffers, images, and pipes
- Applying OpenCL memory access modes, such as read-only, write-only, and read-write
- Managing the OpenCL memory consistency model and synchronization mechanisms
OpenCL Execution Model
- Understanding the differences between host and device execution models
- Defining parallelism using OpenCL work-items, work-groups, and ND-ranges
- Utilizing OpenCL work-item functions such as get_global_id, get_local_id, and get_group_id
- Utilizing OpenCL work-group functions such as barrier, work_group_reduce, and work_group_scan
- Utilizing OpenCL device functions such as get_num_groups, get_global_size, and get_local_size
Debugging
- Understanding common errors and bugs in OpenCL programs
- Using the Visual Studio Code debugger to inspect variables, breakpoints, call stack, etc.
- Using CodeXL to debug and analyze OpenCL programs on AMD devices
- Using Intel VTune to debug and analyze OpenCL programs on Intel devices
- Using NVIDIA Nsight to debug and analyze OpenCL programs on NVIDIA devices
Optimization
- Understanding factors that impact OpenCL program performance
- Using OpenCL vector data types and vectorization techniques to improve arithmetic throughput
- Using OpenCL loop unrolling and loop tiling techniques to reduce control overhead and increase locality
- Using OpenCL local memory and local memory functions to optimize memory accesses and bandwidth
- Using OpenCL profiling and profiling tools to measure and improve execution time and resource utilization
Summary and Next Steps
Requirements
- Understanding of C/C++ programming language and parallel programming concepts.
- Basic knowledge of computer architecture and memory hierarchy.
- Experience with command-line tools and code editors.
Audience
- Developers aiming to learn how to program heterogeneous devices using OpenCL and exploit their parallelism.
- Developers seeking to write portable and scalable code that runs on various platforms and devices.
- Programmers interested in exploring the low-level aspects of heterogeneous programming and optimizing code performance.
28 Hours