This course will aid in students in learning in concepts that scale the use of GPUs and the CPUs that manage their use beyond the most common consumer-grade GPU installations. They will learn how to manage asynchronous workflows, sending and receiving events to encapsulate data transfers and control signals. Also, students will walk through application of GPUs to sorting of data and processing images, implementing their own software using these techniques and libraries. By the end of the course, you will be able to do the following: - Develop software that can use multiple CPUs and GPUs - Develop software that uses CUDA’s events and streams capability to create asynchronous workflows - Use the CUDA computational model to to solve canonical programming challenges including data sorting and image processing To be successful in this course, you should have an understanding of parallel programming and experience programming in C/C++. This course will be extremely applicable to software developers and data scientists working in the fields of high performance computing, data processing, and machine learning.