GPU Programming Introduction

Chưa phân loại

General-purpose computing on a GPU (Graphics Processing Unit), better known as GPU programming, is the use of a GPU together with a CPU (Central Processing Unit) to accelerate computation in applications traditionally handled only by the CPU.Even though GPU programming has been practically viable only for the past two decades, its applications now include virtually every industry. For example, GPU programming has been used to accelerate video, digital image, and audio signal processing, statistical physics, scientific computing, medical imaging, computer vision, neural networks and deep learning, cryptography, and even intrusion detection, among many other areas.

This article serves as a theoretical introduction aimed at those who would like to learn how to write GPU-accelerated programs as well as those who have just a general interest in this fascinating topic.

The Difference Between a GPU and a CPU

A long time before high-resolution, high-fidelity 3D graphics became the norm, most computers had no GPU. Instead, the CPU carried out all the instructions of computer programs by performing the basic arithmetic, logical, control, and input/output (I/O) operations specified by the instructions. For this reason, the CPU is often described as the brain of the computer.

But in recent years, the GPU, which is designed to accelerate the creation of images for output to a display device, has often been helping the CPU solve problems in areas that were previously handled solely by the CPU.

Graphics card manufacturer Nvidia provides a simple way how to understand the fundamental difference between a GPU and a CPU: “A CPU consists of a few cores optimized for sequential serial processing while a GPU has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously.”

The ability to handle multiple tasks at the same time makes GPUs highly suitable for some tasks, such as searching for a word in a document, while other tasks, such as calculating the Fibonacci sequence, don’t benefit from parallel processing at all.

However, among the tasks that do significantly benefit from parallel processing is deep learning, one of the most highly sought-after skills in tech today. Deep learning algorithms mimic the activity in layers of neurons in the neocortex, allowing machines to learn how to understand language, recognize patterns, or compose music.

As a result of the growing importance of artificial intelligence, the demand for developers who understand general-purpose computing on a GPU has been soaring.

CUDA Versus OpenCL Versus OpenACC

Because GPUs understand computational problems in terms of graphics primitives, early efforts to use GPUs as general-purpose processors required reformulating computational problems in the language of graphics cards.

Fortunately, it’s now much easier to do GPU-accelerated computing thanks to parallel computing platforms such as Nvidia’s CUDA, OpenCL, or OpenACC. These platforms allow developers to ignore the language barrier that exists between the CPU and the GPU and, instead, focus on higher-level computing concepts.


Initially released by Nvidia in 2007, CUDA (Compute Unified Device Architecture) is the dominant proprietary framework today. “With CUDA, developers are able to dramatically speed up computing applications by harnessing the power of GPUs,” describes the framework Nvidia.

Developers can call CUDA from programming languages such as C, C++, Fortran, or Python without any skills in graphics programming. What’s more, the CUDA Toolkit from Nvidia contains everything developers need to start creating GPU-accelerated applications that greatly outperform their CPU-bound counterparts.

The CUDA SDK is available for Microsoft Windows, Linux, and macOS. the CUDA platform also supports other computational interfaces, including the OpenCL, Microsoft’s DirectCompute, OpenGL Compute Shaders, and C++ AMP.


Initially released by the Khronos Group in 2009, OpenCL is the most popular open, royalty-free standard for cross-platform, parallel programming. According to the Khronos Group, “OpenCL greatly improves the speed and responsiveness of a wide spectrum of applications in numerous market categories including gaming and entertainment titles, scientific and medical software, professional creative tools, vision processing, and neural network training and inferencing.”

OpenCL has so far been implemented by Altera, AMD, Apple, ARM, Creative, IBM, Imagination, Intel, Nvidia, Qualcomm, Samsung, Vivante, Xilinx, and ZiiLABS, and it supports all popular operating systems across all major platforms, making it extremely versatile. OpenCL defines a C-like language for writing programs, but third-party APIs exist for other programming languages and platforms such as Python or Java.


OpenACC is the youngest programming standard for parallel computing described in this article. It was initially released in 2015 by a group of companies comprising Cray, CAPS, Nvidia, and PGI (the Portland Group) to simplify parallel programming of heterogeneous CPU/GPU systems.

“OpenACC is a user-driven directive-based performance-portable parallel programming model designed for scientists and engineers interested in porting their codes to a wide-variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort than required with a low-level model.,” states OpenACC on its official website.

Developers interested in OpenACC can annotate C, C++, and Fortran source code to tell the GPU which areas that should be accelerated. The goal is to provide a model for accelerator programming that is portable across operating systems and various types of host CPUs and accelerators.

Which One Should I Use?

The choice between these three parallel computing platforms depends on your goals and the environment you work in. For example, CUDA is widely used in academia, and it’s also considered to be the easiest one to learn. OpenCL is by far the most portable parallel computing platform, although programs written in OpenCL still need to be individually optimized for each target platform.

Learn GPU Coding on

GPU Programming with Python

GPU Programming with C++

Further Reading

To become familiar with CUDA, we recommend you follow the instructions in the CUDA Quick Start Guide, which explains how to get CUDA up and running on Linux, Windows, and macOS. AMD’s OpenCL Programming Guide provides a fantastic, in-depth overview of OpenCL, but it assumes that the reader is familiar with the first three chapters of the OpenCL Specification. OpenACC offers a three-step introductory tutorial designed to demonstrate how to take advantage of GPU programming, and more information can be found in the OpenACC specification.

ONET IDC thành lập vào năm 2012, là công ty chuyên nghiệp tại Việt Nam trong lĩnh vực cung cấp dịch vụ Hosting, VPS, máy chủ vật lý, dịch vụ Firewall Anti DDoS, SSL… Với 10 năm xây dựng và phát triển, ứng dụng nhiều công nghệ hiện đại, ONET IDC đã giúp hàng ngàn khách hàng tin tưởng lựa chọn, mang lại sự ổn định tuyệt đối cho website của khách hàng để thúc đẩy việc kinh doanh đạt được hiệu quả và thành công.
Bài viết liên quan

Ubuntu vs Linux Mint Desktop Versions

In the desktop market, the competition has always been among Linux, MacOS and Windows operating systems, which has led to...

How to make BASH suit you better

BASH has a simple standard setup which is great but you may want more! Many computer users avoid the command line, because...

Install BleachBit for Ubuntu

Whenever we continue to use our system, it creates a number of files that are useful. However, not all of them stays useful...