Your very own CUDA kernel!

3 min readJun 29, 2024

Where I get you started with CUDA programming. You set up your coding environment, and write your first CUDA kernel.

Folks, recently I wrote a tweet about how you can easily get started with CUDA programming. The tweet got some attention, so I decided to convert it to a blog post!

Setting up your code editor.

Alright so our code editor of choice for learning CUDA would be… VS Code? No. Okay C Lion, because CUDA is basically C? No. Then DevCPP??? No.

Google Colab.

Wait What? Google Colab?? Yes you heard right! The fundamental requirement of learning CUDA programming is an NVIDIA GPU. And Google Colab gives you that for free! Obviously you can use Kaggle or even spin up your virtual machine in cloud with an H100, but Google Colab would just suffice for learning CUDA.

Spin up a new colab notebook and set the GPU runtime to T4. At the time of writing this post, you can do this via Runtime → Change Runtime Type → T4 GPU

Run the following command in first cell of the notebook

!pip install git+https://github.com/andreinechaev/nvcc4jupyter.git
%load_ext nvcc4jupyter

Well this is a pretty useful extension that will enable us running CUDA C code in Google Colab via nvcc. You can think of nvcc as a compiler that compiles your CUDA C code to machine code, which you can execute normally like any other compiled C program!

Hello world from GPU!

It’s time to write your first CUDA program already. We’ll also do some bash wizardry on the way!

Create another cell, and type in:

Writing a basic hello world kernel in cuda!

Pretty simple C code right! Just a function which is being called from main. Well there are a bunch of things to learn here before we move ahead.

%%bash is a magic function in colab, if you place it at the top of a cell, colab will presume that the code in this cell, hereon, will be bash.
cat <<EOF> hello_world.cu ……. EOF, is a simple hack to write a multiline string to a file. In our case, our file will have .cu extension which is cuda file format. It goes without saying that you can name your file whatever you wish to! And yes you can view content of your file by just running

!cat hello_world.cu

There are two weird things here, which I guess you might have already noticed.

Why is helloWorldFromGPU function qualified with __global__ and why is this function embellished with <<<1, 1>>> when it’s called from main.

Well, CUDA C adds the __global__ qualifier to standard C. This mechanism alerts the compiler that a function should be compiled to run on a device (GPU) instead of the host (CPU). In this simple example, nvcc gives the function helloFromGPU() to the compiler that handles device code, and it feeds main() to the host compiler as it did in the previous example.

For now, you can think of the triple angular brackets <<<1, 1>>> as the parameters to the runtime that describe how to launch the kernel. Don’t worry we won’t be skipping it, we’ll learn more about it in later section of this post.

By the way, you already wrote your first CUDA kernel: helloWorldFromGPU. CUDA Kernel is a fancy term for function that executes on the device i.e. GPU.

Getting device specs…

CUDA code to get specs of your gpu device

Those weird angular brackets…

It’s time to give a proper explanation to those weird angular brackets, which we described as runtime parameters earlier. The first number in those parameters represents the number of parallel blocks in which we would like the device to execute our kernel. In this case, we’re passing the value 1 for this parameter. With the second parameter we specify how many threads to launch per block. We will learn in detail about these in next iteration!

Until next time!

Your very own CUDA kernel!

Setting up your code editor.

Hello world from GPU!

Getting device specs…

Those weird angular brackets…

Written by Anshuman Mishra

No responses yet