25 May, 2020

NVLINK on GeForce RTX 2080ti

Read time

5 min

Nvlink is a communication protocol developed by Nvidia. It is a hardware technology for creating a high bandwidth link between two of their video cards. It can be used for many things, from basic SLI for faster gaming to potentially pooling GPU memory for rendering large and complex scenes.

It also allows the combination of two GPUs. It is used for data and control code transfers in processor systems between CPUs and GPUs and solely between GPUs. NVLink specifies a point-to-point connection with data rates of 20 and 25 Gbit/s. It is like a connection between GPUs for code and data transfer. Nvlink helps in distributing the workload on multiple GPUs.

By the end of this blog, you will be able to understand and compare the performance of NVLINK between 2 RTX 2080Ti GPUs along with a comparison against a single GPU I’ve recently done.

Hardware Setup:

GeForce RTX 2080Ti
Intel Core i9- 10940 X CPU
3.30 GHz Processor Base Frequency
126 GB Memory

Software Setup:

Ubuntu 18.04.4 LTS
NVIDIA display driver 440.44
CUDA 10.2
Torch 1.4.0

Before going into Nvlink, let’s first understand the problem statement. In this post, I have considered a PyTorch model called DeepNeuro. It is an open-source toolset of deep learning applications in medical imaging.

We can check the status of available GPUs using nvidia-smi nvlink — status command.

Nvlink-Status

Nvlink Status

This gives us the current status of GPUs and bandwidth.

Nvlink-img-min

To know the topology of nvlink, use nvidia-smi topo -m command.

From the topology of Nvlink, we can understand that GPU0 and GPU1 have a connection traversing a bonded set of the number of Nvlinks.

Nvlink-Topology

Nvlink Topology

This shows us which GPUs are connected to each other with which technology.

DeepNeuro with a single GPU:

I have trained a Deep Learning model using 3150 images with a batch size of 150. Here I’m using only one GPU and the entire workload will only 1 GPU. In these kinds of situations, when workload increases, a single GPU won’t be able to take the load completely and it will give Out of Memory issues.

Nvlink-DeepNeuro-withoutNvlink

Monitoring the running status of Deepneuro model without Nvlinks

DeepNeuro with Nvlink:

Nvlink will enable the communication between multiple GPUs and it distributes the workload among them. To use Nvlink, I have parallelized the PyTorch model, and using the Nvlink task gets distributed to both the GPUs.

Nvlink-DeepNeuro-withNvlink

Monitoring the running status of Deepneuro model with Nvlink

Using Nvlink, we can parallelize the task which reduces the workload on Individual GPUs and it also reduces the processing time.

Nvlink-plot

Time vs epochs plot

We can observe the plot for epochs against the time taken. For a given number of epochs, Multi-GPU models are processing the models faster than a single GPU. We can observe that Nvlink is processing the models nearly 20% faster.

Other Examples:

I have trained another PyTorch classification model to predict whether the patient has Pneumonia or not. From the following example, we can compare the computational speed of single and multiple GPUs more clearly.

Single GPU:

I have trained a PyTorch model with 5000 images on a single GPU and it utilizes nearly 73% of memory, when we increase the size of the dataset, it may give Out of Memory error in future.

Nvlink-singleGPU

Monitoring the running status of pytorch model with a single NVIDIA GPU

Multi GPU:

The workload is getting shared by both the GPUs and around 30–40% of GPU memory is getting utilized.

Nvlink-MultiGPU

Monitoring the running status of pytorch model with Nvlink

Nvlink-plot2

Time vs epochs plot

Comparing RTX 2080 Ti with other GPUs:

Boasting up to six times, the performance of the older GTX 1080 series graphics card, Nvidia’s latest RTX2080, and RTX 2080 Ti are the GPU Beasts.

Nvlink-comparision

Comparing RTX 2080 Ti with other GPUs

Comparing Average G3D Mark of GPUs:

This graph shows the relative performance of the video card compared to the 10 other common video cards in terms of PassMark G3D Mark. 3DMark is a computer benchmarking tool created and developed by UL, (formerly Futuremark), to determine the performance of a computer’s 3D graphic rendering and CPU workload processing capabilities. Running 3DMark produces a 3DMark score, with higher numbers indicating better performance.

Nvlink-comparision2

Comparing Average G3D Mark of GPUs

Software Development

Software Development

NVLINK on GeForce RTX 2080ti

Hardware Setup:

Software Setup:

DeepNeuro with a single GPU:

DeepNeuro with Nvlink:

Other Examples:

Single GPU:

Multi GPU:

Comparing RTX 2080 Ti with other GPUs:

Comparing Average G3D Mark of GPUs:

Our people write about the best
development practices

Body Part Detection

Comparison of publicly available COVID-19 models

The size of our success

Experience improved software development

Hardware Setup:

Software Setup:

DeepNeuro with a single GPU:

DeepNeuro with Nvlink:

Other Examples:

Single GPU:

Multi GPU:

Comparing RTX 2080 Ti with other GPUs:

Comparing Average G3D Mark of GPUs:

Our people write about the best development practices

Body Part Detection

Comparison of publicly available COVID-19 models

The size of our success

Experience improved software development

Our people write about the best
development practices