I use this command to use a GPU.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

But, I want to use two GPUs in jupyter, like this:

device = torch.device("cuda:0,1" if torch.cuda.is_available() else "cpu")

Assuming that you want to distribute the data across the available GPUs (If you have batch size of 16, and 2 GPUs, you might be looking providing the 8 samples to each of the GPUs), and not really spread out the parts of models across difference GPU’s. This can be done as follows:

If you want to use all the available GPUs:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = CreateModel()

model= nn.DataParallel(model)
model.to(device)

If you want to use specific GPUs:
(For example, using 2 out of 4 GPUs)

device = torch.device("cuda:1,3" if torch.cuda.is_available() else "cpu") ## specify the GPU id's, GPU id's start from 0.

model = CreateModel()

model= nn.DataParallel(model,device_ids = [1, 3])
model.to(device)

To use the specific GPU’s by setting OS environment variable:

Before executing the program, set CUDA_VISIBLE_DEVICES variable as follows:

export CUDA_VISIBLE_DEVICES=1,3 (Assuming you want to select 2nd and 4th GPU)

Then, within program, you can just use DataParallel() as though you want to use all the GPUs. (similar to 1st case). Here the GPUs available for the program is restricted by the OS environment variable.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = CreateModel()

model= nn.DataParallel(model)
model.to(device)

In all of these cases, the data has to be mapped to the device.

If X and y are the data:

X.to(device)
y.to(device)

Using multi-GPUs is as simply as wrapping a model in DataParallel and increasing the batch size. Check these two tutorials for a quick start:

Another option would be to use some helper libraries for PyTorch:

PyTorch Ignite library Distributed GPU training

In there there is a concept of context manager for distributed configuration on:

  • nccl – torch native distributed configuration on multiple GPUs
  • xla-tpu – TPUs distributed configuration

PyTorch Lightning Multi-GPU training

This is of possible the best option IMHO to train on CPU/GPU/TPU without changing your original PyTorch code.

Worth cheking Catalyst for similar distributed GPU options.

When I ran naiveinception_googlenet, the above methods didn’t work for me. The following method solved my problem.

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "0,3"  # specify which GPU(s) to be used