I guess the question speaks for itself. I’m interested in doing some serious computations but am not a programmer by trade. I can string enough python together to get done what I want. But can I write a program in python and have the GPU execute it using CUDA? Or do I have to use some mix of python and C?

The examples on Klockner’s (sp) “pyCUDA” webpage had a mix of both python and C, so I’m not sure what the answer is.

If anyone wants to chime in about Opencl, feel free. I heard about this CUDA business only a couple of weeks ago and didn’t know you could use your video cards like this.

You should take a look at CUDAmat and Theano. Both are approaches to writing code that executes on the GPU without really having to know much about GPU programming.

I believe that, with PyCUDA, your computational kernels will always have to be written as “CUDA C Code”. PyCUDA takes charge of a lot of otherwise-tedious book-keeping, but does not build computational CUDA kernels from Python code.

pyopencl offers an interesting alternative to PyCUDA. It is described as a “sister project” to PyCUDA. It is a complete wrapper around OpenCL’s API.

As far as I understand, OpenCL has the advantage of running on GPUs beyond Nvidia’s.

Great answers already, but another option is Clyther. It will let you write OpenCL programs without even using C, by compiling a subset of Python into OpenCL kernels.

A promising library is Copperhead (alternative link), you just need to decorate the function that you want to be run by the GPU (and then you can opt-in / opt-out it to see what’s best between cpu or gpu for that function)

There is a good, basic set of math constructs with compute kernels already written that can be accessed through pyCUDA’s cumath module. If you want to do more involved or specific/custom stuff you will have to write a touch of C in the kernel definition, but the nice thing about pyCUDA is that it will do the heavy C-lifting for you; it does a lot of meta-programming on the back-end so you don’t have to worry about serious C programming, just the little pieces. One of the examples given is a Map/Reduce kernel to calculate the dot product:

dot_krnl = ReductionKernel(np.float32, neutral="0",
arguments="float *x, float *y")

The little snippets of code inside each of those arguments are C lines, but it actually writes the program for you. the ReductionKernel is a custom kernel type for map/reducish type functions, but there are different types. The examples portion of the official pyCUDA documentation goes into more detail.

Good luck!

Scikits CUDA package could be a better option, provided that it doesn’t require any low-level knowledge or C code for any operation that can be represented as numpy array manipulation.

I was wondering the same thing and carried a few searches. I found the article linked below which seems to answer your question. However, you asked this back in 2014 and the Nvidia article does not have a date.


The video goes through the set up, an initial example and, quite importantly, profiliing. However, I do not know if you can implement all of the usual general compute patterns. I would think you can because as far as I could there are no limitations in NumPy.