Python: Multicore processing?

Each Answer to this Q is separated by one/two green lines.

I’ve been reading about Python’s multiprocessing module. I still don’t think I have a very good understanding of what it can do.

Let’s say I have a quadcore processor and I have a list with 1,000,000 integers and I want the sum of all the integers. I could simply do:

list_sum = sum(my_list)

But this only sends it to one core.

Is it possible, using the multiprocessing module, to divide the array up and have each core get the sum of it’s part and return the value so the total sum may be computed?

Something like:

core1_sum = sum(my_list[0:500000])          #goes to core 1
core2_sum = sum(my_list[500001:1000000])    #goes to core 2
all_core_sum = core1_sum + core2_sum        #core 3 does final computation

Any help would be appreciated.

Yes, it’s possible to do this summation over several processes, very much like doing it with multiple threads:

from multiprocessing import Process, Queue

def do_sum(q,l):

def main():
    my_list = range(1000000)

    q = Queue()

    p1 = Process(target=do_sum, args=(q,my_list[:500000]))
    p2 = Process(target=do_sum, args=(q,my_list[500000:]))
    r1 = q.get()
    r2 = q.get()
    print r1+r2

if __name__=='__main__':

However, it is likely that doing it with multiple processes is likely slower than doing it in a single process, as copying the data forth and back is more expensive than summing them right away.

Welcome the world of concurrent programming.

What Python can (and can’t) do depends on two things.

  1. What the OS can (and can’t) do. Most OS’s allocate processes to cores. To use 4 cores, you need to break your problem into four processes. This is easier than it sounds. Sometimes.

  2. What the underlying C libraries can (and can’t) do. If the C libraries expose features of the OS AND the OS exposes features of the hardware, you’re solid.

To break a problem into multiple processes — especially in GNU/Linux — is easy. Break it into a multi-step pipeline.

In the case of summing a million numbers, think of the following shell script. Assuming some hypothetical program that sums either a range of numbers or a list of numbers on stdin.

( 0 500000 & 50000 1000000 ) |

This would have 3 concurrent processes. Two are doing sums of a lot of numbers, the third is summing two numbers.

Since the GNU/Linux shells and the OS already handle some parts of concurrency for you, you can design simple (very, very simple) programs that read from stdin, write to stdout, and are designed to do small parts of a large job.

You can try to reduce the overheads by using subprocess to build the pipeline instead of allocating the job to the shell. You may find, however, that the shell builds pipelines very, very quickly. (It was written directly in C and makes direct OS API calls for you.)

Sure, for example:

from multiprocessing import Process, Queue

thelist = range(1000*1000)

def f(q, sublist):

def main():
    start = 0
    chunk = 500*1000
    queue = Queue()
    NP = 0
    subprocesses = []
    while start < len(thelist):
      p = Process(target=f, args=(queue, thelist[start:start+chunk]))
      NP += 1
      print 'delegated %s:%s to subprocess %s' % (start, start+chunk, NP)
      start += chunk
    total = 0
    for i in range(NP):
      total += queue.get()
    print "total is", total, '=', sum(thelist)
    while subprocesses:

if __name__ == '__main__':

results in:

$ python2.6 
delegated 0:500000 to subprocess 1
delegated 500000:1000000 to subprocess 2
total is 499999500000 = 499999500000

note that this granularity is too fine to be worth spawning processes for — the overall summing task is small (which is why I can recompute the sum in main as a check;-) and too many data is being moved back and forth (in fact the subprocesses wouldn’t need to get copies of the sublists they work on — indices would suffice). So, it’s a “toy example” where multiprocessing isn’t really warranted. With different architectures (use a pool of subprocesses that receive multiple tasks to perform from a queue, minimize data movement back and forth, etc, etc) and on less granular tasks you could actually get benefits in terms of performance, however.

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .