I’m having a bit of trouble deciding whatever to use python multiprocessing or celery or pp for my application.

My app is very CPU heavy but currently uses only one cpu so, I need to spread it across all available cpus(which caused me to look at python’s multiprocessing library) but I read that this library doesn’t scale to other machines if required. Right now I’m not sure if I’ll need more than one server to run my code but I’m thinking of running celery locally and then scaling would only require adding new servers instead of refactoring the code(as it would if I used multiprocessing).

My question: is this logic correct? and is there any negative(performance) with using celery locally(if it turns out a single server with multiple cores can complete my task)? or is it more advised to use multiprocessing and grow out of it into something else later?


p.s. this is for a personal learning project but I would maybe one day like to work as a developer in a firm and want to learn how professionals do it.

I just finished a test to decide how much celery adds as overhead over multiprocessing.Pool and shared arrays. The test runs the wiener filter on a (292, 353, 1652) uint16 array. Both versions use the same chunking (roughly:divide the 292,353 dimensions by the square root of the number of available cpu’s). Two celery versions were tried: one solution sends pickled data the other opens the underlying data file in every worker.

Result: on my 16 core i7 CPU celery takes about 16s, multiprocessing.Pool with shared arrays about 15s. I find this difference surprisingly small.

Increasing granularity increases the difference obviously (celery has to pass more messages): celery takes 15 s, multiprocessing.Pool takes 12s.

Take into account that celery workers were already running on the host whereas the pool workers are forked at each run. I am not sure how could I start multiprocessing pool at the beginning since I pass the shared arrays in the initializer:

with closing(Pool(processes=mp.cpu_count(), initializer=poolinit_gen, initargs=(sourcearrays, resarrays))) as p:

and only the resarrays are protected by locking.

I have actually never used Celery, but I have used multiprocessing.

Celery seems to have several ways to pass messages (tasks) around, including ways that you should be able to run workers on different machines. So a downside might be that message passing could be slower than with multiprocessing, but on the other hand you could spread the load to other machines.

You are right that multiprocessing can only run on one machine. But on the other hand, communication between the processes can be very fast, for example by using shared memory. Also if you need to process very large amounts of data, you could easily read and write data from and to the local disk, and just pass filenames between the processes.

I don’t know how well Celery would deal with task failures. For example, task might never finish running, or might crash, or you might want to have the ability to kill a task if it did not finish in certain time limit. I don’t know how hard it would be to add support for that if it is not there.

multiprocessing does not come with fault tolerance out of the box, but you can build that yourself without too much trouble.