I have a Python script that is currently using multiprocessing to perform tasks in parallel. My advisor recommended me to use GNU Parallel to speed it up since Python programs always execute on a single core. Should I keep the multiprocessing script as it is and use GNU Parallel on top of it? Or should I remove the multiprocessing part and then use GNU Parallel? Does it make a difference?
Does it make a difference?
There is a really simple answer: Try it and measure.
The performance of parallelization these days depends on so many factors and it really depends on your application (and sometimes on your hardware).
Related
I heard that using OMP_NUM_THREADS=1 before calling a Python script that use multiprocessing make the script faster.
Is it true or not ? If yes, why so ?
Since you said in a comment that your Python program is calling a C module that uses OpenMP:
OpenMP does multi-threading within a process, and the default number of threads is typically the number that the CPU can actually run simultaneously. (This is generally the number of CPU cores, or a multiple of that number if the CPU has an SMT feature such as Intel's Hyper-Threading.) So if you have, for example, a quad-core non-hyperthreaded CPU, OpenMP will want to run 4 threads by default.
When you use Python's multiprocessing module, your program starts multiple Python processes which can run simultaneously. You can control the number of processes, but often you'll want it to be the number of CPU cores/threads, e.g. returned by multiprocessing.cpu_count().
So, what happens on that quad-core CPU if you run a multiprocessing program that runs 4 Python processes, and each calls an OpenMP function runs 4 threads? You end up running 16 threads on 4 cores. That'll work, but not at peak efficiency, since each core will have to spend some time switching between tasks.
Setting OMP_NUM_THREADS=1 basically turns off the OpenMP multi-threading, so each of your Python processes remains single-threaded.
Make sure you're starting enough Python processes if you do this, though! If you have 4 CPU cores and you only run 2 single-threaded Python processes, you'll have 2 cores utilized and the other 2 sitting idle. (In this case you might want to set OMP_NUM_THREADS=2.)
Resolved in comments:
OMP_NUM_THREADS is an option for OpenMP, a C/C++/Fortran API for doing multi-threading within a process.
It's unclear how that's even related to Python multiprocessing.
Is your Python program calling modules written in C that use OpenMP internally? – Wyzard
I have a kivy application in python which uses some threads.
As python is not able to run these threads on different Cores due to the Global Interpreter Lock, I would have liked to try to use PyPy for it and see if I can make the threads run faster of different cores since PyPy is different and offers stackless (what ever that is? :).
Does somebody have some information to share on how to make a simple python program, which launches some threads by the module threading, running with the pypy interpreter such that it uses this stackless feature?
Pypy won't resolve Python problems of running a single-thread each time, since it also makes use of the GIL - http://doc.pypy.org/en/latest/faq.html#does-pypy-have-a-gil-why
Besides that, Kivy is a complex project embedding Python itself - although I don't know it very well, I doubt it is possible to switch the Python used in it for Pypy.
Depending on what you are doing, you may want to use the multiprocessing module instead of threading - it is a drop-in replacement that will make transparent inter-process calls to Python functions, and can therefore take advantage of multiple-cores.
https://docs.python.org/3/library/multiprocessing.html
This is standard in cPython and can likely be used from within Kivy, if (and only if) all code in the subprocess just take care of number-crunching, and so on, and all user interaction and display updates are made on the main process.
I've researched this topic quite a bit and can't seem to come to a conclusion.
So I know OpenCL can be used for parallel processing using both the GPU and CPU (in contrast to CUDA). Since I want to do parallel processing with GPU and CPU, would it be better to use Multiprocessing module from python + PyOpenCL/PyCUDA for parallel processing or just use PyOpenCL for both GPU and CPU parallel programming?
I'm pretty new to this but intuitively, I would imagine multiprocessing module from python to be the best possible way to do CPU parallel processing in Python.
Any help or direction would be much appreciated
I don't know if you already got your answer, but take in mind GPUs are designed for floating point operations, and execute a complete python processes in it could slower than you may expect for a GPU.
Anyway, once you are new in parallell processing, you should start with multiprocessing module, once GPU programming, and OpenCL library itself, are diffcult to learn when you have no base.
You may take a look here https://philipwfowler.github.io/2015-01-13-oxford/intermediate/python/04-multiprocessing.html
Have script a.py, it will run some task with multiple-thread, please noticed that I have no control over the a.py.
I'm looking for a way to limit the number of thread it could use, as I found that using more threads than my CPU core will slow down the script.
It could be something like:
python --nthread=2 a.py
or Modifying something in my OS is also acceptable .
I am using ubuntu 16.04
As requested:
the a.py is just a module in scikit-learn the MLPRegressor .
I also asked this question here.
A more general way, not specific to python:
taskset -c 1-3 python yourProgram.py
In this case the threads 1-3 (3 in total) will be used. Any parallelization invoked by your program will share those resources.
For a solution that fits your exact problem you should better identify which part of the code parellelizes. For instance, if it is due to numpy routines, you could limit it by using:
OMP_NUM_THREADS=4 python yourProgram.py
Again, the first solution is general and handled by the os, whereas the second is python (numpy) specific.
Read the threading doc, it said:
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
If you will like to take advantage of the multi-core processing better update the code to use multiprocessing module.
In case that you prefer anyway continue using threading, you have one option passing the number of threads to the application as an argument like:
python a.py --nthread=2
Then you can update the script code to limit the threads.
is there any benefit in using multiprocessing versus running several python interpreters in parallel for long-running, embarrassingly parallel tasks?
At the moment, I'm just firing up several python interpreters that run the analysis over slices of input data, each of them dumping the results into a separate pickle file. It is trivial to slice the input data as well as to combine the results. I'm using python 3.4 on OS X and linux for that.
Is rewriting the code with the multiprocessing module worth the effort? It seems to me that not, but then I'm far away from being expert...
Well, one advantage of multiprocessing is that there are tools for interprocess communication and you can have shared variables with different sorts of restrictions. But if you don't need that your approach is perfectly viable. What you are doing is basically what most map reduce systems automate. I am sure there is some slight performance penalty in running an entire other interpreter but it's probably insignificant.