I heard that using OMP_NUM_THREADS=1 before calling a Python script that use multiprocessing make the script faster.
Is it true or not ? If yes, why so ?
Since you said in a comment that your Python program is calling a C module that uses OpenMP:
OpenMP does multi-threading within a process, and the default number of threads is typically the number that the CPU can actually run simultaneously. (This is generally the number of CPU cores, or a multiple of that number if the CPU has an SMT feature such as Intel's Hyper-Threading.) So if you have, for example, a quad-core non-hyperthreaded CPU, OpenMP will want to run 4 threads by default.
When you use Python's multiprocessing module, your program starts multiple Python processes which can run simultaneously. You can control the number of processes, but often you'll want it to be the number of CPU cores/threads, e.g. returned by multiprocessing.cpu_count().
So, what happens on that quad-core CPU if you run a multiprocessing program that runs 4 Python processes, and each calls an OpenMP function runs 4 threads? You end up running 16 threads on 4 cores. That'll work, but not at peak efficiency, since each core will have to spend some time switching between tasks.
Setting OMP_NUM_THREADS=1 basically turns off the OpenMP multi-threading, so each of your Python processes remains single-threaded.
Make sure you're starting enough Python processes if you do this, though! If you have 4 CPU cores and you only run 2 single-threaded Python processes, you'll have 2 cores utilized and the other 2 sitting idle. (In this case you might want to set OMP_NUM_THREADS=2.)
Resolved in comments:
OMP_NUM_THREADS is an option for OpenMP, a C/C++/Fortran API for doing multi-threading within a process.
It's unclear how that's even related to Python multiprocessing.
Is your Python program calling modules written in C that use OpenMP internally? – Wyzard
Related
I have a Python script that is currently using multiprocessing to perform tasks in parallel. My advisor recommended me to use GNU Parallel to speed it up since Python programs always execute on a single core. Should I keep the multiprocessing script as it is and use GNU Parallel on top of it? Or should I remove the multiprocessing part and then use GNU Parallel? Does it make a difference?
Does it make a difference?
There is a really simple answer: Try it and measure.
The performance of parallelization these days depends on so many factors and it really depends on your application (and sometimes on your hardware).
So I know that even multithreaded python process can not use multiple core at the same time.
But, by default, does that mean a python process is "pinned" to one CPU? By pinned, I mean, will the python process use always the same CPU, or can the same process, use overtime the different CPU of my machine?
By default, a python process is not pinned to a particular CPU core. In fact, despite the GIL, a single python process can spawn multiple threads -- each of which can be scheduled simultaneously by the OS on different CPU cores. Although the GIL makes it difficult for more than one thread to actually make progress at any given time (since they must all contend for the lock), even this can happen (native code can release the GIL unless / until it needs to access Python datastructures).
You can, of course, use your operating system utilities to pin any process (including Python) to a specific CPU core.
I have a more general beginners question about multiprocessing in Python (please forgive me if I'm utterly wrong in the following). Let's assume I launch two ore more Ipython consols in parallel and run some independent functions/scripts via those consols, does that meant these tasks are performed on multiple cores (one core per task)? If yes, would it be better to collect the tasks in a "main module" and use the multiprocessing library?
There's no difference between starting processes in two terminals or using multiprocessing:
when you open two python consoles you have two processes with their pid
when you run two multiprocessing processes they are forked (on Linux) or started as separate python instance (Windows) and thus run as independent processes.
What the OS does with this processes is beyond your control. If both processes use a lot of CPU resources and there is only little other processes they will be spread across cores.
Have script a.py, it will run some task with multiple-thread, please noticed that I have no control over the a.py.
I'm looking for a way to limit the number of thread it could use, as I found that using more threads than my CPU core will slow down the script.
It could be something like:
python --nthread=2 a.py
or Modifying something in my OS is also acceptable .
I am using ubuntu 16.04
As requested:
the a.py is just a module in scikit-learn the MLPRegressor .
I also asked this question here.
A more general way, not specific to python:
taskset -c 1-3 python yourProgram.py
In this case the threads 1-3 (3 in total) will be used. Any parallelization invoked by your program will share those resources.
For a solution that fits your exact problem you should better identify which part of the code parellelizes. For instance, if it is due to numpy routines, you could limit it by using:
OMP_NUM_THREADS=4 python yourProgram.py
Again, the first solution is general and handled by the os, whereas the second is python (numpy) specific.
Read the threading doc, it said:
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
If you will like to take advantage of the multi-core processing better update the code to use multiprocessing module.
In case that you prefer anyway continue using threading, you have one option passing the number of threads to the application as an argument like:
python a.py --nthread=2
Then you can update the script code to limit the threads.
Currently have a data-intensive process running on Ubuntu Version 11.04 that needs multiple CPU usage.
I wrote the command, given I have 4 cores
taskset -c 0,1,2,3 python sample.py
I am only achieving 100% on one CPU, and the others are idle <2%.
Any tips how to ramp all 4 CPUs up to 100% to make the task faster?
Cheers!
Application needs to be prepared to use more than one core, its tasks need to be divided into separate threads. Otherwise there is little to no usage of more than one CPU.
Standard python interpreter (CPython) has GIL that prevents running more than one thread on a CPU. Consider using multiprocessing module or use alternative implementations such as PyPy.