Linux Taskset Command: Setting 100% CPU Usage for Mutiple Processes - python

Currently have a data-intensive process running on Ubuntu Version 11.04 that needs multiple CPU usage.
I wrote the command, given I have 4 cores
taskset -c 0,1,2,3 python sample.py
I am only achieving 100% on one CPU, and the others are idle <2%.
Any tips how to ramp all 4 CPUs up to 100% to make the task faster?
Cheers!

Application needs to be prepared to use more than one core, its tasks need to be divided into separate threads. Otherwise there is little to no usage of more than one CPU.

Standard python interpreter (CPython) has GIL that prevents running more than one thread on a CPU. Consider using multiprocessing module or use alternative implementations such as PyPy.

Related

Using multiprocessing and GNU Parallel at the same time?

I have a Python script that is currently using multiprocessing to perform tasks in parallel. My advisor recommended me to use GNU Parallel to speed it up since Python programs always execute on a single core. Should I keep the multiprocessing script as it is and use GNU Parallel on top of it? Or should I remove the multiprocessing part and then use GNU Parallel? Does it make a difference?
Does it make a difference?
There is a really simple answer: Try it and measure.
The performance of parallelization these days depends on so many factors and it really depends on your application (and sometimes on your hardware).

Dask on single OSX machine - is it parallel by default?

I have installed Dask on OSX Mojave. Does it execute computations in parallel by default? Or do I need to change some settings?
I am using the DataFrame API. Does that make a difference to the answer?
I installed it with pip. Does that make a difference to the answer?
Yes, Dask is parallel by default.
Unless you specify otherwise, or create a distributed Client, execution will happen with the "threaded" scheduler, in a number of threads equal to your number of cores. Note, however, that because of the python GIL (only one python instruction executed at a time), you may not get as much parallelism as available, depending on how good your specific tasks are at releasing the GIL. That is why you have a choice of schedulers.
Being on OSX, installing with pip: these make no difference. Using dataframes makes a difference in that it dictates the sorts of tasks you're likely running. Pandas is good at releasing the GIL for many operations.

Is a python processed pinned to one CPU, or can it uses multiple CPU overtime?

So I know that even multithreaded python process can not use multiple core at the same time.
But, by default, does that mean a python process is "pinned" to one CPU? By pinned, I mean, will the python process use always the same CPU, or can the same process, use overtime the different CPU of my machine?
By default, a python process is not pinned to a particular CPU core. In fact, despite the GIL, a single python process can spawn multiple threads -- each of which can be scheduled simultaneously by the OS on different CPU cores. Although the GIL makes it difficult for more than one thread to actually make progress at any given time (since they must all contend for the lock), even this can happen (native code can release the GIL unless / until it needs to access Python datastructures).
You can, of course, use your operating system utilities to pin any process (including Python) to a specific CPU core.

Use of OMP_NUM_THREADS=1 for Python Multiprocessing

I heard that using OMP_NUM_THREADS=1 before calling a Python script that use multiprocessing make the script faster.
Is it true or not ? If yes, why so ?
Since you said in a comment that your Python program is calling a C module that uses OpenMP:
OpenMP does multi-threading within a process, and the default number of threads is typically the number that the CPU can actually run simultaneously. (This is generally the number of CPU cores, or a multiple of that number if the CPU has an SMT feature such as Intel's Hyper-Threading.) So if you have, for example, a quad-core non-hyperthreaded CPU, OpenMP will want to run 4 threads by default.
When you use Python's multiprocessing module, your program starts multiple Python processes which can run simultaneously. You can control the number of processes, but often you'll want it to be the number of CPU cores/threads, e.g. returned by multiprocessing.cpu_count().
So, what happens on that quad-core CPU if you run a multiprocessing program that runs 4 Python processes, and each calls an OpenMP function runs 4 threads? You end up running 16 threads on 4 cores. That'll work, but not at peak efficiency, since each core will have to spend some time switching between tasks.
Setting OMP_NUM_THREADS=1 basically turns off the OpenMP multi-threading, so each of your Python processes remains single-threaded.
Make sure you're starting enough Python processes if you do this, though! If you have 4 CPU cores and you only run 2 single-threaded Python processes, you'll have 2 cores utilized and the other 2 sitting idle. (In this case you might want to set OMP_NUM_THREADS=2.)
Resolved in comments:
OMP_NUM_THREADS is an option for OpenMP, a C/C++/Fortran API for doing multi-threading within a process.
It's unclear how that's even related to Python multiprocessing.
Is your Python program calling modules written in C that use OpenMP internally? – Wyzard

How to set the max thread a python script could use when calling from shell

Have script a.py, it will run some task with multiple-thread, please noticed that I have no control over the a.py.
I'm looking for a way to limit the number of thread it could use, as I found that using more threads than my CPU core will slow down the script.
It could be something like:
python --nthread=2 a.py
or Modifying something in my OS is also acceptable .
I am using ubuntu 16.04
As requested:
the a.py is just a module in scikit-learn the MLPRegressor .
I also asked this question here.
A more general way, not specific to python:
taskset -c 1-3 python yourProgram.py
In this case the threads 1-3 (3 in total) will be used. Any parallelization invoked by your program will share those resources.
For a solution that fits your exact problem you should better identify which part of the code parellelizes. For instance, if it is due to numpy routines, you could limit it by using:
OMP_NUM_THREADS=4 python yourProgram.py
Again, the first solution is general and handled by the os, whereas the second is python (numpy) specific.
Read the threading doc, it said:
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
If you will like to take advantage of the multi-core processing better update the code to use multiprocessing module.
In case that you prefer anyway continue using threading, you have one option passing the number of threads to the application as an argument like:
python a.py --nthread=2
Then you can update the script code to limit the threads.

Categories

Resources