Dask on single OSX machine - is it parallel by default? - python

I have installed Dask on OSX Mojave. Does it execute computations in parallel by default? Or do I need to change some settings?
I am using the DataFrame API. Does that make a difference to the answer?
I installed it with pip. Does that make a difference to the answer?

Yes, Dask is parallel by default.
Unless you specify otherwise, or create a distributed Client, execution will happen with the "threaded" scheduler, in a number of threads equal to your number of cores. Note, however, that because of the python GIL (only one python instruction executed at a time), you may not get as much parallelism as available, depending on how good your specific tasks are at releasing the GIL. That is why you have a choice of schedulers.
Being on OSX, installing with pip: these make no difference. Using dataframes makes a difference in that it dictates the sorts of tasks you're likely running. Pandas is good at releasing the GIL for many operations.

Related

Isolated Sub-Interpreters in Python without GIL

There are PEP-554 and PEP-684. Both are designed to support multiple interpreters on Thread level.
Does anyone know if these PEPs are implemented somewhere at least in experimental or pre-release versions of Python like 3.11?
I found out that Python 3.10 (maybe even 3.9) has these features in experimental build. If you build CPython by configuring with following flag:
./configure --with-experimental-isolated-subinterpreters
or by adding define to compile command when compiling all .c files:
#define EXPERIMENTAL_ISOLATED_SUBINTERPRETERS 1
I posted request of enabling this feature into one famous project, see issue here.
After enabling this feature as I suppose I will be able to create separate interpreters inside multiple threads (not processes), meaning that I don't need multiprocessing anymore.
More than that when using multiple Interpreters according to this feature description there is no need to have single GIL, every Interpreter in separate thread has its own GIL. It means that even if Interpreters are created inside thread, still all CPU cores are used, same like in multiprocessing. Current Python suffers from GIL only because it forces to use only single CPU core, thus multiprocessing is used by people to overcome this and use all CPU cores.
In description of these features it was said that authors had to modify by hand 1500 static and global variables by moving them all into per-thread local table inside thread state structure.
Presumably all these new features can be used right now only from Python C API.
If someone here knows how to use these isolated sub-interpreters features, can you provide some Python code or C API code with detailed example on how to use them?
Specifically I'm interested in how to use interpreters in such a way that all CPU cores are used, i.e. I want to know how to avoid single GIL, but to use multiple GILs (actually local locks, LILs). Of course I want inside threads, without using multiprocessing.

Using multiprocessing and GNU Parallel at the same time?

I have a Python script that is currently using multiprocessing to perform tasks in parallel. My advisor recommended me to use GNU Parallel to speed it up since Python programs always execute on a single core. Should I keep the multiprocessing script as it is and use GNU Parallel on top of it? Or should I remove the multiprocessing part and then use GNU Parallel? Does it make a difference?
Does it make a difference?
There is a really simple answer: Try it and measure.
The performance of parallelization these days depends on so many factors and it really depends on your application (and sometimes on your hardware).

Is a python processed pinned to one CPU, or can it uses multiple CPU overtime?

So I know that even multithreaded python process can not use multiple core at the same time.
But, by default, does that mean a python process is "pinned" to one CPU? By pinned, I mean, will the python process use always the same CPU, or can the same process, use overtime the different CPU of my machine?
By default, a python process is not pinned to a particular CPU core. In fact, despite the GIL, a single python process can spawn multiple threads -- each of which can be scheduled simultaneously by the OS on different CPU cores. Although the GIL makes it difficult for more than one thread to actually make progress at any given time (since they must all contend for the lock), even this can happen (native code can release the GIL unless / until it needs to access Python datastructures).
You can, of course, use your operating system utilities to pin any process (including Python) to a specific CPU core.

Python and Threads with PyPy?

I have a kivy application in python which uses some threads.
As python is not able to run these threads on different Cores due to the Global Interpreter Lock, I would have liked to try to use PyPy for it and see if I can make the threads run faster of different cores since PyPy is different and offers stackless (what ever that is? :).
Does somebody have some information to share on how to make a simple python program, which launches some threads by the module threading, running with the pypy interpreter such that it uses this stackless feature?
Pypy won't resolve Python problems of running a single-thread each time, since it also makes use of the GIL - http://doc.pypy.org/en/latest/faq.html#does-pypy-have-a-gil-why
Besides that, Kivy is a complex project embedding Python itself - although I don't know it very well, I doubt it is possible to switch the Python used in it for Pypy.
Depending on what you are doing, you may want to use the multiprocessing module instead of threading - it is a drop-in replacement that will make transparent inter-process calls to Python functions, and can therefore take advantage of multiple-cores.
https://docs.python.org/3/library/multiprocessing.html
This is standard in cPython and can likely be used from within Kivy, if (and only if) all code in the subprocess just take care of number-crunching, and so on, and all user interaction and display updates are made on the main process.

How to set the max thread a python script could use when calling from shell

Have script a.py, it will run some task with multiple-thread, please noticed that I have no control over the a.py.
I'm looking for a way to limit the number of thread it could use, as I found that using more threads than my CPU core will slow down the script.
It could be something like:
python --nthread=2 a.py
or Modifying something in my OS is also acceptable .
I am using ubuntu 16.04
As requested:
the a.py is just a module in scikit-learn the MLPRegressor .
I also asked this question here.
A more general way, not specific to python:
taskset -c 1-3 python yourProgram.py
In this case the threads 1-3 (3 in total) will be used. Any parallelization invoked by your program will share those resources.
For a solution that fits your exact problem you should better identify which part of the code parellelizes. For instance, if it is due to numpy routines, you could limit it by using:
OMP_NUM_THREADS=4 python yourProgram.py
Again, the first solution is general and handled by the os, whereas the second is python (numpy) specific.
Read the threading doc, it said:
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
If you will like to take advantage of the multi-core processing better update the code to use multiprocessing module.
In case that you prefer anyway continue using threading, you have one option passing the number of threads to the application as an argument like:
python a.py --nthread=2
Then you can update the script code to limit the threads.

Categories

Resources