n_jobs don't work in sklearn-classes - python

Does anybody use "n_jobs" of sklearn-classes? I am work with sklearn in Anaconda 3.4 64 bit. Spyder version is 2.3.8. My script can't finish its execution after setting "n_jobs" parameter of some sklearn-class to non-zero value.Why is this happening?

Several scikit-learn tools such as GridSearchCV and cross_val_score rely internally on Python’s multiprocessing module to parallelize execution onto several Python processes by passing n_jobs > 1 as argument.
Taken from Sklearn documentation:
The problem is that Python multiprocessing does a fork system call
without following it with an exec system call for performance reasons.
Many libraries like (some versions of) Accelerate / vecLib under OSX,
(some versions of) MKL, the OpenMP runtime of GCC, nvidia’s Cuda (and
probably many others), manage their own internal thread pool. Upon a
call to fork, the thread pool state in the child process is corrupted:
the thread pool believes it has many threads while only the main
thread state has been forked. It is possible to change the libraries
to make them detect when a fork happens and reinitialize the thread
pool in that case: we did that for OpenBLAS (merged upstream in master
since 0.2.10) and we contributed a patch to GCC’s OpenMP runtime (not
yet reviewed).

Related

Isolated Sub-Interpreters in Python without GIL

There are PEP-554 and PEP-684. Both are designed to support multiple interpreters on Thread level.
Does anyone know if these PEPs are implemented somewhere at least in experimental or pre-release versions of Python like 3.11?
I found out that Python 3.10 (maybe even 3.9) has these features in experimental build. If you build CPython by configuring with following flag:
./configure --with-experimental-isolated-subinterpreters
or by adding define to compile command when compiling all .c files:
#define EXPERIMENTAL_ISOLATED_SUBINTERPRETERS 1
I posted request of enabling this feature into one famous project, see issue here.
After enabling this feature as I suppose I will be able to create separate interpreters inside multiple threads (not processes), meaning that I don't need multiprocessing anymore.
More than that when using multiple Interpreters according to this feature description there is no need to have single GIL, every Interpreter in separate thread has its own GIL. It means that even if Interpreters are created inside thread, still all CPU cores are used, same like in multiprocessing. Current Python suffers from GIL only because it forces to use only single CPU core, thus multiprocessing is used by people to overcome this and use all CPU cores.
In description of these features it was said that authors had to modify by hand 1500 static and global variables by moving them all into per-thread local table inside thread state structure.
Presumably all these new features can be used right now only from Python C API.
If someone here knows how to use these isolated sub-interpreters features, can you provide some Python code or C API code with detailed example on how to use them?
Specifically I'm interested in how to use interpreters in such a way that all CPU cores are used, i.e. I want to know how to avoid single GIL, but to use multiple GILs (actually local locks, LILs). Of course I want inside threads, without using multiprocessing.

Why can tensorflow use multithread while python can only execute one thread at a time due to GIL?

I'm wondering why we can have Tensorflow run in a multi-thread fashion while python can only execute one thread at a time due to GIL?
The GIL's restriction is slightly more subtle: only one thread at a time can be executing Python bytecode.
Extensions using Python's C API (like tensorflow) can release the GIL if they don't need it. I/O operations like using files or sockets also tend to release the GIL because they generally involve lots of waiting.
So threads executing extensions or waiting for I/O can run while another thread is executing Python bytecode.
Most of the tensorflow core is written in C++ and the python APIs are just the wrappers around it. While running the C++ code regular python restrictions do not apply.

Dask on single OSX machine - is it parallel by default?

I have installed Dask on OSX Mojave. Does it execute computations in parallel by default? Or do I need to change some settings?
I am using the DataFrame API. Does that make a difference to the answer?
I installed it with pip. Does that make a difference to the answer?
Yes, Dask is parallel by default.
Unless you specify otherwise, or create a distributed Client, execution will happen with the "threaded" scheduler, in a number of threads equal to your number of cores. Note, however, that because of the python GIL (only one python instruction executed at a time), you may not get as much parallelism as available, depending on how good your specific tasks are at releasing the GIL. That is why you have a choice of schedulers.
Being on OSX, installing with pip: these make no difference. Using dataframes makes a difference in that it dictates the sorts of tasks you're likely running. Pandas is good at releasing the GIL for many operations.

What parts of the standard library can run multi cored?

I am learning multi-threaded Python (CPython). I'm aware of the GIL and how it limits threading to a single core (in most circumstances).
I know that I/O functionality can be run multi-cored, however I have been unable to find a list of what parts of the standard library can be run across multiple cores. I believe that urllib can be run multi cored, allowing downloading on a thread on a separate core (but have been unable to find confirmation of this in the docs).
What I am trying to find out is, which parts of the standard library will run multi-core, as this doesn't seem to be specified in the documentation.
Taken from the docs:
However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally-intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O.
With the multiprocessing package you can write truly parallel programs where separate processes run on different cores. There is no limitation to which libraries (standard or not) each sub-process can use.
The tricky part about multi-process programming is when the processes need to exchange information (e.g., pass each other values, wait for each other to finish with a certain task). The multiprocessing package contains several tools for that.

How do twisted and multiprocessing.Process create zombies?

In python, using twisted loopingcall, multiprocessing.Process, and multiprocessing.Queue; is it possible to create a zombie process. And, if so, then how?
A zombie is a process which has completed but whose completion has not yet been noticed by the process which started it. It's the Twisted process's responsibility to reap its children.
If you start the process with spawnProcess, everything should always work as expected. However, as described in bug #733 in Twisted (which has long been fixed), there are a plethora of nasty edge-cases when you want to use Twisted with other functions that spawn processes, as Python's API historically made it difficult to cooperate between signal handlers.
This is all fixed in recent versions of the code, but I believe you may still encounter this bug in the following conditions:
You are using a version of Twisted earlier than 10.1.
You are using a version of Python earlier than 2.6.
You are not building Twisted's native extension modules (if you're working from a development checkout or unpacked tarball rather than an installed version, you can fix this with python setup.py build_ext -i).
You are using a module like popen or subprocess.
Hopefully upgrading Twisted or running the appropriate command will fix your immediate issue, but you should still consider using spawnProcess, since that lets you treat process output as a normal event in the reactor event loop.

Categories

Resources