Are distutils and setuptools thread-safe? - python

Does anybody know if I can safely compile multiple extensions concurrently with threading?
I realise this might not speed things up (although compilers are run in subprocesses, so maybe!), but I'm in a situation where GUI actions can start a simulation which may involve a compilation step, so I'd like to know if I need to prevent multiple simulations from compiling at the same time, or if this is fine

Not only they're not thread-safe — they're not process-safe: you cannot call setup() multiple times in one process. See an example of errors from this.
To work around the limitation you have to run python setup.py or pip in a subprocess and then the question of thread-safety no longer applies.

Related

Isolated Sub-Interpreters in Python without GIL

There are PEP-554 and PEP-684. Both are designed to support multiple interpreters on Thread level.
Does anyone know if these PEPs are implemented somewhere at least in experimental or pre-release versions of Python like 3.11?
I found out that Python 3.10 (maybe even 3.9) has these features in experimental build. If you build CPython by configuring with following flag:
./configure --with-experimental-isolated-subinterpreters
or by adding define to compile command when compiling all .c files:
#define EXPERIMENTAL_ISOLATED_SUBINTERPRETERS 1
I posted request of enabling this feature into one famous project, see issue here.
After enabling this feature as I suppose I will be able to create separate interpreters inside multiple threads (not processes), meaning that I don't need multiprocessing anymore.
More than that when using multiple Interpreters according to this feature description there is no need to have single GIL, every Interpreter in separate thread has its own GIL. It means that even if Interpreters are created inside thread, still all CPU cores are used, same like in multiprocessing. Current Python suffers from GIL only because it forces to use only single CPU core, thus multiprocessing is used by people to overcome this and use all CPU cores.
In description of these features it was said that authors had to modify by hand 1500 static and global variables by moving them all into per-thread local table inside thread state structure.
Presumably all these new features can be used right now only from Python C API.
If someone here knows how to use these isolated sub-interpreters features, can you provide some Python code or C API code with detailed example on how to use them?
Specifically I'm interested in how to use interpreters in such a way that all CPU cores are used, i.e. I want to know how to avoid single GIL, but to use multiple GILs (actually local locks, LILs). Of course I want inside threads, without using multiprocessing.

Python and Threads with PyPy?

I have a kivy application in python which uses some threads.
As python is not able to run these threads on different Cores due to the Global Interpreter Lock, I would have liked to try to use PyPy for it and see if I can make the threads run faster of different cores since PyPy is different and offers stackless (what ever that is? :).
Does somebody have some information to share on how to make a simple python program, which launches some threads by the module threading, running with the pypy interpreter such that it uses this stackless feature?
Pypy won't resolve Python problems of running a single-thread each time, since it also makes use of the GIL - http://doc.pypy.org/en/latest/faq.html#does-pypy-have-a-gil-why
Besides that, Kivy is a complex project embedding Python itself - although I don't know it very well, I doubt it is possible to switch the Python used in it for Pypy.
Depending on what you are doing, you may want to use the multiprocessing module instead of threading - it is a drop-in replacement that will make transparent inter-process calls to Python functions, and can therefore take advantage of multiple-cores.
https://docs.python.org/3/library/multiprocessing.html
This is standard in cPython and can likely be used from within Kivy, if (and only if) all code in the subprocess just take care of number-crunching, and so on, and all user interaction and display updates are made on the main process.

How to set the max thread a python script could use when calling from shell

Have script a.py, it will run some task with multiple-thread, please noticed that I have no control over the a.py.
I'm looking for a way to limit the number of thread it could use, as I found that using more threads than my CPU core will slow down the script.
It could be something like:
python --nthread=2 a.py
or Modifying something in my OS is also acceptable .
I am using ubuntu 16.04
As requested:
the a.py is just a module in scikit-learn the MLPRegressor .
I also asked this question here.
A more general way, not specific to python:
taskset -c 1-3 python yourProgram.py
In this case the threads 1-3 (3 in total) will be used. Any parallelization invoked by your program will share those resources.
For a solution that fits your exact problem you should better identify which part of the code parellelizes. For instance, if it is due to numpy routines, you could limit it by using:
OMP_NUM_THREADS=4 python yourProgram.py
Again, the first solution is general and handled by the os, whereas the second is python (numpy) specific.
Read the threading doc, it said:
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
If you will like to take advantage of the multi-core processing better update the code to use multiprocessing module.
In case that you prefer anyway continue using threading, you have one option passing the number of threads to the application as an argument like:
python a.py --nthread=2
Then you can update the script code to limit the threads.

True multithreading with boost.python

I'm trying to test a multi-threaded C++ DLL. This DLL is supposed to be thread-safe. I have it wrapped with boost.python, and I'd like to create multiple python threads to exercise the DLL through the boost.python wrapper. I'm actually trying to cause threading problems.
What I can't seem to find good documentation on is whether the python interpreter will support two of its threads (on different cores, say) calling into an imported module concurrently, and whether the GIL needs tending at all given that I don't want any added safety above what the DLL is supposed to provide.
Can anyone describe or refer me to a description of python calling DLL modules from multiple threads and how the GIL is suppsed to be used in this case?
How to release GIL when calling a C++ function from Python via Boost.Pyhton:
http://wiki.python.org/moin/boost.python/HowTo#Multithreading_Support_for_my_function
The answer is no, the GIL will never truly multi-thread unless the DLL manually releases the lock. Python allows exactly one thread to run at a time unless the extension manually says, "I'm blocked, carry on without me." This is commonly done with the Py_BEGIN_ALLOW_THREADS macro (and undone with Py_END_ALLOW_THREADS) defined in python's include/ceval.h. Once an extension does this, python will allow another thread to run, and the first thread doing any python stuff will likely cause problems (as the comment question notes.) It's really meant for blocking on I/O or going into heavy compute time.

How do twisted and multiprocessing.Process create zombies?

In python, using twisted loopingcall, multiprocessing.Process, and multiprocessing.Queue; is it possible to create a zombie process. And, if so, then how?
A zombie is a process which has completed but whose completion has not yet been noticed by the process which started it. It's the Twisted process's responsibility to reap its children.
If you start the process with spawnProcess, everything should always work as expected. However, as described in bug #733 in Twisted (which has long been fixed), there are a plethora of nasty edge-cases when you want to use Twisted with other functions that spawn processes, as Python's API historically made it difficult to cooperate between signal handlers.
This is all fixed in recent versions of the code, but I believe you may still encounter this bug in the following conditions:
You are using a version of Twisted earlier than 10.1.
You are using a version of Python earlier than 2.6.
You are not building Twisted's native extension modules (if you're working from a development checkout or unpacked tarball rather than an installed version, you can fix this with python setup.py build_ext -i).
You are using a module like popen or subprocess.
Hopefully upgrading Twisted or running the appropriate command will fix your immediate issue, but you should still consider using spawnProcess, since that lets you treat process output as a normal event in the reactor event loop.

Categories

Resources