Why Do I have to worry about Thread Safety in CPython?

Why Do I have to worry about Thread Safety in CPython? - python

From what I understand, the Global Interpreter Lock allows only a single thread to access the interpreter and execute bytecode. If that's the case, then at any given time, only a single thread will be using the interpreter and its memory.
With that I believe that it is fair to exclude the possibility of having race cases, since no two threads can access the interpreter's memory at the same time, yet I still see warnings about making sure data structures are "thread safe". There is a possibility that it may be covering all implementations of the python interpreter (like cython) which can switch off the GIL and allow true multi threading.
I understand the importance of thread safety in interpreter environments that do not have the GIL enabled. However, for CPython, why is thread safety encouraged when writing multi threaded python code? What is the worse that can happen in the CPython environment?

Of course race conditions can still take place, because access to datastructures is not atomic.
Say you test for a key being present in a dictionary, then do something to add the key:
if key not in dictionary:
# calculate new value
value = elaborate_calculation()
dictionary[key] = value
The thread can be switched at any point after the not in test has returned true, and another thread will also come to the conclusion that the key isn't there. Now two threads are doing the calculation, and you don't know which one will win.
All that the GIL does is protect Python's internal interpreter state. This doesn't mean that data structures used by Python code itself are now locked and protected.

An important note: the multiprocessing module in Python is synchonous to some degree despite the GIL, in that access to the same variable can occur across different processes simultaneously.
This has a likelyhood of corrupting your data, or at least disrupting your control flow, which would be why thread safety is reccomended.
As to why it happens, despite there only being one interpriter, there isn't anything stopping (at least as far as I can tell) two preinterprited pieces of code accessing the same parts of the shared memory synchonously. When doing say:
import multiprocessing
def my_func ():
print("hello world")
my_process=multiprocessing.Process (target=my_func, args=(,))
my_process.start ()
my_process.join ()
My understanding is that the time it takes to interprit (in this case) my_func was buried in the overhead it takes to spawn a new process.
In this case, the term "process" is more suitable here, because there are worker threads that are temporarily spawned just to copy data, so there's some data handshaking doing on, so it's actually quite a bit of a different process (pun intended) than the spawning of a traditional thread.
I hope this helps.

Related

Make two competing functions and kill the slow one

In python, I have to fetch crypto data from binance every minute and do some calculations. For fetching data I have two functions func_a() and func_b(). They both do the same thing but in wildly different manner. Sometimes func_a is faster and sometimes func_b is faster. I want to run both the functions in parallel, if any of the function returns result to me faster, I want to kill the other one and move on (because they both are going to bring the same result).
How can I achieve this in python? Please mind that I do not want to replace these functions or their mechanics.

Python threads aren't very suitable for this purpose for two reasons:
The Python GIL means that if you spawn two CPU-bound threads, each of the two threads will run at half its normal speed (because only one thread is actually running at any given instant; the other is waiting to acquire the interpreter lock)
There is no reliable way to unilaterally kill a thread, because if you do that, any resources it had allocated will be leaked, causing major problems.
If you really want to be able to cancel a function-in-progress, then, you have two options:
Modify the function to periodically check a "please_quit" boolean variable (or whatever) and return immediately if that boolean's state has changed to True. Then your main thread can set the please_quit variable and then call join() on the thread, and rest assured that the thread will quit ASAP. (This does require that you have the ability to modify the function's implementation)
Spawn child processes instead of child threads. A child process takes more resources to launch, but it can run truly in parallel (since it has its own separate Python interpreter) and it is safe (usually) to unilaterally kill it, because the OS will automatically clean up all of the process's held resources when the process is killed.

Would setting a mutex manually improve performance?

My python program is definitely cpu bound but 40% to 55% of the time spent is performed in C code in the z3 solver (which doesn’t knows anything against the gil) where each single call to the C function (z3_optimize_check) take almost a minute to complete (so far the parallel_enable parameter still result in this function working in single thread mode and blocking the main thread).
I can’t use multiprocessing as z3_objects aren’t serializable friendly (except if someone here can prove otherwise). As they are several tasks (where each tasks adds more z3 work in a dict for other tasks), I initially set up mulithreading directly. But the Gil definitely hurts performance more than there is a benefit (especially with hyperthreading) despite the huge time spent in the solver.
But if I set up a blocking mutex manually (through threading.Lock.aquire()) in the z3py module just after the switch from C code which would allows an other thread running only if all other threads are performing solver work, would this remove the gil performance penalty (since their would be only 1 thread at time executing python code and it would always be the same one until the lock is released before z3_optimize_check)?
I mean would using threading.Lock.aquire() triggers calls to PyEval_SaveThread() as if z3 was doing it directly?

so far the parallel_enable parameter still result in this function working in single thread mode and blocking the main thread
I think you are misunderstanding that. z3 running in parallel mode means that you call it from a single Python thread, and then it spawns multiple OS-level threads for itself, doing the job, cleaning up the threads and returning the result for you. It does not miraculously enable Python running without GIL.
From the viewpoint of Python, it still does one thing at a time, and that one thing is making the call to z3. And it is holding GIL for the entire time. So if you see more than one CPU core/thread utilized while the calculation is running, that is the effect of parallel mode of z3, internally branching to multiple threads.
There is another thing, releasing GIL, like what blocking I/O operations do. It does not happen by magic, there is a call-pair for that:
PyThreadState* PyEval_SaveThread()
Release the global interpreter lock (if it has been created) and reset the thread state to NULL, returning the previous thread state (which is not NULL). If the lock has been created, the current thread must have acquired it.
void PyEval_RestoreThread(PyThreadState *tstate)
Acquire the global interpreter lock (if it has been created) and set the thread state to tstate, which must not be NULL. If the lock has been created, the current thread must not have acquired it, otherwise deadlock ensues.
These are C calls, so they are accessible for extension developers. When developers know that the code will run for a long time, without the need for accessing Python internals, PyEval_SaveThread() can be used, and then Python can proceed with other Python threads. And when the long whatever is done, the thread can re-introduce itself and apply for GIL using PyEval_RestoreThread().
But, these things happen only if developers make them happen. And with z3 it might not be the case.
To provide a direct answer to your question: no, Python code can not release GIL and keep it released, as GIL is the lock what a Python thread has to hold when it proceeds. So whenever a Python "instruction" returns, GIL is held again.
Apparently somehow I managed to not include the link I wanted to, so they are on page https://docs.python.org/3/c-api/init.html#thread-state-and-the-global-interpreter-lock (and the linked paragraph discusses what I shortly summarized).
Z3 is open source (https://github.com/Z3Prover/z3), and the source code does not contain neither PyEval_SaveThread, nor the wrapper-shortcut Py_BEGIN_ALLOW_THREADS character sequences.
But, it has a parallel Python example, btw. https://github.com/Z3Prover/z3/blob/master/examples/python/parallel.py, with
from multiprocessing.pool import ThreadPool
So I would assume that it might be tested and working with multiprocessing.

How to share an integer between threads in python

I've got 2 threads:
A worker thread, that loops looking for input from an ssh socket
A manager thread, that processes stuff from the worker thread
They use a Queue to communicate - as stuff comes in, the worker places it on the Queue if it's important, and the manager takes it off to process.
However, I'd like the manager to also know the last time anything came in - whether important or not.
My thought was that the worker could set an integer (say), and the manager could read it. But there doesn't seem to be a threading primitive that supports this.
Is it safe for the manager to just read the worker's instance variables, providing it doesn't write to them? Or will that give some shared memory issues? Is there some way I can share this state without putting all the junk stuff in the Queue?

Is it safe for the manager to just read the worker's instance
variables, providing it doesn't write to them?
Yes, this is safe in CPython. Because of the GIL, it's impossible for one thread to be reading the value of a variable while another thread is in process of writing it. This is because both operations are a single bytecode instruction, which makes them atomic - the GIL will be held for the entire instruction, so no other thread can be executing at the same time. One has to happen either before or after the other. You'll only run into issues if you have two different threads trying to do non-atomic operations on the same object (like incrementing the integer, for example). If that were the case, you'd need to use a threading.Lock() that was shared between the two threads to synchronize access to the integer.
Do note that the behavior of bytecode (and even the existence of the GIL) is considered an implementation detail, and is therefore subject to change:
CPython implementation detail: Bytecode is an implementation detail of
the CPython interpreter! No guarantees are made that bytecode will not
be added, removed, or changed between versions of Python.
So, if you want to be absolutely safe across all versions and implementations of Python, use a Lock, even though it's not actually necessary right now (and in reality, probably won't ever be) in CPython.
Using a Lock to synchronize access to a variable is very straightforward:
lock = threading.Lock()
Thread 1:
with lock:
print(shared_int) # Some read operation
# Lock is release once we leave the with block
Thread 2:
with lock:
shared_int = 55 # Some write operation

Can you race condition in Python while there is a GIL?

My understanding is that due to the Global Interpreter Lock (GIL) in cPython, only one thread can ever be executed at any one time. Does this or does this not automatically protected against race conditions, such as the lost update problem?

Due to the GIL, there is only ever one thread per process active to execute Python bytecode; the bytecode evaluation loop is protected by it.
The lock is released every sys.getswitchinterval() seconds, at which point a thread switch can take place. This means that for Python code, a thread switch can still take place, but only between byte code instructions. Any code that relies on thread safety needs to take this into account. Actions that can be done in one bytecode can be thread safe, everything else is not.
Even a single byte code instruction can trigger other Python code; for example the line object[index] can trigger a __getitem__ call on a custom class, implemented itself in Python. Thus a single BINARY_SUBSCR opcode is not necessarily thread safe, depending on the object type.

Python embedding with threads -- avoiding deadlocks?

Is there any way to embed python, allow callbacks from python to C++, allowing the Pythhon code to spawn threads, and avoiding deadlocks?
The problem is this:
To call into Python, I need to hold the GIL. Typically, I do this by getting the main thread state when I first create the interpreter, and then using PyEval_RestoreThread() to take the GIL and swap in the thread state before I call into Python.
When called from Python, I may need to access some protected resources that are protected by a separate critical section in my host. This means that Python will hold the GIL (potentially from some other thread than I initially called into), and then attempt to acquire my protection lock.
When calling into Python, I may need to hold the same locks, because I may be iterating over some collection of objects, for example.
The problem is that even if I hold the GIL when I call into Python, Python may give it up, give it to another thread, and then have that thread call into my host, expecting to take the host locks. Meanwhile, the host may take the host locks, and the GIL lock, and call into Python. Deadlock ensues.
The problem here is that Python relinquishes the GIL to another thread while I've called into it. That's what it's expected to do, but it makes it impossible to sequence locking -- even if I first take GIL, then take my own lock, then call Python, Python will call into my system from another thread, expecting to take my own lock (because it un-sequenced the GIL by releasing it).
I can't really make the rest of my system use the GIL for all possible locks in the system -- and that wouldn't even work right, because Python may still release it to another thread.
I can't really guarantee that my host doesn't hold any locks when entering Python, either, because I'm not in control of all the code in the host.
So, is it just the case that this can't be done?

"When calling into Python, I may need to hold the same locks, because I may be iterating over some collection of objects, for example."
This often indicates that a single process with multiple threads isn't appropriate. Perhaps this is a situation where multiple processes -- each with a specific object from the collection -- makes more sense.
Independent process -- each with their own pool of threads -- may be easier to manage.

The code that is called by python should release the GIL before taking any of your locks.
That way I believe it can't get into the dead-lock.

There was recently some discussion of a similar issue on the pyopenssl list. I'm afraid if I try to explain this I'm going to get it wrong, so instead I'll refer you to the problem in question.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.