the threading module in Python provides two kinds of locks: A common lock and a reentrant lock. It seems to me, that if I need a lock, I should always prefer the RLock over the Lock; mainly to prevent deadlock situations.
Besides that, I see two points, when to prefer a Lock over a RLock:
RLock has a more complicated internal structure and may therefore have worse performance.
Due to some reason, I want to prevent a thread recursing through the lock.
Is my reasoning correct? Can you point out other aspects?
Two points:
In officially released Python versions (2.4, 2.5... up to 3.1), an RLock is much slower than a Lock, because Locks are implemented in C and RLocks in Python (this will change in 3.2)
A Lock can be released from any thread (not necessarily the thread which acquire()d it), while an RLock has to be released by the same thread which acquired it
Bottom line, I'd suggest to only use an RLock if it matches the semantics you are looking for, otherwise stick to Locks by default.
Normally you should structure your code such that you never need to recursively lock in normal operation (basically it forces you to use locks tightly around the protected datastructures they are are protecting). Therefore you want to catch an anomalous recursive locking.
Related
Can you only call notify() when you have the underlying lock of the Condition object. In the documentation it makes it clear that for notify_all() you need the lock. Is this also the case for notify() or is it safe to call notify() on a Condition without holding the underlying lock.
Also whatever the answer is, is this specific to Python or is it a truth about the Condition Variable semantics in all/most languages.
Thanks!
From the docs:
If the calling thread has not acquired the lock when this method is called, a RuntimeError is raised.
This is generally true in all languages. There's several reasons for this. One is that condition variable code generally looks something like:
while not some_predicate:
condition_vaiable.wait()
The some_predicate tells you when something you care about has happened. But if you don't hold the lock when you check if some_predicate is true then some other thread may have modified it in between the wait returning and you checking the predicate. In addition you are generally waiting on a condition to be true in order to do something with a shared resource. For example, you might be waiting for a signal that a result is available and you should process it. But if that result isn't protected by a mutex it isn't safe to access it. You might think that if only 1 thread is notified that that object is ready then only 1 thread would process it but there's some issues with that:
A mutex generally does more than just protect a resource; it also ensures that memory that was held in registers must be flushed to RAM so that it's visible to other CPUs where other threads might be running (this isn't such an issue for Python due to the GIL but in other languages it's very important).
Compilers and CPUs can re-order instructions for efficiency but a mutex ensures that whatever happened before a mutex release in the code is actually executed before the mutex release in the optimized code and in the CPU pipeline.
It's actually hard to ensure that exactly one thread is signaled by a notify so it's possible for more than one thread to be awakened (at least in some implementations) so you do, in fact, need to make sure that the mutex is held.
From what I understand, the Global Interpreter Lock allows only a single thread to access the interpreter and execute bytecode. If that's the case, then at any given time, only a single thread will be using the interpreter and its memory.
With that I believe that it is fair to exclude the possibility of having race cases, since no two threads can access the interpreter's memory at the same time, yet I still see warnings about making sure data structures are "thread safe". There is a possibility that it may be covering all implementations of the python interpreter (like cython) which can switch off the GIL and allow true multi threading.
I understand the importance of thread safety in interpreter environments that do not have the GIL enabled. However, for CPython, why is thread safety encouraged when writing multi threaded python code? What is the worse that can happen in the CPython environment?
Of course race conditions can still take place, because access to datastructures is not atomic.
Say you test for a key being present in a dictionary, then do something to add the key:
if key not in dictionary:
# calculate new value
value = elaborate_calculation()
dictionary[key] = value
The thread can be switched at any point after the not in test has returned true, and another thread will also come to the conclusion that the key isn't there. Now two threads are doing the calculation, and you don't know which one will win.
All that the GIL does is protect Python's internal interpreter state. This doesn't mean that data structures used by Python code itself are now locked and protected.
An important note: the multiprocessing module in Python is synchonous to some degree despite the GIL, in that access to the same variable can occur across different processes simultaneously.
This has a likelyhood of corrupting your data, or at least disrupting your control flow, which would be why thread safety is reccomended.
As to why it happens, despite there only being one interpriter, there isn't anything stopping (at least as far as I can tell) two preinterprited pieces of code accessing the same parts of the shared memory synchonously. When doing say:
import multiprocessing
def my_func ():
print("hello world")
my_process=multiprocessing.Process (target=my_func, args=(,))
my_process.start ()
my_process.join ()
My understanding is that the time it takes to interprit (in this case) my_func was buried in the overhead it takes to spawn a new process.
In this case, the term "process" is more suitable here, because there are worker threads that are temporarily spawned just to copy data, so there's some data handshaking doing on, so it's actually quite a bit of a different process (pun intended) than the spawning of a traditional thread.
I hope this helps.
I've got 2 threads:
A worker thread, that loops looking for input from an ssh socket
A manager thread, that processes stuff from the worker thread
They use a Queue to communicate - as stuff comes in, the worker places it on the Queue if it's important, and the manager takes it off to process.
However, I'd like the manager to also know the last time anything came in - whether important or not.
My thought was that the worker could set an integer (say), and the manager could read it. But there doesn't seem to be a threading primitive that supports this.
Is it safe for the manager to just read the worker's instance variables, providing it doesn't write to them? Or will that give some shared memory issues? Is there some way I can share this state without putting all the junk stuff in the Queue?
Is it safe for the manager to just read the worker's instance
variables, providing it doesn't write to them?
Yes, this is safe in CPython. Because of the GIL, it's impossible for one thread to be reading the value of a variable while another thread is in process of writing it. This is because both operations are a single bytecode instruction, which makes them atomic - the GIL will be held for the entire instruction, so no other thread can be executing at the same time. One has to happen either before or after the other. You'll only run into issues if you have two different threads trying to do non-atomic operations on the same object (like incrementing the integer, for example). If that were the case, you'd need to use a threading.Lock() that was shared between the two threads to synchronize access to the integer.
Do note that the behavior of bytecode (and even the existence of the GIL) is considered an implementation detail, and is therefore subject to change:
CPython implementation detail: Bytecode is an implementation detail of
the CPython interpreter! No guarantees are made that bytecode will not
be added, removed, or changed between versions of Python.
So, if you want to be absolutely safe across all versions and implementations of Python, use a Lock, even though it's not actually necessary right now (and in reality, probably won't ever be) in CPython.
Using a Lock to synchronize access to a variable is very straightforward:
lock = threading.Lock()
Thread 1:
with lock:
print(shared_int) # Some read operation
# Lock is release once we leave the with block
Thread 2:
with lock:
shared_int = 55 # Some write operation
Note: My education on this topic is lacking, so I may be making some naive assumptions.
Assume you have a function performing blocking I/O. You need to run this function n times.
If you were to simply spawn n threads (using the threading module) and start them at the same time, would it work to simply use the GIL to manage the threads (based on I/O) as opposed to using the multiprocessing.pool module to manage subprocesses?
It's bad practice to use an implementation detail as a core feature of your code. The GIL is an implementation detail of CPython, and doesn't exist in other implementations.
Use things that are designed to do what you want.
How is the GIL even relevant here? What are you expecting to get out of it?
You can spawn n threads and have them all perform blocking I/O, without a GIL.
And if you want to "manage" the threads—e.g., join the all so you know when you're done—you still need to do that explicitly; the GIL doesn't help.
My main question is does the Threading lock object create atomic locks? It doesn't say that the lock is atomic in the module documentation. in pythons mutex documentation it does say the mutex lock is atomic but it seems that I read somewhere that in fact it isn't. I am wondering if someone could could give me a bit of insight on this mater. Which lock should I use. I am currently running my scripts using python 2.4
Locks of any nature would be rather useless if they weren't atomic - the whole point of the lock is to allow for higher-level atomic operations.
All of threading's synchronization objects (locks, rlocks, semaphores, boundedsemaphores) utilize atomic instructions, as do mutexes.
You should use threading, since mutex is actually deprecated going forward (and removed in Python 3).