Garbage-collect a lock once no threads are asking for it

Garbage-collect a lock once no threads are asking for it - python

I have a function that must never be called with the same value simultaneously from two threads. To enforce this, I have a defaultdict that spawns new threading.Locks for a given key. Thus, my code looks similar to this:
from collections import defaultdict
import threading
lock_dict = defaultdict(threading.Lock)
def f(x):
with lock_dict[x]:
print "Locked for value x"
The problem is that I cannot figure out how to safely delete the lock from the defaultdict once its no longer needed. Without doing this, my program has a memory leak that becomes noticeable when f is called with many different values of x.
I cannot simply del lock_dict[x] at the end of f, because in the scenario that another thread is waiting for the lock, then the second thread will lock a lock that's no longer associated with lock_dict[x], and thus two threads could end up simultaneously calling f with the same value of x.

I'd use a different approach:
fcond = threading.Condition()
fargs = set()
def f(x):
with fcond:
while x in fargs:
fcond.wait()
fargs.add(x) # this thread has exclusive rights to use `x`
# do useful stuff with x
# any other thread trying to call f(x) will
# block in the .wait above()
with fcond:
fargs.remove(x) # we're done with x
fcond.notify_all() # let blocked threads (if any) proceed
Conditions have a learning curve, but once it's climbed they make it much easier to write correct thread-safe, race-free code.
Thread safety of the original code
#JimMischel asked in a comment whether the orignal's use of defaultdict was subject to races. Good question!
The answer is - alas - "you'll have to stare at your specific Python's implementation".
Assuming the CPython implementation: if any of the code invoked by defaultdict to supply a default invokes Python code, or C code that releases the GIL (global interpreter lock), then 2 (or more) threads could "simultaneously" invoke withlock_dict[x] with the same x not already in the dict, and:
Thread 1 sees that x isn't in the dict, gets a lock, then loses its timeslice (before setting x in the dict).
Thread 2 sees that x isn't in the dict, and also gets a lock.
One of those thread's locks ends up in the dict, but both threads execute f(x).
Staring at the source for 3.4.0a4+ (the current development head), defaultdict and threading.Lock are both implemented by C code that doesn't release the GIL. I don't recall whether earlier versions did or didn't, at various times, implement all or parts of defaultdict or threading.Lock in Python.
My suggested alternative code is full of stuff implemented in Python (all threading.Condition methods), but is race-free by design - even if you're using an old version of Python with sets also implemented in Python (the set is only accessed under the protection of the condition variable's lock).
One lock per argument
Without conditions, this seems to be much harder. In the original approach, I believe you need to keep a count of threads wanting to use x, and you need a lock to protect those counts and to protect the dictionary. The best code I've come up with for that is so long-winded that it seems sanest to put it in a context manager. To use, create an argument locker per function that needs it:
farglocker = ArgLocker() # for function `f()`
and then the body of f() can be coded simply:
def f(x):
with farglocker(x):
# only one thread at a time can run with argument `x`
Of course the condition approach could also be wrapped in a context manager. Here's the code:
import threading
class ArgLocker:
def __init__(self):
self.xs = dict() # maps x to (lock, count) pair
self.lock = threading.Lock()
def __call__(self, x):
return AllMine(self.xs, self.lock, x)
class AllMine:
def __init__(self, xs, lock, x):
self.xs = xs
self.lock = lock
self.x = x
def __enter__(self):
x = self.x
with self.lock:
xlock = self.xs.get(x)
if xlock is None:
xlock = threading.Lock()
xlock.acquire()
count = 0
else:
xlock, count = xlock
self.xs[x] = xlock, count + 1
if count: # x was already known - wait for it
xlock.acquire()
assert xlock.locked
def __exit__(self, *args):
x = self.x
with self.lock:
xlock, count = self.xs[x]
assert xlock.locked
assert count > 0
count -= 1
if count:
self.xs[x] = xlock, count
else:
del self.xs[x]
xlock.release()
So which way is better? Using conditions ;-) That way is "almost obviously correct", but the lock-per-argument (LPA) approach is a bit of a head-scratcher. The LPA approach does have the advantage that when a thread is done with x, the only threads allowed to proceed are those wanting to use the same x; using conditions, the .notify_all() wakes all threads blocked waiting on any argument. But unless there's very heavy contention among threads trying to use the same arguments, this isn't going to matter much: using conditions, the threads woken up that aren't waiting on x stay awake only long enough to see that x in fargs is true, and then immediately block (.wait()) again.

Related

Do we ever need to synchronise threads in python?

According to GIL wiki it states that
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe.
When multiple threads tries to do some operation on a shared variable at same time we need to synchronise the threads to avoid Race Conditions. We achieve this by acquiring a lock.
But since python uses GIL only one thread is allowed to execute python's byte code, so this problem should be never faced in case of python programs - is what I thought :( .But I saw an article about thread synchronisation in python where we have a code snippet that is causing race conditions.
https://www.geeksforgeeks.org/multithreading-in-python-set-2-synchronization/
Can someone please explain me how this is possible?
Code
import threading
# global variable x
x = 0
def increment():
"""
function to increment global variable x
"""
global x
x += 1
def thread_task():
"""
task for thread
calls increment function 100000 times.
"""
for _ in range(100000):
increment()
def main_task():
global x
# setting global variable x as 0
x = 0
# creating threads
t1 = threading.Thread(target=thread_task)
t2 = threading.Thread(target=thread_task)
# start threads
t1.start()
t2.start()
# wait until threads finish their job
t1.join()
t2.join()
if __name__ == "__main__":
for i in range(10):
main_task()
print("Iteration {0}: x = {1}".format(i,x))
Output:
Iteration 0: x = 175005
Iteration 1: x = 200000
Iteration 2: x = 200000
Iteration 3: x = 169432
Iteration 4: x = 153316
Iteration 5: x = 200000
Iteration 6: x = 167322
Iteration 7: x = 200000
Iteration 8: x = 169917
Iteration 9: x = 153589

Only one thread at a time can execute bytecode. That ensures memory allocation, and primitive objects like lists, dicts and sets are always consistent without the need for any explicit control on the Python side of the code.
However, the += 1, integers being imutable objects, is not atomic: it fetches the previous value in the same variable, creates (or gets a reference to) a new object, which is the result of the operation, and then stores that value in the original global variable. The bytecode for that can be seen with the help of the dis module:
In [2]: import dis
In [3]: global counter
In [4]: counter = 0
In [5]: def inc():
...: global counter
...: counter += 1
...:
In [6]: dis.dis(inc)
1 0 RESUME 0
3 2 LOAD_GLOBAL 0 (counter)
14 LOAD_CONST 1 (1)
16 BINARY_OP 13 (+=)
20 STORE_GLOBAL 0 (counter)
22 LOAD_CONST 0 (None)
24 RETURN_VALUE
And the running thread can change arbitrarily between each of these bytecode instructions.
So, for this kind of concurrency, one has to resort to, as in lower level code, to a lock -the inc function should be like this:
In [7]: from threading import Lock
In [8]: inc_lock = Lock()
In [9]: def inc():
...: global counter
...: with inc_lock:
...: counter += 1
...:
So, this will ensure no other thread will run bytecode while performing the whole counter += 1 part.
(The disassemble here would be significantly lengthier, but it has to do with the semantics of the with block, not with the lock, so, not related to the problem we are looking at. The lock can be acquired through other means as well - a with block is just the most convenient.)
Also, this is one of the greatest advantages of async code when compared to threaded parallelism: in async code one's Python code will always run without being interrupted unless there is an explicit deferring of the flow to the controlling loop - by using an await or one of the various async <command> patterns.

since python uses GIL only one thread is allowed to execute python's byte code
All of the threads in a Python program must be able to execute byte codes, but at any one moment in time, only one thread can have a lock on the GIL. The threads in a program continually take turns locking and unlocking it as needed.
When multiple threads tries to do some operation on a shared variable at same time we need to synchronise the threads to avoid Race Conditions.
"Race condition" is kind of a low-level idea. There is a higher-level way to understand why we need mutexes.
Threads communicate through shared variables. Imagine, we're selling seats to a show in a theater. We've got a list of seats that already have been sold, and we've got a list of seats that still are available, and we've got some number of pending transactions which have seats "on-hold." At any given instant in time, if we count all of the seats in all of those different places, they'd better add up to the number of seats in the theater—a constant number.
Computer scientists call that property an invariant. The number of seats in the theater never varies, and we always want all the seats that we know about to add up to that number.
The problem is, how can you sell a seat without breaking the invariant? You can't. You can't write code that moves a seat from one category to another in a single, atomic operation. Computer hardware doesn't have an operation for that. We have to use a sequence of simpler operations to move some object from one list to another. And, if one thread tries to count the seats while some other thread is half-way done performing that sequence, then the first thread will get the wrong number of seats. The invariant is "broken."
Mutexes solve the problem. If every thread that can temporarily break the invariant only ever does it while keeping a certain mutex locked, and if every other thread that cares about the invariant only ever checks it while keeping the same mutex locked, then no thread will ever see the broken invariant other than the one thread that is doing it on purpose.
You can talk about "invariants," or you can talk about "race conditions," but which feels right, depends on the complexity of the program. If it's a complicated system, then it often makes sense to describe the need for a mutex at a high level—by describing the invariant that the mutex protects. If it's a really simple problem (e.g., like incrementing a counter) then it feels better to talk about the "race condition" that the mutex averts. But they're really just two different ways of thinking about the same thing.

Python 2.7 - atomically add dict entry only if it doesn't exist? [duplicate]

Is accessing/changing dictionary values thread-safe?
I have a global dictionary foo and multiple threads with ids id1, id2, ... , idn. Is it OK to access and change foo's values without allocating a lock for it if it's known that each thread will only work with its id-related value, say thread with id1 will only work with foo[id1]?

Assuming CPython: Yes and no. It is actually safe to fetch/store values from a shared dictionary in the sense that multiple concurrent read/write requests won't corrupt the dictionary. This is due to the global interpreter lock ("GIL") maintained by the implementation. That is:
Thread A running:
a = global_dict["foo"]
Thread B running:
global_dict["bar"] = "hello"
Thread C running:
global_dict["baz"] = "world"
won't corrupt the dictionary, even if all three access attempts happen at the "same" time. The interpreter will serialize them in some undefined way.
However, the results of the following sequence is undefined:
Thread A:
if "foo" not in global_dict:
global_dict["foo"] = 1
Thread B:
global_dict["foo"] = 2
as the test/set in thread A is not atomic ("time-of-check/time-of-use" race condition). So, it is generally best, if you lock things:
from threading import RLock
lock = RLock()
def thread_A():
with lock:
if "foo" not in global_dict:
global_dict["foo"] = 1
def thread_B():
with lock:
global_dict["foo"] = 2

The best, safest, portable way to have each thread work with independent data is:
import threading
tloc = threading.local()
Now each thread works with a totally independent tloc object even though it's a global name. The thread can get and set attributes on tloc, use tloc.__dict__ if it specifically needs a dictionary, etc.
Thread-local storage for a thread goes away at end of thread; to have threads record their final results, have them put their results, before they terminate, into a common instance of Queue.Queue (which is intrinsically thread-safe). Similarly, initial values for data a thread is to work on could be arguments passed when the thread is started, or be taken from a Queue.
Other half-baked approaches, such as hoping that operations that look atomic are indeed atomic, may happen to work for specific cases in a given version and release of Python, but could easily get broken by upgrades or ports. There's no real reason to risk such issues when a proper, clean, safe architecture is so easy to arrange, portable, handy, and fast.

Since I needed something similar, I landed here. I sum up your answers in this short snippet :
#!/usr/bin/env python3
import threading
class ThreadSafeDict(dict) :
def __init__(self, * p_arg, ** n_arg) :
dict.__init__(self, * p_arg, ** n_arg)
self._lock = threading.Lock()
def __enter__(self) :
self._lock.acquire()
return self
def __exit__(self, type, value, traceback) :
self._lock.release()
if __name__ == '__main__' :
u = ThreadSafeDict()
with u as m :
m[1] = 'foo'
print(u)
as such, you can use the with construct to hold the lock while fiddling in your dict()

The GIL takes care of that, if you happen to be using CPython.
global interpreter lock
The lock used by Python threads to assure that only one thread executes in the CPython virtual machine at a time. This simplifies the CPython implementation by assuring that no two processes can access the same memory at the same time. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines. Efforts have been made in the past to create a “free-threaded” interpreter (one which locks shared data at a much finer granularity), but so far none have been successful because performance suffered in the common single-processor case.
See are-locks-unnecessary-in-multi-threaded-python-code-because-of-the-gil.

How it works?:
>>> import dis
>>> demo = {}
>>> def set_dict():
... demo['name'] = 'Jatin Kumar'
...
>>> dis.dis(set_dict)
2 0 LOAD_CONST 1 ('Jatin Kumar')
3 LOAD_GLOBAL 0 (demo)
6 LOAD_CONST 2 ('name')
9 STORE_SUBSCR
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
Each of the above instructions is executed with GIL lock hold and STORE_SUBSCR instruction adds/updates the key+value pair in a dictionary. So you see that dictionary update is atomic and hence thread safe.

An easily refreshable Queue for Python Threading

I would like to find a mechanism to easily report the progress of a Python thread. For example, if my thread had a counter, I would like to know the value of the counter once in awhile, but, importantly, I only need to know the latest value, not every value that's ever gone by.
What I imagine to be the simplest solution is a single value Queue, where every time I put a new value on in the thread, it replaces the old value with the new one. Then when I do a get in the main program, it would only return the latest value.
Because I don't know how to do the above, instead what I do is put every counter value in a queue, and when I get, I get all the values until there are no more, and just keep the last. But this seems far from ideal, in that I'm filling the queues with thousands of values the I don't care about.
Here's an example of what I do now:
from threading import Thread
from Queue import Queue, Empty
from time import sleep
N = 1000
def fast(q):
count = 0
while count<N:
sleep(.02)
count += 1
q.put(count)
def slow(q):
while 1:
sleep(5) # sleep for a long time
# read last item in queue
val = None
while 1: # read all elements of queue, only saving last
try:
val = q.get(block=False)
except Empty:
break
print val # the last element read from the queue
if val==N:
break
if __name__=="__main__":
q = Queue()
fast_thread = Thread(target=fast, args=(q,))
fast_thread.start()
slow(q)
fast_thread.join()
My question is, is there a better approach?

Just use a global variable and a threading.Lock to protect it during assignments:
import threading
from time import sleep
N = 1000
value = 0
def fast(lock):
global value
count = 0
while count<N:
sleep(.02)
count += 1
with lock:
value = count
def slow():
while 1:
sleep(5) # sleep for a long time
print value # read current value
if value == N:
break
if __name__=="__main__":
lock = threading.Lock()
fast_thread = threading.Thread(target=fast, args=(lock,))
fast_thread.start()
slow()
fast_thread.join()
yields (something like)
249
498
747
997
1000
As Don Question points out, if there is only one thread modifying value, then
actually no lock is needed in the fast function. And as dano points out, if you want to
ensure that the value printed in slow is the same value used in the
if-statement, then a lock is needed in the slow function.
For more on when locks are needed, see Thread Synchronization Mechanisms in Python.

Just use a deque with a maximum length of 1. It will just keep your latest value.
So, instead of:
q = Queue()
use:
from collections import deque
q = deque(maxlen=1)
To read from the deque, there's no get method, so you'll have to do something like:
val = None
try:
val = q[0]
except IndexError:
pass

In your special case, you may over-complicate the issue. If your variable is just some kind of progress-indenticator of a single thread, and only this thread actually changes the variable, then it's completely safe to use a shared object to communicate the progress as long as all other threads do only read.
I guess we all read to many (rightfully) warnings about race-conditions and other pitfalls of shared states in concurrent programming, so we tend to overthink and add more precaution then is sometimes needed.
You could basically share a pre-constructed dict:
thread_progress = dict.fromkeys(list_of_threads, progress_start_value)
or manually:
thread_progress = {thread: progress_value, ...}
without further precaution as long as no thread changes the dict-keys.
This way you can track the progress of multiple threads over one dict. Only condition is to not change the dict once the threading started. Which means the dict must contain all threads BEFORE the first child-thread starts, else you must use a Lock, before writing to the dict. With "changing the dict" i mean all operation regarding the keys. You may change the associated values of a key, because that's in the next level of indirection.
Update:
The underlying problem is the shared state. Which is already a problem in linear Programs, but a nightmare in concurrent.
For example: Imagine a global (shared) variable sv and two functions G(ood) and B(ad) in a linear program. Both function calculate a result depending on sv, but B unintentionally changes sv. Now you are wondering why the heck G doesn't do what it should do, despite not finding any error in your function G, even after you tested it isolated and it was perfectly fine.
Now imagine the same scenario in a concurrent program, with two Threads A and B. Both Threads increment the shared state/variable sv by one.
without locking (current value of sv in parenthesis):
sv = 0
A reads sv (0)
B reads sv (0)
A inc sv (0)
B inc sv (0)
A writes sv (1)
B writes sv (1)
sv == 1 # should be 2!
To find the source of the problem is a pure nightmare! Because it could also succeed sometimes. More often than not A actually would succeed to finish, before B even starts to read sv, but now your problem just seems to behave non-deterministic or erratic and is even harder to find. In contrast to my linear example, both threads are "good", but nevertheless behave not as intentioned.
with locking:
sv = 0
l = lock (for access on sv)
A tries to aquire lock for sv -> success (0)
B tries to aquire lock for sv -> failure, blocked by A (0)
A reads sv (0)
B blocked (0)
A inc sv (0)
B blocked (0)
A writes sv (1)
B blocked (1)
A releases lock on sv (1)
B tries to aquire lock for sv -> success (1)
...
sv == 2
I hope my little example explained the underlying problem of accessing a shared state and
why making write operations (including the read operation) atomic through locking is necessary.
Regarding my advice of a pre-initialized dict: This is a mere precaution because of two reasons:
if you iterate over the threads in a for-loop, the loop may raise an
exception if a thread adds or removes an entry to/from the dict
while still in the loop, because it now is unclear what the next key
should be.
Thread A reads the dict and gets interrupted by Thread B which adds
an entry and finishes. Thread A resumes, but doesn't have the dict
Thread B changed and writes the pre-B together with it's own changes
back. Thread Bs changes are lost.
BTW my proposed solution wouldn't work atm, because of the immutability of the primitive types. But this could be easily fixed by making them mutable, e.g. by encapsulating them into a list or an special Progress-Object, or even simpler: give the thread-function access to the thread_progress dict .
Explanation by example:
t = Thread()
progress = 0 # progress points to the object `0`
dict[t] = progress # dict[t] points now to object `0`
progress = 1 # progress points to object `1`
dict[t] # dict[t] still points to object `0`
better:
t = Thread()
t.progress = 0
dict[thread_id] = t
t.progress = 1
dict[thread_id].progress == 1

Python: update argument in thread

I was wondering if it would be possible to start a new thread and update its argument when this argument gets a new value in the main of the program, so something like this:
i = 0
def foo(i):
print i
time.sleep(5)
thread.start_new_thread(foo,(i,))
while True:
i = i+1
Thanks a lot for any help!

An argument is just a value, like anything else. Passing the value just makes a new reference to the same value, and if you mutate that value, every reference will see it.
The fact that both the global variable and the function parameter have the same name isn't relevant here, and is a little confusing, so I'm going to rename one of them. Also, your foo function only does that print once (possibly before you even increment the value), then sleeps for 5 seconds, then finishes. You probably wanted a loop there; otherwise, you can't actually tell whether things are working or not.
So, here's an example:
i = []
def foo(j):
while True:
print j
time.sleep(5)
thread.start_new_thread(foo,(i,))
while True:
i.append(1)
So, why doesn't your code work? Well, i = i+1 isn't mutating the value 0, it's assigning a new value, 0 + 1, to i. The foo function still has a reference to the old value, 0, which is unchanged.
Since integers are immutable, you can't directly solve this problem. But you can indirectly solve it very easily: replace the integer with some kind of wrapper that is mutable.
For example, you can write an IntegerHolder class with set and get methods; when you i.set(i.get() + 1), and the other reference does i.get(), it will see the new value.
Or you can just use a list as a holder. Lists are mutable, and hold zero or more elements. When you do i[0] = i[0] + 1, that replaces i[0] with a new integer value, but i is still the same list value, and that's what the other reference is pointing at. So:
i = [0]
def foo(j):
print j[0]
time.sleep(5)
thread.start_new_thread(foo,(i,))
while True:
i[0] = i[0]+1
This may seem a little hacky, but it's actually a pretty common Python idiom.
Meanwhile, the fact that foo is running in another thread creates another problem.
In theory, threads run simultaneously, and there's no ordering of any data accesses between them. Your main thread could be running on core 0, and working on a copy of i that's in core 0's cache, while your foo thread is running on core 1, and working on a different copy of i that's in core 1's cache, and there is nothing in your code to force the caches to get synchronized.
In practice, you will often get away with this, especially in CPython. But to actually know when you can get away with it, you have to learn how the Global Interpreter Lock works, and how the interpreter handles variables, and (in some cases) even how your platform's cache coherency and your C implementation's memory model and so on work. So, you shouldn't rely on it. The right thing to do is to use some kind of synchronization mechanism to guard access to i.
As a side note, you should also almost never use thread instead of threading, so I'm going to switch that as well.
i = []
lock = threading.Lock()
def foo(j):
while True:
with lock:
print j[0]
time.sleep(5)
t = threading.Thread(target=foo, args=(i,))
t.start()
while True:
with lock:
i[0] = i[0]+1
One last thing: If you create a thread, you need to join it later, or you can't quit cleanly. But your foo thread never exits, so if you try to join it, you'll just block forever.
For simple cases like this, there's a simple solution. Before calling t.start(), do t.daemon = True. This means when your main thread quits, the background thread will be automatically killed at some arbitrary point. That's obviously a bad thing if it's, say, writing to a file or a database. But in your case, it's not doing anything persistent or dangerous.
For more realistic cases, you generally want to create some way to signal between the two threads. Often you've already got something for the thread to wait on—a Queue, a file object or collection of them (via select), etc. If not, just create a flag variable protected by a lock (or condition or whatever is appropriate).

Try globals.
i = 0
def foo():
print i
time.sleep(5)
thread.start_new_thread(foo,())
while True:
i = i+1
You could also pass a hash holding the variables you need.
args = {'i' : 0}
def foo(args):
print args['i']
time.sleep(5)
thread.start_new_thread(foo,(args,))
while True:
args['i'] = arg['i'] + 1
You might also want to use a thread lock.
import thread
lock = thread.allocate_lock()
args = {'i' : 0}
def foo(args):
with lock:
print args['i']
time.sleep(5)
thread.start_new_thread(foo,(args,))
while True:
with lock:
args['i'] = arg['i'] + 1
Hoped this helped.

Python Locking Implementation (with threading module)

This is probably a rudimentary question, but I'm new to threaded programming in Python and am not entirely sure what the correct practice is.
Should I be creating a single lock object (either globally or being passed around) and using that everywhere that I need to do locking? Or, should I be creating multiple lock instances in each of the classes where I will be employing them. Take these 2 rudimentary code samples, which direction is best to go? The main difference being that a single lock instance is used in both class A and B in the second, while multiple instances are used in the first.
Sample 1
class A():
def __init__(self, theList):
self.theList = theList
self.lock = threading.Lock()
def poll(self):
while True:
# do some stuff that eventually needs to work with theList
self.lock.acquire()
try:
self.theList.append(something)
finally:
self.lock.release()
class B(threading.Thread):
def __init__(self,theList):
self.theList = theList
self.lock = threading.Lock()
self.start()
def run(self):
while True:
# do some stuff that eventually needs to work with theList
self.lock.acquire()
try:
self.theList.remove(something)
finally:
self.lock.release()
if __name__ == "__main__":
aList = []
for x in range(10):
B(aList)
A(aList).poll()
Sample 2
class A():
def __init__(self, theList,lock):
self.theList = theList
self.lock = lock
def poll(self):
while True:
# do some stuff that eventually needs to work with theList
self.lock.acquire()
try:
self.theList.append(something)
finally:
self.lock.release()
class B(threading.Thread):
def __init__(self,theList,lock):
self.theList = theList
self.lock = lock
self.start()
def run(self):
while True:
# do some stuff that eventually needs to work with theList
self.lock.acquire()
try:
self.theList.remove(something)
finally:
self.lock.release()
if __name__ == "__main__":
lock = threading.Lock()
aList = []
for x in range(10):
B(aList,lock)
A(aList,lock).poll()

If you use a separate lock object in each class then you run a risk of deadlocking, e.g. if one operation claims the lock for A and then claims the lock for B while a different operation claims B and then A.
If you use a single lock then you're forcing code to single thread when different operations could be run in parallel. That isn't always as serious in Python (which has a global lock in any case) as in other languages, but say you were to hold a global lock while writing to a file Python would release the GIL but you'd have blocked everything else.
So it's a tradeoff. I'd say go for little locks as that way you maximise the chance for parallel execution, but take care never to claim more than one lock at a time, and try not to hold onto a lock for any longer than you absolutely have to.
So far as your specific examples go, the first one is just plain broken. If you lock operations on theList then you must use the same lock every time or you aren't locking anything. That may not matter here as list.append and list.remove are effectively atomic anyway, but if you do need to lock access to the list you need to be sure to use the same lock every time. The best way to do that is to hold the list and a lock as attributes of a class and force all access to the list to go through methods of the containing class. Then pass the container class around not the list or the lock.

In the general case, a single global lock is less efficient (more contention) but safer (no risk of deadlock) as long as it's a RLock (reentrant) rather than a plain Lock.
The potential problems come when a thread that's executing while holding a lock tries to acquire another (or the same) lock, for example by calling another method that contains the acquire call. If a thread that's already holding a lock tries to acquire it again, it will block forever if the lock's a plain Lock, but proceed smoothly if it's a slightly more complex RLock -- that's why the latter is called reentrant, because the thread holding it can "enter" (acquire the lock) again. Essentially, a RLock keeps track of which thread holds it, and how many time the thread has acquired the lock, while the simpler Lock does not keep such information around.
With multiple locks, the deadlock problem comes when one thread tries to acquire lock A then lock B, while another tries to acquire first lock B, then lock A. If that occurs, then sooner or later you'll be in a situation where the first lock holds A, the second one holds B, and each tries to acquire the lock that the other one is holding -- so both block forever.
One way to prevent multiple-lock deadlocks is to make sure that locks are always acquired in the same order, whatever thread is doing the acquiring. However, when each instance has its own lock, that's exceedingly difficult to organize with any clarity and simplicity.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.