Do we ever need to synchronise threads in python?

Do we ever need to synchronise threads in python? - python

According to GIL wiki it states that
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe.
When multiple threads tries to do some operation on a shared variable at same time we need to synchronise the threads to avoid Race Conditions. We achieve this by acquiring a lock.
But since python uses GIL only one thread is allowed to execute python's byte code, so this problem should be never faced in case of python programs - is what I thought :( .But I saw an article about thread synchronisation in python where we have a code snippet that is causing race conditions.
https://www.geeksforgeeks.org/multithreading-in-python-set-2-synchronization/
Can someone please explain me how this is possible?
Code
import threading
# global variable x
x = 0
def increment():
"""
function to increment global variable x
"""
global x
x += 1
def thread_task():
"""
task for thread
calls increment function 100000 times.
"""
for _ in range(100000):
increment()
def main_task():
global x
# setting global variable x as 0
x = 0
# creating threads
t1 = threading.Thread(target=thread_task)
t2 = threading.Thread(target=thread_task)
# start threads
t1.start()
t2.start()
# wait until threads finish their job
t1.join()
t2.join()
if __name__ == "__main__":
for i in range(10):
main_task()
print("Iteration {0}: x = {1}".format(i,x))
Output:
Iteration 0: x = 175005
Iteration 1: x = 200000
Iteration 2: x = 200000
Iteration 3: x = 169432
Iteration 4: x = 153316
Iteration 5: x = 200000
Iteration 6: x = 167322
Iteration 7: x = 200000
Iteration 8: x = 169917
Iteration 9: x = 153589

Only one thread at a time can execute bytecode. That ensures memory allocation, and primitive objects like lists, dicts and sets are always consistent without the need for any explicit control on the Python side of the code.
However, the += 1, integers being imutable objects, is not atomic: it fetches the previous value in the same variable, creates (or gets a reference to) a new object, which is the result of the operation, and then stores that value in the original global variable. The bytecode for that can be seen with the help of the dis module:
In [2]: import dis
In [3]: global counter
In [4]: counter = 0
In [5]: def inc():
...: global counter
...: counter += 1
...:
In [6]: dis.dis(inc)
1 0 RESUME 0
3 2 LOAD_GLOBAL 0 (counter)
14 LOAD_CONST 1 (1)
16 BINARY_OP 13 (+=)
20 STORE_GLOBAL 0 (counter)
22 LOAD_CONST 0 (None)
24 RETURN_VALUE
And the running thread can change arbitrarily between each of these bytecode instructions.
So, for this kind of concurrency, one has to resort to, as in lower level code, to a lock -the inc function should be like this:
In [7]: from threading import Lock
In [8]: inc_lock = Lock()
In [9]: def inc():
...: global counter
...: with inc_lock:
...: counter += 1
...:
So, this will ensure no other thread will run bytecode while performing the whole counter += 1 part.
(The disassemble here would be significantly lengthier, but it has to do with the semantics of the with block, not with the lock, so, not related to the problem we are looking at. The lock can be acquired through other means as well - a with block is just the most convenient.)
Also, this is one of the greatest advantages of async code when compared to threaded parallelism: in async code one's Python code will always run without being interrupted unless there is an explicit deferring of the flow to the controlling loop - by using an await or one of the various async <command> patterns.

since python uses GIL only one thread is allowed to execute python's byte code
All of the threads in a Python program must be able to execute byte codes, but at any one moment in time, only one thread can have a lock on the GIL. The threads in a program continually take turns locking and unlocking it as needed.
When multiple threads tries to do some operation on a shared variable at same time we need to synchronise the threads to avoid Race Conditions.
"Race condition" is kind of a low-level idea. There is a higher-level way to understand why we need mutexes.
Threads communicate through shared variables. Imagine, we're selling seats to a show in a theater. We've got a list of seats that already have been sold, and we've got a list of seats that still are available, and we've got some number of pending transactions which have seats "on-hold." At any given instant in time, if we count all of the seats in all of those different places, they'd better add up to the number of seats in the theater—a constant number.
Computer scientists call that property an invariant. The number of seats in the theater never varies, and we always want all the seats that we know about to add up to that number.
The problem is, how can you sell a seat without breaking the invariant? You can't. You can't write code that moves a seat from one category to another in a single, atomic operation. Computer hardware doesn't have an operation for that. We have to use a sequence of simpler operations to move some object from one list to another. And, if one thread tries to count the seats while some other thread is half-way done performing that sequence, then the first thread will get the wrong number of seats. The invariant is "broken."
Mutexes solve the problem. If every thread that can temporarily break the invariant only ever does it while keeping a certain mutex locked, and if every other thread that cares about the invariant only ever checks it while keeping the same mutex locked, then no thread will ever see the broken invariant other than the one thread that is doing it on purpose.
You can talk about "invariants," or you can talk about "race conditions," but which feels right, depends on the complexity of the program. If it's a complicated system, then it often makes sense to describe the need for a mutex at a high level—by describing the invariant that the mutex protects. If it's a really simple problem (e.g., like incrementing a counter) then it feels better to talk about the "race condition" that the mutex averts. But they're really just two different ways of thinking about the same thing.

Related

Python - Why doesn't multithreading increase the speed of my code?

I tried improving my code by running this with and without using two threads:
from threading import Lock
from threading import Thread
import time
start_time = time.clock()
arr_lock = Lock()
arr = range(5000)
def do_print():
# Disable arr access to other threads; they will have to wait if they need to read
a = 0
while True:
arr_lock.acquire()
if len(arr) > 0:
item = arr.pop(0)
print item
arr_lock.release()
b = 0
for a in range(30000):
b = b + 1
else:
arr_lock.release()
break
thread1 = Thread(target=do_print)
thread1.start()
thread1.join()
print time.clock() - start_time, "seconds"
When running 2 threads my code's run time increased. Does anyone know why this happened, or perhaps know a different way to increase the performance of my code?

The primary reason you aren't seeing any performance improvements with multiple threads is because your program only enables one thread to do anything useful at a time. The other thread is always blocked.
Two things:
Remove the print statement that's invoked inside the lock. print statements drastically impact performance and timing. Also, the I/O channel to stdout is essentially single threaded, so you've built another implicit lock into your code. So let's just remove the print statement.
Use a proper sleep technique instead of "spin locking" and counting up from 0 to 30000. That's just going to burn a core needlessly.
Try this as your main loop
while True:
arr_lock.acquire()
if len(arr) > 0:
item = arr.pop(0)
arr_lock.release()
time.sleep(0)
else:
arr_lock.release()
break
This should run slightly better... I would even advocate getting the sleep statement out altogether so you can just let each thread have a full quantum.
However, because each thread is either doing "nothing" (sleeping or blocked on acquire) or just doing a single pop call on the array while in the lock, the majority of the time spent is going to be in the acquire/release calls instead of actually operating on the array. Hence, multiple threads aren't going to make your program run faster.

Python 2.7 - atomically add dict entry only if it doesn't exist? [duplicate]

Is accessing/changing dictionary values thread-safe?
I have a global dictionary foo and multiple threads with ids id1, id2, ... , idn. Is it OK to access and change foo's values without allocating a lock for it if it's known that each thread will only work with its id-related value, say thread with id1 will only work with foo[id1]?

Assuming CPython: Yes and no. It is actually safe to fetch/store values from a shared dictionary in the sense that multiple concurrent read/write requests won't corrupt the dictionary. This is due to the global interpreter lock ("GIL") maintained by the implementation. That is:
Thread A running:
a = global_dict["foo"]
Thread B running:
global_dict["bar"] = "hello"
Thread C running:
global_dict["baz"] = "world"
won't corrupt the dictionary, even if all three access attempts happen at the "same" time. The interpreter will serialize them in some undefined way.
However, the results of the following sequence is undefined:
Thread A:
if "foo" not in global_dict:
global_dict["foo"] = 1
Thread B:
global_dict["foo"] = 2
as the test/set in thread A is not atomic ("time-of-check/time-of-use" race condition). So, it is generally best, if you lock things:
from threading import RLock
lock = RLock()
def thread_A():
with lock:
if "foo" not in global_dict:
global_dict["foo"] = 1
def thread_B():
with lock:
global_dict["foo"] = 2

The best, safest, portable way to have each thread work with independent data is:
import threading
tloc = threading.local()
Now each thread works with a totally independent tloc object even though it's a global name. The thread can get and set attributes on tloc, use tloc.__dict__ if it specifically needs a dictionary, etc.
Thread-local storage for a thread goes away at end of thread; to have threads record their final results, have them put their results, before they terminate, into a common instance of Queue.Queue (which is intrinsically thread-safe). Similarly, initial values for data a thread is to work on could be arguments passed when the thread is started, or be taken from a Queue.
Other half-baked approaches, such as hoping that operations that look atomic are indeed atomic, may happen to work for specific cases in a given version and release of Python, but could easily get broken by upgrades or ports. There's no real reason to risk such issues when a proper, clean, safe architecture is so easy to arrange, portable, handy, and fast.

Since I needed something similar, I landed here. I sum up your answers in this short snippet :
#!/usr/bin/env python3
import threading
class ThreadSafeDict(dict) :
def __init__(self, * p_arg, ** n_arg) :
dict.__init__(self, * p_arg, ** n_arg)
self._lock = threading.Lock()
def __enter__(self) :
self._lock.acquire()
return self
def __exit__(self, type, value, traceback) :
self._lock.release()
if __name__ == '__main__' :
u = ThreadSafeDict()
with u as m :
m[1] = 'foo'
print(u)
as such, you can use the with construct to hold the lock while fiddling in your dict()

The GIL takes care of that, if you happen to be using CPython.
global interpreter lock
The lock used by Python threads to assure that only one thread executes in the CPython virtual machine at a time. This simplifies the CPython implementation by assuring that no two processes can access the same memory at the same time. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines. Efforts have been made in the past to create a “free-threaded” interpreter (one which locks shared data at a much finer granularity), but so far none have been successful because performance suffered in the common single-processor case.
See are-locks-unnecessary-in-multi-threaded-python-code-because-of-the-gil.

How it works?:
>>> import dis
>>> demo = {}
>>> def set_dict():
... demo['name'] = 'Jatin Kumar'
...
>>> dis.dis(set_dict)
2 0 LOAD_CONST 1 ('Jatin Kumar')
3 LOAD_GLOBAL 0 (demo)
6 LOAD_CONST 2 ('name')
9 STORE_SUBSCR
10 LOAD_CONST 0 (None)
13 RETURN_VALUE
Each of the above instructions is executed with GIL lock hold and STORE_SUBSCR instruction adds/updates the key+value pair in a dictionary. So you see that dictionary update is atomic and hence thread safe.

An easily refreshable Queue for Python Threading

I would like to find a mechanism to easily report the progress of a Python thread. For example, if my thread had a counter, I would like to know the value of the counter once in awhile, but, importantly, I only need to know the latest value, not every value that's ever gone by.
What I imagine to be the simplest solution is a single value Queue, where every time I put a new value on in the thread, it replaces the old value with the new one. Then when I do a get in the main program, it would only return the latest value.
Because I don't know how to do the above, instead what I do is put every counter value in a queue, and when I get, I get all the values until there are no more, and just keep the last. But this seems far from ideal, in that I'm filling the queues with thousands of values the I don't care about.
Here's an example of what I do now:
from threading import Thread
from Queue import Queue, Empty
from time import sleep
N = 1000
def fast(q):
count = 0
while count<N:
sleep(.02)
count += 1
q.put(count)
def slow(q):
while 1:
sleep(5) # sleep for a long time
# read last item in queue
val = None
while 1: # read all elements of queue, only saving last
try:
val = q.get(block=False)
except Empty:
break
print val # the last element read from the queue
if val==N:
break
if __name__=="__main__":
q = Queue()
fast_thread = Thread(target=fast, args=(q,))
fast_thread.start()
slow(q)
fast_thread.join()
My question is, is there a better approach?

Just use a global variable and a threading.Lock to protect it during assignments:
import threading
from time import sleep
N = 1000
value = 0
def fast(lock):
global value
count = 0
while count<N:
sleep(.02)
count += 1
with lock:
value = count
def slow():
while 1:
sleep(5) # sleep for a long time
print value # read current value
if value == N:
break
if __name__=="__main__":
lock = threading.Lock()
fast_thread = threading.Thread(target=fast, args=(lock,))
fast_thread.start()
slow()
fast_thread.join()
yields (something like)
249
498
747
997
1000
As Don Question points out, if there is only one thread modifying value, then
actually no lock is needed in the fast function. And as dano points out, if you want to
ensure that the value printed in slow is the same value used in the
if-statement, then a lock is needed in the slow function.
For more on when locks are needed, see Thread Synchronization Mechanisms in Python.

Just use a deque with a maximum length of 1. It will just keep your latest value.
So, instead of:
q = Queue()
use:
from collections import deque
q = deque(maxlen=1)
To read from the deque, there's no get method, so you'll have to do something like:
val = None
try:
val = q[0]
except IndexError:
pass

In your special case, you may over-complicate the issue. If your variable is just some kind of progress-indenticator of a single thread, and only this thread actually changes the variable, then it's completely safe to use a shared object to communicate the progress as long as all other threads do only read.
I guess we all read to many (rightfully) warnings about race-conditions and other pitfalls of shared states in concurrent programming, so we tend to overthink and add more precaution then is sometimes needed.
You could basically share a pre-constructed dict:
thread_progress = dict.fromkeys(list_of_threads, progress_start_value)
or manually:
thread_progress = {thread: progress_value, ...}
without further precaution as long as no thread changes the dict-keys.
This way you can track the progress of multiple threads over one dict. Only condition is to not change the dict once the threading started. Which means the dict must contain all threads BEFORE the first child-thread starts, else you must use a Lock, before writing to the dict. With "changing the dict" i mean all operation regarding the keys. You may change the associated values of a key, because that's in the next level of indirection.
Update:
The underlying problem is the shared state. Which is already a problem in linear Programs, but a nightmare in concurrent.
For example: Imagine a global (shared) variable sv and two functions G(ood) and B(ad) in a linear program. Both function calculate a result depending on sv, but B unintentionally changes sv. Now you are wondering why the heck G doesn't do what it should do, despite not finding any error in your function G, even after you tested it isolated and it was perfectly fine.
Now imagine the same scenario in a concurrent program, with two Threads A and B. Both Threads increment the shared state/variable sv by one.
without locking (current value of sv in parenthesis):
sv = 0
A reads sv (0)
B reads sv (0)
A inc sv (0)
B inc sv (0)
A writes sv (1)
B writes sv (1)
sv == 1 # should be 2!
To find the source of the problem is a pure nightmare! Because it could also succeed sometimes. More often than not A actually would succeed to finish, before B even starts to read sv, but now your problem just seems to behave non-deterministic or erratic and is even harder to find. In contrast to my linear example, both threads are "good", but nevertheless behave not as intentioned.
with locking:
sv = 0
l = lock (for access on sv)
A tries to aquire lock for sv -> success (0)
B tries to aquire lock for sv -> failure, blocked by A (0)
A reads sv (0)
B blocked (0)
A inc sv (0)
B blocked (0)
A writes sv (1)
B blocked (1)
A releases lock on sv (1)
B tries to aquire lock for sv -> success (1)
...
sv == 2
I hope my little example explained the underlying problem of accessing a shared state and
why making write operations (including the read operation) atomic through locking is necessary.
Regarding my advice of a pre-initialized dict: This is a mere precaution because of two reasons:
if you iterate over the threads in a for-loop, the loop may raise an
exception if a thread adds or removes an entry to/from the dict
while still in the loop, because it now is unclear what the next key
should be.
Thread A reads the dict and gets interrupted by Thread B which adds
an entry and finishes. Thread A resumes, but doesn't have the dict
Thread B changed and writes the pre-B together with it's own changes
back. Thread Bs changes are lost.
BTW my proposed solution wouldn't work atm, because of the immutability of the primitive types. But this could be easily fixed by making them mutable, e.g. by encapsulating them into a list or an special Progress-Object, or even simpler: give the thread-function access to the thread_progress dict .
Explanation by example:
t = Thread()
progress = 0 # progress points to the object `0`
dict[t] = progress # dict[t] points now to object `0`
progress = 1 # progress points to object `1`
dict[t] # dict[t] still points to object `0`
better:
t = Thread()
t.progress = 0
dict[thread_id] = t
t.progress = 1
dict[thread_id].progress == 1

Garbage-collect a lock once no threads are asking for it

I have a function that must never be called with the same value simultaneously from two threads. To enforce this, I have a defaultdict that spawns new threading.Locks for a given key. Thus, my code looks similar to this:
from collections import defaultdict
import threading
lock_dict = defaultdict(threading.Lock)
def f(x):
with lock_dict[x]:
print "Locked for value x"
The problem is that I cannot figure out how to safely delete the lock from the defaultdict once its no longer needed. Without doing this, my program has a memory leak that becomes noticeable when f is called with many different values of x.
I cannot simply del lock_dict[x] at the end of f, because in the scenario that another thread is waiting for the lock, then the second thread will lock a lock that's no longer associated with lock_dict[x], and thus two threads could end up simultaneously calling f with the same value of x.

I'd use a different approach:
fcond = threading.Condition()
fargs = set()
def f(x):
with fcond:
while x in fargs:
fcond.wait()
fargs.add(x) # this thread has exclusive rights to use `x`
# do useful stuff with x
# any other thread trying to call f(x) will
# block in the .wait above()
with fcond:
fargs.remove(x) # we're done with x
fcond.notify_all() # let blocked threads (if any) proceed
Conditions have a learning curve, but once it's climbed they make it much easier to write correct thread-safe, race-free code.
Thread safety of the original code
#JimMischel asked in a comment whether the orignal's use of defaultdict was subject to races. Good question!
The answer is - alas - "you'll have to stare at your specific Python's implementation".
Assuming the CPython implementation: if any of the code invoked by defaultdict to supply a default invokes Python code, or C code that releases the GIL (global interpreter lock), then 2 (or more) threads could "simultaneously" invoke withlock_dict[x] with the same x not already in the dict, and:
Thread 1 sees that x isn't in the dict, gets a lock, then loses its timeslice (before setting x in the dict).
Thread 2 sees that x isn't in the dict, and also gets a lock.
One of those thread's locks ends up in the dict, but both threads execute f(x).
Staring at the source for 3.4.0a4+ (the current development head), defaultdict and threading.Lock are both implemented by C code that doesn't release the GIL. I don't recall whether earlier versions did or didn't, at various times, implement all or parts of defaultdict or threading.Lock in Python.
My suggested alternative code is full of stuff implemented in Python (all threading.Condition methods), but is race-free by design - even if you're using an old version of Python with sets also implemented in Python (the set is only accessed under the protection of the condition variable's lock).
One lock per argument
Without conditions, this seems to be much harder. In the original approach, I believe you need to keep a count of threads wanting to use x, and you need a lock to protect those counts and to protect the dictionary. The best code I've come up with for that is so long-winded that it seems sanest to put it in a context manager. To use, create an argument locker per function that needs it:
farglocker = ArgLocker() # for function `f()`
and then the body of f() can be coded simply:
def f(x):
with farglocker(x):
# only one thread at a time can run with argument `x`
Of course the condition approach could also be wrapped in a context manager. Here's the code:
import threading
class ArgLocker:
def __init__(self):
self.xs = dict() # maps x to (lock, count) pair
self.lock = threading.Lock()
def __call__(self, x):
return AllMine(self.xs, self.lock, x)
class AllMine:
def __init__(self, xs, lock, x):
self.xs = xs
self.lock = lock
self.x = x
def __enter__(self):
x = self.x
with self.lock:
xlock = self.xs.get(x)
if xlock is None:
xlock = threading.Lock()
xlock.acquire()
count = 0
else:
xlock, count = xlock
self.xs[x] = xlock, count + 1
if count: # x was already known - wait for it
xlock.acquire()
assert xlock.locked
def __exit__(self, *args):
x = self.x
with self.lock:
xlock, count = self.xs[x]
assert xlock.locked
assert count > 0
count -= 1
if count:
self.xs[x] = xlock, count
else:
del self.xs[x]
xlock.release()
So which way is better? Using conditions ;-) That way is "almost obviously correct", but the lock-per-argument (LPA) approach is a bit of a head-scratcher. The LPA approach does have the advantage that when a thread is done with x, the only threads allowed to proceed are those wanting to use the same x; using conditions, the .notify_all() wakes all threads blocked waiting on any argument. But unless there's very heavy contention among threads trying to use the same arguments, this isn't going to matter much: using conditions, the threads woken up that aren't waiting on x stay awake only long enough to see that x in fargs is true, and then immediately block (.wait()) again.

Python: accessing a function by multiple thread concurrently without Lock mechansim

When multiple threads access the same function then do we require to implement the lock mechanism explicitly or not.
I have a program using thread.
There are two thread, t1 and t2. t1 is for add1() and t2 is for subtract1().Both of the threads concurrently access the same function myfunction(caller,num)
1. I have defined a simple lock mechanism in the given program using a variable functionLock. Is this reliable or do we need to modify it.
import time, threading
functionLock = '' # blank means lock is open
def myfunction(caller,num):
global functionLock
while functionLock!='': # check and wait until the lock is open
print "locked by "+ str(functionLock)
time.sleep(1)
functionLock = caller # apply lock
total=0
if caller=='add1':
total+=num
print"1. addition finish with Total:"+str(total)
time.sleep(2)
total+=num
print"2. addition finish with Total:"+str(total)
time.sleep(2)
total+=num
print"3. addition finish with Total:"+str(total)
else:
time.sleep(1)
total-=num
print"\nSubtraction finish with Total:"+str(total)
print '\n For '+caller+'() Total: '+str(total)
functionLock='' # release the lock
def add1(arg1, arg2):
print '\n START add'
myfunction('add1',10)
print '\n END add'
def subtract1():
print '\n START Sub'
myfunction('sub1',100)
print '\n END Sub'
def main():
t1 = threading.Thread(target=add1, args=('arg1','arg2'))
t2 = threading.Thread(target=subtract1)
t1.start()
t2.start()
if __name__ == "__main__":
main()
The output is as follows:
START add
START Sub
1. addition finish with Total:10
locked by add1
locked by add1
2. addition finish with Total:20
locked by add1
locked by add1
3. addition finish with Total:30
locked by add1
For add1() Total: 30
END add
Subtraction finish with Total:-100
For sub1() Total: -100
END Sub
2. is it ok it we do not use locks?
Even if I do not use the lock mechanism defined in the above program the result is same from both threads t1 and t2. Does this mean that python implements locks automatically when multiple threads access the same function.
The output of the program without using the lock, functionLock , in the above program
START add
START Sub
1. addition finish with Total:10
Subtraction finish with Total:-100
For sub1() Total: -100
END Sub
2. addition finish with Total:20
3. addition finish with Total:30
For add1() Total: 30
END add
Thanks!

In addition to the other comments on this thread about busy waiting on a variable, I would like to point out that the fact that you are not using any kind of atomic swap may cause concurrency bugs. Even though your test execution does not cause them come up, if executed enough repetitions with different timings, the following sequence of events may come up:
Thread #1 executes while functionLock!='' and gets False. Then, Thread#1 is interrupted (preempted for something else to be executed), and Thread #2 executes the same line, while functionLock!='' also getting False. In this example, both threads have entered the critical section, which is clearly not what you wanted. In particular, in any line where threads modify total, the result may not be that which you expected, since both threads can be in that section at the same time. See the following example:
total is 10. For the sake of simplicity, assume num is always 1. Thread#1 executes total+=num, which is composed of three operations: (i) loading the value of total, (ii) adding it num and (iii) storing the result in total. If after (i), Thread#1 gets preempted and Thread#2 then executes total-=num, total is set to 9. Then, Thread#1 resumes. However, it had already loaded total = 10, so it adds 1 and stores 11 into the total variable. This effectively transformed the decrement operation by Thread#2 in a no-op.
Notice that in the wikipedia article linked by #ron-klein, the code uses an xchg operation, which atomically swaps a register with a variable. This is vital for the correction of the lock. In conclusion, if you want to steer clear of incredibly hard to debug concurrency bugs, never implement your own locks as alternative to atomic operations.
[edit] I just noticed that in fact total is a local variable in your code, so this could never happen. However, I believe that you are not aware that this is the cause of the code you have working perfectly, due to you affirming "Does this mean that python implements locks automatically when multiple threads access the same function.", which is not true. Please try adding global total to the beginning of myfunction, and executing the threads several times, and you should see errors in the output. [/edit]

Although I don't know much Python, I would say this is like in any other language:
As long as there are no variables involved that have been declared outside of the function and can therefore be shared between threads, there shouldn't be a need for locks. And this doesn't seem to be the case with your function.
Output to console might be garbled, though.

You need to lock when you think that code you are writing is critical section code i.e. whether the code snippet is modifying shared state between threads if it is not then you don't need to worry about locking.
Whether methods should be locked or not is a design choice, ideally you should lock as closer to the shared state access by the threads.

In your code you implement your own spin-lock. While this is possible, I don't think it's recommended in Python, since it might lead to a performance issue.
I used a well known searching engine (starts with G), querying about "python lock". On of the first results is this one: Thread Synchronization Mechanisms in Python. It looks like a good article to start with.
For the code itself: You should lock whenever the operation(s) executed on a shared resource are not atomic. It currently looks like there's no such resource in your code.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.