Thread-safe Python function attribute? - python

I have seen 2 different answers on thread safety of python function attributes. Assuming a single process with possible multiple threads, and thinking that functions are global, is there a definite problem in using a function attribute as static storage? No answers based on programming style preferences, please.

There's nothing really wrong with what you describe. A "static" global function object to hold your variables:
from threading import Thread
def static():
pass
static.x = 0
def thread():
static.x += 1
print(static.x)
for nothing in range(10):
Thread(target=thread).start()
Output:
1
2
3
4
5
6
7
8
9
10
That "works" because each thread executes and finishes quickly, and within the same amount of time. But let's say you have threads running for arbitrary lengths of time:
from threading import Thread
from time import sleep
from random import random
from threading import Thread
def static():
pass
static.x = 0
def thread():
static.x += 1
x = static.x
sleep(random())
print(x)
for nothing in range(10):
Thread(target=thread).start()
Output:
2
3
8
1
5
7
10
4
6
9
The behavior becomes undefined.
Amendment:
As tdelaney pointed out,
"[The] first example only works by luck and would fail randomly on
repeated runs..."
The first example is meant to be an illustration of how multithreading can appear to be functioning properly. It is by no means thread-safe code.
Amendment II: You may want to take a look at this question.

Related

Python Multiprocessing or Thread

My code
import time
from multiprocessing.pool import ThreadPool
from concurrent.futures import ThreadPoolExecutor
def print_function(tests):
while True:
print tests
time.sleep(2)
executor = ThreadPoolExecutor(max_workers=2)
for i in range(5):
a = executor.submit(print_function(i))
output
0 0 0 0 0 0 0 0...
but I want out 012345, 012345, 012345...
How can I do this ?
In the line
a = executor.submit(print_function(i))
^^^^^^^^^^^^^^^^^
you are calling the function already. Since it has a while True, it will never finish and thus submit() will never be reached.
The solution is to pass the function as a reference and the argument separately:
a = executor.submit(print_function, i)
However, notice that you will not get the output you like (012345), since a) the range will stop at 4 and b) you kick off only 2 workers and c) the operating system will choose which process to run, so that will be seemingly random (more like 310254).

How do I increase variable value when multithreading in python

I am trying to make a webscraper with multithreading to make it faster. I want to make the value increase every execution. but sometimes the value is skipping or repeating on itself.
import threading
num = 0
def scan():
while True:
global num
num += 1
print(num)
open('logs.txt','a').write(str(f'{num}\n'))
for x in range(500):
threading.Thread(target=scan).start()
Result:
2
2
5
5
7
8
10
10
12
13
13
13
16
17
19
19
22
23
24
25
26
28
29
29
31
32
33
34
Expected result:
1
2
3
4
5
6
7
8
9
10
so since the variable num is a shared resource, you need to put a lock on it. This is done as follows:
num_lock = threading.Lock()
Everytime you want to update the shared variable, you need your thread to first acquire the lock. Once the lock is acquired, only that thread will have access to update the value of num, and no other thread will be able to do so while the current thread has acquired the lock.
Ensure that you use wait or a try-finally block while doing this, to guarantee that the lock will be released even if the current thread fails to update the shared variable.
Something like this:
num_lock.acquire()
try:
num+=1
finally:
num_lock.release()
using with:
with num_lock:
num+=1
Seems like a race condition. You could use a lock so that only one thread can get a particular number. It would make sense also to use lock for writing to the output file.
Here is an example with lock. You do not guarantee the order in which the output is written of course, but every item should be written exactly once. In this example I added a limit of 10000 so that you can more easily check that everything is written eventually in the test code, because otherwise at whatever point you interrupt it, it is harder to verify whether a number got skipped or it was just waiting for a lock to write the output.
The my_num is not shared, so you after you have already claimed it inside the with num_lock section, you are free to release that lock (which protects the shared num) and then continue to use my_num outside of the with while other threads can access the lock to claim their own value. This minimises the duration of time that the lock is held.
import threading
num = 0
num_lock = threading.Lock()
file_lock = threading.Lock()
def scan():
global num_lock, file_lock, num
while num < 10000:
with num_lock:
num += 1
my_num = num
# do whatever you want here using my_num
# but do not touch num
with file_lock:
open('logs.txt','a').write(str(f'{my_num}\n'))
threads = [threading.Thread(target=scan) for _ in range(500)]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
An important callout in addition to threading.Lock:
Use join to make the parent thread wait for forked threads to complete.
Without this, threads would still race.
Suppose I'm using the num after threads complete:
import threading
lock, num = threading.Lock(), 0
def operation():
global num
print("Operation has started")
with lock:
num += 1
threads = [threading.Thread(target=operation) for x in range(10)]
for t in threads:
t.start()
for t in threads:
t.join()
print(num)
Without join, inconsistent (9 gets printed once, 10 otherwise):
Operation has started
Operation has started
Operation has started
Operation has started
Operation has startedOperation has started
Operation has started
Operation has started
Operation has started
Operation has started9
With join, its consistent:
Operation has started
Operation has started
Operation has started
Operation has started
Operation has started
Operation has started
Operation has started
Operation has started
Operation has started
Operation has started
10

Does new implementation of GIL in Python handled race condition issue?

I've read an article about multithreading in Python where they trying to use Synchronization to solve race condition issue. And I've run the example code below to reproduce race condition issue:
import threading
# global variable x
x = 0
def increment():
"""
function to increment global variable x
"""
global x
x += 1
def thread_task():
"""
task for thread
calls increment function 100000 times.
"""
for _ in range(100000):
increment()
def main_task():
global x
# setting global variable x as 0
x = 0
# creating threads
t1 = threading.Thread(target=thread_task)
t2 = threading.Thread(target=thread_task)
# start threads
t1.start()
t2.start()
# wait until threads finish their job
t1.join()
t2.join()
if __name__ == "__main__":
for i in range(10):
main_task()
print("Iteration {0}: x = {1}".format(i,x))
It does return the same result as the article when I'm using Python 2.7.15. But it does not when I'm using Python 3.6.9 (all threads return the same result = 200000).
I wonder that does new implementation of GIL (since Python 3.2) was handled race condition issue? If it does, why Lock, Mutex still exist in Python >3.2 . If it doesn't, why there is no conflict when running multi threading to modify shared resource like the example above?
My mind was struggling with those question in these days when I'm trying to understand more about how Python really works under the hood.
The change you are referring to was to replace check interval with switch interval. This meant that rather than switching threads every 100 byte codes it would do so every 5 milliseconds.
Ref: https://pymotw.com/3/sys/threads.html https://mail.python.org/pipermail/python-dev/2009-October/093321.html
So if your code ran fast enough, it would never experience a thread switch and it might appear to you that the operations were atomic when they are in fact not. The race condition did not appear as there was no actual interweaving of threads. x += 1 is actually four byte codes:
>>> dis.dis(sync.increment)
11 0 LOAD_GLOBAL 0 (x)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_GLOBAL 0 (x)
10 LOAD_CONST 2 (None)
13 RETURN_VALUE
A thread switch in the interpreter can occur between any two bytecodes.
Consider that in 2.7 this prints 200000 always because the check interval is set so high that each thread completes in its entirety before the next runs. The same can be constructed with switch interval.
import sys
import threading
print(sys.getcheckinterval())
sys.setcheckinterval(1000000)
# global variable x
x = 0
def increment():
"""
function to increment global variable x
"""
global x
x += 1
def thread_task():
"""
task for thread
calls increment function 100000 times.
"""
for _ in range(100000):
increment()
def main_task():
global x
# setting global variable x as 0
x = 0
# creating threads
t1 = threading.Thread(target=thread_task)
t2 = threading.Thread(target=thread_task)
# start threads
t1.start()
t2.start()
# wait until threads finish their job
t1.join()
t2.join()
if __name__ == "__main__":
for i in range(10):
main_task()
print("Iteration {0}: x = {1}".format(i,x))
The GIL protects individual byte code instructions. In contrast, a race condition is an incorrect ordering of instructions, which means multiple byte code instructions. As a result, the GIL cannot protect against race conditions outside of the Python VM itself.
However, by their very nature race conditions do not always trigger. Certain GIL strategies are more or less likely to trigger certain race conditions. A thread shorter than the GIL window is never interrupted, and one longer than the GIL window is always interrupted.
Your increment function has 6 byte code instructions, as has the inner loop calling it. Of these, 4 instructions must finish at once, meaning there are 3 possible switching points that corrupt the result. Your entire thread_task function takes about 0.015s to 0.020s (on my system).
With the old GIL switching every 100 instructions, the loop is guaranteed to be interrupted every 8.3 calls, or roughly 1200 times. With the new GIL switching every 5ms, the loop is interrupted only 3 times.

Trying to get two while loops to run concurrently using threading or multiprocessing

According to examples online, these two methods I've tried should be the solution to my problem (see code). These two while loops are running one after the other even through they are in separate threads.
I've tried threading and multiprocessing.
global numberit
numberit= 0
global numberg
numberg= 0
def countingit(numberit):
while numberit < 10:
numberit += 1
print("counter ", numberit)
# time.sleep(1)
def garbage(numberg):
while numberg < 10:
numberg += 1
print("garbage ", numberg)
# time.sleep(1)
# threading.Thread(target=countingit(numberit)).start()
# threading.Thread(target=garbage(numberg)).start()
if __name__ == '__main__':
Process(target=countingit(numberit)).start()
Process(target=garbage(numberg)).start()
# threading.Thread(target=countingit(numberit)).start()
# threading.Thread(target=garbage(numberg)).start()
I'm trying to get it to print:
counter 1
garbage 1
counter 2
garbage 2
... and so on.
The plan is to run while loop threads concurrently with a tkinter gui with push buttons. but i cant get them to run at the same time. One process always has to complete before the other starts.
Thank you.
I've already tried what is shown in the example code I have provided.
Instead of having each while loop run in intervals, I get them running one after the other which is not the desired outcome. I'm trying this as a test to then add a tkinter gui in another thread.
This is the result:
counter 1
...
counter 10
garbage 1
...
garbage 10
But would like:
counter 1
garbage 1
...
counter 10
garbage 10
I see a problem in these two lines:
threading.Thread(target=countingit(numberit)).start()
threading.Thread(target=garbage(numberg)).start()
This is a common antipattern -- instead of making a thread that calls countingit with the argument numberit, this code calls countingit right away in the main thread, and then passes the return value to the Thread initializer.
To pass arguments to a function being called by a thread, use the args parameter. Make sure to pass it as a tuple, even if there's only one argument.
threading.Thread(target = countingit, args=(numberit,)).start()
threading.Thread(target = garbage, args=(numberit,)).start()
When I run this on my machine, I get output that is interleaved as desired:
counter 1
counter 2
garbage 1
counter 3
garbage 2
counter 4
counter 5
garbage 3
counter 6
counter 7
garbage 4
counter 8
garbage 5
counter 9
garbage 6
counter 10
garbage 7
garbage 8
garbage 9
garbage 10
(all of this advice applies to your Process-based attempt, as well)

change value when python is in infinite loop

Can this be done in python?
Start by print an infinite loop and changing values in between loops.
x = 5
while ( true ):
print x
x = 3 # regain control of the interpreter and set to new value
expected output:
5
5
5
......
3
3
3
No, the code as you have written it will not work. The statement after the non-terminating loop will never get executed.
Try the following:
x = 5
while True:
if (some-condition):
x = 3
print x
Alternatively, use threading, and change the value of x in a second thread:
def changeX():
global x
x = 3
x = 5
import threading
threading.Timer(3, changeX).start() # executes changeX after 3 seconds in a second thread
while True:
print x
It's unclear what you need to do this for, but you can catch the "ctrl-c" event and enter a new value:
x = 5
while True:
try:
print x
except KeyboardInterrupt:
x = raw_input("Enter new value: ").strip()
I think that the best answer to this question is to use threading, but there is a way to inject code into a running interpreter thread:
https://fedorahosted.org/pyrasite/
Not exactly. Why do you want to do this? What's the underlying issue?
The "right" way to do it is probably to change the code within the while loop to occasionally actually check for your condition and then end the loop if it's time to end it (e.g., have a thread continue watching for console input)
With that said, technically you could attach to your running program with a debugger (such as winpdb or the built in pdb and mess with it.
But what you probably want to do, if I'm guessing right about your underlying motives, is continue to accept input despite doing some other processing simultaneously.
In that case, you want to learn how to use threads in Python. Check the threading module.

Categories

Resources