I must be missing something here but this simple example of two threads trying to modify a global variable in a function is not giving the expected result:
from threading import Thread, Lock
some_var = 0
def some_func(id):
lo = Lock()
with lo:
global some_var
print("{} here!".format(id))
for i in range(1000000):
some_var += 1
print("{} leaving!".format(id))
t1 = Thread(target=some_func, args=(1,))
t2 = Thread(target=some_func, args=(2,))
t1.start()
t2.start()
t1.join()
t2.join()
print(some_var)
outputs:
1 here!
2 here!
2 leaving!
1 leaving!
1352010
As you can see both threads enter the part that should be locked simultaneous and the incrementation of the globel variable 'some_var' gets mixed up because of that.
It looks like the Lock is just not working for some reason.
For a range up to 10000 it is working but this is probably just because of the GIL not being released during such short calculations.
What is going on?
I'm using Python3.3.2 64bit
The Lock() function creates an entirely new lock - one that only the thread calling the function can use. That's why it doesn't work, because each thread is locking an entirely different lock.
Lock items are one of the few things that you can declare as a global without any problems, because you absolutely want every thread to see the same Lock(). You should try this instead:
from threading import Thread, Lock
some_var = 0
lo = Lock()
def some_func(id):
global lo
with lo:
global some_var
print("{} here!".format(id))
for i in range(1000000):
some_var += 1
print("{} leaving!".format(id))
Every time your function is getting called, a new lock is getting created hence you will have different locks for each different thread. The Lock object should be created globally because every thread should be able to see if the same lock is held up by another. Try moving you lock object creation as global lock!
Or you can define the lock in your main() function. And pass it to the called function.
lock = threading.Lock()
t1 = Thread(target=some_func, args=(1,lock))
t2 = Thread(target=some_func, args=(2,lock))
t1.start()
t2.start()
This way there is only one lock. It is better to avoid global variables whenever possible.
Related
I got this as an interview problem a few days ago. I don't really know parallel programming, and the obvious solution I've tried isn't working.
The question is: write two functions, one printing "foo", one printing "bar", that will be run on separate threads. How to ensure output is always:
foo
bar
foo
bar
...
Here's what I've tried:
from threading import Lock, Thread
class ThreadPrinting:
def __init__(self):
self.lock = Lock()
self.count = 10
def foo(self):
for _ in range(self.count):
with self.lock:
print("foo")
def bar(self):
for _ in range(self.count):
with self.lock:
print("bar")
if __name__ == "__main__":
tp = ThreadPrinting()
t1 = Thread(target=tp.foo)
t2 = Thread(target=tp.bar)
t1.start()
t2.start()
But this just produces 10 "foo"s and then 10 "bar"s. Seemingly the same thread manages to loop around and re-acquire the lock before the other. What might be the solution here? Thank you.
this just produces 10 "foo"s and then 10 "bar"s. Seemingly the same thread manages to loop around and re-acquire the lock before the other.
No surprise there. The problem with using a threading.Lock object (a.k.a., a "mutex") in this way is that, like the (default) mutexes in most programming systems, it makes no attempt to be fair.
The very next thing that either of your two threads does after it releases the lock is, it immediately tries to acquire the lock again. Meanwhile, the other thread is sleeping (a.k.a., "blocked",) waiting for its turn to acquire the lock.
The goal of most operating systems, when there is heavy demand for CPU time, is to maximize the amount of useful work that the CPU(s) can do. The best way to do that is to award the lock to the thread that already is running on some CPU instead of wasting time waking up some other thread that is sleeping.
That strategy works well in programs that use locks the way locks were meant to be used—that is to say, programs where the threads spend most of their time unlocked, and only briefly grab a lock, every so often, in order to examine or update some (group of) shared variables.
In order to make your threads take turns printing their messages, you are going to need to find some way to let the threads explicitly say to each other, "It's your turn now."
See my comments on your question for a hint about how you might do that.
#Solomon Slow provided a great explanation and pointed me in the right direction. I initially wanted some kind of a "lock with value" that can be acquired only conditionally. But this doesn't really exist, and busy-waiting in a cycle of "acquire lock - check variable - loop around" is not great. Instead I solved this with a pair of threading.Condition objects that the threads use to talk to each other. I'm sure there's a simpler solution, but here's mine:
from threading import Thread, Condition
class ThreadPrinting:
def __init__(self):
self.fooCondition = Condition()
self.barCondition = Condition()
self.count = 10
def foo(self):
for _ in range(self.count):
with self.fooCondition:
self.fooCondition.wait()
print("foo")
with self.barCondition:
self.barCondition.notify()
def bar(self):
with self.fooCondition:
self.fooCondition.notify() # Bootstrap the cycle
for _ in range(self.count):
with self.barCondition:
self.barCondition.wait()
print("bar")
with self.fooCondition:
self.fooCondition.notify()
if __name__ == "__main__":
tp = ThreadPrinting()
t1 = Thread(target=tp.foo)
t2 = Thread(target=tp.bar)
t1.start()
t2.start()
The way I did it was just to have the first thread send 'foo' then 1 second of sleep before the second sends 'bar'. Both functions sleep for 2 seconds between sends. This allows for them to always alternate, sending one word per second.
from threading import Thread
import time
def foo():
num = 0
while num < 10:
print("foo")
num = num + 1
time.sleep(2)
def bar():
num = 0
while num < 10:
print("bar")
num = num + 1
time.sleep(2)
t1 = Thread(target=foo)
t2 = Thread(target=bar)
t1.start()
time.sleep(1)
t2.start()
I tried this for 100 of each 'foo' and 'bar' and it still alternated.
I have a class that starts multiple threads upon initialization. Originally I was using threading, but I learned the hard way how painfully slow it can get. As I researched this, it seems that multiprocessing would be faster because it actually utilizes multiple cores. The only hard part is the fact that it doesn't automatically share values. How could I make the following code share self across all processes?
Ideally, it would also share across processes outside of the class as well.
Also, I would rather share the entire class than share each individual value, if possible.
import multiprocessing as mp
from time import sleep
class ThreadedClass:
def __init__(self):
self.var = 0
#Here is where I would want to tell multiprocessing to share 'self'
change_var = mp.Process(target=self.change_var, args=())
print_var = mp.Process(target=self.print_var, args=())
change_var.start()
sleep(0.5)
print_var.start()
def change_var(self):
while True:
self.var += 1
print("Changed var to ", self.var)
sleep(1)
def print_var(self):
while True:
print("Printing var: ", self.var)
sleep(1)
ThreadedClass()
I also included output of the above code below:
Changed var to 1
Printing var: 0
Changed var to 2
Printing var: 0
Changed var to 3
Printing var: 0
Changed var to 4
Printing var: 0
Changed var to 5
Printing var: 0
Changed var to 6
Printing var: 0
Changed var to 7
Printing var: 0
Changed var to 8
Printing var: 0
Changed var to 9
Printing var: 0
Changed var to 10
Thanks in advance.
First of all, multiprocessing means that you are making sub-processes. This means that in general, they have their own space in memory and don't talk to each other. To be clear, when you start a new multiprocessing thread, python copies all your global variables into that thread and then runs that thread separate from everything else. So, when you spawned your two processes, change_var and print_var, each of them received a copy of self, and since their are two copies of self, neither of them is talking to each. One thread is updating it's own copy of self and producing answers that are counting, the other is not updating self. You can easily test this yourself:
import multiprocessing as mp
LIST = [] # This list is in parent process.
def update(item):
LIST.append(item)
p = mp.Process(target=update, args=(5,)) # Copies LIST, update, and anything else that is global.
p.start()
p.join()
# The LIST in the sub-process is cleaned up in memory when the process ends.
print(LIST) # The LIST in the parent process is not updated.
It would be very dangerous if different processes were updating each other's variables while they were trying to process with them; hence, naturally to isolate them (and prevent "segmentation faults"), the entire namespace is copied. If you want sub-processes to talk to each other, you need to communicate with a manager and Queue that is designed for that.
I personally recommend to write your code around things like a Pool() instead. Very clean, input an array, get back an array, done. But if you want to go down the rabbit hole, here is what I read on the multiprocessing website.
import multiprocessing as mp
def f(queue):
queue.put(['stuff',15])
def g(queue):
queue.put(['other thing'])
queue = mp.Queue()
p = mp.Process(target=f,args=(queue,))
q = mp.Process(target=g,args=(queue,))
p.start()
q.start()
for _ in range(2):
print(queue.get())
p.join()
q.join()
The main idea is that the queue does not get copied and instead allows things to be left in the queue. When the you run queue.get() it waits for something in the queue to be gotten that was left by some other process. queue.get() blocks and waits. This means you could have one process read the contents of the other process, like:
import multiprocessing as mp
def f(queue):
obj = queue.get() # Blocks this sub-process until something shows up.
if obj:
print('Something was in the queue from some other process.')
print(obj)
def g(queue):
queue.put(['leaving information here in queue'])
queue = mp.Queue()
p = mp.Process(target=f,args=(queue,))
q = mp.Process(target=g,args=(queue,))
p.start()
This is kindof cool, so I recommend waiting here a second to think about what is waiting to process. Next start the q process.
q.start()
Notice that p didn't get to finish processing until q was started. This is because the Queue blocked and waited for something to show up.
# clean up
p.join()
q.join()
You can read more at: https://docs.python.org/3.4/library/multiprocessing.html?highlight=process#multiprocessing.Queue
hi i am new to multi thread programming in Pyhton
as far as i am concerned i the multi thread programming in python the a lock object can lock the process
of a thread for example in the code below the self.lock.acquire() will set the mode of lock of object to the lock and every other thread which face the self.lock.acquire will stop until the lock object is
then unlocked and then the code under the self.lock.acquire is executed
for being very precise when you make a lock object its default state is unlocked when it encounter the
self.lock.acquire then its is set to lock and other threads which will face the (self.lock.acquire) then
because it is set to lock will wait and nothing is done till the lock object is unlocked with self.lock.realese then the self.lock.acquire will set the lock object to lock mode again and then continue until it reaches the realese object
but the question here if a data is shared between more than two threads
what is happening exactly
for example the code below g is shared between three threads
the only thing i can imagine is that it's not really multi thread because when the realese function in
add_one is called and we know that the statement self.lock.acquire is blocked in both functions or thread
of the functions add_two and add_three so what will happen here will it first execute the self.lock.acquire of function add_two and then because of what we now the function add_three will be locked or what??
so tell me in exact what will happen here
from threading import Lock, Thread
lock = Lock()
g = 0
def add_one():
"""
Just used for demonstration. It’s bad to use the ‘global’
statement in general.
"""
global g
lock.acquire()
g += 1
lock.release()
def add_two():
global g
lock.acquire()
g += 2
lock.release()
def add_three():
global g
lock.acquire()
g += 2
lock.release()
threads = []
for func in [add_one, add_two,add_three]:
threads.append(Thread(target=func))
threads[-1].start()
for thread in threads:
thread.join()
print(g)
I need to run a parallelized process on a list of inputs but using in the process all the variables and functions defined above in the code. But the process itself can be parallelized, because it depends only on one variable, the input of the list.
So I have two possibilities but I don’t know how to implement neither of the two:
1) to use a class, and have a method that should be parallelized using all the functions and attributes of that class. That is: run the method in a parallelized loop, but giving the chance to read the attributes of the object without creating a copy of it.
2) just have a big main and define global variables before running the parallelized process.
Ex:
from joblib import Parallel, delayed
def func(x,y,z):
#do something
a = func0(x,y) #whatever function
a = func1(a,z) #whatever function
return a
if name==“__main__””:
#a lot of stuff in which you create y and z
global y,z
result = Parallel(n_jobs=2)(delayed(func)(i,y,z)for i in range(10))
So the problem is that when I get to the parallel function, y and z are already defined and they are just lookup data, and my question is how can I pass those values to the paralleled function, without python creating a copy for each job?
If you just need to pass a list to some parallel processes I would use the built in threading module. From what I can tell of your question this is all that you need, and you are able to pass arguments to the threads.
Here is a basic threading setup:
import threading
def func(x, y):
print(x, y) # random example
x, y = "foo", "bar"
threads = []
for _ in range(10): # create 10 threads
t = threading.Thread(target=func, args=(x, y,))
threads.append(t)
t.start()
for t in threads:
t.join() # waits for the thread to complete
However if you need to keep track of that list in a thread-safe way you will want to use a Queue:
import threading, queue
# build a thread-safe list
my_q = queue.Queue()
for i in range(1000):
my_q.put(i)
# here is your worker function
def worker(queue):
while not queue.empty():
task = queue.get() # get the next value from the queue
print(task)
queue.task_done() # when you are done tell the queue that this task is complete
# spin up some threads
threads = []
for _ in range(10):
t = threading.Thread(target=worker, args=(my_q,))
threads.append(t)
t.start()
my_q.join() # joining the queue means your code will wait here until the queue is empty
Now to answer your question about shared state, you can create an object to hold your variables. That way instead of passing a copy of the variables to each thread, you can pass the object itself (I believe this is called a Borg, but I could be slightly wrong on that). When doing this if you plan on making any changes to the shared variable it is imported to ensure they are thread-safe. For example if two threads try to increment a number at the same time you could potentially lose that change as one thread overwrites the other. To prevent this we use the threading.Lock object. (if you do not care about this, just ignore all of the lock stuff below).
There are other ways of doing this, but I find this method to be easy to understand and extremely flexible:
import threading
# worker function
def worker(vars, lock):
with lock:
vars.counter += 1
print(f"{threading.current_thread().name}: counter = {vars.counter}")
# this holds your variables to be referenced by threads
class Vars(object):
counter = 0
vars = Vars()
lock = threading.Lock()
# spin up some threads
threads = []
for _ in range(10):
t = threading.Thread(target=worker, args=(vars, lock, ))
threads.append(t)
t.start()
for t in threads:
t.join()
I have an implementation of a network system based on Twisted. I noticed that when I run a function (which do some mathematical operations and prints the result) in a new thread, not in the main one, the print function causes Segmentation fault. Is it possible? Is there an option to avoid that?
My approach, based on Bram Cohen's suggestion:
Define a global Lock variable
from threading import Lock
s_print_lock = Lock()
Define a function to call print with the Lock
def s_print(*a, **b):
"""Thread safe print function"""
with s_print_lock:
print(*a, **b)
Use s_print instead of print in your threads.
You need to use a thread lock when you print something in a thread.
Example:
lock = Lock()
lock.acquire() # will block if lock is already held
print("something")
lock.release()
In this way the resource(in this case print) will not be used in the same time by multiple threads.
Using a thread lock is something like focusing the attention on the thread where the lock is acquired.