Variable sharing in Multiprocessing with Python? (ProcessPoolExecutor())

Variable sharing in Multiprocessing with Python? (ProcessPoolExecutor()) - python

I want to share a variable among multiple processes.
I read this one: Shared variable in concurrent.futures.ProcessPoolExecutor() python but it didn't really help my code. I am also not an expert in this and just starting since a few weeks (first year student) :)
How is is possible to share the variable x among (all) threads as soon as it becomes available? This is what I have so far:
import concurrent.futures, time
def share():
time.sleep(1)
global x
x = "hello!"
def printshare():
while True:
time.sleep(0.5)
try:
print(x)
except Exception as e:
print(f"printshare {e}")
def main():
with concurrent.futures.ProcessPoolExecutor() as executor:
executor.submit(share)
executor.submit(printshare)
if __name__ == '__main__':
main()
And it gives me the error:
printshare name 'x' is not defined

Got it to work:
def foo(x):
time.sleep(1)
x.string = 'hello'
def foo2(x):
time.sleep(1.5)
print(x.string)
def main():
x = Value('i')
with concurrent.futures.ProcessPoolExecutor() as executor:
executor.submit(foo(x))
executor.submit(foo2(x))
if __name__ == '__main__':
main()

Related

Python code to benchmark in flops using threading

I'm having trouble writing a benchmark code in python using threading. I was able to get my threading to work, but I can't get my object to return a value. I want to take the values and add them to a list so I can calculate the flops.
create class to carry out threading
class myThread(threading.Thread):
def calculation(self):
n=0
start=time.time()
ex_time=0
while ex_time < 30:
n+=1
end=time.time()
ex_time=end-start
return ex_time
def run(self):
t = threading.Thread(target = self.calculation)
t.start()
function to create threads
def make_threads(num):
times=[]
calcs=[]
for i in range(num):
print('start thread', i+1)
thread1=myThread()
t=thread1.start()
times.append(t)
#calcs.append(n)
#when trying to get a return value it comes back as none as seen
print(times)
#average out the times,add all the calculations to get the final numbers
#to calculate flops
time.sleep(32) #stop the menu from printing until calc finish
def main():
answer=1
while answer != 0:
answer=int(input("Please indicate how many threads to use: (Enter 0 to exit)"))
print("\n\nBenchmark test with ", answer, "threads")
make_threads(answer)
main()

Two ways to do this:
1. Using static variables (hacky, but efficient and quick)
Define some global variable that you then manipulate in the thread. I.e.:
import threading
import time
class myThread(threading.Thread):
def calculation(self):
n=0
start=time.time()
ex_time=0
print("Running....")
while ex_time < 30:
n+=1
end=time.time()
ex_time=end-start
self.myThreadValues[self.idValue] = ex_time
print(self.myThreadValues)
return ex_time
def setup(self,myThreadValues=None,idValue=None):
self.myThreadValues = myThreadValues
self.idValue = idValue
def run(self):
self.calculation()
#t = threading.Thread(target = self.calculation)
#t.start()
def make_threads(num):
threads=[]
calcs=[]
myThreadValues = {}
for i in range(num):
print('start thread', i+1)
myThreadValues[i] = 0
thread1=myThread()
thread1.setup(myThreadValues,i)
thread1.start()
#times.append(t)
threads.append(thread1)
# Now we need to wait for all the threads to finish. There are a couple ways to do this, but the best is joining.
print("joining all threads...")
for thread in threads:
thread.join()
#calcs.append(n)
#when trying to get a return value it comes back as none as seen
print("Final thread values: " + str(myThreadValues))
print("Done")
#average out the times,add all the calculations to get the final numbers
#to calculate flops
#time.sleep(32) #stop the menu from printing until calc finish
def main():
answer=1
while answer != 0:
answer=int(input("Please indicate how many threads to use: (Enter 0 to exit)"))
print("\n\nBenchmark test with ", answer, "threads")
make_threads(answer)
main()
2. The proper way to do this is with Processes
Processes are designed for passing information back and forth, versus threads which are commonly used for async work. See explanation here: https://docs.python.org/3/library/multiprocessing.html
See this answer: How can I recover the return value of a function passed to multiprocessing.Process?
import multiprocessing
from os import getpid
def worker(procnum):
print 'I am number %d in process %d' % (procnum, getpid())
return getpid()
if __name__ == '__main__':
pool = multiprocessing.Pool(processes = 3)
print pool.map(worker, range(5))

AttributeError: Can't pickle local object 'computation.. function1 using multiprocessing queue

I have the following code using the scheduler and multiprocessing module:
def computation():
def function1(q):
while True:
daydate = datetime.now()
number = random.randrange(1, 215)
print('Sent to function2: ({}, {})'.format(daydate, number))
q.put((daydate, number))
time.sleep(2)
def function2(q):
while True:
date, number = q.get()
print("Recevied values from function1: ({}, {})".format(date, number))
time.sleep(2)
if __name__ == "__main__":
q = Queue()
a = Process(target=function1, args=(q,))
a.start()
b = Process(target=function2, args=(q,))
b.start()
a.join()
b.join()
schedule.every().monday.at("08:45").do(computation)
schedule.every().tuesday.at("08:45").do(computation)
while True:
schedule.run_pending()
time.sleep(1)
However while executing the code gives the following error:
AttributeError: Can't pickle local object 'computation..
function1
And:
OSError: [WinError 87] The parameter is incorrect
How does one solve this problem? I've tried to solve this by define a function at the top level of a module as stated in the documents (https://docs.python.org/2/library/pickle.html#what-can-be-pickled-and-unpickled) however it still gives the same error.

Nested functions are not functions defined at the top-level so that's why you get the error. You need to relocate the definition of function1 and function2 outside
of computation.
How you wrote it, your processes would start immediately instead of on the date you scheduled them to run. That probably does what you intended:
import os
import time
import random
from multiprocessing import Process, Queue
from threading import Thread
from datetime import datetime
import schedule
def function1(q):
while True:
daydate = datetime.now()
number = random.randrange(1, 215)
fmt = '(pid: {}) Sent to function2: ({}, {})'
print(fmt.format(os.getpid(), daydate, number))
q.put((daydate, number))
time.sleep(2)
def function2(q):
while True:
date, number = q.get()
fmt = "(pid: {}) Received values from function1: ({}, {})"
print(fmt.format(os.getpid(), date, number))
# time.sleep(2) no need to sleep here because q.get will block until
# new items are available
def computation():
q = Queue()
a = Process(target=function1, args=(q,))
a.start()
b = Process(target=function2, args=(q,))
b.start()
a.join()
b.join()
if __name__ == "__main__":
# We are spawning new threads as a launching platform for
# computation. Without it, the next job couldn't start before the last
# one has finished. If your jobs always end before the next one should
# start, you don't need this construct and you can just pass
# ...do(computation)
schedule.every().friday.at("01:02").do(
Thread(target=computation).start
)
schedule.every().friday.at("01:03").do(
Thread(target=computation).start
)
while True:
schedule.run_pending()
time.sleep(1)
As it is now, your processes would run forever after started once. If that's not what you want, you have to think about implementing some stop condition.

apply_async callback function not being called

I am a newbie to python,i am have function that calculate feature for my data and then return a list that should be processed and written in file.,..i am using Pool to do the calculation and then and use the callback function to write into file,however the callback function is not being call,i ve put some print statement in it but it is definetly not being called.
my code looks like this:
def write_arrow_format(results):
print("writer called")
results[1].to_csv("../data/model_data/feature-"+results[2],sep='\t',encoding='utf-8')
with open('../data/model_data/arow-'+results[2],'w') as f:
for dic in results[0]:
feature_list=[]
print(dic)
beginLine=True
for key,value in dic.items():
if(beginLine):
feature_list.append(str(value))
beginLine=False
else:
feature_list.append(str(key)+":"+str(value))
feature_line=" ".join(feature_list)
f.write(feature_line+"\n")
def generate_features(users,impressions,interactions,items,filename):
#some processing
return [result1,result2,filename]
if __name__=="__main__":
pool=mp.Pool(mp.cpu_count()-1)
for i in range(interval):
if i==interval:
pool.apply_async(generate_features,(users[begin:],impressions,interactions,items,str(i)),callback=write_arrow_format)
else:
pool.apply_async(generate_features,(users[begin:begin+interval],impressions,interactions,items,str(i)),callback=write_arrow_format)
begin=begin+interval
pool.close()
pool.join()

It's not obvious from your post what is contained in the list returned by generate_features. However, if any of result1, result2, or filename are not serializable, then for some reason the multiprocessing lib will not call the callback function and will fail to do so silently. I think this is because the multiprocessing lib attempts to pickle objects before passing them back and forth between child processes and the parent process. If anything you're returning isn't "pickleable" (i.e not serializable) then the callback doesn't get called.
I've encountered this bug myself, and it turned out to be an instance of a logger object that was giving me troubles. Here is some sample code to reproduce my issue:
import multiprocessing as mp
import logging
def bad_test_func(ii):
print('Calling bad function with arg %i'%ii)
name = "file_%i.log"%ii
logging.basicConfig(filename=name,level=logging.DEBUG)
if ii < 4:
log = logging.getLogger()
else:
log = "Test log %i"%ii
return log
def good_test_func(ii):
print('Calling good function with arg %i'%ii)
instance = ('hello', 'world', ii)
return instance
def pool_test(func):
def callback(item):
print('This is the callback')
print('I have been given the following item: ')
print(item)
num_processes = 3
pool = mp.Pool(processes = num_processes)
results = []
for i in range(5):
res = pool.apply_async(func, (i,), callback=callback)
results.append(res)
pool.close()
pool.join()
def main():
print('#'*30)
print('Calling pool test with bad function')
print('#'*30)
pool_test(bad_test_func)
print('#'*30)
print('Calling pool test with good function')
print('#'*30)
pool_test(good_test_func)
if __name__ == '__main__':
main()
Hopefully this helpful and points you in the right direction.

Multithreading (?): Manual interference in a loop

I've been looking into a way to directly change variables in a running module.
What I want to achieve is that a load test is being run and that I can manually adjust the call pace or whatsoever.
Below some code that I just created (not-tested e.d.), just to give you an idea.
class A():
def __init__(self):
self.value = 1
def runForever(self):
while(1):
print self.value
def setValue(self, value):
self.value = value
if __name__ == '__main__':
#Some code to create the A object and directly apply the value from an human's input
a = A()
#Some parallelism or something has to be applied.
a.runForever()
a.setValue(raw_input("New value: "))
Edit #1: Yes, I know that now I will never hit the a.setValue() :-)

Here is a multi-threaded example. This code will work with the python interpreter but not with the Python Shell of IDLE, because the raw_input function is not handled the same way.
from threading import Thread
from time import sleep
class A(Thread):
def __init__(self):
Thread.__init__(self)
self.value = 1
self.stop_flag = False
def run(self):
while not self.stop_flag:
sleep(1)
print(self.value)
def set_value(self, value):
self.value = value
def stop(self):
self.stop_flag = True
if __name__ == '__main__':
a = A()
a.start()
try:
while 1:
r = raw_input()
a.set_value(int(r))
except:
a.stop()

The pseudo code you wrote is quite similar to the way Threading / Multiprocessing works in python. You will want to start a (for example) thread that "runs forever" and then instead of modifying the internal rate value directly, you will probably just send a message through a Queue that gives the new value.
Check out this question.
Here is a demonstration of doing what you asked about. I prefer to use Queues to directly making calls on threads / processes.
import Queue # !!warning. if you use multiprocessing, use multiprocessing.Queue
import threading
import time
def main():
q = Queue.Queue()
tester = Tester(q)
tester.start()
while True:
user_input = raw_input("New period in seconds or (q)uit: ")
if user_input.lower() == 'q':
break
try:
new_speed = float(user_input)
except ValueError:
new_speed = None # ignore junk
if new_speed is not None:
q.put(new_speed)
q.put(Tester.STOP_TOKEN)
class Tester(threading.Thread):
STOP_TOKEN = '<<stop>>'
def __init__(self, q):
threading.Thread.__init__(self)
self.q = q
self.speed = 1
def run(self):
while True:
# get from the queue
try:
item = self.q.get(block=False) # don't hang
except Queue.Empty:
item = None # do nothing
if item:
# stop when requested
if item == self.STOP_TOKEN:
break # stop this thread loop
# otherwise check for a new speed
try:
self.speed = float(item)
except ValueError:
pass # whatever you like with unknown input
# do your thing
self.main_code()
def main_code(self):
time.sleep(self.speed) # or whatever you want to do
if __name__ == '__main__':
main()

What if/ should I use threading to update global variables.[pythonic way]

I have a function to update a global/class variable.
So, What should care after regularly invoke such function as subthread?(in asynchronous way)
Or, any suggestions to avoid using this pattern? (the pathonic way)
import time
import threading
# through global variable or class variable
_a = 123
def update_a(): # may be called more than once
"slow updating process"
time.sleep(3)
global _a
_a += 10
return
if __name__ == '__main__':
print(_a)
th = threading.Thread(target=update_a)
th.setDaemon(True)
th.start()
print(_a)
# updating aynchrounously
time.sleep(5)
print(_a)

First of all, threads are a thing to avoid in Python altogether, but if you really want to, I'd do it like this. First, create a thread-safe object with a lock:
class ThreadSafeValue(object):
def __init__(self, init):
self._value = init
self._lock = threading.Lock()
def atomic_update(self, func):
with self._lock:
self._value = func(self._value)
#property
def value(self):
return self._value
then I'd pass that to the thread target function:
def update(val):
time.sleep(3)
val.atomic_update(lambda v: v + 10)
def main():
a = ThreadSaveValue(123)
print a.value
th = threading.Thread(target=update, args=(a,))
th.daemon = True
th.start()
print a.value
th.join()
print a.value
if __name__ == '__main__':
main()
That way you will avoid global variables and ensure the thread-safety.

This demonstrates that addition is not threadsafe (See Josiah Carlson' comment. effbot.org seems to be down right now; you can check out an archived version of the page through the wayback machine here.):
import threading
x = 0
def foo():
global x
for i in xrange(1000000):
x += 1
threads = [threading.Thread(target=foo), threading.Thread(target=foo)]
for t in threads:
t.daemon = True
t.start()
for t in threads:
t.join()
print(x)
yields some number less than 2000000. This shows that some calls to x += 1 did not properly update the variable.
The solution is to protect assignment to your global variable with a lock:
lock = threading.Lock()
def safe_foo():
global x
for i in xrange(1000000):
with lock:
x += 1
x = 0
threads = [threading.Thread(target=safe_foo), threading.Thread(target=safe_foo)]
for t in threads:
t.daemon = True
t.start()
for t in threads:
t.join()
print(x)
yields 2000000.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Variable sharing in Multiprocessing with Python? (ProcessPoolExecutor()) - python

Got it to work: def foo(x): time.sleep(1) x.string = 'hello' def foo2(x): time.sleep(1.5) print(x.string) def main(): x = Value('i') with concurrent.futures.ProcessPoolExecutor() as executor: executor.submit(foo(x)) executor.submit(foo2(x)) if name == 'main': main()

Related

Python code to benchmark in flops using threading

AttributeError: Can't pickle local object 'computation.. function1 using multiprocessing queue

apply_async callback function not being called

Multithreading (?): Manual interference in a loop

What if/ should I use threading to update global variables.[pythonic way]

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Variable sharing in Multiprocessing with Python? (ProcessPoolExecutor()) - python

Got it to work: def foo(x): time.sleep(1) x.string = 'hello' def foo2(x): time.sleep(1.5) print(x.string) def main(): x = Value('i') with concurrent.futures.ProcessPoolExecutor() as executor: executor.submit(foo(x)) executor.submit(foo2(x)) if __name__ == '__main__': main()

Related

Python code to benchmark in flops using threading

AttributeError: Can't pickle local object 'computation.. function1 using multiprocessing queue

apply_async callback function not being called

Multithreading (?): Manual interference in a loop

What if/ should I use threading to update global variables.[pythonic way]

Categories

Resources

Got it to work: def foo(x): time.sleep(1) x.string = 'hello' def foo2(x): time.sleep(1.5) print(x.string) def main(): x = Value('i') with concurrent.futures.ProcessPoolExecutor() as executor: executor.submit(foo(x)) executor.submit(foo2(x)) if name == 'main': main()