Python ~ Importing a class that implements multiprocessing fails on Windows 7 - python

I have three files in a folder:
MultiProcFunctions.py
The idea is to take any function and parallelize it
import multiprocessing
from multiprocessing import Manager
def MultiProcDecorator(f,*args):
"""
Takes a function f, and formats it so that results are saved to a shared dict
"""
def g(procnum,return_dict,*args):
result = f(*args)
return_dict[procnum] = result
g.__module__ = "__main__"
return g
def MultiProcFunction(f,n_procs,*args):
"""
Takes a function f, and runs it in n_procs with given args
"""
manager = Manager()
return_dict = manager.dict()
jobs = []
for i in range(n_procs):
p = multiprocessing.Process( target = f, args = (i,return_dict) + args )
jobs.append(p)
p.start()
for proc in jobs:
proc.join()
return dict(return_dict)
MultiProcClass.py
A file that defines a class which makes use of the above functions to parallelize the sq function:
from MultiProcFunctions import MultiProcDecorator, MultiProcFunction
def sq(x):
return x**2
g = MultiProcDecorator(sq)
class Square:
def __init__(self):
pass
def f(self,x):
return MultiProcFunction(g,2,x)
MultiProcTest.py
Finally, I have a third file that imports the class above and tries to call the f method:
from MultiProcClass import Square
s = Square()
print s.f(2)
However, this yields an error:
File "C:\Python27\lib\multiprocessing\managers.py", line 528, in start
self._address = reader.recv()
EOFError
I am on Windows 7, and also tried:
from MultiProcClass import Square
if __name__ == "__main__":
s = Square()
print s.f(2)
In this case, I got a different error:
PicklingError: Can't pickle <function g at 0x01F62530>: it's not found as __main__.g
Not sure how to make heads or tails of this. I get neither error on Ubuntu 12.04 LTS, where all of this works flawlessly; so the error definitely has to do with how Windows does things, but I can't put my finger on it. Any insight is highly appreciated!

I think you get it on Windows because under Windows you start a new Python process whereas in Linux you fork the process. That means in Windows you need to serialize and deserialize the function whereas in Linux the pointer can be used. Finding a function required module and nae of the function to point to it.
g.__module__ should equal f.__module__.
Also these answers might help further how to decorate functions for picklability and usability.

Related

multiple instance of discord.py bots with multiprocessing issue with passing object to arguments of multiprocessing.Process() [duplicate]

I am sorry that I can't reproduce the error with a simpler example, and my code is too complicated to post. If I run the program in IPython shell instead of the regular Python, things work out well.
I looked up some previous notes on this problem. They were all caused by using pool to call function defined within a class function. But this is not the case for me.
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 313, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
I would appreciate any help.
Update: The function I pickle is defined at the top level of the module. Though it calls a function that contains a nested function. i.e, f() calls g() calls h() which has a nested function i(), and I am calling pool.apply_async(f). f(), g(), h() are all defined at the top level. I tried simpler example with this pattern and it works though.
Here is a list of what can be pickled. In particular, functions are only picklable if they are defined at the top-level of a module.
This piece of code:
import multiprocessing as mp
class Foo():
#staticmethod
def work(self):
pass
if __name__ == '__main__':
pool = mp.Pool()
foo = Foo()
pool.apply_async(foo.work)
pool.close()
pool.join()
yields an error almost identical to the one you posted:
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 315, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
The problem is that the pool methods all use a mp.SimpleQueue to pass tasks to the worker processes. Everything that goes through the mp.SimpleQueue must be pickable, and foo.work is not picklable since it is not defined at the top level of the module.
It can be fixed by defining a function at the top level, which calls foo.work():
def work(foo):
foo.work()
pool.apply_async(work,args=(foo,))
Notice that foo is pickable, since Foo is defined at the top level and foo.__dict__ is picklable.
I'd use pathos.multiprocesssing, instead of multiprocessing. pathos.multiprocessing is a fork of multiprocessing that uses dill. dill can serialize almost anything in python, so you are able to send a lot more around in parallel. The pathos fork also has the ability to work directly with multiple argument functions, as you need for class methods.
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> p = Pool(4)
>>> class Test(object):
... def plus(self, x, y):
... return x+y
...
>>> t = Test()
>>> p.map(t.plus, x, y)
[4, 6, 8, 10]
>>>
>>> class Foo(object):
... #staticmethod
... def work(self, x):
... return x+1
...
>>> f = Foo()
>>> p.apipe(f.work, f, 100)
<processing.pool.ApplyResult object at 0x10504f8d0>
>>> res = _
>>> res.get()
101
Get pathos (and if you like, dill) here:
https://github.com/uqfoundation
When this problem comes up with multiprocessing a simple solution is to switch from Pool to ThreadPool. This can be done with no change of code other than the import-
from multiprocessing.pool import ThreadPool as Pool
This works because ThreadPool shares memory with the main thread, rather than creating a new process- this means that pickling is not required.
The downside to this method is that python isn't the greatest language with handling threads- it uses something called the Global Interpreter Lock to stay thread safe, which can slow down some use cases here. However, if you're primarily interacting with other systems (running HTTP commands, talking with a database, writing to filesystems) then your code is likely not bound by CPU and won't take much of a hit. In fact I've found when writing HTTP/HTTPS benchmarks that the threaded model used here has less overhead and delays, as the overhead from creating new processes is much higher than the overhead for creating new threads and the program was otherwise just waiting for HTTP responses.
So if you're processing a ton of stuff in python userspace this might not be the best method.
As others have said multiprocessing can only transfer Python objects to worker processes which can be pickled. If you cannot reorganize your code as described by unutbu, you can use dills extended pickling/unpickling capabilities for transferring data (especially code data) as I show below.
This solution requires only the installation of dill and no other libraries as pathos:
import os
from multiprocessing import Pool
import dill
def run_dill_encoded(payload):
fun, args = dill.loads(payload)
return fun(*args)
def apply_async(pool, fun, args):
payload = dill.dumps((fun, args))
return pool.apply_async(run_dill_encoded, (payload,))
if __name__ == "__main__":
pool = Pool(processes=5)
# asyn execution of lambda
jobs = []
for i in range(10):
job = apply_async(pool, lambda a, b: (a, b, a * b), (i, i + 1))
jobs.append(job)
for job in jobs:
print job.get()
print
# async execution of static method
class O(object):
#staticmethod
def calc():
return os.getpid()
jobs = []
for i in range(10):
job = apply_async(pool, O.calc, ())
jobs.append(job)
for job in jobs:
print job.get()
I have found that I can also generate exactly that error output on a perfectly working piece of code by attempting to use the profiler on it.
Note that this was on Windows (where the forking is a bit less elegant).
I was running:
python -m profile -o output.pstats <script>
And found that removing the profiling removed the error and placing the profiling restored it. Was driving me batty too because I knew the code used to work. I was checking to see if something had updated pool.py... then had a sinking feeling and eliminated the profiling and that was it.
Posting here for the archives in case anybody else runs into it.
Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
This error will also come if you have any inbuilt function inside the model object that was passed to the async job.
So make sure to check the model objects that are passed doesn't have inbuilt functions. (In our case we were using FieldTracker() function of django-model-utils inside the model to track a certain field). Here is the link to relevant GitHub issue.
This solution requires only the installation of dill and no other libraries as pathos
def apply_packed_function_for_map((dumped_function, item, args, kwargs),):
"""
Unpack dumped function as target function and call it with arguments.
:param (dumped_function, item, args, kwargs):
a tuple of dumped function and its arguments
:return:
result of target function
"""
target_function = dill.loads(dumped_function)
res = target_function(item, *args, **kwargs)
return res
def pack_function_for_map(target_function, items, *args, **kwargs):
"""
Pack function and arguments to object that can be sent from one
multiprocessing.Process to another. The main problem is:
«multiprocessing.Pool.map*» or «apply*»
cannot use class methods or closures.
It solves this problem with «dill».
It works with target function as argument, dumps it («with dill»)
and returns dumped function with arguments of target function.
For more performance we dump only target function itself
and don't dump its arguments.
How to use (pseudo-code):
~>>> import multiprocessing
~>>> images = [...]
~>>> pool = multiprocessing.Pool(100500)
~>>> features = pool.map(
~... *pack_function_for_map(
~... super(Extractor, self).extract_features,
~... images,
~... type='png'
~... **options,
~... )
~... )
~>>>
:param target_function:
function, that you want to execute like target_function(item, *args, **kwargs).
:param items:
list of items for map
:param args:
positional arguments for target_function(item, *args, **kwargs)
:param kwargs:
named arguments for target_function(item, *args, **kwargs)
:return: tuple(function_wrapper, dumped_items)
It returs a tuple with
* function wrapper, that unpack and call target function;
* list of packed target function and its' arguments.
"""
dumped_function = dill.dumps(target_function)
dumped_items = [(dumped_function, item, args, kwargs) for item in items]
return apply_packed_function_for_map, dumped_items
It also works for numpy arrays.
A quick fix is to make the function global
from multiprocessing import Pool
class Test:
def __init__(self, x):
self.x = x
#staticmethod
def test(x):
return x**2
def test_apply(self, list_):
global r
def r(x):
return Test.test(x + self.x)
with Pool() as p:
l = p.map(r, list_)
return l
if __name__ == '__main__':
o = Test(2)
print(o.test_apply(range(10)))
Building on #rocksportrocker solution,
It would make sense to dill when sending and RECVing the results.
import dill
import itertools
def run_dill_encoded(payload):
fun, args = dill.loads(payload)
res = fun(*args)
res = dill.dumps(res)
return res
def dill_map_async(pool, fun, args_list,
as_tuple=True,
**kw):
if as_tuple:
args_list = ((x,) for x in args_list)
it = itertools.izip(
itertools.cycle([fun]),
args_list)
it = itertools.imap(dill.dumps, it)
return pool.map_async(run_dill_encoded, it, **kw)
if __name__ == '__main__':
import multiprocessing as mp
import sys,os
p = mp.Pool(4)
res = dill_map_async(p, lambda x:[sys.stdout.write('%s\n'%os.getpid()),x][-1],
[lambda x:x+1]*10,)
res = res.get(timeout=100)
res = map(dill.loads,res)
print(res)
As #penky Suresh has suggested in this answer, don't use built-in keywords.
Apparently args is a built-in keyword when dealing with multiprocessing
class TTS:
def __init__(self):
pass
def process_and_render_items(self):
multiprocessing_args = [{"a": "b", "c": "d"}, {"e": "f", "g": "h"}]
with ProcessPoolExecutor(max_workers=10) as executor:
# Using args here is fine.
future_processes = {
executor.submit(TTS.process_and_render_item, args)
for args in multiprocessing_args
}
for future in as_completed(future_processes):
try:
data = future.result()
except Exception as exc:
print(f"Generated an exception: {exc}")
else:
print(f"Generated data for comment process: {future}")
# Dont use 'args' here. It seems to be a built-in keyword.
# Changing 'args' to 'arg' worked for me.
def process_and_render_item(arg):
print(arg)
# This will print {"a": "b", "c": "d"} for the first process
# and {"e": "f", "g": "h"} for the second process.
PS: The tabs/spaces maybe a bit off.

Python - Limiting the Number of Threads while passing arguments

I am trying to run some threads using a thread limiter to keep number of threads to 10. I had an example to use as guide but I need to pass in some arguments to the function when the calling the thread. I am struggling with the passing of arguments. I marked with ### the areas where I am not sure the syntax and where I think my problem is.
I am trying to use this The right way to limit maximum number of threads running at once? as a guide. Here is the sample code I am trying to follow. My example is below that. Any time I try to pass in all the arguments I get back TypeError: __ init__() takes 1 to 6 arguments but 17 were passed In my example below I cut the arguments down to 4 to make it easier to read but I have 17 arguments in my live code. In example I keep the arguments down to run, main_path, target_path, jiranum to keep reading easier
threadLimiter = threading.BoundedSemaphore(maximumNumberOfThreads)
class MyThread(threading.Thread):
def run(self):
threadLimiter.acquire()
try:
self.Executemycode()
finally:
threadLimiter.release()
def Executemycode(self):
print(" Hello World!")
# <your code here>
My code
import os
import sys
import threading
threadLimiter = threading.BoundedSemaphore(10)
class MyThread(threading.Thread):
def run(self): ### I also tried (run, main_path, target_path, jiranum)
threadLimiter.acquire()
try:
self.run_compare(run, main_path, target_path, jiranum) #### I also tried self
finally:
threadLimiter.release()
def run_compare(run, main_path, target_path, jiranum): #### ???
os.chdir(target_path)
os.system(main_path + ', ' + target_path + ',' + jiranum + ',' + run)
if __name__ == '__main__':
#set the needed variables
threads = []
for i in range (1, int(run)+1):
process=threading.Thread(target=MyThread, args=(str(i), main_path, target_path, jiranum)) #### Is this defined right?
process.start()
threads.append(process)
for process in threads:
process.join()
This would probably be a simpler task with concurrent.futures but I like getting my hands dirty, so here we go. A few suggestions:
I find classes as thread targets often complicate things, so if there's no compelling reason, keep it simple
It's easier to use a with block to acquire and release a semaphore, and a regular semaphore usually suffices in that case
17 arguments can get messy; I would build a tuple of the arguments outside the call to threading.Thread() so it's easier to read, then unpack the tuple in the thread
This should work as a simple example; os.system() just echoes something and sleeps, so you can see the thread count is limited by the semaphore.
import os
import threading
from random import randint
threadLimiter = threading.Semaphore(10)
def run_config(*args):
run, arg1, arg2 = args # unpack the 17 args by name
with threadLimiter:
seconds = randint(2, 7)
os.system(f"echo run {run}, args {arg1} {arg2} ; sleep {seconds}")
if __name__ == '__main__':
threads = []
run = "20" # I guess this is a string because of below?
for i in range (1, int(run)+1):
thr_args = (str(i), "arg1",
"arg2") # put the 17 args here
thr = threading.Thread(target=run_config, args=thr_args)
thr.start()
threads.append(thr)
for thr in threads:
thr.join()

pyQt5 issues with futures ProcessPoolExecutor() [duplicate]

I am sorry that I can't reproduce the error with a simpler example, and my code is too complicated to post. If I run the program in IPython shell instead of the regular Python, things work out well.
I looked up some previous notes on this problem. They were all caused by using pool to call function defined within a class function. But this is not the case for me.
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 313, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
I would appreciate any help.
Update: The function I pickle is defined at the top level of the module. Though it calls a function that contains a nested function. i.e, f() calls g() calls h() which has a nested function i(), and I am calling pool.apply_async(f). f(), g(), h() are all defined at the top level. I tried simpler example with this pattern and it works though.
Here is a list of what can be pickled. In particular, functions are only picklable if they are defined at the top-level of a module.
This piece of code:
import multiprocessing as mp
class Foo():
#staticmethod
def work(self):
pass
if __name__ == '__main__':
pool = mp.Pool()
foo = Foo()
pool.apply_async(foo.work)
pool.close()
pool.join()
yields an error almost identical to the one you posted:
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 315, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
The problem is that the pool methods all use a mp.SimpleQueue to pass tasks to the worker processes. Everything that goes through the mp.SimpleQueue must be pickable, and foo.work is not picklable since it is not defined at the top level of the module.
It can be fixed by defining a function at the top level, which calls foo.work():
def work(foo):
foo.work()
pool.apply_async(work,args=(foo,))
Notice that foo is pickable, since Foo is defined at the top level and foo.__dict__ is picklable.
I'd use pathos.multiprocesssing, instead of multiprocessing. pathos.multiprocessing is a fork of multiprocessing that uses dill. dill can serialize almost anything in python, so you are able to send a lot more around in parallel. The pathos fork also has the ability to work directly with multiple argument functions, as you need for class methods.
>>> from pathos.multiprocessing import ProcessingPool as Pool
>>> p = Pool(4)
>>> class Test(object):
... def plus(self, x, y):
... return x+y
...
>>> t = Test()
>>> p.map(t.plus, x, y)
[4, 6, 8, 10]
>>>
>>> class Foo(object):
... #staticmethod
... def work(self, x):
... return x+1
...
>>> f = Foo()
>>> p.apipe(f.work, f, 100)
<processing.pool.ApplyResult object at 0x10504f8d0>
>>> res = _
>>> res.get()
101
Get pathos (and if you like, dill) here:
https://github.com/uqfoundation
When this problem comes up with multiprocessing a simple solution is to switch from Pool to ThreadPool. This can be done with no change of code other than the import-
from multiprocessing.pool import ThreadPool as Pool
This works because ThreadPool shares memory with the main thread, rather than creating a new process- this means that pickling is not required.
The downside to this method is that python isn't the greatest language with handling threads- it uses something called the Global Interpreter Lock to stay thread safe, which can slow down some use cases here. However, if you're primarily interacting with other systems (running HTTP commands, talking with a database, writing to filesystems) then your code is likely not bound by CPU and won't take much of a hit. In fact I've found when writing HTTP/HTTPS benchmarks that the threaded model used here has less overhead and delays, as the overhead from creating new processes is much higher than the overhead for creating new threads and the program was otherwise just waiting for HTTP responses.
So if you're processing a ton of stuff in python userspace this might not be the best method.
As others have said multiprocessing can only transfer Python objects to worker processes which can be pickled. If you cannot reorganize your code as described by unutbu, you can use dills extended pickling/unpickling capabilities for transferring data (especially code data) as I show below.
This solution requires only the installation of dill and no other libraries as pathos:
import os
from multiprocessing import Pool
import dill
def run_dill_encoded(payload):
fun, args = dill.loads(payload)
return fun(*args)
def apply_async(pool, fun, args):
payload = dill.dumps((fun, args))
return pool.apply_async(run_dill_encoded, (payload,))
if __name__ == "__main__":
pool = Pool(processes=5)
# asyn execution of lambda
jobs = []
for i in range(10):
job = apply_async(pool, lambda a, b: (a, b, a * b), (i, i + 1))
jobs.append(job)
for job in jobs:
print job.get()
print
# async execution of static method
class O(object):
#staticmethod
def calc():
return os.getpid()
jobs = []
for i in range(10):
job = apply_async(pool, O.calc, ())
jobs.append(job)
for job in jobs:
print job.get()
I have found that I can also generate exactly that error output on a perfectly working piece of code by attempting to use the profiler on it.
Note that this was on Windows (where the forking is a bit less elegant).
I was running:
python -m profile -o output.pstats <script>
And found that removing the profiling removed the error and placing the profiling restored it. Was driving me batty too because I knew the code used to work. I was checking to see if something had updated pool.py... then had a sinking feeling and eliminated the profiling and that was it.
Posting here for the archives in case anybody else runs into it.
Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
This error will also come if you have any inbuilt function inside the model object that was passed to the async job.
So make sure to check the model objects that are passed doesn't have inbuilt functions. (In our case we were using FieldTracker() function of django-model-utils inside the model to track a certain field). Here is the link to relevant GitHub issue.
This solution requires only the installation of dill and no other libraries as pathos
def apply_packed_function_for_map((dumped_function, item, args, kwargs),):
"""
Unpack dumped function as target function and call it with arguments.
:param (dumped_function, item, args, kwargs):
a tuple of dumped function and its arguments
:return:
result of target function
"""
target_function = dill.loads(dumped_function)
res = target_function(item, *args, **kwargs)
return res
def pack_function_for_map(target_function, items, *args, **kwargs):
"""
Pack function and arguments to object that can be sent from one
multiprocessing.Process to another. The main problem is:
«multiprocessing.Pool.map*» or «apply*»
cannot use class methods or closures.
It solves this problem with «dill».
It works with target function as argument, dumps it («with dill»)
and returns dumped function with arguments of target function.
For more performance we dump only target function itself
and don't dump its arguments.
How to use (pseudo-code):
~>>> import multiprocessing
~>>> images = [...]
~>>> pool = multiprocessing.Pool(100500)
~>>> features = pool.map(
~... *pack_function_for_map(
~... super(Extractor, self).extract_features,
~... images,
~... type='png'
~... **options,
~... )
~... )
~>>>
:param target_function:
function, that you want to execute like target_function(item, *args, **kwargs).
:param items:
list of items for map
:param args:
positional arguments for target_function(item, *args, **kwargs)
:param kwargs:
named arguments for target_function(item, *args, **kwargs)
:return: tuple(function_wrapper, dumped_items)
It returs a tuple with
* function wrapper, that unpack and call target function;
* list of packed target function and its' arguments.
"""
dumped_function = dill.dumps(target_function)
dumped_items = [(dumped_function, item, args, kwargs) for item in items]
return apply_packed_function_for_map, dumped_items
It also works for numpy arrays.
A quick fix is to make the function global
from multiprocessing import Pool
class Test:
def __init__(self, x):
self.x = x
#staticmethod
def test(x):
return x**2
def test_apply(self, list_):
global r
def r(x):
return Test.test(x + self.x)
with Pool() as p:
l = p.map(r, list_)
return l
if __name__ == '__main__':
o = Test(2)
print(o.test_apply(range(10)))
Building on #rocksportrocker solution,
It would make sense to dill when sending and RECVing the results.
import dill
import itertools
def run_dill_encoded(payload):
fun, args = dill.loads(payload)
res = fun(*args)
res = dill.dumps(res)
return res
def dill_map_async(pool, fun, args_list,
as_tuple=True,
**kw):
if as_tuple:
args_list = ((x,) for x in args_list)
it = itertools.izip(
itertools.cycle([fun]),
args_list)
it = itertools.imap(dill.dumps, it)
return pool.map_async(run_dill_encoded, it, **kw)
if __name__ == '__main__':
import multiprocessing as mp
import sys,os
p = mp.Pool(4)
res = dill_map_async(p, lambda x:[sys.stdout.write('%s\n'%os.getpid()),x][-1],
[lambda x:x+1]*10,)
res = res.get(timeout=100)
res = map(dill.loads,res)
print(res)
As #penky Suresh has suggested in this answer, don't use built-in keywords.
Apparently args is a built-in keyword when dealing with multiprocessing
class TTS:
def __init__(self):
pass
def process_and_render_items(self):
multiprocessing_args = [{"a": "b", "c": "d"}, {"e": "f", "g": "h"}]
with ProcessPoolExecutor(max_workers=10) as executor:
# Using args here is fine.
future_processes = {
executor.submit(TTS.process_and_render_item, args)
for args in multiprocessing_args
}
for future in as_completed(future_processes):
try:
data = future.result()
except Exception as exc:
print(f"Generated an exception: {exc}")
else:
print(f"Generated data for comment process: {future}")
# Dont use 'args' here. It seems to be a built-in keyword.
# Changing 'args' to 'arg' worked for me.
def process_and_render_item(arg):
print(arg)
# This will print {"a": "b", "c": "d"} for the first process
# and {"e": "f", "g": "h"} for the second process.
PS: The tabs/spaces maybe a bit off.

terminate all processes in a Pool

I have a python script that looks like follows:
import os
import tempfile
from multiprocessing import Pool
def runReport(a, b, c):
# do task.
temp_dir = tempfile.gettempdir()
if (os.path.isfile(temp_dir + "/stop_check")):
# How to terminate all processes in the pool here?
def runReports(args):
return runReport(*args)
def main(argv):
pool = Pool(4)
args = []
# Code to generate args. args is an array of tuples of form (a, b, c)
pool.map(runReports, args)
if (__name__ == '__main__'):
main(sys.argv[1:])
There is another python script that creates this file /tmp/stop_check.
When this file gets created, I need to terminate the Pool. How can I achieve this?
Only the parent process can terminate the pool. You're better off having the parent run a loop that checks for the existence of that file, rather than trying to have each child do it and then signal the parent somehow:
import os
import sys
import time
import tempfile
from multiprocessing import Pool
def runReport(*args):
# do task
def runReports(args):
return runReport(*args)
def main(argv):
pool = Pool(4)
args = []
# Code to generate args. args is an array of tuples of form (a, b, c)
result = pool.map_async(runReports, args)
temp_dir = tempfile.gettempdir()
while not result.ready():
if os.path.isfile(temp_dir + "/stop_check"):
pool.terminate()
break
result.wait(.5) # Wait a bit to avoid pegging the CPU. You can tune this value as you see fit.
if (__name__ == '__main__'):
main(sys.argv[1:])
By using map_async instead of map, you're free to have the parent use a loop to check for the existence of the file, and then terminate the pool when necessary. Do not that using terminate to kill the children means that they won't get to do any clean up at all, so you need to make sure none of them access resources that could get left in an inconsistent state if the process dies while using them.

Python threading: how to use return values of external scripts?

(I found a decent solution here for this, but unfortunately I'm using IronPython which does not implement the mutliprocessing module ...)
Driving script Threader.py will call Worker.py's single function twice, using the threading module.
Its single function just fetches a dictionary of data.
Roughly speaking:
Worker.py
def GetDict():
:
:
:
return theDict
Threader.py
import threading
from Worker import GetDict
:
:
:
def ThreadStart():
t = threading.Thread(target=GetDict)
t.start()
:
:
In the driver script Threader.py, I want to be able to operate on the two dictionaries outputted by the 2 instances of Worker.py.
The accepted answer here involving the Queue module seems to be what I need in terms of accessing return values, but this is written from the point of view of everthing being doen in a single script. How do I go about making the return values of the function called in Worker.py available to Threader.py (or any other script for that matter)?
Many thanks
another way to do what you want (without using a Queue) would be by using the concurrent.futures module (from python3.2, for earlier versions there is a backport).
using this, your example would work like this:
from concurrent import futures
def GetDict():
return {'foo':'bar'}
# imports ...
# from Worker import GetDict
def ThreadStart():
executor = futures.ThreadPoolExecutor(max_workers=4)
future = executor.submit(GetDict)
print(future.result()) # blocks until GetDict finished
# or doing more then one:
jobs = [executor.submit(GetDict) for i in range(10)]
for j in jobs:
print(j.result())
if __name__ == '__main__':
ThreadStart()
edit:
something similar woule be to use your own thread to execute the target function and save it's return value, something like this:
from threading import Thread
def GetDict():
return {'foo':'bar'}
# imports ...
# from Worker import GetDict
class WorkerThread(Thread):
def __init__(self, fnc, *args, **kwargs):
super(WorkerThread, self).__init__()
self.fnc = fnc
self.args = args
self.kwargs = kwargs
def run(self):
self.result = self.fnc(*self.args, **self.kwargs)
def ThreadStart():
jobs = [WorkerThread(GetDict) for i in range(10)]
for j in jobs:
j.start()
for j in jobs:
j.join()
print(j.result)
if __name__ == '__main__':
ThreadStart()

Categories

Resources