Call method on many objects in parallel

Call method on many objects in parallel - python

I wanted to use concurrency in Python for the first time. So I started reading a lot about Python concurreny (GIL, threads vs processes, multiprocessing vs concurrent.futures vs ...) and seen a lot of convoluted examples. Even in examples using the high level concurrent.futures library.
So I decided to just start trying stuff and was surprised with the very, very simple code I ended up with:
from concurrent.futures import ThreadPoolExecutor
class WebHostChecker(object):
def __init__(self, websites):
self.webhosts = []
for website in websites:
self.webhosts.append(WebHost(website))
def __iter__(self):
return iter(self.webhosts)
def check_all(self):
# sequential:
#for webhost in self:
# webhost.check()
# threaded:
with ThreadPoolExecutor(max_workers=10) as executor:
executor.map(lambda webhost: webhost.check(), self.webhosts)
class WebHost(object):
def __init__(self, hostname):
self.hostname = hostname
def check(self):
print("Checking {}".format(self.hostname))
self.check_dns() # only modifies internal state, i.e.: sets self.dns
self.check_http() # only modifies internal status, i.e.: sets self.http
Using the classes looks like this:
webhostchecker = WebHostChecker(["urla.com", "urlb.com"])
webhostchecker.check_all() # -> this calls .check() on all WebHost instances in parallel
The relevant multiprocessing/threading code is only 3 lines. I barely had to modify my existing code (which I hoped to be able to do when first starting to write the code for sequential execution, but started to doubt after reading the many examples online).
And... it works! :)
It perfectly distributes the IO-waiting among multiple threads and runs in less than 1/3 of the time of the original program.
So, now, my question(s):
What am I missing here?
Could I implement this differently? (Should I?)
Why are other examples so convoluted? (Although I must say I couldn't find an exact example doing a method call on multiple objects)
Will this code get me in trouble when I expand my program with features/code I cannot predict right now?
I think I already know of one potential problem and it would be nice if someone can confirm my reasoning: if WebHost.check() also becomes CPU bound I won't be able to swap ThreadPoolExecutor for ProcessPoolExecutor. Because every process will get cloned versions of the WebHost instances? And I would have to code something to sync those cloned instances back to the original?
Any insights/comments/remarks/improvements/... that can bring me to greater understanding will be much appreciated! :)

Ok, so I'll add my own first gotcha:
If webhost.check() raises an Exception, then the thread just ends and self.dns and/or self.http might NOT have been set. However, with the current code, you won't see the Exception, UNLESS you also access the executor.map() results! Leaving me wondering why some objects raised AttributeErrors after running check_all() :)
This can easily be fixed by just evaluating every result (which is always None, cause I'm not letting .check() return anything). You can do it after all threads have run or during. I choose to let Exceptions be raised during (ie: within the with statement), so the program stops at the first unexpected error:
def check_all(self):
with ThreadPoolExecutor(max_workers=10) as executor:
# this alone works, but does not raise any exceptions from the threads:
#executor.map(lambda webhost: webhost.check(), self.webhosts)
for i in executor.map(lambda webhost: webhost.check(), self.webhosts):
pass
I guess I could also use list(executor.map(lambda webhost: webhost.check(), self.webhosts)) but that would unnecessarily use up memory.

Related

Training a model based on time rather than epochs [duplicate]

In Python, for a toy example:
for x in range(0, 3):
# Call function A(x)
I want to continue the for loop if function A takes more than five seconds by skipping it so I won't get stuck or waste time.
By doing some search, I realized a subprocess or thread may help, but I have no idea how to implement it here.

I think creating a new process may be overkill. If you're on Mac or a Unix-based system, you should be able to use signal.SIGALRM to forcibly time out functions that take too long. This will work on functions that are idling for network or other issues that you absolutely can't handle by modifying your function. I have an example of using it in this answer:
Option for SSH to timeout after a short time? ClientAlive & ConnectTimeout don't seem to do what I need them to do
Editing my answer in here, though I'm not sure I'm supposed to do that:
import signal
class TimeoutException(Exception): # Custom exception class
pass
def timeout_handler(signum, frame): # Custom signal handler
raise TimeoutException
# Change the behavior of SIGALRM
signal.signal(signal.SIGALRM, timeout_handler)
for i in range(3):
# Start the timer. Once 5 seconds are over, a SIGALRM signal is sent.
signal.alarm(5)
# This try/except loop ensures that
# you'll catch TimeoutException when it's sent.
try:
A(i) # Whatever your function that might hang
except TimeoutException:
continue # continue the for loop if function A takes more than 5 second
else:
# Reset the alarm
signal.alarm(0)
This basically sets a timer for 5 seconds, then tries to execute your code. If it fails to complete before time runs out, a SIGALRM is sent, which we catch and turn into a TimeoutException. That forces you to the except block, where your program can continue.

Maybe someone find this decorator useful, based on TheSoundDefense answer:
import time
import signal
class TimeoutException(Exception): # Custom exception class
pass
def break_after(seconds=2):
def timeout_handler(signum, frame): # Custom signal handler
raise TimeoutException
def function(function):
def wrapper(*args, **kwargs):
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(seconds)
try:
res = function(*args, **kwargs)
signal.alarm(0) # Clear alarm
return res
except TimeoutException:
print u'Oops, timeout: %s sec reached.' % seconds, function.__name__, args, kwargs
return
return wrapper
return function
Test:
#break_after(3)
def test(a, b, c):
return time.sleep(10)
>>> test(1,2,3)
Oops, timeout: 3 sec reached. test (1, 2, 3) {}

If you can break your work up and check every so often, that's almost always the best solution. But sometimes that's not possible—e.g., maybe you're reading a file off an slow file share that every once in a while just hangs for 30 seconds. To deal with that internally, you'd have to restructure your whole program around an async I/O loop.
If you don't need to be cross-platform, you can use signals on *nix (including Mac and Linux), APCs on Windows, etc. But if you need to be cross-platform, that doesn't work.
So, if you really need to do it concurrently, you can, and sometimes you have to. In that case, you probably want to use a process for this, not a thread. You can't really kill a thread safely, but you can kill a process, and it can be as safe as you want it to be. Also, if the thread is taking 5+ seconds because it's CPU-bound, you don't want to fight with it over the GIL.
There are two basic options here.
First, you can put the code in another script and run it with subprocess:
subprocess.check_call([sys.executable, 'other_script.py', arg, other_arg],
timeout=5)
Since this is going through normal child-process channels, the only communication you can use is some argv strings, a success/failure return value (actually a small integer, but that's not much better), and optionally a hunk of text going in and a chunk of text coming out.
Alternatively, you can use multiprocessing to spawn a thread-like child process:
p = multiprocessing.Process(func, args)
p.start()
p.join(5)
if p.is_alive():
p.terminate()
As you can see, this is a little more complicated, but it's better in a few ways:
You can pass arbitrary Python objects (at least anything that can be pickled) rather than just strings.
Instead of having to put the target code in a completely independent script, you can leave it as a function in the same script.
It's more flexible—e.g., if you later need to, say, pass progress updates, it's very easy to add a queue in either or both directions.
The big problem with any kind of parallelism is sharing mutable data—e.g., having a background task update a global dictionary as part of its work (which your comments say you're trying to do). With threads, you can sort of get away with it, but race conditions can lead to corrupted data, so you have to be very careful with locking. With child processes, you can't get away with it at all. (Yes, you can use shared memory, as Sharing state between processes explains, but this is limited to simple types like numbers, fixed arrays, and types you know how to define as C structures, and it just gets you back to the same problems as threads.)
Ideally, you arrange things so you don't need to share any data while the process is running—you pass in a dict as a parameter and get a dict back as a result. This is usually pretty easy to arrange when you have a previously-synchronous function that you want to put in the background.
But what if, say, a partial result is better than no result? In that case, the simplest solution is to pass the results over a queue. You can do this with an explicit queue, as explained in Exchanging objects between processes, but there's an easier way.
If you can break the monolithic process into separate tasks, one for each value (or group of values) you wanted to stick in the dictionary, you can schedule them on a Pool—or, even better, a concurrent.futures.Executor. (If you're on Python 2.x or 3.1, see the backport futures on PyPI.)
Let's say your slow function looked like this:
def spam():
global d
for meat in get_all_meats():
count = get_meat_count(meat)
d.setdefault(meat, 0) += count
Instead, you'd do this:
def spam_one(meat):
count = get_meat_count(meat)
return meat, count
with concurrent.futures.ProcessPoolExecutor(max_workers=1) as executor:
results = executor.map(spam_one, get_canned_meats(), timeout=5)
for (meat, count) in results:
d.setdefault(meat, 0) += count
As many results as you get within 5 seconds get added to the dict; if that isn't all of them, the rest are abandoned, and a TimeoutError is raised (which you can handle however you want—log it, do some quick fallback code, whatever).
And if the tasks really are independent (as they are in my stupid little example, but of course they may not be in your real code, at least not without a major redesign), you can parallelize the work for free just by removing that max_workers=1. Then, if you run it on an 8-core machine, it'll kick off 8 workers and given them each 1/8th of the work to do, and things will get done faster. (Usually not 8x as fast, but often 3-6x as fast, which is still pretty nice.)

This seems like a better idea (sorry, I am not sure of the Python names of thing yet):
import signal
def signal_handler(signum, frame):
raise Exception("Timeout!")
signal.signal(signal.SIGALRM, signal_handler)
signal.alarm(3) # Three seconds
try:
for x in range(0, 3):
# Call function A(x)
except Exception, msg:
print "Timeout!"
signal.alarm(0) # Reset

The comments are correct in that you should check inside. Here is a potential solution. Note that an asynchronous function (by using a thread for example) is different from this solution. This is synchronous which means it will still run in series.
import time
for x in range(0,3):
someFunction()
def someFunction():
start = time.time()
while (time.time() - start < 5):
# do your normal function
return;

multiprocessing in python using pool.map_async

Hi I don't feel like I have quite understood multiprocessing in python correctly.
I want to run a function called 'run_worker' (which is simply code that runs and manages a subprocess) 20 times in parallel and wait for all the functions to complete. Each run_worker should run on a separate core/thread. I don' mind what order the processes complete hence i used async and i dont have a return value so i used map
I thought that I should use:
if __name__ == "__main__":
num_workers = 20
param_map = []
for i in range(num_workers):
param_map += [experiment_id]
pool = mp.Pool(processes= num_workers)
pool.map_async(run_worker, param_map)
pool.close()
pool.join()
However this code exits straight away and doesn't appear to execute run_worker properly. Also do I really have to create a param_map of the same experiment_id to pass to the worker because this seems like a hack to get the number of run_workers created. Ideally i would like to run a function with no parameters and no return value over multiple cores.
Note I am using windows 2019 server in AWS.
edit added run_worker which calls a subprocess which write to file:
def run_worker(experiment_id):
hostname = socket.gethostname()
experiment = conn.experiments(experiment_id).fetch()
while experiment.progress.observation_count < experiment.observation_budget:
suggestion = conn.experiments(experiment.id).suggestions().create()
value = evaluate_model(suggestion.assignments)
conn.experiments(experiment_id).observations().create(suggestion=suggestion.id,value=value,metadata=dict(hostname=hostname),)
# Update the experiment object
experiment = conn.experiments(experiment_id).fetch()

It seems that for this simple purpose you can better be using pool.map instead of pool.map_async. They both run in parallel, however pool.map is blocking until all operations are finished (see also this question). pool.map_async is especially meant for situations like this:
result = map_async(func, iterable)
while not result.ready():
// do some work while map_async is running
pass
// blocking call to get the result
out = result.get()
Regarding your question about the parameters, the fundamental idea of a map operation is to map the values of one list/array/iterable to a new list of values of the same size. As far as I can see in the docs, multiprocessing does not provide any method to run multiple functions without parameters.
If you would also share your run_worker function, that might help to get better answers to your question. That might also clear up why you would run a function without any arguments and return values using a map operation in the first place.

How to appropriatly use QThreadPool for multiple massive calculation tasks?

I am facing an issue that should be easy to resolve, yet I do feel that I tap in the dark. I was writing a simple framework which consists of the following classes:
First there is an Algorithm class which simply contains numerical procedures:
class Algorithm(object):
.
.
.
#staticmethod
def calculate(parameters):
#do stuff
return result
Then, there is an item class which holds paths to files, utility and status information. A Worker class subclasses QRunnable:
class Worker(QRunnable):
def __init__(self,item,*args,**kwargs):
self.item = item
def run(self,*args,**kwargs):
result = Algorithms.calculate(items.parameter)
item.result = result
And in a Manager class the processes are started
class Manager(object):
def __init__(self,*args,**kwargs):
self.pool = QThreadPool()
self.pool.setMaxThreadCount(4)
self.items = [item1,item2,...]
def onEvent(self):
for i in self.items:
self.pooil.start(item.requestWoker()) #returns a worker
Now the problem: After doing this I notice 2 things:
The work is done not faster (even a bit slower) then doing it with 1 thread
The items get assigned the same results! For example result A, which is the correct result for item A, gets assigned to all items!
I could not find much docu on this, so where did I go wrong?
All the best
Twerp

One limitation of the most common implementation of Python (CPython) is that its parser uses a Global Interpreter Lock which means that only one thread can be parsing Python bytes at a time. It is possible for multiple Python threads to be executing C-based Python subroutines simultaneously, and for multiple Python threads to be waiting on I/O simultaneously, but not for them to be executing Python code simultaneously. Because of that, it is common not to see any speedup when multithreading a CPU-bound Python program. Common workaround are to spawn sub-processes instead of threads (since each sub-process will use its own copy of the Python interpreter, they won't interfere with each other), or to rewrite some or all of the Python program in another language that doesn't have this limitation.

Multiprocessing errors in OS X with python2.7 on pre-El Capitan machines

The context for this is much, much too big for an SO question so the code below is a extremely simplified demonstration of the actual implementation.
Generally, I've written an extensive module for academic contexts that launches a subprocess at runtime to be used for event scheduling. When a script or program using this module closes on pre-El Capitan machines my efforts to join the child process fail, as do my last-ditch efforts to just kill the process; OS X gives a "Python unexpectedly quit" error and the the orphaned process persists. I am very much a nub to multiprocessing, without a CS background; diagnosing this is beyond me.
If I am just too ignorant, I'm more than willing to go RTFM; specific directions welcome.
I'm pretty sure this example is coherent & representative, but, know that the actual project works flawlessly on El Capitan, works during runtime on everything else, but consistently crashes as described when quitting. I've tested it with absurd time-out values (30 sec+); always the same result.
One last note: I started this with python's default multiprocessing libraries, then switched to billiard as a dev friend suggested it might run smoother. To date, I've not experienced any difference.
UPDATE:
Had omitted the function that gives the #threaded decorator purpose; now present in code.
Generally, we have:
shared_queue = billiard.Queue() # or multiprocessing, have used both
class MainInstanceParent(object):
def __init__(self):
# ..typically init stuff..
self.event_ob = EventClass(self) # gets a reference to parent
def quit():
try:
self.event_ob.send("kkbai")
started = time.time()
while time.time - started < 1: # or whatever
self.event_ob.recieve()
if self.event_ob.event_p.is_alive():
raise RuntimeError("Little bugger still kickin'")
except RuntimeError:
os.kill(self.event_on.event_p.pid, SIGKILL)
class EventClass(object):
def __init__(self, parent):
# moar init stuff
self.parent = parent
self.pipe, child = Pipe()
self.event_p = __event_process(child)
def receive():
self.pipe.poll()
t = self.pipe.recv()
if isinstance(t, Exception):
raise t
return t
def send(deets):
self.pipe.send(deets)
def threaded(func):
def threaded_func(*args, **kwargs):
p = billiard.Process(target=func, args=args, kwargs=kwargs)
p.start()
return p
return threaded_func
#threaded
def __event_process(pipe):
while True:
if pipe.poll():
inc = pipe.recv()
# do stuff conditionally on what comes through
if inc == "kkbai":
return
if inc == "meets complex condition to pass here":
shared_queue.put("stuff inferred from inc")

Before exiting the main program, call multiprocessing.active_children() to see how many child processes are still running. This will also join the processes that have already quit.
If you would need to signal the children that it's time to quit, create a multiprocessing.Event before starting the child processes. Give it a meaningful name like children_exit. The child processes should regularly call children_exit.is_set() to see if it is time for them to quit. In the main program you call children_exit.set() to signal the child processes.
Update:
Have a good look through the Programming guidelines in the multiprocessing documentation;
It is best to provide the abovementioned Event objects as argument to the target of the Process initializer for reasons mentioned in those guidelines.
If your code also needs to run on ms-windows, you have to jump through some extra hoop, since that OS doesn't do fork().
Update 2:
On your PyEval_SaveThread error; could you modify your question to show the complete trace or alternatively could you post it somewhere?
Since multiprocessing uses threads internally, this is probably the culprit, unless you are also using threads somewhere.
If you also use threads note that GUI toolkits in general and tkinter in particular are not thread safe. Tkinter calls should therefore only be made from one thread!
How much work would it be to port your code to Python 3? If it is a bug in Python 2.7, it might be already fixed in the current (as of now) Python 3.5.1.

Is this an insane implementation of producer consumer type thing?

# file1.py
class _Producer(self):
def __init__(self):
self.chunksize = 6220800
with open('/dev/zero') as f:
self.thing = f.read(self.chunksize)
self.n = 0
self.start()
def start(self):
import subprocess
import threading
def produce():
self._proc = subprocess.Popen(['producer_proc'], stdout=subprocess.PIPE)
while True:
self.thing = self._proc.stdout.read(self.chunksize)
if len(self.thing) != self.chunksize:
msg = 'Expected {0} bytes. Read {1} bytes'.format(self.chunksize, len(self.thing))
raise Exception(msg)
self.n += 1
t = threading.Thread(target=produce)
t.daemon = True
t.start()
self._thread = t
def stop(self):
if self._thread.is_alive():
self._proc.terminate()
self._thread.join(1)
producer = _Producer()
producer.start()
I have written some code more or less like the above design, and now I want to be able to consume the output of producer_proc in other files by going:
# some_other_file.py
import file1
my_thing = file1.producer.thing
Multiple other consumers might be grabbing a reference to file.producer.thing, they all need to use from the same producer_proc. And the producer_proc should never be blocked. Is this a sane implementation? Does the python GIL make it thread safe, or do I need to reimplement using a Queue for getting data of the worker thread? Do consumers need to explicitly make a copy of the thing?
I guess am trying to implement something like Producer/Consumer pattern or Observer pattern, but I'm not really clear on all the technical details of design patterns.
A single producer is constantly making things
Multiple consumers using things at arbitrary times
producer.thing should be replaced by a fresh thing as soon as the new one is available, most things will go unused but that's ok
It's OK for multiple consumers to read the same thing, or to read the same thing twice in succession. They only want to be sure they have got the most recent thing when asked for it, not some stale old thing.
A consumer should be able to keep using a thing as long as they have it in scope, even though the producer may have already overwritten his self.thing with a fresh new thing.

Given your (unusual!) requirements, your implementation seems correct. In particular,
If you're only updating one attribute, the Python GIL should be sufficient. Single bytecode instructions are atomic.
If you do anything more complex, add locking! It's basically harmless anyway - if you cared about performance or multicore scalability, you probably wouldn't be using Python!
In particular, be aware that self.thing and self.n in this code are updated in a separate bytecode instructions. The GIL could be released/acquired between, so you can't get a consistent view of the two of them unless you add locking. If you're not going to do that, I'd suggest removing self.n as it's an "attractive nuisance" (easily misused) or at least adding a comment/docstring with this caveat.
Consumers don't need to make a copy. You're not ever mutating a particular object pointed to by self.thing (and couldn't with string objects; they're immutable) and Python is garbage-collected, so as long as a consumer grabbed a reference to it, it can keep accessing it without worrying too much about what other threads are doing. The worst that could happen is your program using a lot of memory from several generations of self.thing being kept alive.
I'm a bit curious where your requirements came from. In particular, that you don't care if a thing is never used or used many times.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.