How to appropriatly use QThreadPool for multiple massive calculation tasks?

How to appropriatly use QThreadPool for multiple massive calculation tasks? - python

I am facing an issue that should be easy to resolve, yet I do feel that I tap in the dark. I was writing a simple framework which consists of the following classes:
First there is an Algorithm class which simply contains numerical procedures:
class Algorithm(object):
.
.
.
#staticmethod
def calculate(parameters):
#do stuff
return result
Then, there is an item class which holds paths to files, utility and status information. A Worker class subclasses QRunnable:
class Worker(QRunnable):
def __init__(self,item,*args,**kwargs):
self.item = item
def run(self,*args,**kwargs):
result = Algorithms.calculate(items.parameter)
item.result = result
And in a Manager class the processes are started
class Manager(object):
def __init__(self,*args,**kwargs):
self.pool = QThreadPool()
self.pool.setMaxThreadCount(4)
self.items = [item1,item2,...]
def onEvent(self):
for i in self.items:
self.pooil.start(item.requestWoker()) #returns a worker
Now the problem: After doing this I notice 2 things:
The work is done not faster (even a bit slower) then doing it with 1 thread
The items get assigned the same results! For example result A, which is the correct result for item A, gets assigned to all items!
I could not find much docu on this, so where did I go wrong?
All the best
Twerp

One limitation of the most common implementation of Python (CPython) is that its parser uses a Global Interpreter Lock which means that only one thread can be parsing Python bytes at a time. It is possible for multiple Python threads to be executing C-based Python subroutines simultaneously, and for multiple Python threads to be waiting on I/O simultaneously, but not for them to be executing Python code simultaneously. Because of that, it is common not to see any speedup when multithreading a CPU-bound Python program. Common workaround are to spawn sub-processes instead of threads (since each sub-process will use its own copy of the Python interpreter, they won't interfere with each other), or to rewrite some or all of the Python program in another language that doesn't have this limitation.

Related

Call method on many objects in parallel

I wanted to use concurrency in Python for the first time. So I started reading a lot about Python concurreny (GIL, threads vs processes, multiprocessing vs concurrent.futures vs ...) and seen a lot of convoluted examples. Even in examples using the high level concurrent.futures library.
So I decided to just start trying stuff and was surprised with the very, very simple code I ended up with:
from concurrent.futures import ThreadPoolExecutor
class WebHostChecker(object):
def __init__(self, websites):
self.webhosts = []
for website in websites:
self.webhosts.append(WebHost(website))
def __iter__(self):
return iter(self.webhosts)
def check_all(self):
# sequential:
#for webhost in self:
# webhost.check()
# threaded:
with ThreadPoolExecutor(max_workers=10) as executor:
executor.map(lambda webhost: webhost.check(), self.webhosts)
class WebHost(object):
def __init__(self, hostname):
self.hostname = hostname
def check(self):
print("Checking {}".format(self.hostname))
self.check_dns() # only modifies internal state, i.e.: sets self.dns
self.check_http() # only modifies internal status, i.e.: sets self.http
Using the classes looks like this:
webhostchecker = WebHostChecker(["urla.com", "urlb.com"])
webhostchecker.check_all() # -> this calls .check() on all WebHost instances in parallel
The relevant multiprocessing/threading code is only 3 lines. I barely had to modify my existing code (which I hoped to be able to do when first starting to write the code for sequential execution, but started to doubt after reading the many examples online).
And... it works! :)
It perfectly distributes the IO-waiting among multiple threads and runs in less than 1/3 of the time of the original program.
So, now, my question(s):
What am I missing here?
Could I implement this differently? (Should I?)
Why are other examples so convoluted? (Although I must say I couldn't find an exact example doing a method call on multiple objects)
Will this code get me in trouble when I expand my program with features/code I cannot predict right now?
I think I already know of one potential problem and it would be nice if someone can confirm my reasoning: if WebHost.check() also becomes CPU bound I won't be able to swap ThreadPoolExecutor for ProcessPoolExecutor. Because every process will get cloned versions of the WebHost instances? And I would have to code something to sync those cloned instances back to the original?
Any insights/comments/remarks/improvements/... that can bring me to greater understanding will be much appreciated! :)

Ok, so I'll add my own first gotcha:
If webhost.check() raises an Exception, then the thread just ends and self.dns and/or self.http might NOT have been set. However, with the current code, you won't see the Exception, UNLESS you also access the executor.map() results! Leaving me wondering why some objects raised AttributeErrors after running check_all() :)
This can easily be fixed by just evaluating every result (which is always None, cause I'm not letting .check() return anything). You can do it after all threads have run or during. I choose to let Exceptions be raised during (ie: within the with statement), so the program stops at the first unexpected error:
def check_all(self):
with ThreadPoolExecutor(max_workers=10) as executor:
# this alone works, but does not raise any exceptions from the threads:
#executor.map(lambda webhost: webhost.check(), self.webhosts)
for i in executor.map(lambda webhost: webhost.check(), self.webhosts):
pass
I guess I could also use list(executor.map(lambda webhost: webhost.check(), self.webhosts)) but that would unnecessarily use up memory.

Multiprocessing errors in OS X with python2.7 on pre-El Capitan machines

The context for this is much, much too big for an SO question so the code below is a extremely simplified demonstration of the actual implementation.
Generally, I've written an extensive module for academic contexts that launches a subprocess at runtime to be used for event scheduling. When a script or program using this module closes on pre-El Capitan machines my efforts to join the child process fail, as do my last-ditch efforts to just kill the process; OS X gives a "Python unexpectedly quit" error and the the orphaned process persists. I am very much a nub to multiprocessing, without a CS background; diagnosing this is beyond me.
If I am just too ignorant, I'm more than willing to go RTFM; specific directions welcome.
I'm pretty sure this example is coherent & representative, but, know that the actual project works flawlessly on El Capitan, works during runtime on everything else, but consistently crashes as described when quitting. I've tested it with absurd time-out values (30 sec+); always the same result.
One last note: I started this with python's default multiprocessing libraries, then switched to billiard as a dev friend suggested it might run smoother. To date, I've not experienced any difference.
UPDATE:
Had omitted the function that gives the #threaded decorator purpose; now present in code.
Generally, we have:
shared_queue = billiard.Queue() # or multiprocessing, have used both
class MainInstanceParent(object):
def __init__(self):
# ..typically init stuff..
self.event_ob = EventClass(self) # gets a reference to parent
def quit():
try:
self.event_ob.send("kkbai")
started = time.time()
while time.time - started < 1: # or whatever
self.event_ob.recieve()
if self.event_ob.event_p.is_alive():
raise RuntimeError("Little bugger still kickin'")
except RuntimeError:
os.kill(self.event_on.event_p.pid, SIGKILL)
class EventClass(object):
def __init__(self, parent):
# moar init stuff
self.parent = parent
self.pipe, child = Pipe()
self.event_p = __event_process(child)
def receive():
self.pipe.poll()
t = self.pipe.recv()
if isinstance(t, Exception):
raise t
return t
def send(deets):
self.pipe.send(deets)
def threaded(func):
def threaded_func(*args, **kwargs):
p = billiard.Process(target=func, args=args, kwargs=kwargs)
p.start()
return p
return threaded_func
#threaded
def __event_process(pipe):
while True:
if pipe.poll():
inc = pipe.recv()
# do stuff conditionally on what comes through
if inc == "kkbai":
return
if inc == "meets complex condition to pass here":
shared_queue.put("stuff inferred from inc")

Before exiting the main program, call multiprocessing.active_children() to see how many child processes are still running. This will also join the processes that have already quit.
If you would need to signal the children that it's time to quit, create a multiprocessing.Event before starting the child processes. Give it a meaningful name like children_exit. The child processes should regularly call children_exit.is_set() to see if it is time for them to quit. In the main program you call children_exit.set() to signal the child processes.
Update:
Have a good look through the Programming guidelines in the multiprocessing documentation;
It is best to provide the abovementioned Event objects as argument to the target of the Process initializer for reasons mentioned in those guidelines.
If your code also needs to run on ms-windows, you have to jump through some extra hoop, since that OS doesn't do fork().
Update 2:
On your PyEval_SaveThread error; could you modify your question to show the complete trace or alternatively could you post it somewhere?
Since multiprocessing uses threads internally, this is probably the culprit, unless you are also using threads somewhere.
If you also use threads note that GUI toolkits in general and tkinter in particular are not thread safe. Tkinter calls should therefore only be made from one thread!
How much work would it be to port your code to Python 3? If it is a bug in Python 2.7, it might be already fixed in the current (as of now) Python 3.5.1.

Shared pool map between processes with object-oriented python

(python2.7)
I'm trying to do a kind of scanner, that has to walk through CFG nodes, and split in different processes on branching for parallelism purpose.
The scanner is represented by an object of class Scanner. This class has one method traverse that walks through the said graph and splits if necessary.
Here how it looks:
class Scanner(object):
def __init__(self, atrb1, ...):
self.attribute1 = atrb1
self.process_pool = Pool(processes=4)
def traverse(self, ...):
[...]
if branch:
self.process_pool.map(my_func, todo_list).
My problem is the following:
How do I create a instance of multiprocessing.Pool, that is shared between all of my processes ? I want it to be shared, because since a path can be splitted again, I do not want to end with a kind of fork bomb, and having the same Pool will help me to limit the number of processes running at the same time.
The above code does not work, since Pool can not be pickled. In consequence, I have tried that:
class Scanner(object):
def __getstate__(self):
self_dict = self.__dict__.copy()
def self_dict['process_pool']
return self_dict
[...]
But obviously, it results in having self.process_pool not defined in the created processes.
Then, I tried to create a Pool as a module attribute:
process_pool = Pool(processes=4)
def my_func(x):
[...]
class Scanner(object):
def __init__(self, atrb1, ...):
self.attribute1 = atrb1
def traverse(self, ...):
[...]
if branch:
process_pool.map(my_func, todo_list)
It does not work, and this answer explains why.
But here comes the thing, wherever I create my Pool, something is missing. If I create this Pool at the end of my file, it does not see self.attribute1, the same way it did not see answer and fails with an AttributeError.
I'm not even trying to share it yet, and I'm already stuck with Multiprocessing way of doing thing.
I don't know if I have not been thinking correctly the whole thing, but I can not believe it's so complicated to handle something as simple as "having a worker pool and giving them tasks".
Thank you,
EDIT:
I resolved my first problem (AttributeError), my class had a callback as its attribute, and this callback was defined in the main script file, after the import of the scanner module... But the concurrency and "do not fork bomb" thing is still a problem.

What you want to do can't be done safely. Think about if you somehow had a single shared Pool shared across parent and worker processes, with, say, two worker processes. The parent runs a map that tries to perform two tasks, and each task needs to map two more tasks. The two parent dispatched tasks go to each worker, and the parent blocks. Each worker sends two more tasks to the shared pool and blocks for them to complete. But now all workers are now occupied, waiting for a worker to become free; you've deadlocked.
A safer approach would be to have the workers return enough information to dispatch additional tasks in the parent. Then you could do something like:
class MoreWork(object):
def __init__(self, func, *args):
self.func = func
self.args = args
pool = multiprocessing.Pool()
try:
base_task = somefunc, someargs
outstanding = collections.deque([pool.apply_async(*base_task)])
while outstanding:
result = outstanding.popleft().get()
if isinstance(result, MoreWork):
outstanding.append(pool.apply_async(result.func, result.args))
else:
... do something with a "final" result, maybe breaking the loop ...
finally:
pool.terminate()
What the functions are is up to you, they'd just return information in a MoreWork when there was more to do, not launch a task directly. The point is to ensure that by having the parent be solely responsible for task dispatch, and the workers solely responsible for task completion, you can't deadlock due to all workers being blocked waiting for tasks that are in the queue, but not being processed.
This is also not at all optimized; ideally, you wouldn't block waiting on the first item in the queue if other items in the queue were complete; it's a lot easier to do this with the concurrent.futures module, specifically with concurrent.futures.wait to wait on the first available result from an arbitrary number of outstanding tasks, but you'd need a third party PyPI package to get concurrent.futures on Python 2.7.

A couple of questions about calling methods outside of QThread from within the QThread - is my design flawed?

I have an application that has a GUI thread and many different worker threads. In this application, I have a functions.py module, which contains a lot of different "utility" functions that are used all over the application.
Yesterday the application has been released and some users (a minority, but still) has reported problems with the application crashing. I looked over my code and noticed a possible design flaw, and would like to check with the lovely people of SO and see if I am right and if this is indeed a flaw.
Suppose I have this defined in my functions.py module:
class Functions:
solveComputationSignal = Signal(str)
updateStatusSignal = Signal(int, str)
text = None
#classmethod
def setResultText(self, text):
self.text = text
#classmethod
def solveComputation(cls, platform, computation, param=None):
#Not the entirety of the method is listed here
result = urllib.urlopen(COMPUTATION_URL).read()
if param is None:
cls.solveComputationSignal.emit(result)
else:
cls.solveAlternateComputation(platform, computation)
while not self.text:
time.sleep(3)
return self.text if self.text else False
#classmethod
def updateCurrentStatus(cls, platform, statusText):
cls.updateStatusSignal.emit(platform, statusText)
I think these methods in themselves are fine. The two signals defined here are connected to in the GUI thread. The first signal pops-up a dialog in which the computation is presented. The GUI thread calls the setResultText() method and sets the resulting string as entered by the user (if anyone knows of a better way to wait until the user has inputted the text other than sleeping and waiting for self.text to become True, please let me know). The solveAlternateComputation is another method in the same class that solves the computation automatically, however, it too calls the setResultText() method that sets the resulting text.
The second signal updates the statusBar text of the main GUI as well.
What's worse is that I think the above design, while perhaps flawed, is not the problem.
The problem lies, I believe, in the way I call these methods, whihch is from the worker threads (note that I have multiple similar workers, all of which are different "platforms")
Assume I have this (and I do):
class WorkerPlatform1(QThread):
#Init and other methods are here
def run(self):
#Thread does its job here, but then when it needs to present the
#computation, instead of emitting a signal, this is what I do
self.f = functions.Functions
result = self.f.solveComputation(platform, computation)
if result:
#Go on with the task
else:
self.f.updateCurrentStatus(platform, "Error grabbing computation!")
In this case I think that my flaw is that the thread itself is not emitting any signals, but rather calling callables residing outside of that thread directly. Am I right in thinking that this could cause my application to crash? Although the faulty module is reported as QtGui4.dll
One more thing: both of these methods in the Functions class are accessed by many threads almost simultaneously. Is this even advisable - have methods residing outside of a thread be accessed by many threads all at the same time? Can it so happen that I "confuse" my program? The reason I am asking is because people who say that the application is not crashing report that, very often, the solveComputation() returns the incorrect text - not all the time, but very often. Since that COMPUTATION_URL's server can take some time to respond (even 10+ seconds), is it possible that, once a thread calls that method, while the urllib library is still waiting for server response, in that time another thread can call it, causing it to use a different COMPUTATION_URL, which will result in it returning an incorrect value on some cases?
Finally, I am thinking of solutions: for my first (crashing) problem, do you think the proper solution would be to directly emit a Signal from the thread itself, and then connect it in the GUI thread? Is that the right way to go about it?
Secondly, for the solveComputation returning incorrect values, would I solve it by moving that method (and accompanying methods) to every Worker class? then I could call them directly and hopefully have the correct response - or, dozens of different responses (since I have that many threads) - for every thread?
Thank you all and I apologize for the wall of text.
EDIT: I would like to add that when running in console with some users, this error appears QObject: Cannot create children for a parent that is in a different thread.
(Parent is QLabel(0x4795500), parent's thread is QThread(0x2d3fd90), current thread is WordpressCreator(0x49f0548)

Your design is flawed if you really are using your Functions class like this with classmethods storing results on class attributes, being shared amongst multiple workers. It should be using all instance methods, and each thread should be using an instance of this class:
class Functions(QObject):
solveComputationSignal = pyqtSignal(str)
updateStatusSignal = pyqtSignal(int, str)
def __init__(self, parent=None):
super(Functions, self).__init__(parent)
self.text = ""
def setResultText(self, text):
self.text = text
def solveComputation(self, platform, computation, param=None):
result = urllib.urlopen(COMPUTATION_URL).read()
if param is None:
self.solveComputationSignal.emit(result)
else:
self.solveAlternateComputation(platform, computation)
while not self.text:
time.sleep(3)
return self.text if self.text else False
def updateCurrentStatus(self, platform, statusText):
self.updateStatusSignal.emit(platform, statusText)
# worker_A
def run(self):
...
f = Functions()
# worker_B
def run(self):
...
f = Functions()
Also, for doing your urlopen, instead of doing sleeps to check for when it is ready, you can make use of the QNetworkAccessManager to make your requests and use signals to be notified when results are ready.

Is this an insane implementation of producer consumer type thing?

# file1.py
class _Producer(self):
def __init__(self):
self.chunksize = 6220800
with open('/dev/zero') as f:
self.thing = f.read(self.chunksize)
self.n = 0
self.start()
def start(self):
import subprocess
import threading
def produce():
self._proc = subprocess.Popen(['producer_proc'], stdout=subprocess.PIPE)
while True:
self.thing = self._proc.stdout.read(self.chunksize)
if len(self.thing) != self.chunksize:
msg = 'Expected {0} bytes. Read {1} bytes'.format(self.chunksize, len(self.thing))
raise Exception(msg)
self.n += 1
t = threading.Thread(target=produce)
t.daemon = True
t.start()
self._thread = t
def stop(self):
if self._thread.is_alive():
self._proc.terminate()
self._thread.join(1)
producer = _Producer()
producer.start()
I have written some code more or less like the above design, and now I want to be able to consume the output of producer_proc in other files by going:
# some_other_file.py
import file1
my_thing = file1.producer.thing
Multiple other consumers might be grabbing a reference to file.producer.thing, they all need to use from the same producer_proc. And the producer_proc should never be blocked. Is this a sane implementation? Does the python GIL make it thread safe, or do I need to reimplement using a Queue for getting data of the worker thread? Do consumers need to explicitly make a copy of the thing?
I guess am trying to implement something like Producer/Consumer pattern or Observer pattern, but I'm not really clear on all the technical details of design patterns.
A single producer is constantly making things
Multiple consumers using things at arbitrary times
producer.thing should be replaced by a fresh thing as soon as the new one is available, most things will go unused but that's ok
It's OK for multiple consumers to read the same thing, or to read the same thing twice in succession. They only want to be sure they have got the most recent thing when asked for it, not some stale old thing.
A consumer should be able to keep using a thing as long as they have it in scope, even though the producer may have already overwritten his self.thing with a fresh new thing.

Given your (unusual!) requirements, your implementation seems correct. In particular,
If you're only updating one attribute, the Python GIL should be sufficient. Single bytecode instructions are atomic.
If you do anything more complex, add locking! It's basically harmless anyway - if you cared about performance or multicore scalability, you probably wouldn't be using Python!
In particular, be aware that self.thing and self.n in this code are updated in a separate bytecode instructions. The GIL could be released/acquired between, so you can't get a consistent view of the two of them unless you add locking. If you're not going to do that, I'd suggest removing self.n as it's an "attractive nuisance" (easily misused) or at least adding a comment/docstring with this caveat.
Consumers don't need to make a copy. You're not ever mutating a particular object pointed to by self.thing (and couldn't with string objects; they're immutable) and Python is garbage-collected, so as long as a consumer grabbed a reference to it, it can keep accessing it without worrying too much about what other threads are doing. The worst that could happen is your program using a lot of memory from several generations of self.thing being kept alive.
I'm a bit curious where your requirements came from. In particular, that you don't care if a thing is never used or used many times.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.