Best way to output concurrent task results in shell on the go - python

That is maybe a very basic question, and I can think of solutions, but I wondered if there is a more elegant one I don't know of (quick googling didn't bring anything useful).
I wrote a script to communicate to a remote device. However, now that I have more than one of that type, I thought I just make the communication in concurrent futures and handle it simultaneously:
with concurrent.futures.ThreadPoolExecutor(20) as executor:
executor.map(device_ctl, ids, repeat(args))
So it just calls up to 20 threads of device_ctl with respective IDs and the same args. device_ctl is now printing some results, but since they all run in parallel, it gets mixed up and looks messy. Ideally I could have 1 line per ID that shows the current state of the communication and gets updated once it changes, e.g.:
Dev1 Connecting...
Dev2 Connected! Status: Idle
Dev3 Connected! Status: Updating
However, I don't really know how to solve it nicely. I can think of a status list that outside of the threads gets assembled into one status string, which gets frequently updated. But it feels like there could be a simpler method! Ideas?

Since there was no good answer, I made my own solution, which is quite compact but efficient. I define a class that I call globally. Each thread populates it or updates a value based on its ID. The ID is meant to be the same list entry as taken for the thread. Here I made a simple example how to use it:
class collect:
ids = []
outs = []
LINE_UP = "\033[1A"
LINE_CLEAR = "\x1b[2K"
printed = 0
def init(list):
collect.ids = [i for i in list]
collect.outs = ["" for i in list]
collect.printall()
def write(id, out):
if id not in collect.ids:
collect.ids.append(id)
collect.outs.append(out)
else:
collect.outs[collect.ids.index(id)] = out
def writeout(id, out):
if id not in collect.ids:
collect.ids.append(str(id))
collect.outs.append(str(out))
else:
collect.outs[collect.ids.index(id)] = str(out)
collect.printall()
def append(id, out):
if id not in collect.ids:
collect.ids.append(str(id))
collect.outs.append(str(out))
else:
collect.outs[collect.ids.index(id)] += str(out)
def appendout(id, out):
if id not in collect.ids:
collect.ids.append(id)
collect.outs.append(out)
else:
collect.outs[collect.ids.index(id)] += str(out)
collect.printall()
def read(id):
return collect.outs[collect.ids.index(str(id))]
def readall():
return collect.outs, "\n".join(collect.outs)
def printall(filter=""):
if collect.printed > 0:
print(collect.LINE_CLEAR + collect.LINE_UP * len(collect.ids), end="")
print(
"\n".join(
[
collect.ids[i] + "\t" + collect.outs[i] + " " * 30
for i in range(len(collect.outs))
if filter in collect.ids[i]
]
)
)
collect.printed = len(collect.ids)
def device_ctl(id,args):
collect.writeout(id,"Connecting...")
if args.connected:
collect.writeout(id,"Connected")
collect.init(ids)
with concurrent.futures.ThreadPoolExecutor(20) as executor:
executor.map(device_ctl, ids, repeat(args))

Related

multiprocessing a function with parameters that are iterated through

I'm trying to improve the speed of my program and I decided to use multiprocessing!
the problem is I can't seem to find any way to use the pool function (i think this is what i need) to use my function
here is the code that i am dealing with:
def dataLoading(output):
name = ""
link = ""
upCheck = ""
isSuccess = ""
for i in os.listdir():
with open(i) as currentFile:
data = json.loads(currentFile.read())
try:
name = data["name"]
link = data["link"]
upCheck = data["upCheck"]
isSuccess = data["isSuccess"]
except:
print("error in loading data from config: improper naming or formating used")
output[name] = [link, upCheck, isSuccess]
#working
def userCheck(link, user, isSuccess):
link = link.replace("<USERNAME>", user)
isSuccess = isSuccess.replace("<USERNAME>", user)
html = requests.get(link, headers=headers)
page_source = html.text
count = page_source.count(isSuccess)
if count > 0:
return True
else:
return False
I have a parent function to run these two together but I don't think i need to show the whole thing, just the part that gets the data iteratively:
for i in configData:
data = configData[i]
link = data[0]
print(link)
upCheck = data[1] #just for future use
isSuccess = data[2]
if userCheck(link, username, isSuccess) == True:
good.append(i)
you can see how I enter all of the data in there, how would I be able to use multiprocessing to do this when I am iterating through the dictionary to collect multiple parameters?
I like to use mp.Pool().map. I think it is easiest and most straight forward and handles most multiprocessing cases. So how does map work? For starts, we have to keep in mind that mp creates workers, each worker receives a copy of the namespace (ya the whole thing), then each worker works on what they are assigned and returns. Hence, doing something like "updating a global variable" while they work, doesn't work; since they are each going to receive a copy of the global variable and none of the workers are going to be communicating. (If you want communicating workers you need to use mp.Queue's and such, it gets complicated). Anyway, here is using map:
from multiprocessing import Pool
t = 'abcd'
def func(s):
return t[int(s)]
results = Pool().map(func,range(4))
Each worker received a copy of t, func, and the portion of range(4) they were assigned. They are then automatically tracked and everything is cleaned up in the end by Pool.
Something like your dataLoading won't work very well, we need to modify it. I also cleaned the code a little.
def loadfromfile(file):
data = json.loads(open(file).read())
items = [data.get(k,"") for k in ['name','link','upCheck','isSuccess']]
return items[0],items[1:]
output = dict(Pool().map(loadfromfile,os.listdir()))

Python: How to return values in a multiprocessing situation?

Let's say I have a collection of Process-es, a[0] through a[m].
These processes will then send a job, via a queue, to another collection of Process-es, b[0] through b[n], where m > n
Or, to diagram:
a[0], a[1], ..., a[m] ---Queue---> b[0], b[1], ..., b[n]
Now, how do I return the result of the b processes to the relevant a process?
My first guess was using multiprocessing.Pipe()
So, I've tried doing the following:
## On the 'a' side
pipe = multiprocessing.Pipe()
job['pipe'] = pipe
queue.put(job)
rslt = pipe[0].recv()
## On the 'b' side
job = queue.get()
... process the job ...
pipe = job['pipe']
pipe.send(result)
and it doesn't work with the error: Required argument 'handle' (pos 1) not found
Reading many docs, I came up with:
## On the 'a' side
pipe = multiprocessing.Pipe()
job['pipe'] = multiprocessing.reduction.reduce_connection(pipe[1])
queue.put(job)
rslt = pipe[0].recv()
## On the 'b' side
job = queue.get()
... process the job ...
pipe = multiprocessing.reduction.rebuild_connection(job['pipe'], True, True)
pipe.send(result)
Now I get a different error: ValueError: need more than 2 values to unpack.
I've tried searching and searching and still can't find how to properly use the reduce_ and rebuild_ methods.
Please help so I can return the value from b to a.
I would recommend to avoid using this movement of Pipe and file descriptors (last time I tried, it was not very standard and not very well documented). Having to deal with it was a pain, I do not recommend it :-/
I would suggest a different approach: let the main manage the connections. Keep a work queue, but sent the responses in a different path. This means that you need some kind of identifier for the threads. I will provide a toy implementation to illustrate my proposal:
#!/usr/bin/env python
import multiprocessing
import random
def fib(n):
"Slow fibonacci implementation because why not"
if n < 2:
return n
return fib(n-2) + fib(n-1)
def process_b(queue_in, queue_out):
print "Starting process B"
while True:
j = queue_in.get()
print "Job: %d" % j["val"]
j["result"] = fib(j["val"])
queue_out.put(j)
def process_a(index, pipe_end, queue):
print "Starting process A"
value = random.randint(5, 50)
j = {
"a_id": index,
"val": value,
}
queue.put(j)
r = pipe_end.recv()
print "Process A sent value %d and received: %s" % (value, r)
def main():
print "Starting main"
a_pipes = list()
jobs = multiprocessing.Queue()
done_jobs = multiprocessing.Queue()
for i in range(5):
multiprocessing.Process(target=process_b, args=(jobs, done_jobs,)).start()
for i in range(10):
receiver, sender = multiprocessing.Pipe(duplex=False)
a_pipes.append(sender)
multiprocessing.Process(target=process_a, args=(i, receiver, jobs)).start()
while True:
j = done_jobs.get()
a_pipes[j["a_id"]].send(j["result"])
if __name__ == "__main__":
main()
Note that the Queue of jobs is connected directly between a and b processes. a process is responsible to put their identifier (which the "master" should know). The b uses a different Queue for finished work. I used the same job dictionary, but typical implementation should use some more tailored data structure. This response should have the identifier of a in order for the master to send that to the specific process.
I assume that there is some way to use it with your approach, which I don't dislike at all (it would have been my first approach). But having to deal with file descriptors and the reduce_ and rebuild_ methods is not nice. Not at all.
So, as #MariusSiuram explained in this post, trying to pass a Connection object is an exercise in frustration.
I finally resorted to using a DictProxy to return values from B to A.
This is the concept:
### This is in the main process
...
jobs_queue = multiprocessing.Queue()
manager = multiprocessing.Manager()
ret_dict = manager.dict()
...
# Somewhere during Process initialization, jobs_queue and ret_dict got passed to
# the workers' constructor
...
### This is in the "A" (left-side) workers
...
self.ret_dict.pop(self.pid, None) # Remove our identifier if exist
self.jobs_queue.put({
'request': parameters_to_be_used_by_B,
'requester': self.pid
})
while self.pid not in self.ret_dict:
time.sleep(0.1) # Or any sane value
result = self.ret_dict[self.pid]
...
### This is in the "B" (right-side) workers
...
while True:
job = self.jobs_queue.get()
if job is None:
break
result = self.do_something(job['request'])
self.ret_dict[job['requester']] = result
...

Pyqt and general python, can this be considered a correct approach for coding?

I have a dialog window containing check-boxes, when each of them is checked a particular class needs to be instantiated and a run a a task on a separated thread (one for each check box). I have 14 check-boxes to check the .isChecked() property and is comprehensible checking the returned Boolean for each of them is not efficient and requires a lot more coding.
Hence I decided to get all the children items corresponding to check-box element, get just those that are checked, appending their names to list and loop through them matching their name to d dictionary which key is the name of the check box and the value is the corresponding class to instantiate.
EXAMPLE:
# class dictionary
self.summary_runnables = {'dupStreetCheckBox': [DupStreetDesc(),0],
'notStreetEsuCheckBox': [StreetsNoEsuDesc(),1],
'notType3CheckBox': [Type3Desc(False),2],
'incFootPathCheckBox': [Type3Desc(True),2],
'dupEsuRefCheckBox': [DupEsuRef(True),3],
'notEsuStreetCheckBox': [NoLinkEsuStreets(),4],
'invCrossRefCheckBox': [InvalidCrossReferences()],
'startEndCheckBox': [CheckStartEnd(tol=10),8],
'tinyEsuCheckBox': [CheckTinyEsus("esu",1)],
'notMaintReinsCheckBox': [CheckMaintReins()],
'asdStartEndCheckBox': [CheckAsdCoords()],
'notMaintPolysCheckBox': [MaintNoPoly(),16],
'notPolysMaintCheckBox': [PolyNoMaint()],
'tinyPolysCheckBox': [CheckTinyEsus("rd_poly",1)]}
# looping through list
self.long_task = QThreadPool(None).globalInstance()
self.long_task.setMaxThreadCount(1)
start_report = StartReport(val_file_path)
end_report = EndReport()
# start_report.setAutoDelete(False)
# end_report.setAutoDelete(False)
end_report.signals.result.connect(self.log_progress)
end_report.signals.finished.connect(self.show_finished)
# end_report.setAutoDelete(False)
start_report.signals.result.connect(self.log_progress)
self.long_task.start(start_report)
# print str(self.check_boxes_names)
for check_box_name in self.check_boxes_names:
run_class = self.summary_runnables[check_box_name]
if run_class[0].__class__.__name__ is 'CheckStartEnd':
run_class[0].tolerance = tolerance
runnable = run_class[0]()
runnable.signals.result.connect(self.log_progress)
self.long_task.start(runnable)
self.long_task.start(end_report)
example of a runnable (even if some of them use different global functions)
I can't post the global functions that write content to file as they are too many and not all 14 tasks execute the same type function. arguments of these functions are int keys to other dictionaries that contain the report static content and the SQL queries to return report main dynamic contents.
class StartReport(QRunnable):
def __init__(self, file_path):
super(StartReport,self).__init__()
# open the db connection in thread
db.open()
self.signals = GeneralSignals()
# self.simple_signal = SimpleSignal()
# print self.signals.result
self.file_path = file_path
self.task = "Starting Report"
self.progress = 1
self.org_name = org_name
self.user = user
self.report_title = "Validation Report"
print "instantiation of start report "
def run(self):
self.signals.result.emit(self.task, self.progress)
if self.file_path is None:
print "I started and found file none "
return
else:
global report_file
# create the file and prints the header
report_file = open(self.file_path, 'wb')
report_file.write(str(self.report_title) + ' for {0} \n'.format(self.org_name))
report_file.write('Created on : {0} at {1} By : {2} \n'.format(datetime.today().strftime("%d/%m/%Y"),
datetime.now().strftime("%H:%M"),
str(self.user)))
report_file.write(
"------------------------------------------------------------------------------------------ \n \n \n \n")
report_file.flush()
os.fsync(report_file.fileno())
class EndReport(QRunnable):
def __init__(self):
super(EndReport,self).__init__()
self.signals = GeneralSignals()
self.task = "Finishing report"
self.progress = 100
def run(self):
self.signals.result.emit(self.task, self.progress)
if report_file is not None:
# write footer and close file
report_file.write("\n \n \n")
report_file.write("---------- End of Report -----------")
report_file.flush()
os.fsync(report_file.fileno())
report_file.close()
self.signals.finished.emit()
# TODO: checking whether opening a db connection in thread might affect the db on the GUI
# if db.isOpen():
# db.close()
else:
return
class DupStreetDesc(QRunnable):
"""
duplicate street description report section creation
:return: void if the report is to text
list[string] if the report is to screen
"""
def __init__(self):
super(DupStreetDesc,self).__init__()
self.signals = GeneralSignals()
self.task = "Checking duplicate street descriptions..."
self.progress = 16.6
def run(self):
self.signals.result.emit(self.task,self.progress)
if report_file is None:
print "report file is none "
# items_list = write_content(0, 0, 0, 0)
# for item in items_list:
# self.signals.list.emit(item)
else:
write_content(0, 0, 0, 0)
Now, I used this approach before and it has always worked fine without using multiprocessing. In this case it works good to some extent, I can run the tasks the first time but if I try to run for the second time I get the following Python Error :
self.long_task.start(run_class[0])
RuntimeError: wrapped C/C++ object of type DupStreetDesc has been deleted
I tried to use run_class[0].setAutoDelete(False) before running them in the loop but pyQt crashes with a minidump error (I am running the code in QGIS) and I the programs exists with few chances to understand what has happened.
On the other hand, if I run my classes separately, checking with an if else statement each check-box, then it works fine, I can run the tasks again and the C++ classes are not deleted, but it isn't a nice coding approach, at least from my very little experience.
Is there anyone else out there who can advise a different approach in order to make this run smoothly without using too many lines of code? Or knows whether there is a more efficient pattern to handle this problem, which I think must be quite common?
It seems that you should create a new instance of each runnable, and allow Qt to automatically delete it. So your dictionary entries could look like this:
'dupStreetCheckBox': [lambda: DupStreetDesc(), 0],
and then you can do:
for check_box_name in self.check_boxes_names:
run_class = self.summary_runnables[check_box_name]
runnable = run_class[0]()
runnable.signals.result.connect(self.log_progress)
self.long_task.start(runnable)
I don't know why setAutoDelete does not work (assuming you are calling it before starting the threadpool). I suppose there might be a bug, but it's impossible to be sure without having a fully-working example to test.

Conditional if in asynchronous python program with twisted

I'm creating a program that uses the Twisted module and callbacks.
However, I keep having problems because the asynchronous part goes wrecked.
I have learned (also from previous questions..) that the callbacks will be executed at a certain point, but this is unpredictable.
However, I have a certain program that goes like
j = calc(a)
i = calc2(b)
f = calc3(c)
if s:
combine(i, j, f)
Now the boolean s is set by a callback done by calc3. Obviously, this leads to an undefined error because the callback is not executed before the s is needed.
However, I'm unsure how you SHOULD do if statements with asynchronous programming using Twisted. I've been trying many different things, but can't find anything that works.
Is there some way to use conditionals that require callback values?
Also, I'm using VIFF for secure computations (which uses Twisted): VIFF
Maybe what you're looking for is twisted.internet.defer.gatherResults:
d = gatherResults([calc(a), calc2(b), calc3(c)])
def calculated((j, i, f)):
if s:
return combine(i, j, f)
d.addCallback(calculated)
However, this still has the problem that s is undefined. I can't quite tell how you expect s to be defined. If it is a local variable in calc3, then you need to return it so the caller can use it.
Perhaps calc3 looks something like this:
def calc3(argument):
s = bool(argument % 2)
return argument + 1
So, instead, consider making it look like this:
Calc3Result = namedtuple("Calc3Result", "condition value")
def calc3(argument):
s = bool(argument % 2)
return Calc3Result(s, argument + 1)
Now you can rewrite the calling code so it actually works:
It's sort of unclear what you're asking here. It sounds like you know what callbacks are, but if so then you should be able to arrive at this answer yourself:
d = gatherResults([calc(a), calc2(b), calc3(c)])
def calculated((j, i, calc3result)):
if calc3result.condition:
return combine(i, j, calc3result.value)
d.addCallback(calculated)
Or, based on your comment below, maybe calc3 looks more like this (this is the last guess I'm going to make, if it's wrong and you'd like more input, then please actually share the definition of calc3):
def _calc3Result(result, argument):
if result == "250":
# SMTP Success response, yay
return Calc3Result(True, argument)
# Anything else is bad
return Calc3Result(False, argument)
def calc3(argument):
d = emailObserver("The argument was %s" % (argument,))
d.addCallback(_calc3Result)
return d
Fortunately, this definition of calc3 will work just fine with the gatherResults / calculated code block immediately above.
You have to put if in the callback. You may use Deferred to structure your callback.
As stated in previous answer - the preocessing logic should be handled in callback chain, below is simple code demonstration how this could work. C{DelayedTask} is a dummy implementation of a task which happens in the future and fires supplied deferred.
So we first construct a special object - C{ConditionalTask} which takes care of storring the multiple results and servicing callbacks.
calc1, calc2 and calc3 returns the deferreds, which have their callbacks pointed to C{ConditionalTask}.x_callback.
Every C{ConditionalTask}.x_callback does a call to C{ConditionalTask}.process which checks if all of the results have been registered and fires on a full set.
Additionally - C{ConditionalTask}.c_callback sets a flag of wheather or not the data should be processed at all.
from twisted.internet import reactor, defer
class DelayedTask(object):
"""
Delayed async task dummy implementation
"""
def __init__(self,delay,deferred,retVal):
self.deferred = deferred
self.retVal = retVal
reactor.callLater(delay, self.on_completed)
def on_completed(self):
self.deferred.callback(self.retVal)
class ConditionalTask(object):
def __init__(self):
self.resultA=None
self.resultB=None
self.resultC=None
self.should_process=False
def a_callback(self,result):
self.resultA = result
self.process()
def b_callback(self,result):
self.resultB=result
self.process()
def c_callback(self,result):
self.resultC=result
"""
Here is an abstraction for your "s" boolean flag, obviously the logic
normally would go further than just setting the flag, you could
inspect the result variable and do other strange stuff
"""
self.should_process = True
self.process()
def process(self):
if None not in (self.resultA,self.resultB,self.resultC):
if self.should_process:
print 'We will now call the processor function and stop reactor'
reactor.stop()
def calc(a):
deferred = defer.Deferred()
DelayedTask(5,deferred,a)
return deferred
def calc2(a):
deferred = defer.Deferred()
DelayedTask(5,deferred,a*2)
return deferred
def calc3(a):
deferred = defer.Deferred()
DelayedTask(5,deferred,a*3)
return deferred
def main():
conditional_task = ConditionalTask()
dFA = calc(1)
dFB = calc2(2)
dFC = calc3(3)
dFA.addCallback(conditional_task.a_callback)
dFB.addCallback(conditional_task.b_callback)
dFC.addCallback(conditional_task.c_callback)
reactor.run()

Making a python program wait until Twisted deferred returns a value

I have a program that fetches info from other pages and parses them using BeautifulSoup and Twisted's getPage. Later on in the program I print info that the deferred process creates. Currently my program tries to print it before the differed returns the info. How can I make it wait?
def twisAmaz(contents): #This parses the page (amazon api xml file)
stonesoup = BeautifulStoneSoup(contents)
if stonesoup.find("mediumimage") == None:
imageurl.append("/images/notfound.png")
else:
imageurl.append(stonesoup.find("mediumimage").url.contents[0])
usedPdata = stonesoup.find("lowestusedprice")
newPdata = stonesoup.find("lowestnewprice")
titledata = stonesoup.find("title")
reviewdata = stonesoup.find("editorialreview")
if stonesoup.find("asin") != None:
asin.append(stonesoup.find("asin").contents[0])
else:
asin.append("None")
reactor.stop()
deferred = dict()
for tmpISBN in isbn: #Go through ISBN numbers and get Amazon API information for each
deferred[(tmpISBN)] = getPage(fetchInfo(tmpISBN))
deferred[(tmpISBN)].addCallback(twisAmaz)
reactor.run()
.....print info on each ISBN
What it seems like is you're trying to make/run multiple reactors. Everything gets attached to the same reactor. Here's how to use a DeferredList to wait for all of your callbacks to finish.
Also note that twisAmaz returns a value. That value is passed through the callbacks DeferredList and comes out as value. Since a DeferredList keeps the order of the things that are put into it, you can cross-reference the index of the results with the index of your ISBNs.
from twisted.internet import defer
def twisAmazon(contents):
stonesoup = BeautifulStoneSoup(contents)
ret = {}
if stonesoup.find("mediumimage") is None:
ret['imageurl'] = "/images/notfound.png"
else:
ret['imageurl'] = stonesoup.find("mediumimage").url.contents[0]
ret['usedPdata'] = stonesoup.find("lowestusedprice")
ret['newPdata'] = stonesoup.find("lowestnewprice")
ret['titledata'] = stonesoup.find("title")
ret['reviewdata'] = stonesoup.find("editorialreview")
if stonesoup.find("asin") is not None:
ret['asin'] = stonesoup.find("asin").contents[0]
else:
ret['asin'] = 'None'
return ret
callbacks = []
for tmpISBN in isbn: #Go through ISBN numbers and get Amazon API information for each
callbacks.append(getPage(fetchInfo(tmpISBN)).addCallback(twisAmazon))
def printResult(result):
for e, (success, value) in enumerate(result):
print ('[%r]:' % isbn[e]),
if success:
print 'Success:', value
else:
print 'Failure:', value.getErrorMessage()
callbacks = defer.DeferredList(callbacks)
callbacks.addCallback(printResult)
reactor.run()
Another cool way to do this is with #defer.inlineCallbacks. It lets you write asynchronous code like a regular sequential function: http://twistedmatrix.com/documents/8.1.0/api/twisted.internet.defer.html#inlineCallbacks
First, you shouldn't put a reactor.stop() in your deferred method, as it kills everything.
Now, in Twisted, "Waiting" is not allowed. To print results of you callback, just add another callback after the first one.

Categories

Resources