Python Multiprocessing: Handling Child Errors in Parent

Python Multiprocessing: Handling Child Errors in Parent - python

I am currently playing around with multiprocessing and queues.
I have written a piece of code to export data from mongoDB, map it into a relational (flat) structure, convert all values to string and insert them into mysql.
Each of these steps is submitted as a process and given import/export queues, safe for the mongoDB export which is handled in the parent.
As you will see below, I use queues and child processes terminate themselves when they read "None" from the queue. The problem I currently have is that, if a child process runs into an unhandled Exception, this is not recognized by the parent and the rest just Keeps running. What I want to happen is that the whole shebang quits and at best reraise the child error.
I have two questions:
How do I detect the child error in the parent?
How do I kill my child processes after detecting the error (best practice)? I realize that putting "None" to the queue to kill the child is pretty dirty.
I am using python 2.7.
Here are the essential parts of my code:
# Establish communication queues
mongo_input_result_q = multiprocessing.Queue()
mapper_result_q = multiprocessing.Queue()
converter_result_q = multiprocessing.Queue()
[...]
# create child processes
# all processes generated here are subclasses of "multiprocessing.Process"
# create mapper
mappers = [mongo_relational_mapper.MongoRelationalMapper(mongo_input_result_q, mapper_result_q, columns, 1000)
for i in range(10)]
# create datatype converter, converts everything to str
converters = [datatype_converter.DatatypeConverter(mapper_result_q, converter_result_q, 'str', 1000)
for i in range(10)]
# create mysql writer
# I create a list of writers. currently only one,
# but I have the option to parallellize it further
writers = [mysql_inserter.MySqlWriter(mysql_host, mysql_user, mysql_passwd, mysql_schema, converter_result_q
, columns, 'w_'+mysql_table, 1000) for i in range(1)]
# starting mapper
for mapper in mappers:
mapper.start()
time.sleep(1)
# starting converter
for converter in converters:
converter.start()
# starting writer
for writer in writers:
writer.start()
[... initializing mongo db connection ...]
# put each dataset read to queue for the mapper
for row in mongo_collection.find({inc_column: {"$gte": start}}):
mongo_input_result_q.put(row)
count += 1
if count % log_counter == 0:
print 'Mongo Reader' + " " + str(count)
print "MongoReader done"
# Processes are terminated when they read "None" object from queue
# now that reading is finished, put None for each mapper in the queue so they terminate themselves
# the same for all followup processes
for mapper in mappers:
mongo_input_result_q.put(None)
for mapper in mappers:
mapper.join()
for converter in converters:
mapper_result_q.put(None)
for converter in converters:
converter.join()
for writer in writers:
converter_result_q.put(None)
for writer in writers:
writer.join()

Why not to let the Process to take care of its own exceptions, like this:
from __future__ import print_function
import multiprocessing as mp
import traceback
class Process(mp.Process):
def __init__(self, *args, **kwargs):
mp.Process.__init__(self, *args, **kwargs)
self._pconn, self._cconn = mp.Pipe()
self._exception = None
def run(self):
try:
mp.Process.run(self)
self._cconn.send(None)
except Exception as e:
tb = traceback.format_exc()
self._cconn.send((e, tb))
# raise e # You can still rise this exception if you need to
#property
def exception(self):
if self._pconn.poll():
self._exception = self._pconn.recv()
return self._exception
Now you have, both error and traceback at your hands:
def target():
raise ValueError('Something went wrong...')
p = Process(target = target)
p.start()
p.join()
if p.exception:
error, traceback = p.exception
print(traceback)
Regards,
Marek

I don't know standard practice but what I've found is that to have reliable multiprocessing I design the methods/class/etc. specifically to work with multiprocessing. Otherwise you never really know what's going on on the other side (unless I've missed some mechanism for this).
Specifically what I do is:
Subclass multiprocessing.Process or make functions that specifically support multiprocessing (wrapping functions that you don't have control over if necessary)
always provide a shared error multiprocessing.Queue from the main process to each worker process
enclose the entire run code in a try: ... except Exception as e. Then when something unexpected happens send an error package with:
the process id that died
the exception with it's original context (check here). The original context is really important if you want to log useful information in the main process.
of course handle expected issues as normal within the normal operation of the worker
(similar to what you said already) assuming a long-running process, wrap the running code (inside the try/catch-all) with a loop
define a stop token in the class or for functions.
When the main process wants the worker(s) to stop, just send the stop token. to stop everyone, send enough for all the processes.
the wrapping loop checks the input q for the token or whatever other input you want
The end result is worker processes that can survive for a long time and that can let you know what's happening when something goes wrong. They will die quietly since you can handle whatever you need to do after the catch-all exception and you will also know when you need to restart a worker.
Again, I've just come to this pattern through trial and error so I don't know how standard it is. Does that help with what you are asking for?

#mrkwjc 's solution is simple, so easy to understand and implement, but there is one disadvantage of this solution. When we have few processes and we want to stop all processes if any single process has error, we need to wait until all processes are finished in order to check if p.exception. Below is the code which fixes this problem (ie when one child has error, we terminate also another child):
import multiprocessing
import traceback
from time import sleep
class Process(multiprocessing.Process):
"""
Class which returns child Exceptions to Parent.
https://stackoverflow.com/a/33599967/4992248
"""
def __init__(self, *args, **kwargs):
multiprocessing.Process.__init__(self, *args, **kwargs)
self._parent_conn, self._child_conn = multiprocessing.Pipe()
self._exception = None
def run(self):
try:
multiprocessing.Process.run(self)
self._child_conn.send(None)
except Exception as e:
tb = traceback.format_exc()
self._child_conn.send((e, tb))
# raise e # You can still rise this exception if you need to
#property
def exception(self):
if self._parent_conn.poll():
self._exception = self._parent_conn.recv()
return self._exception
class Task_1:
def do_something(self, queue):
queue.put(dict(users=2))
class Task_2:
def do_something(self, queue):
queue.put(dict(users=5))
def main():
try:
task_1 = Task_1()
task_2 = Task_2()
# Example of multiprocessing which is used:
# https://eli.thegreenplace.net/2012/01/16/python-parallelizing-cpu-bound-tasks-with-multiprocessing/
task_1_queue = multiprocessing.Queue()
task_2_queue = multiprocessing.Queue()
task_1_process = Process(
target=task_1.do_something,
kwargs=dict(queue=task_1_queue))
task_2_process = Process(
target=task_2.do_something,
kwargs=dict(queue=task_2_queue))
task_1_process.start()
task_2_process.start()
while task_1_process.is_alive() or task_2_process.is_alive():
sleep(10)
if task_1_process.exception:
error, task_1_traceback = task_1_process.exception
# Do not wait until task_2 is finished
task_2_process.terminate()
raise ChildProcessError(task_1_traceback)
if task_2_process.exception:
error, task_2_traceback = task_2_process.exception
# Do not wait until task_1 is finished
task_1_process.terminate()
raise ChildProcessError(task_2_traceback)
task_1_process.join()
task_2_process.join()
task_1_results = task_1_queue.get()
task_2_results = task_2_queue.get()
task_1_users = task_1_results['users']
task_2_users = task_2_results['users']
except Exception:
# Here usually I send email notification with error.
print('traceback:', traceback.format_exc())
if __name__ == "__main__":
main()

Thanks to kobejohn i have found a solution which is nice and stable.
I have created a subclass of multiprocessing.Process which implements some functions and overwrites the run() method to wrap a new saferun method into a try-catch block. This Class requires a feedback_queue to initialize which is used to report info, debug, error messages back to the parent. The log methods in the class are wrappers for the globally defined log functions of the package:
class EtlStepProcess(multiprocessing.Process):
def __init__(self, feedback_queue):
multiprocessing.Process.__init__(self)
self.feedback_queue = feedback_queue
def log_info(self, message):
log_info(self.feedback_queue, message, self.name)
def log_debug(self, message):
log_debug(self.feedback_queue, message, self.name)
def log_error(self, err):
log_error(self.feedback_queue, err, self.name)
def saferun(self):
"""Method to be run in sub-process; can be overridden in sub-class"""
if self._target:
self._target(*self._args, **self._kwargs)
def run(self):
try:
self.saferun()
except Exception as e:
self.log_error(e)
raise e
return
I have subclassed all my other process steps from EtlStepProcess. The code to be run is implemented in the saferun() method rather than run. This ways i do not have to add a try catch block around it, since this is already done by the run() method.
Example:
class MySqlWriter(EtlStepProcess):
def __init__(self, mysql_host, mysql_user, mysql_passwd, mysql_schema, mysql_table, columns, commit_count,
input_queue, feedback_queue):
EtlStepProcess.__init__(self, feedback_queue)
self.mysql_host = mysql_host
self.mysql_user = mysql_user
self.mysql_passwd = mysql_passwd
self.mysql_schema = mysql_schema
self.mysql_table = mysql_table
self.columns = columns
self.commit_count = commit_count
self.input_queue = input_queue
def saferun(self):
self.log_info(self.name + " started")
#create mysql connection
engine = sqlalchemy.create_engine('mysql://' + self.mysql_user + ':' + self.mysql_passwd + '#' + self.mysql_host + '/' + self.mysql_schema)
meta = sqlalchemy.MetaData()
table = sqlalchemy.Table(self.mysql_table, meta, autoload=True, autoload_with=engine)
connection = engine.connect()
try:
self.log_info("start MySQL insert")
counter = 0
row_list = []
while True:
next_row = self.input_queue.get()
if isinstance(next_row, Terminator):
if counter % self.commit_count != 0:
connection.execute(table.insert(), row_list)
# Poison pill means we should exit
break
row_list.append(next_row)
counter += 1
if counter % self.commit_count == 0:
connection.execute(table.insert(), row_list)
del row_list[:]
self.log_debug(self.name + ' ' + str(counter))
finally:
connection.close()
return
In my main file, I submit a Process that does all the work and give it a feedback_queue. This process starts all the steps and thenreads from mongoDB and puts values to the initial queue. My main process listens to the feedback queue and prints all log messages. If it receives an error log, it print the error and terminate its child, which in return also terminates all its children before dying.
if __name__ == '__main__':
feedback_q = multiprocessing.Queue()
p = multiprocessing.Process(target=mongo_python_export, args=(feedback_q,))
p.start()
while p.is_alive():
fb = feedback_q.get()
if fb["type"] == "error":
p.terminate()
print "ERROR in " + fb["process"] + "\n"
for child in multiprocessing.active_children():
child.terminate()
else:
print datetime.datetime.fromtimestamp(fb["timestamp"]).strftime('%Y-%m-%d %H:%M:%S') + " " + \
fb["process"] + ": " + fb["message"]
p.join()
I think about making a module out of it and putting it up on github, but I have to do some cleaning up and commenting first.

Related

Automatically restarting Python sub-processes using identical arguments

I have a python script which calls a series of sub-processes. They need to run "for ever" - but they occasionally die, or get killed. When this happens I need to restart the process using the same arguments as the one which died.
This is a very simplified version:
[edit: this is the less simplified version, which includes "restart" code]
import multiprocessing
import time
import random
def printNumber(number):
print("starting :", number)
while random.randint(0, 5) > 0:
print(number)
time.sleep(2)
if __name__ == '__main__':
children = [] # list
args = {} # dictionary
for processNumber in range(10,15):
p = multiprocessing.Process(
target=printNumber,
args=(processNumber,)
)
children.append(p)
p.start()
args[p.pid] = processNumber
while True:
time.sleep(1)
for n, p in enumerate(children):
if not p.is_alive():
#get parameters dead child was started with
pidArgs = args[p.pid]
del(args[p.pid])
print("n,args,p: ",n,pidArgs,p)
children.pop(n)
# start new process with same args
p = multiprocessing.Process(
target=printNumber,
args=(pidArgs,)
)
children.append(p)
p.start()
args[p.pid] = pidArgs
I have updated the example to illustrate how I want the processes to be restarted if one crashes/killed/etc - keeping track of which pid was started with which args.
Is this the "best" way to do this, or is there a more "python" way of doing this?

I think I would create a separate thread for each Process and use a ProcessPoolExecutor. Executors have a useful function, submit, which returns a Future. You can wait on each Future and re-launch the Executor when the Future is done. Arguments to the function are tracked as class variables, so restarting is just a simple loop.
import threading
from concurrent.futures import ProcessPoolExecutor
import time
import random
import traceback
def printNumber(number):
print("starting :", number)
while random.randint(0, 5) > 0:
print(number)
time.sleep(2)
class KeepRunning(threading.Thread):
def __init__(self, func, *args, **kwds):
self.func = func
self.args = args
self.kwds = kwds
super().__init__()
def run(self):
while True:
with ProcessPoolExecutor(max_workers=1) as pool:
future = pool.submit(self.func, *self.args, **self.kwds)
try:
future.result()
except Exception:
traceback.print_exc()
if __name__ == '__main__':
for process_number in range(10, 15):
keep = KeepRunning(printNumber, process_number)
keep.start()
while True:
time.sleep(1)
At the end of the program is a loop to keep the main thread running. Without that, the program will attempt to exit while your Processes are still running.

For the example you provided I would just remove the exit condition from the while loop and change it to True.
As you said though the actual code is more complicated (why didn't you post that?). So if the process gets terminated by lets say an exception just put the code inside a try catch block. You can then put said block in an infinite loop.
I hope this is what you are looking for but that seems to be the right way to do it provided the goal and information you provided.

Instead of just starting the process immediately, you can save the list of processes and their arguments, and create another process that checks they are alive.
For example:
if __name__ == '__main__':
process_list = []
for processNumber in range(5):
process = multiprocessing.Process(
target=printNumber,
args=(processNumber,)
)
process_list.append((process,args))
process.start()
while True:
for running_process, process_args in process_list:
if not running_process.is_alive():
new_process = multiprocessing.Process(target=printNumber, args=(process_args))
process_list.remove(running_process, process_args) # Remove terminated process
process_list.append((new_process, process_args))
I must say that I'm not sure the best way to do it is in python, you may want to look at scheduler services like jenkins or something like that.

How to run Python custom objects in separate processes, all working on a shared events queue?

I have 4 different Python custom objects and an events queue. Each obect has a method that allows it to retrieve an event from the shared events queue, process it if the type is the desired one and then puts a new event on the same events queue, allowing other processes to process it.
Here's an example.
import multiprocessing as mp
class CustomObject:
def __init__(events_queue: mp.Queue) -> None:
self.events_queue = event_queue
def process_events_queue() -> None:
event = self.events_queue.get()
if type(event) == SpecificEventDataTypeForThisClass:
# do something and create a new_event
self.events_queue.put(new_event)
else:
self.events_queue.put(event)
# there are other methods specific to each object
These 4 objects have specific tasks to do, but they all share this same structure. Since I need to "simulate" the production condition, I want them to run all at the same time, indipendently from eachother.
Here's just an example of what I want to do, if possible.
import multiprocessing as mp
import CustomObject
if __name__ == '__main__':
events_queue = mp.Queue()
data_provider = mp.Process(target=CustomObject, args=(events_queue,))
portfolio = mp.Process(target=CustomObject, args=(events_queue,))
engine = mp.Process(target=CustomObject, args=(events_queue,))
broker = mp.Process(target=CustomObject, args=(events_queue,))
while True:
data_provider.process_events_queue()
portfolio.process_events_queue()
engine.process_events_queue()
broker.process_events_queue()
My idea is to run each object in a separate process, allowing them to communicate with events shared through the events_queue. So my question is, how can I do that?
The problem is that obj = mp.Process(target=CustomObject, args=(events_queue,)) returns a Process instance and I can't access the CustomObject methods from it. Also, is there a smarter way to achieve what I want?

Processes require a function to run, which defines what the process is actually doing. Once this function exits (and there are no non-daemon threads) the process is done. This is similar to how Python itself always executes a __main__ script.
If you do mp.Process(target=CustomObject, args=(events_queue,)) that just tells the process to call CustomObject - which instantiates it once and then is done. This is not what you want, unless the class actually performs work when instantiated - which is a bad idea for other reasons.
Instead, you must define a main function or method that handles what you need: "communicate with events shared through the events_queue". This function should listen to the queue and take action depending on the events received.
A simple implementation looks like this:
import os, time
from multiprocessing import Queue, Process
class Worker:
# separate input and output for simplicity
def __init__(self, commands: Queue, results: Queue):
self.commands = commands
self.results = results
# our main function to be run by a process
def main(self):
# each process should handle more than one command
while True:
value = self.commands.get()
# pick a well-defined signal to detect "no more work"
if value is None:
self.results.put(None)
break
# do whatever needs doing
result = self.do_stuff(value)
print(os.getpid(), ':', self, 'got', value, 'put', result)
time.sleep(0.2) # pretend we do something
# pass on more work if required
self.results.put(result)
# placeholder for what needs doing
def do_stuff(self, value):
raise NotImplementedError
This is a template for a class that just keeps on processing events. The do_stuff method must be overloaded to define what actually happens.
class AddTwo(Worker):
def do_stuff(self, value):
return value + 2
class TimesThree(Worker):
def do_stuff(self, value):
return value * 3
class Printer(Worker):
def do_stuff(self, value):
print(value)
This already defines fully working process payloads: Process(target=TimesThree(in_queue, out_queue).main) schedules the main method in a process, listening for and responding to commands.
Running this mainly requires connecting the individual components:
if __name__ == '__main__':
# bookkeeping of resources we create
processes = []
start_queue = Queue()
# connect our workers via queues
queue = start_queue
for element in (AddTwo, TimesThree, Printer):
instance = element(queue, Queue())
# we run the main method in processes
processes.append(Process(target=instance.main))
queue = instance.results
# start all processes
for process in processes:
process.start()
# send input, but do not wait for output
start_queue.put(1)
start_queue.put(248124)
start_queue.put(-256)
# send shutdown signal
start_queue.put(None)
# wait for processes to shutdown
for process in processes:
process.join()
Note that you do not need classes for this. You can also compose functions for a similar effect, as long as everything is pickle-able:
import os, time
from multiprocessing import Queue, Process
def main(commands, results, do_stuff):
while True:
value = commands.get()
if value is None:
results.put(None)
break
result = do_stuff(value)
print(os.getpid(), ':', do_stuff, 'got', value, 'put', result)
time.sleep(0.2)
results.put(result)
def times_two(value):
return value * 2
if __name__ == '__main__':
in_queue, out_queue = Queue(), Queue()
worker = Process(target=main, args=(in_queue, out_queue, times_two))
worker.start()
for message in (1, 3, 5, None):
in_queue.put(message)
while True:
reply = out_queue.get()
if reply is None:
break
print('result:', reply)

Deadlock in Python's multiprocessing upon early termination

I'm creating a multiprocessing.Queue in Python and adding multiprocessing.Process instances to this Queue.
I would like to add a function call that is executed after every job, which checks if a specific task has succeeded. If so, I would like to empty the Queue and terminate execution.
My Process class is:
class Worker(multiprocessing.Process):
def __init__(self, queue, check_success=None, directory=None, permit_nonzero=False):
super(Worker, self).__init__()
self.check_success = check_success
self.directory = directory
self.permit_nonzero = permit_nonzero
self.queue = queue
def run(self):
for job in iter(self.queue.get, None):
stdout = mbkit.dispatch.cexectools.cexec([job], directory=self.directory, permit_nonzero=self.permit_nonzero)
with open(job.rsplit('.', 1)[0] + '.log', 'w') as f_out:
f_out.write(stdout)
if callable(self.check_success) and self.check_success(job):
# Terminate all remaining jobs here
pass
And my Queue is setup here:
class LocalJobServer(object):
#staticmethod
def sub(command, check_success=None, directory=None, nproc=1, permit_nonzero=False, time=None, *args, **kwargs):
if check_success and not callable(check_success):
msg = "check_success option requires a callable function/object: {0}".format(check_success)
raise ValueError(msg)
# Create a new queue
queue = multiprocessing.Queue()
# Create workers equivalent to the number of jobs
workers = []
for _ in range(nproc):
wp = Worker(queue, check_success=check_success, directory=directory, permit_nonzero=permit_nonzero)
wp.start()
workers.append(wp)
# Add each command to the queue
for cmd in command:
queue.put(cmd, timeout=time)
# Stop workers from exiting without completion
for _ in range(nproc):
queue.put(None)
for wp in workers:
wp.join()
The function call mbkit.dispatch.cexectools.cexec() is a wrapper around subprocess.Popen and returns p.stdout.
In the Worker class, I've written the conditional to check if a job succeeded, and tried emptying the remaining jobs in the Queue using a while loop, i.e. my Worker.run() function looked like this:
def run(self):
for job in iter(self.queue.get, None):
stdout = mbkit.dispatch.cexectools.cexec([job], directory=self.directory, permit_nonzero=self.permit_nonzero)
with open(job.rsplit('.', 1)[0] + '.log', 'w') as f_out:
f_out.write(stdout)
if callable(self.check_success) and self.check_success(job):
break
while not self.queue.empty():
self.queue.get()
Although this works sometimes, it usually deadlocks and my only option is to Ctrl-C. I am aware that .empty() is unreliable, thus my question.
Any advice on how I can implement such an early termination functionality?

You do not have a deadlock here. It is just linked to the behavior of multiprocessing.Queue, as the get method is blocking by default. Thus when you call get on an empty queue, the call stall, waiting for the next element to be ready. You can see that some of your workers will stall because when you use your loop while not self.queue.empty() to empty it, you remove all the None sentinel and some of your workers will block on the empty Queue, like in this code:
from multiprocessing import Queue
q = Queue()
for e in iter(q.get, None):
print(e)
To be notified when the queue is empty, you need to use non blocking call. You can for instance use q.get_nowait, or use a timeout in q.get(timeout=1). Both throw a multiprocessing.queues.Empty exception when the queue is empty. So you should replace your Worker for job in iter(...): loop by something like:
while not queue.empty():
try:
job = queue.get(timeout=.1)
except multiprocessing.queues.Empty:
continue
# Do stuff with your job
If you do not want to be stuck at any point.
For the synchronization part, I would recommend using a synchronization primitive such as multiprocessing.Condition or an multiprocessing.Event. This is cleaner than the Value are they are design for this purpose. Something like this should help
def run(self):
while not queue.empty():
try:
job = queue.get(timeout=.1)
except multiprocessing.queues.Empty:
continue
if self.event.is_set():
continue
stdout = mbkit.dispatch.cexectools.cexec([job], directory=self.directory, permit_nonzero=self.permit_nonzero)
with open(job.rsplit('.', 1)[0] + '.log', 'w') as f_out:
f_out.write(stdout)
if callable(self.check_success) and self.check_success(job):
self.event.set()
print("Worker {} terminated cleanly".format(self.name))
with event = multiprocessing.Event().
Note that it is also possible to use a multiprocessing.Pool to get avoid dealing with the queue and the workers. But as you need some synchronization primitive, it might be a bit more complicated to set up. Something like this should work:
def worker(job, success, check_success=None, directory=None, permit_nonzero=False):
if sucess.is_set():
return False
stdout = mbkit.dispatch.cexectools.cexec([job], directory=self.directory, permit_nonzero=self.permit_nonzero)
with open(job.rsplit('.', 1)[0] + '.log', 'w') as f_out:
f_out.write(stdout)
if callable(self.check_success) and self.check_success(job):
success.set()
return True
# ......
# In the class LocalJobServer
# .....
def sub(command, check_success=None, directory=None, nproc=1, permit_nonzero=False):
mgr = multiprocessing.Manager()
success = mgr.Event()
pool = multiprocessing.Pool(nproc)
run_args = [(cmd, success, check_success, directory, permit_nonzero)]
result = pool.starmap(worker, run_args)
pool.close()
pool.join()
Note here that I use a Manager as you cannot pass multiprocessing.Event directly as arguments. You could also use the arguments initializer and initargs of the Pool to initiate global success event in each worker and avoid relying on the Manager but it is slightly more complicated.

This might not be the optimal solution, and any other suggestion is much appreciated, but I managed to solve the problem as such:
class Worker(multiprocessing.Process):
"""Simple manual worker class to execute jobs in the queue"""
def __init__(self, queue, success, check_success=None, directory=None, permit_nonzero=False):
super(Worker, self).__init__()
self.check_success = check_success
self.directory = directory
self.permit_nonzero = permit_nonzero
self.success = success
self.queue = queue
def run(self):
"""Method representing the process's activity"""
for job in iter(self.queue.get, None):
if self.success.value:
continue
stdout = mbkit.dispatch.cexectools.cexec([job], directory=self.directory, permit_nonzero=self.permit_nonzero)
with open(job.rsplit('.', 1)[0] + '.log', 'w') as f_out:
f_out.write(stdout)
if callable(self.check_success) and self.check_success(job):
self.success.value = int(True)
time.sleep(1)
class LocalJobServer(object):
"""A local server to execute jobs via the multiprocessing module"""
#staticmethod
def sub(command, check_success=None, directory=None, nproc=1, permit_nonzero=False, time=None, *args, **kwargs):
if check_success and not callable(check_success):
msg = "check_success option requires a callable function/object: {0}".format(check_success)
raise ValueError(msg)
# Create a new queue
queue = multiprocessing.Queue()
success = multiprocessing.Value('i', int(False))
# Create workers equivalent to the number of jobs
workers = []
for _ in range(nproc):
wp = Worker(queue, success, check_success=check_success, directory=directory, permit_nonzero=permit_nonzero)
wp.start()
workers.append(wp)
# Add each command to the queue
for cmd in command:
queue.put(cmd)
# Stop workers from exiting without completion
for _ in range(nproc):
queue.put(None)
# Start the workers
for wp in workers:
wp.join(time)
Basically I'm creating a Value and providing that to each Process. Once a job is marked as successful, this variable gets updated. Each Process checks in if self.success.value: continue whether we have a success and if so, just iterates over the remaining jobs in the Queue until empty.
The time.sleep(1) call is required to account for potential syncing delays amongst the processes. This is certainly not the most efficient approach but it works.

Why does my context manager exit function run before the computation isn't finished?

The exit function of my custom context manager seemingly runs before the computation is done. My context manager is meant to simplify writing concurrent/parallel code. Here is my context manager code:
import time
from multiprocessing.dummy import Pool, cpu_count
class managed_pool:
'''Simple context manager for multiprocessing.dummy.Pool'''
def __init__(self, msg):
self.msg = msg
def __enter__(self):
cores = cpu_count()
print 'start concurrent ({0} cores): {1}'.format(cores, self.msg)
self.start = time.time()
self.pool = Pool(cores)
return self.pool
def __exit__(self, type_, value, traceback):
print 'end concurrent:', self.msg
print 'time:', time.time() - self.start
self.pool.close()
self.pool.join()
I've already tried this script with multiprocessing.Pool instead of multiprocessing.dummy.Pool and it seems to fail all the time.
Here is an example of using the context manager:
def read_engine_files(f):
engine_input = engineInput()
with open(f, 'rb') as f:
engine_input.parse_from_string(f.read())
return engine_input
with managed_pool('load input files') as pool:
data = pool.map(read_engine_files, files)
So, inside of read_engine_files I print the name of the file. You'll notice in the __exit__ function that I also print out when the computation is done and how long it took. But when viewing stdout the __exit__ message appears way before the computation finished. Like, minutes before the computation is done. But htop says all of my cores are still being used. Here's an example of the output
start concurrent (4 cores): load engine input files
file1.pbin
file2.pbin
...
file16.pbin
end concurrent: load engine input files
time: 246.43829298
file17.pbin
...
file45.pbin
Why is __exit__ being called so early?

Are you sure you're just calling pool.map()? That should block until all the items have been mapped.
If you're calling one of the asynchronous methods of Pool, then you should be able to solve the problem by changing the order of things in __exit__(). Just join the pool before doing the summary.
def __exit__(self, type_, value, traceback):
self.pool.close()
self.pool.join()
print 'end concurrent:', self.msg
print 'time:', time.time() - self.start

The most likely explanation is that an exception occurred. The above code sample does not parse the type, value or traceback arguments of the __exit__ statement. Thus, an exception occurs (and is not caught earlier), is handed to the exit statement which in turn does not react to it. The processes (or some of them) continue running.

Python Queues memory leaks when called inside thread

I have python TCP client and need to send media(.mpg) file in a loop to a 'C' TCP server.
I have following code, where in separate thread I am reading the 10K blocks of file and sending it and doing it all over again in loop, I think it is because of my implementation of thread module, or tcp send. I am using Queues to print the logs on my GUI ( Tkinter ) but after some times it goes out of memory..
UPDATE 1 - Added more code as requested
Thread class "Sendmpgthread" used to create thread to send data
.
.
def __init__ ( self, otherparams,MainGUI):
.
.
self.MainGUI = MainGUI
self.lock = threading.Lock()
Thread.__init__(self)
#This is the one causing leak, this is called inside loop
def pushlog(self,msg):
self.MainGUI.queuelog.put(msg)
def send(self, mysocket, block):
size = len(block)
pos = 0;
while size > 0:
try:
curpos = mysocket.send(block[pos:])
except socket.timeout, msg:
if self.over:
self.pushlog(Exit Send)
return False
except socket.error, msg:
print 'Exception'
return False
pos = pos + curpos
size = size - curpos
return True
def run(self):
media_file = None
mysocket = None
try:
mysocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysocket.connect((self.ip, string.atoi(self.port)))
media_file = open(self.file, 'rb')
while not self.over:
chunk = media_file.read(10000)
if not chunk: # EOF Reset it
print 'resetting stream'
media_file.seek(0, 0)
continue
if not self.send(mysocket, chunk): # If some error or thread is killed
break;
#disabling this solves the issue
self.pushlog('print how much data sent')
except socket.error, msg:
print 'print exception'
except Exception, msg:
print 'print exception'
try:
if media_file is not None:
media_file.close()
media_file = None
if mysocket is not None:
mysocket.close()
mysocket = None
finally:
print 'some cleaning'
def kill(self):
self.over = True
I figured out that it is because of wrong implementation of Queue as commenting that piece resolves the issue
UPDATE 2 - MainGUI class which is called from above Thread class
class MainGUI(Frame):
def __init__(self, other args):
#some code
.
.
#from the above thread class used to send data
self.send_mpg_status = Sendmpgthread(params)
self.send_mpg_status.start()
self.after(100, self.updatelog)
self.queuelog = Queue.Queue()
def updatelog(self):
try:
msg = self.queuelog.get_nowait()
while msg is not None:
self.printlog(msg)
msg = self.queuelog.get_nowait()
except Queue.Empty:
pass
if self.send_mpg_status: # only continue when sending
self.after(100, self.updatelog)
def printlog(self,msg):
#print in GUI

Since printlog is adding to a tkinter text control, the memory occupied by that control will grow with each message (it has to store all the log messages in order to display them).
Unless storing all the logs is critical, a common solution is to limit the maximum number of log lines displayed.
A naive implementation is to eliminate extra lines from the begining after the control reaches a maximum number of messages. Add a function to get the number of lines in the control and then, in printlog something similar to:
while getnumlines(self.edit) > self.maxloglines:
self.edit.delete('1.0', '1.end')
(above code not tested)
update: some general guidelines
Keep in mind that what might look like a memory leak does not always mean that a function is wrong, or that the memory is no longer accessible. Many times there is missing cleanup code for a container that is accumulating elements.
A basic general approach for this kind of problems:
form an opinion on what part of the code might be causing the problem
check it by commenting that code out (or keep commenting code until you find a candidate)
look for containers in the responsible code, add code to print their size
decide what elements can be safely removed from that container, and when to do it
test the result

I can't see anything obviously wrong with your code snippet.
To reduce memory usage a bit under Python 2.7, I'd use buffer(block, pos) instead of block[pos:]. Also I'd use mysocket.sendall(block) instead of your send method.
If the ideas above don't solve your problem, then the bug is most probably elsewhere in your code. Could you please post the shortest possible version of the full Python script which still grows out-of-memory (http://sscce.org/)? That increases your change of getting useful help.

Out of memory errors are indicative of data being generated but not consumed or released. Looking through your code I would guess these two areas:
Messages are being pushed onto a Queue.Queue() instance in the pushlog method. Are they being consumed?
The MainGui printlog method may be writing text somewhere. eg. Is it continually writing to some kind of GUI widget without any pruning of messages?
From the code you've posted, here's what I would try:
Put a print statement in updatelog. If this is not being continually called for some reason such as a failed after() call, then the queuelog will continue to grow without bound.
If updatelog is continually being called, then turn your focus to printlog. Comment the contents of this function to see if out of memory errors still occur. If they don't, then something in printlog may be holding on to the logged data, you'll need to dig deeper to find out what.
Apart from this, the code could be cleaned up a bit. self.queuelog is not created until after the thread is started which gives rise to a race condition where the thread may try to write into the queue before it has been created. Creation of queuelog should be moved to somewhere before the thread is started.
updatelog could also be refactored to remove redundancy:
def updatelog(self):
try:
while True:
msg = self.queuelog.get_nowait()
self.printlog(msg)
except Queue.Empty:
pass
And I assume the the kill function is called from the GUI thread. To avoid thread race conditions, the self.over should be a thread safe variable such as a threading.Event object.
def __init__(...):
self.over = threading.Event()
def kill(self):
self.over.set()

There is no data piling up in your TCP sending loop.
Memory error is probably caused by logging queue, as you have not posted complete code try using following class for logging:
from threading import Thread, Event, Lock
from time import sleep, time as now
class LogRecord(object):
__slots__ = ["txt", "params"]
def __init__(self, txt, params):
self.txt, self.params = txt, params
class AsyncLog(Thread):
DEBUGGING_EMULATE_SLOW_IO = True
def __init__(self, queue_max_size=15, queue_min_size=5):
Thread.__init__(self)
self.queue_max_size, self.queue_min_size = queue_max_size, queue_min_size
self._queuelock = Lock()
self._queue = [] # protected by _queuelock
self._discarded_count = 0 # protected by _queuelock
self._pushed_event = Event()
self.setDaemon(True)
self.start()
def log(self, message, **params):
with self._queuelock:
self._queue.append(LogRecord(message, params))
if len(self._queue) > self.queue_max_size:
# empty the queue:
self._discarded_count += len(self._queue) - self.queue_min_size
del self._queue[self.queue_min_size:] # empty the queue instead of creating new list (= [])
self._pushed_event.set()
def run(self):
while 1: # no reason for exit condition here
logs, discarded_count = None, 0
with self._queuelock:
if len(self._queue) > 0:
# select buffered messages for printing, releasing lock ASAP
logs = self._queue[:]
del self._queue[:]
self._pushed_event.clear()
discarded_count = self._discarded_count
self._discarded_count = 0
if not logs:
self._pushed_event.wait()
self._pushed_event.clear()
continue
else:
# print logs
if discarded_count:
print ".. {0} log records missing ..".format(discarded_count)
for log_record in logs:
self.write_line(log_record)
if self.DEBUGGING_EMULATE_SLOW_IO:
sleep(0.5)
def write_line(self, log_record):
print log_record.txt, " ".join(["{0}={1}".format(name, value) for name, value in log_record.params.items()])
if __name__ == "__main__":
class MainGUI:
def __init__(self):
self._async_log = AsyncLog()
self.log = self._async_log.log # stored as bound method
def do_this_test(self):
print "I am about to log 100 times per sec, while text output frequency is 2Hz (twice per second)"
def log_100_records_in_one_second(itteration_index):
for i in xrange(100):
self.log("something happened", timestamp=now(), session=3.1415, itteration=itteration_index)
sleep(0.01)
for iter_index in range(3):
log_100_records_in_one_second(iter_index)
test = MainGUI()
test.do_this_test()
I have noticed that you do not sleep() anywhere in the sending loop, this means data is read as fast as it can and is sent as fast as it can. Note that this is not desirable behavior when playing media files - container time-stamps are there to dictate data-rate.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Multiprocessing: Handling Child Errors in Parent - python

Related

Automatically restarting Python sub-processes using identical arguments

How to run Python custom objects in separate processes, all working on a shared events queue?

Deadlock in Python's multiprocessing upon early termination

Why does my context manager exit function run before the computation isn't finished?

Python Queues memory leaks when called inside thread

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Multiprocessing: Handling Child Errors in Parent - python

Related

Automatically restarting Python sub-processes using identical arguments

How to run Python custom objects in separate processes, all working on a shared events queue?

Deadlock in Python's multiprocessing upon early termination

Why does my context manager __exit__ function run before the computation isn't finished?

Python Queues memory leaks when called inside thread

Categories

Resources

Why does my context manager exit function run before the computation isn't finished?