Python multiprocessing: print() inside apply_async()

Python multiprocessing: print() inside apply_async() - python

print() inside the function that is passed to multiprocessing's apply_async() does not print out anything.
I want to eventually use apply_async to process a large text file in chunks. Therefore, I want the script to print out on the screen how many lines have been processed. However, I don't see any print out at all.
I've attached a toy code. Each foo() call should tell me what process is being used. In my actual code, I will call foo() on each chunk, and it will tell me how many lines of text in that chunk I've processed.
import os
from multiprocessing import Pool
def foo(x,y):
print(f'Process: {os.getpid()}')
return(x*y)
def bar(x):
p = Pool()
result_list = []
for i in range(30):
p.apply_async(foo, args=(i,i*x), callback=result_list.append)
p.close()
p.join()
return(result_list)
if __name__ == '__main__':
print(bar(2))
I got a print out of the multiplication x*y result, but I didn't see any print out that tells me the process id.
Can anyone help me please?

Your sys.stdout is likely block buffered, which means a small number of prints can get buffered without filling the buffer (and therefore the buffer is never flushed to the screen/file). Normally, Python flushes the buffers on exit so this isn't an issue.
Problem is, to avoid a bunch of tricky issues with doubled-cleanup, when using multiprocessing, the workers exit using os._exit, which bypasses all cleanup procedures (including flushing stdio buffers). If you want to be sure the output is emitted, tell print to flush the output immediately by changing:
print(f'Process: {os.getpid()}')
to:
print(f'Process: {os.getpid()}', flush=True)

Related

In Python multiprocessing. Pool, how to get the print result in the subprocess

The thing is that:
def main_fun(x):
...
print(x)
if __name__ == "__main__":
with Pool(5) as pool:
pool.map(main_fun,range(10000))
pool.close()
pool.join()
My question is that: if I run the code on my own computer, it output subprocess print result. But when I submit it as a job to the cluster, I could not see the print result until the whole programs finished. How could I fix it? By the way, the cluster uses the Slurm.

Try doing print(x, flush = True) instead of just print(x).
Flush-variant of call does immediate flush of buffers so that printing is seen right away. While non-flush variant may keep string inside buffer until later time when it is flushed to screen.
Also always do pool.close() and pool.join() inside with block (at the end), not out-side like you did. For the case of just .map() function it doesn't matter, because it is blocking call, but for some other cases it is important to close/join inside with block.

Deadlock with big object in multiprocessing.Queue

When you supply a large-enough object into multiprocessing.Queue, the program seems to hang at weird places. Consider this minimal example:
import multiprocessing
def dump_dict(queue, size):
queue.put({x: x for x in range(size)})
print("Dump finished")
if __name__ == '__main__':
SIZE = int(1e5)
queue = multiprocessing.Queue()
process = multiprocessing.Process(target=dump_dict, args=(queue, SIZE))
print("Starting...")
process.start()
print("Joining...")
process.join()
print("Done")
print(len(queue.get()))
If the SIZE parameter is small-enough (<= 1e4 at least in my case), the whole program runs smoothly without a problem, but once the SIZE is big-enough, the program hangs at weird places. Now, when searching for explanation, i.e. python multiprocessing - process hangs on join for large queue, I have always seen general answers of "you need to consume from the queue". But what seems weird is that the program actually prints Dump finished i.e. reaching the code line after putting the object into the queue. Furthermore using Queue.put_nowait instead of Queue.put did not make a difference.
Finally if you use Process.join(1) instead of Process.join() the whole process finishes with complete dictionary in the queue (i.e. the print(len(..)) line will print 10000).
Can somebody give me a little bit more insight into this?

You need to queue.get() in the parent before you process.join() to prevent a deadlock. The queue has spawned a feeder-thread with its first queue.put() and the MainThread in your worker-process is joining this feeder-thread before exiting. So the worker-process won't exit before the result is flushed to (OS-pipe-)buffer completely, but your result is too big to fit into the buffer and your parent doesn't read from the queue until the worker has exited, resulting in a deadlock.
You see the output of print("Dump finished") because the actual sending happens from the feeder-thread, queue.put() itself just appends to a collections.deque within the worker-process as an intermediate step.

Facing a similar problem, I solved it using #Darkonaut's answer and the following implementation:
import time
done = 0
while done < n: # n is the number of objects you expect to get
if not queue.empty():
done += 1
results = queue.get()
# Do something with the results
else:
time.sleep(.5)
Doesn't feel very pythonic, but it worked!

How to subprocess a big list of files using all CPUs?

I need to convert 86,000 TEX files to XML using the LaTeXML library in the command line. I tried to write a Python script to automate this with the subprocess module, utilizing all 4 cores.
def get_outpath(tex_path):
path_parts = pathlib.Path(tex_path).parts
arxiv_id = path_parts[2]
outpath = 'xml/' + arxiv_id + '.xml'
return outpath
def convert_to_xml(inpath):
outpath = get_outpath(inpath)
if os.path.isfile(outpath):
message = '{}: Already converted.'.format(inpath)
print(message)
return
try:
process = subprocess.Popen(['latexml', '--dest=' + outpath, inpath],
stderr=subprocess.PIPE,
stdout=subprocess.PIPE)
except Exception as error:
process.kill()
message = "error: %s run(*%r, **%r)" % (e, args, kwargs)
print(message)
message = '{}: Converted!'.format(inpath)
print(message)
def start():
start_time = time.time()
pool = multiprocessing.Pool(processes=multiprocessing.cpu_count(),
maxtasksperchild=1)
print('Initialized {} threads'.format(multiprocessing.cpu_count()))
print('Beginning conversion...')
for _ in pool.imap_unordered(convert_to_xml, preprints, chunksize=5):
pass
pool.close()
pool.join()
print("TIME: {}".format(total_time))
start()
The script results in Too many open files and slows down my computer. From looking at Activity Monitor, it looks like this script is trying to create 86,000 conversion subprocesses at once, and each process is trying to open a file. Maybe this is the result of pool.imap_unordered(convert_to_xml, preprints) -- maybe I need to not use map in conjunction with subprocess.Popen, since I just have too many commands to call? What would be an alternative?
I've spent all day trying to figure out the right way to approach bulk subprocessing. I'm new to this part of Python, so any tips for heading in the right direction would be much appreciated. Thanks!

In convert_to_xml, the process = subprocess.Popen(...) statements spawns a latexml subprocess.
Without a blocking call such as process.communicate(), the convert_to_xml ends even while latexml continues to run in the background.
Since convert_to_xml ends, the Pool sends the associated worker process another task to run and so convert_to_xml is called again.
Once again another latexml process is spawned in the background.
Pretty soon, you are up to your eyeballs in latexml processes and the resource limit on the number of open files is reached.
The fix is easy: add process.communicate() to tell convert_to_xml to wait until the latexml process has finished.
try:
process = subprocess.Popen(['latexml', '--dest=' + outpath, inpath],
stderr=subprocess.PIPE,
stdout=subprocess.PIPE)
process.communicate()
except Exception as error:
process.kill()
message = "error: %s run(*%r, **%r)" % (e, args, kwargs)
print(message)
else: # use else so that this won't run if there is an Exception
message = '{}: Converted!'.format(inpath)
print(message)
Regarding if __name__ == '__main__':
As martineau pointed out, there is a warning in the multiprocessing docs that
code that spawns new processes should not be called at the top level of a module.
Instead, the code should be contained inside a if __name__ == '__main__' statement.
In Linux, nothing terrible happens if you disregard this warning.
But in Windows, the code "fork-bombs". Or more accurately, the code
causes an unmitigated chain of subprocesses to be spawned, because on Windows fork is simulated by spawning a new Python process which then imports the calling script. Every import spawns a new Python process. Every Python process tries to import the calling script. The cycle is not broken until all resources are consumed.
So to be nice to our Windows-fork-bereft brethren, use
if __name__ == '__main__:
start()
Sometimes processes require a lot of memory. The only reliable way to free memory is to terminate the process. maxtasksperchild=1 tells the pool to terminate each worker process after it completes 1 task. It then spawns a new worker process to handle another task (if there are any). This frees the (memory) resources the original worker may have allocated which could not otherwise have been freed.
In your situation it does not look like the worker process is going to require much memory, so you probably don't need maxtasksperchild=1.
In convert_to_xml, the process = subprocess.Popen(...) statements spawns a latexml subprocess.
Without a blocking call such as process.communicate(), the convert_to_xml ends even while latexml continues to run in the background.
Since convert_to_xml ends, the Pool sends the associated worker process another task to run and so convert_to_xml is called again.
Once again another latexml process is spawned in the background.
Pretty soon, you are up to your eyeballs in latexml processes and the resource limit on the number of open files is reached.
The fix is easy: add process.communicate() to tell convert_to_xml to wait until the latexml process has finished.
try:
process = subprocess.Popen(['latexml', '--dest=' + outpath, inpath],
stderr=subprocess.PIPE,
stdout=subprocess.PIPE)
process.communicate()
except Exception as error:
process.kill()
message = "error: %s run(*%r, **%r)" % (e, args, kwargs)
print(message)
else: # use else so that this won't run if there is an Exception
message = '{}: Converted!'.format(inpath)
print(message)
The chunksize affects how many tasks a worker performs before sending the result back to the main process.
Sometimes this can affect performance, especially if interprocess communication is a signficant portion of overall runtime.
In your situation, convert_to_xml takes a relatively long time (assuming we wait until latexml finishes) and it simply returns None. So interprocess communication probably isn't a significant portion of overall runtime. Therefore, I don't expect you would find a significant change in performance in this case (though it never hurts to experiment!).
In plain Python, map should not be used just to call a function multiple times.
For a similar stylistic reason, I would reserve using the pool.*map* methods for situations where I cared about the return values.
So instead of
for _ in pool.imap_unordered(convert_to_xml, preprints, chunksize=5):
pass
you might consider using
for preprint in preprints:
pool.apply_async(convert_to_xml, args=(preprint, ))
instead.
The iterable passed to any of the pool.*map* functions is consumed
immediately. It doesn't matter if the iterable is an iterator. There is no
special memory benefit to using an iterator here. imap_unordered returns an
iterator, but it does not handle its input in any especially iterator-friendly
way.
No matter what type of iterable you pass, upon calling the pool.*map* function the iterable is
consumed and turned into tasks which are put into a task queue.
Here is code which corroborates this claim:
version1.py:
import multiprocessing as mp
import time
def foo(x):
time.sleep(0.1)
return x * x
def gen():
for x in range(1000):
if x % 100 == 0:
print('Got here')
yield x
def start():
pool = mp.Pool()
for item in pool.imap_unordered(foo, gen()):
pass
pool.close()
pool.join()
if __name__ == '__main__':
start()
version2.py:
import multiprocessing as mp
import time
def foo(x):
time.sleep(0.1)
return x * x
def gen():
for x in range(1000):
if x % 100 == 0:
print('Got here')
yield x
def start():
pool = mp.Pool()
for item in gen():
result = pool.apply_async(foo, args=(item, ))
pool.close()
pool.join()
if __name__ == '__main__':
start()
Running version1.py and version2.py both produce the same result.
Got here
Got here
Got here
Got here
Got here
Got here
Got here
Got here
Got here
Got here
Crucially, you will notice that Got here is printed 10 times very quickly at
the beginning of the run, and then there is a long pause (while the calculation
is done) before the program ends.
If the generator gen() were somehow consumed slowly by pool.imap_unordered,
we should expect Got here to be printed slowly as well. Since Got here is
printed 10 times and quickly, we can see that the iterable gen() is being
completely consumed well before the tasks are completed.
Running these programs should hopefully give you confidence that
pool.imap_unordered and pool.apply_async are putting tasks in the queue
essentially in the same way: immediate after the call is made.

Python multithreaded print statements delayed until all threads complete execution

I have a piece of code below that creates a few threads to perform a task, which works perfectly well on its own. However I'm struggling to understand why the print statements I call in my function do not execute until all threads complete and the print 'finished' statement is called. I would expect them to be called as the thread executes. Is there any simple way to accomplish this, and why does this work this way in the first place?
def func(param):
time.sleep(.25)
print param*2
if __name__ == '__main__':
print 'starting execution'
launchTime = time.clock()
params = range(10)
pool=multiprocessing.Pool(processes=100) #use N processes to download the data
_=pool.map(func,params)
print 'finished'

For python 3 you can now use the flush param like that:
print('Your text', flush=True)

This happens due to stdout buffering. You still can flush the buffers:
import sys
print 'starting'
sys.stdout.flush()
You can find more info on this issue here and here.

Having run into plenty of issues around this and garbled outputs (especially under Windows when adding colours to the output..), my solution has been to have an exclusive printing thread which consumes a queue
If this still doesn't work, also add flush=True to your print statement(s) as suggested by #Or Duan
Further, you may find the "most correct", but a heavy-handed approach to displaying messages with threading is to use the logging library which can wrap a queue (and write to many places asynchronously, including stdout) or write to a system-level queue (outside Python; availability depends greatly on OS support)
import threading
from queue import Queue
def display_worker(display_queue):
while True:
line = display_queue.get()
if line is None: # simple termination logic, other sentinels can be used
break
print(line, flush=True) # remove flush if slow or using Python2
def some_other_worker(display_queue, other_args):
# NOTE accepts queue reference as an argument, though it could be a global
display_queue.put("something which should be printed from this thread")
def main():
display_queue = Queue() # synchronizes console output
screen_printing_thread = threading.Thread(
target=display_worker,
args=(display_queue,),
)
screen_printing_thread.start()
### other logic ###
display_queue.put(None) # end screen_printing_thread
screen_printing_thread.stop()

Problems mixing threads/processes in python [duplicate]

I have a piece of code below that creates a few threads to perform a task, which works perfectly well on its own. However I'm struggling to understand why the print statements I call in my function do not execute until all threads complete and the print 'finished' statement is called. I would expect them to be called as the thread executes. Is there any simple way to accomplish this, and why does this work this way in the first place?
def func(param):
time.sleep(.25)
print param*2
if __name__ == '__main__':
print 'starting execution'
launchTime = time.clock()
params = range(10)
pool=multiprocessing.Pool(processes=100) #use N processes to download the data
_=pool.map(func,params)
print 'finished'

For python 3 you can now use the flush param like that:
print('Your text', flush=True)

This happens due to stdout buffering. You still can flush the buffers:
import sys
print 'starting'
sys.stdout.flush()
You can find more info on this issue here and here.

Having run into plenty of issues around this and garbled outputs (especially under Windows when adding colours to the output..), my solution has been to have an exclusive printing thread which consumes a queue
If this still doesn't work, also add flush=True to your print statement(s) as suggested by #Or Duan
Further, you may find the "most correct", but a heavy-handed approach to displaying messages with threading is to use the logging library which can wrap a queue (and write to many places asynchronously, including stdout) or write to a system-level queue (outside Python; availability depends greatly on OS support)
import threading
from queue import Queue
def display_worker(display_queue):
while True:
line = display_queue.get()
if line is None: # simple termination logic, other sentinels can be used
break
print(line, flush=True) # remove flush if slow or using Python2
def some_other_worker(display_queue, other_args):
# NOTE accepts queue reference as an argument, though it could be a global
display_queue.put("something which should be printed from this thread")
def main():
display_queue = Queue() # synchronizes console output
screen_printing_thread = threading.Thread(
target=display_worker,
args=(display_queue,),
)
screen_printing_thread.start()
### other logic ###
display_queue.put(None) # end screen_printing_thread
screen_printing_thread.stop()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python multiprocessing: print() inside apply_async() - python

Related

In Python multiprocessing. Pool, how to get the print result in the subprocess

Deadlock with big object in multiprocessing.Queue

How to subprocess a big list of files using all CPUs?

Python multithreaded print statements delayed until all threads complete execution

Problems mixing threads/processes in python [duplicate]

Categories

Resources