Python multithreaded print statements delayed until all threads complete execution - python

I have a piece of code below that creates a few threads to perform a task, which works perfectly well on its own. However I'm struggling to understand why the print statements I call in my function do not execute until all threads complete and the print 'finished' statement is called. I would expect them to be called as the thread executes. Is there any simple way to accomplish this, and why does this work this way in the first place?
def func(param):
time.sleep(.25)
print param*2
if __name__ == '__main__':
print 'starting execution'
launchTime = time.clock()
params = range(10)
pool=multiprocessing.Pool(processes=100) #use N processes to download the data
_=pool.map(func,params)
print 'finished'

For python 3 you can now use the flush param like that:
print('Your text', flush=True)

This happens due to stdout buffering. You still can flush the buffers:
import sys
print 'starting'
sys.stdout.flush()
You can find more info on this issue here and here.

Having run into plenty of issues around this and garbled outputs (especially under Windows when adding colours to the output..), my solution has been to have an exclusive printing thread which consumes a queue
If this still doesn't work, also add flush=True to your print statement(s) as suggested by #Or Duan
Further, you may find the "most correct", but a heavy-handed approach to displaying messages with threading is to use the logging library which can wrap a queue (and write to many places asynchronously, including stdout) or write to a system-level queue (outside Python; availability depends greatly on OS support)
import threading
from queue import Queue
def display_worker(display_queue):
while True:
line = display_queue.get()
if line is None: # simple termination logic, other sentinels can be used
break
print(line, flush=True) # remove flush if slow or using Python2
def some_other_worker(display_queue, other_args):
# NOTE accepts queue reference as an argument, though it could be a global
display_queue.put("something which should be printed from this thread")
def main():
display_queue = Queue() # synchronizes console output
screen_printing_thread = threading.Thread(
target=display_worker,
args=(display_queue,),
)
screen_printing_thread.start()
### other logic ###
display_queue.put(None) # end screen_printing_thread
screen_printing_thread.stop()

Related

Python multiprocessing: print() inside apply_async()

print() inside the function that is passed to multiprocessing's apply_async() does not print out anything.
I want to eventually use apply_async to process a large text file in chunks. Therefore, I want the script to print out on the screen how many lines have been processed. However, I don't see any print out at all.
I've attached a toy code. Each foo() call should tell me what process is being used. In my actual code, I will call foo() on each chunk, and it will tell me how many lines of text in that chunk I've processed.
import os
from multiprocessing import Pool
def foo(x,y):
print(f'Process: {os.getpid()}')
return(x*y)
def bar(x):
p = Pool()
result_list = []
for i in range(30):
p.apply_async(foo, args=(i,i*x), callback=result_list.append)
p.close()
p.join()
return(result_list)
if __name__ == '__main__':
print(bar(2))
I got a print out of the multiplication x*y result, but I didn't see any print out that tells me the process id.
Can anyone help me please?
Your sys.stdout is likely block buffered, which means a small number of prints can get buffered without filling the buffer (and therefore the buffer is never flushed to the screen/file). Normally, Python flushes the buffers on exit so this isn't an issue.
Problem is, to avoid a bunch of tricky issues with doubled-cleanup, when using multiprocessing, the workers exit using os._exit, which bypasses all cleanup procedures (including flushing stdio buffers). If you want to be sure the output is emitted, tell print to flush the output immediately by changing:
print(f'Process: {os.getpid()}')
to:
print(f'Process: {os.getpid()}', flush=True)

continuosly running function in background in python

I want to run a function continuoulsy in parallel to my main process.How do i do it in python?multiprocessing?threading or thread module?
I am new to python.Any help much appreciated.
If the aim is to capture stderr and do some action you can simply replace sys.stderr by a custom object:
>>> import sys
>>> class MyLogger(object):
... def __init__(self, callback):
... self._callback = callback
... def write(self, text):
... if 'log' in text:
... self._callback(text)
... sys.__stderr__.write(text) # continue writing to normal stderr
...
>>> def the_callback(s):
... print('Stderr: %r' % s)
...
>>> sys.stderr = MyLogger(the_callback)
>>> sys.stderr.write('Some log message\n')
Stderr: 'Some log message'
Some log message
>>>
>>> sys.stderr.write('Another message\n')
Another message
If you want to handle tracebacks and exceptions you can use sys.excepthook.
If you want to capture logs created by the logging module you can implement your own Handler class similar to the above Logger but reimplementing the emit method.
A more interesting, but less practical solution would be to use some kind of scheduler and generators to simulate parallel execution without actually creating threads(searching on the internet will yield some nice results about this)
It definitely depends on your aim, but I'd suggest looking at the threading module. There are many good StackOverflow questions on the use of threading and multithreading (e.g., Multiprocessing vs Threading Python).
Here's a brief skeleton from one of my projects:
import threading # Threading module itself
import Queue # A handy way to pass tasks to your thread
job_queue = Queue.Queue()
job_queue.append('one job to do')
# This is the function that we want to keep running while our program does its thing
def function_to_run_in_background():
# Do something...here is one form of flow control
while True:
job_to_do = job_queue.get() # Get the task from the Queue
print job_to_do # Print what it was we fetched
job_queue.task_done() # Signal that we've finished with that queue item
# Launch the thread...
t = threadingThread(target=function_to_run_in_background, args=(args_to_pass,))
t.daemon = True # YOU MAY NOT WANT THIS: Only use this line if you want the program to exit without waiting for the thread to finish
t.start() # Starts the thread
t.setName('threadName') # Makes it easier to interact with the thread later
# Do other stuff
sleep(5)
print "I am still here..."
job_queue.append('Here is another job for the thread...')
# Wait for everything in job_queue to finish. Since the thread is a daemon, the program will now exit, killing the thread.
job_queue.join()
if you just want to run a function in background in the same process, do:
import thread
def function(a):
pass
thread.start_new(function, (1,)) # a is 1 then
I found that client-server architecture was solution for me. Running server, and spawning many clients talking to server and between clients directly, something like messenger.
Talking/comunication can be achieved through network or text file located in memory, (to speed things up and save hard drive).
Bakuriu: give u a good tip about logging module.

Problems mixing threads/processes in python [duplicate]

I have a piece of code below that creates a few threads to perform a task, which works perfectly well on its own. However I'm struggling to understand why the print statements I call in my function do not execute until all threads complete and the print 'finished' statement is called. I would expect them to be called as the thread executes. Is there any simple way to accomplish this, and why does this work this way in the first place?
def func(param):
time.sleep(.25)
print param*2
if __name__ == '__main__':
print 'starting execution'
launchTime = time.clock()
params = range(10)
pool=multiprocessing.Pool(processes=100) #use N processes to download the data
_=pool.map(func,params)
print 'finished'
For python 3 you can now use the flush param like that:
print('Your text', flush=True)
This happens due to stdout buffering. You still can flush the buffers:
import sys
print 'starting'
sys.stdout.flush()
You can find more info on this issue here and here.
Having run into plenty of issues around this and garbled outputs (especially under Windows when adding colours to the output..), my solution has been to have an exclusive printing thread which consumes a queue
If this still doesn't work, also add flush=True to your print statement(s) as suggested by #Or Duan
Further, you may find the "most correct", but a heavy-handed approach to displaying messages with threading is to use the logging library which can wrap a queue (and write to many places asynchronously, including stdout) or write to a system-level queue (outside Python; availability depends greatly on OS support)
import threading
from queue import Queue
def display_worker(display_queue):
while True:
line = display_queue.get()
if line is None: # simple termination logic, other sentinels can be used
break
print(line, flush=True) # remove flush if slow or using Python2
def some_other_worker(display_queue, other_args):
# NOTE accepts queue reference as an argument, though it could be a global
display_queue.put("something which should be printed from this thread")
def main():
display_queue = Queue() # synchronizes console output
screen_printing_thread = threading.Thread(
target=display_worker,
args=(display_queue,),
)
screen_printing_thread.start()
### other logic ###
display_queue.put(None) # end screen_printing_thread
screen_printing_thread.stop()

Why does this python program sometimes fail to exit?

I wrote a test program, which has two processes. The father process gets data from a Queue, and the child puts data into it. There is a signal handler which tells the program to exit. However, it does not exit sometimes when I send the signal SIGTERM to the pid(child process) I printed, and it seems to be having a deadlock.
import os
import sys
import multiprocessing
import time
import signal
bStop = False
def worker(que):
signal.signal(signal.SIGTERM,sighandler)
print 'worker:',os.getpid()
for i in range(100000000):
que.put(i)
print 'STOP'
def sighandler(num,frame):
print 'catch signal'
q.put('STOP')
sys.exit(0)
q = multiprocessing.Queue(100)
p = multiprocessing.Process(target=worker,args=(q,))
p.start()
for item in iter(q.get,'STOP'):
print 'get',item
pass
print 'main stop'
p.join()
Unless you are running python 3 you should be using xrange instead of range for a loop that large. Python tends to choke once it exceeds a certain list size and so you really really need to move to generators by that point.
That very well could be the issue your seeing right now.

Python: run one function until another function finishes

I have two functions, draw_ascii_spinner and findCluster(companyid).
I would like to:
Run findCluster(companyid) in the backround and while its processing....
Run draw_ascii_spinner until findCluster(companyid) finishes
How do I begin to try to solve for this (Python 2.7)?
Use threads:
import threading, time
def wrapper(func, args, res):
res.append(func(*args))
res = []
t = threading.Thread(target=wrapper, args=(findcluster, (companyid,), res))
t.start()
while t.is_alive():
# print next iteration of ASCII spinner
t.join(0.2)
print res[0]
You can use multiprocessing. Or, if findCluster(companyid) has sensible stopping points, you can turn it into a generator along with draw_ascii_spinner, to do something like this:
for tick in findCluster(companyid):
ascii_spinner.next()
Generally, you will use Threads. Here is a simplistic approach which assumes, that there are only two threads: 1) the main thread executing a task, 2) the spinner thread:
#!/usr/bin/env python
import time
import thread
def spinner():
while True:
print '.'
time.sleep(1)
def task():
time.sleep(5)
if __name__ == '__main__':
thread.start_new_thread(spinner, ())
# as soon as task finishes (and so the program)
# spinner will be gone as well
task()
This can be done with threads. FindCluster runs in a separate thread and when done, it can simply signal another thread that is polling for a reply.
You'll want to do some research on threading, the general form is going to be this
Create a new thread for findCluster and create some way for the program to know the method is running - simplest in Python is just a global boolean
Run draw_ascii_spinner in a while loop conditioned on whether it is still running, you'll probably want to have this thread sleep for a short period of time between iterations
Here's a short tutorial in Python - http://linuxgazette.net/107/pai.html
Run findCluster() in a thread (the Threading module makes this very easy), and then draw_ascii_spinner until some condition is met.
Instead of using sleep() to set the pace of the spinner, you can wait on the thread's wait() with a timeout.
It is possible to have a working example? I am new in Python. I have 6 tasks to run in one python program. These 6 tasks should work in coordinations, meaning that one should start when another finishes. I saw the answers , but I couldn't adopted the codes you shared to my program.
I used "time.sleep" but I know that it is not good because I cannot know how much time it takes each time.
# Sending commands
for i in range(0,len(cmdList)): # port Sending commands
cmd = cmdList[i]
cmdFull = convert(cmd)
port.write(cmd.encode('ascii'))
# s = port.read(10)
print(cmd)
# Terminate the command + close serial port
port.write(cmdFull.encode('ascii'))
print('Termination')
port.close()
# time.sleep(1*60)

Categories

Resources