Using gevent monkey patching with threading makes thread work serially - python

I am using gevent and I am monkey patching everything.
It seems like the monkey patching causes the threading to work serially.
My code:
import threading
from gevent import monkey; monkey.patch_all()
class ExampleThread(threading.Thread):
def run(self):
do_stuff() # takes a few minutes to finish
print 'finished working'
if __name__ == '__main__':
worker = ExampleThread()
worker.start()
print 'this should be printed before the worker finished'
So the thread is not working as expected.
But if I remove the monkey.patch_all() it is working fine.
The problem is that I need the monkey.patch_all() for using gevent (now shown in the code above)
My solution:
I changed the
monkey.patch_all()
to
monkey.patch_all(thread=False)
so I am not patching the thread.

When threads are monkey patched in gevent, they behave as coroutines. This means that you have to explicitly yield control to make it possible for other coroutines to execute.
The way to do this is call a blocking operation that has been patched (this will yield automatically) or gevent.sleep:
#!/usr/bin/env python
from gevent import monkey, sleep
monkey.patch_all()
import threading
class ExampleThread(threading.Thread):
def run(self):
for i in xrange(10):
print 'working'
sleep()
if __name__ == '__main__':
worker = ExampleThread()
worker.start()
print 'this will be printed after the first call to sleep'

You can leave your Thread-based class in place if you substitute the Thread with Greenlet, for example like this:
from gevent import monkey
from gevent import Greenlet
from threading import Thread
class ThreadLikeGreenlet(Greenlet):
def __init__(self, name=None, target=None, args=(), kwargs=()):
super().__init__(target, *args, **dict(kwargs))
self.name = name
def is_gevent_patched():
return monkey.is_module_patched('threading')
if is_gevent_patched():
Thread = ThreadLikeGreenlet # substitute Thread with Greenlet
class ExampleThread(Thread):
...
it will work as you wish then.

Related

Python thread.Timer() not works in my process

import os
import sys
from multiprocessing import Process, Queue
import threading
class Test:
def __init__(self):
print '__init__ is called'
def say_hello_again_and_again(self):
print 'Hello :D'
threading.Timer(1, self.say_hello_again_and_again).start()
test = Test()
#test.say_hello_again_and_again()
process = Process(target=test.say_hello_again_and_again)
process.start()
this is test code.
the result:
pi#raspberrypi:~/Plant2 $ python test2.py
__init__ is called
Hello :D
If I use test.say_hello_again_and_again() , "Hello :D" is printed repeatedly.
But, process is not working as I expected. Why is "Hello :D" not being printed in my process?
What's happening in my process?
There are two problems with your code:
First: You start a process with start(). This is doing a fork, that means now you have two processes, the parent and the child running side by side. Now, the parent process immediately exits, because after start() it's the end of the program. To wait until the child has finished (which in your case is never), you have to add process.join().
I did test your suggestion, but it not works
Indeed. There is a second issue: You start a new thread with threading.Timer(1, ...).start() but then immediately end the process. Now, you don't wait until your thread started because the underlying process immediately dies. You'd need to also wait until the thread has stopped with join().
Now that's how your program would look like:
from multiprocessing import Process
import threading
class Test:
def __init__(self):
print '__init__ is called'
def say_hello_again_and_again(self):
print 'Hello :D'
timer = threading.Timer(1, self.say_hello_again_and_again)
timer.start()
timer.join()
test = Test()
process = Process(target=test.say_hello_again_and_again)
process.start()
process.join()
But this is suboptimal at best because you mix multiprocessing (which is using fork to start independent processes) and threading (which starts a thread within the process). While this is not really a problem it makes debugging a lot harder (one problem e.g. with the code above is that you can't stop it with ctrl-c because of some reason your spawned process is inherited by the OS and kept running). Why don't you just do this?
from multiprocessing import Process, Queue
import time
class Test:
def __init__(self):
print '__init__ is called'
def say_hello_again_and_again(self):
while True:
print 'Hello :D'
time.sleep(1)
test = Test()
process = Process(target=test.say_hello_again_and_again)
process.start()
process.join()

running a 2nd zmq.eventloop.ioloop

I want to create a PyZMQ eventloop in a background thread, and have it work correctly with both standalone Python scripts and IPython scripts. (IPython uses PyZMQ eventloops located in the main thread, so this is causing me problems and why I want to start a private ioloop in a background thread.)
I want to run code in Thread A while having the PyZMQ eventloop handle received data from a socket in Thread B. There are times in Thread A where I will need to wait for an event set in Thread B.
How can I get this to work? There seems to be something wrong if I try in IPython:
from zmq.eventloop import ioloop
import threading
class IOBackgroundLoop(object):
def __init__(self):
self._loop = None
self._thread = threading.Thread(target=self.run)
self._thread.daemon = True
self._started = threading.Event()
#property
def loop(self):
return self._loop
def run(self):
self._loop = ioloop.IOLoop()
self._loop.initialize()
self._loop.make_current()
self._started.set()
self._loop.start()
def start(self):
self._thread.start()
self._started.wait()
bkloop = IOBackgroundLoop()
bkloop.start()
for loop in [bkloop.loop, ioloop.IOLoop.instance()]:
print "%s running: %s" % (loop, loop._running)
This prints out two separate instances of IOLoop, but if I go to use it, it doesn't seem to work. I can't think of a small example program to demonstrate this; I've tried with a timeout function:
import time
def print_timestamp(key):
print "%s: %s" % (time.time(), key)
for loop in [bkloop.loop, ioloop.IOLoop.instance()]:
loop.add_timeout(bkloop.loop.time() + 1.0, lambda: print_timestamp("hi from %s" % loop))
print_timestamp("here")
time.sleep(2.0)
print_timestamp("there")
and I get this as a result (no "hi":
1412889057.68: here
1412889059.68: there
1412889059.68: here
1412889061.68: there
Then when I hit another shift+Enter, I get
1412889061.68: hi from <zmq.eventloop.ioloop.ZMQIOLoop object at 0x000000000467E4E0>
which is the IOLoop object from the main thread, but my private instance IOLoop never prints hi.
What could I be doing wrong?
Argh, I just noticed this in the tornado docs:
Note that it is not safe to call add_timeout from other threads.
Instead, you must use add_callback to transfer control to the
IOLoop's thread, and then call add_timeout from there.
It also appears as though the zmq.eventloop.zmqstream needs to be setup in the same thread as the ioloop for it to work properly.

Deadlock with logging multiprocess/multithread python script

I am facing the problem with collecting logs from the following script.
Once I set up the SLEEP_TIME to too "small" value, the LoggingThread
threads somehow block the logging module. The script freeze on logging request
in the action function. If the SLEEP_TIME is about 0.1 the script collect
all log messages as I expect.
I tried to follow this answer but it does not solve my problem.
import multiprocessing
import threading
import logging
import time
SLEEP_TIME = 0.000001
logger = logging.getLogger()
ch = logging.StreamHandler()
ch.setFormatter(logging.Formatter('%(asctime)s %(levelname)s %(funcName)s(): %(message)s'))
ch.setLevel(logging.DEBUG)
logger.setLevel(logging.DEBUG)
logger.addHandler(ch)
class LoggingThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
while True:
logger.debug('LoggingThread: {}'.format(self))
time.sleep(SLEEP_TIME)
def action(i):
logger.debug('action: {}'.format(i))
def do_parallel_job():
processes = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=processes)
for i in range(20):
pool.apply_async(action, args=(i,))
pool.close()
pool.join()
if __name__ == '__main__':
logger.debug('START')
#
# multithread part
#
for _ in range(10):
lt = LoggingThread()
lt.setDaemon(True)
lt.start()
#
# multiprocess part
#
do_parallel_job()
logger.debug('FINISH')
How to use logging module in multiprocess and multithread scripts?
This is probably bug 6721.
The problem is common in any situation where you have locks, threads and forks. If thread 1 had a lock while thread 2 calls fork, in the forked process, there will only be thread 2 and the lock will be held forever. In your case, that is logging.StreamHandler.lock.
A fix can be found here (permalink) for the logging module. Note that you need to take care of any other locks, too.
I've run into similar issue just recently while using logging module together with Pathos multiprocessing library. Still not 100% sure, but it seems, that in my case the problem may have been caused by the fact, that logging handler was trying to reuse a lock object from within different processes.
Was able to fix it with a simple wrapper around default logging Handler:
import threading
from collections import defaultdict
from multiprocessing import current_process
import colorlog
class ProcessSafeHandler(colorlog.StreamHandler):
def __init__(self):
super().__init__()
self._locks = defaultdict(lambda: threading.RLock())
def acquire(self):
current_process_id = current_process().pid
self._locks[current_process_id].acquire()
def release(self):
current_process_id = current_process().pid
self._locks[current_process_id].release()
By default, multiprocessing will fork() the process in the pool when running on Linux. The resulting subprocess will lose all running threads except for the main one. So if you're on Linux, that's the problem.
First action item: You shouldn't ever use the fork()-based pool; see https://pythonspeed.com/articles/python-multiprocessing/ and https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods.
On Windows, and I think newer versions of Python on macOS, the "spawn"-based pool is used. This is also what you ought use on Linux. In this setup, a new Python process is started. As you would expect, the new process doesn't have any of the threads from the parent process, because it's a new process.
Second action item: you'll want to have logging setup done in each subprocess in the pool; the logging setup for the parent process isn't sufficient to get logs from the worker processes. You do this with the initializer keyword argument to Pool, e.g. write a function called setup_logging() and then do pool = multiprocessing.Pool(initializer=setup_logging) (https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing.pool).

Allow Greenlet to finish work at end of main module execution

I'm making a library that uses gevent to do some work asynchronously. I'd like to guarantee that the work is completed, even if the main module finishes execution.
class separate_library(object):
def __init__(self):
import gevent.monkey; gevent.monkey.patch_all()
def do_work(self):
from gevent import spawn
spawn(self._do)
def _do(self):
from gevent import sleep
sleep(1)
print 'Done!'
if __name__ == '__main__':
lib = separate_library()
lib.do_work()
If you run this, you'll notice the program ends immediately, and Done! doesn't get printed.
Now, the main module doesn't know, or care, how separate_library actually accomplishes the work (or even that gevent is being used), so it's unreasonable to require joining there.
Is there any way separate_library can detect certain types of program exits, and stall until the work is done? Keyboard interrupts, SIGINTs, and sys.exit() should end the program immediately, as that is probably the expected behaviour.
Thanks!
Try using a new thread that is not a daemon thread that spawns your gevent threads. Your program will not exit due to this non daemon thread.
import gevent
import threading
class separate_library(object):
def __init__(self):
import gevent.monkey; gevent.monkey.patch_all()
def do_work(self):
t = threading.Thread(target=self.spawn_gthreads)
t.setDaemon(False)
t.start()
def spawn_gthreads(self):
from gevent import spawn
gthreads = [spawn(self._do,x) for x in range(10)]
gevent.joinall(gthreads)
def _do(self,sec):
from gevent import sleep
sleep(sec)
print 'Done!'
if __name__ == '__main__':
lib = separate_library()
lib.do_work()

Python core application

I'm new to Python and I'm writing a script that
includes some timed routines.
My current approach is to instantiate a class
that includes those Timers (from: threading.Timer),
but I don't want the script to return when it gets to the
end of the function:
import mytimer
timer = mytimer()
Suppose I have a imple script like that one. All it
does is instantiate a mytimer object which performs a series
of timed activities.
In order for the application not to exit, I could use Qt like this:
from PyQt4.QtCore import QCoreApplication
import mytimer
import sys
def main():
app = QCoreApplication(sys.argv)
timer = mytimer()
sys.exit(app.exec_())
if __name__ == '__main__':
main()
This way, the sys.exit() call won't return immediately, and the
timer would just keep doing its thing 'forever' in background.
Although this is a solution I've used before, using Qt just for this doesn't
fell right to me.
So my question is, Is there any way to accomplish this using standard Python?
Thanks
Create a function in your script which tests a select or poll object to terminate a loop. Check out serve_forever in SocketServer.py from the standard library as an example.
A Google search for "python timer" finds:
http://docs.python.org/library/sched.html
http://docs.python.org/release/2.5.2/lib/timer-objects.html
The sched module seems to be exactly what you need.
Example:
>>> import sched, time
>>> s = sched.scheduler(time.time, time.sleep)
>>> def print_time(): print "From print_time", time.time()
...
>>> def print_some_times():
... print time.time()
... s.enter(5, 1, print_time, ())
... s.enter(10, 1, print_time, ())
... s.run()
... print time.time()
...
>>> print_some_times()
930343690.257
From print_time 930343695.274
From print_time 930343700.273
930343700.276
Once you have built your queue of times for things to happen, you just call the .run() method on your sched instance, and it will automatically wait until the queue is emptied, then will complete. So you can just put s.run() as the last thing in your script, and it will automatically exit only when the timed tasks are all done.
import mytimer
import sys
from threading import Lock
lock = Lock()
lock.acquire() # put lock into locked state
def main():
timer = mytimer()
lock.acquire() # blocks until someone calls lock.release()
if __name__ == '__main__':
main()
If you want a clean exit, you can just make mytimer() call lock.release() at some point.

Categories

Resources