Downside of using patched threading vs native gevent greenlets? - python

My understanding is that once I have called gevent.monkey.patch_all(), the standard threading module is modified to use greenlets instead of python threads. So if I write my application in terms of python threads, locks, semaphores etc, and then call patch_all, am I getting the full benefit of gevent, or am I losing out on something compared with using the explicit gevent equivalents?
The motivation behind this question is that I am writing a module which uses some threads/greenlets, and I am deciding whether it is useful to have an explicit switch between using gevent and using threading, or whether I can just use threading+patch_all without losing anything.
To put it in code, is this...
def myfunction():
print 'ohai'
Greenlet.spawn(myfunction)
...any different to this?
import gevent.monkey
gevent.monkey.patch_all()
def mythread(threading.Thread):
def run(self):
print 'ohai'
mythread().start()

At least your will loose some of greenlet-specific methods: link, kill, join etc.
Also you can't use threads with, for example, gevent.pool module, that can be very useful.
And there is a very little overhead for creating Thread object.

Related

Using third-party Python module that blocks Flask application

My application that uses websockets also makes use of several third-party Python modules that appear to be written in way that blocks the rest of the application when called. For example, I use xlrd to parse Excel files a user has uploaded.
I've monkey patched the builtins like this in the first lines of the application:
import os
import eventlet
if os.name == 'nt':
eventlet.monkey_patch(os=False)
else:
eventlet.monkey_patch()
Then I use the following to start the task that contains calls to xlrd.
socketio.start_background_task(my_background_task)
What is the appropriate way to now call these other modules so that my application runs smoothly? Is the multiprocessing module to start another process within the greened thread the right way?
First you should try thread pool [1].
If that doesn't work as well as you want, please submit an issue [2] and go with multiprocessing as workaround.
eventlet.tpool.execute(xlrd_read, file_path, other=arg)
Execute meth in a Python thread, blocking the current coroutine/ greenthread until the method completes.
The primary use case for this is to wrap an object or module that is not amenable to monkeypatching or any of the other tricks that Eventlet uses to achieve cooperative yielding. With tpool, you can force such objects to cooperate with green threads by sticking them in native threads, at the cost of some overhead.
[1] http://eventlet.net/doc/threading.html
[2] https://github.com/eventlet/eventlet/issues

Gevent's libev, and Twisted's reactor

I'm trying to figure out how Gevent works with respect to other asynchronous frameworks in python, like Twisted.
The key difference between Gevent and Twisted is that Gevent uses greenlets and monkey patching the standard library for an implicit behavior and a synchronous programming model whereas Twisted requires specific libraries and callbacks for an explicit behavior. The event loop in Gevent is libev/libevent, which is written in C, and the event loop in Twisted is the reactor, which is written in python.
Is there anything special about libev/libevent that allows for this implicit behavior? Why not use an event loop written in Python? Conversely, why isn't Twisted using libev/libevent? Is there any particular reason? Maybe it was simply a design choice and could have gone either way...
Theoretically, can Gevent's libev be replaced with another event loop, written in python, like Twisted's reactor? And can Twisted's reactor be replaced with libev?
Short answer: Twisted is a network framework. Gevent tries to act as a library without requiring from the programmer to change the way he programs. That's their focus.. and not so much how that is achieved under the hood.
Long answer:
All asyncio libraries (Gevent, Asyncio, etc.) work pretty much the same:
Have a main loop running endlessly on a single thread.
When an event occurs, it's captured by the main loop.
The main loop decides based on different rules (scheduling) if it should continue checking for events or switch temporarily and give control to any subscriber functions to the event.
greenlet is a different library. It's very simple in that it just changes the order that Python code is run and lets you change jumping back and forth between functions. Gevent uses it under the hood to implement its async features.
asyncio which comes with Python3 is like gevent. The big difference is the interface again. It requires the programmer to mark functions with async and allow him to explicitly wait for a subscribed function in the main loop with await.
Gevent is like asyncio. But instead of the keywords it patches existing code where appropriate. It uses greenlet under the hood to switch between main loop and subscribed functions and make it all work seamlessly.
Twisted as mentioned feels more like a framework than a library. It requires the programmer to follow very specific ways to achieve concurrency. Again though it has a main loop under the hood called reactor like everything else.
Back to your initial question: You can in theory replace the reactor with any loop (including gevent). But that would defeat the purpose. Probably Twisted's team decided to use their own version of a main loop for optimisation reasons. All these libraries use different scheduling in their main loops to meet their needs.

Need help understanding python threading flavors

I'm into threads now and exploring thread and threading libraries. When I started with them, I wrote 2 basic programs. The following are the 2 programs with their corresponding outputs:
threading_1.py :
import threading
def main():
t1=threading.Thread(target=prints,args=(3,))
t2=threading.Thread(target=prints,args=(5,))
t1.start()
t2.start()
t1.join()
t2.join()
def prints(i):
while(i>0):
print "i="+str(i)+"\n"
i=i-1
if __name__=='__main__':
main()
output :
i=3
i=2
i=5
i=4
i=1
i=3
i=2
i=1
thread_1.py
import thread
import threading
def main():
t1=thread.start_new_thread(prints,(3,))
t2=thread.start_new_thread(prints,(5,))
t1.start()
t2.start()
t1.join()
t2.join()
def prints(i):
while(i>0):
print "i="+str(i)+"\n"
i=i-1
if __name__=='__main__':
main()
output :
Traceback (most recent call last):
i=3
File "thread_1.py", line 19, in <module>
i=2
i=1
main()
i=5
i=4
i=3
i=2
i=1
File "thread_1.py", line 8, in main
t1.start()
AttributeError: 'int' object has no attribute 'start'
My desired output is as in threading_1.py where interleaved prints makes it a convincing example of thread executions. My understanding is that "threading" is a higher-class library compared to "thread". And the AttributeError I get in thread_1.py is because I am operating on a thread started from thread library and not threading.
So, now my question is - how do I achieve an output similar to the output of threading_1.py using thread_1.py. Can the program be modified or tuned to produce the same result?
Short answer: ignore the thread module and just use threading.
The thread and threading module serve quite different purposes. The thread module is a low-level module written in C, designed to abstract away platform differences and provide a minimal cross-platform set of primitives (essentially, threads and simple locks) that can serve as a foundation for higher-level APIs. If you were porting Python to a new platform that didn't support existing threading APIs (like POSIX threads, for example), then you'd have to edit the thread module source so that you could wrap the appropriate OS-level calls to provide those same primitives on your new platform.
As an example, if you look at the current CPython implementation, you'll see that a Python Lock is based on unnamed POSIX semaphores on Linux, on a combination of a POSIX condition variable and a POSIX mutex on OS X (which doesn't support unnamed semaphores), and on an Event and a collection of Windows-specific library calls providing various atomic operations on Windows. As a Python user, you don't want to have to care about those details. The thread module provides the abstraction layer that lets you build higher-level code without worrying about platform-level details.
As such, the thread module is really there as a convenience for those developing Python, rather than for those using it: it's not something that normal Python users are expected to need to deal with. For that reason, the module has been renamed to _thread in Python 3: the leading underscore indicates that it's private, and that users shouldn't rely on its API or behaviour going forward.
In contrast, the threading-module is a Java-inspired module written in Python. It builds on the foundations laid by the thread module to provide a convenient API for starting and joining threads, and a broad set of concurrency primitives (re-entrant locks, events, condition variables, semaphores, barriers and so on) for users. This is almost always the module that you as a Python user want to be using. If you're interested in what's going on behind the scenes, it's worth taking some time to look at the threading source: you can see how the threading module pulls in the primitives it needs from the thread module and puts everything together to provide that higher-level API.
Note that there are different tradeoffs here, from the perspective of the Python core developers. On the one hand, it should be easy to port Python to a new platform, so the thread module should be small: you should only have to implement a few basic primitives to get up and running on your new platform. In contrast, Python users want a wide variety of concurrency primitives, so the threading library needs to be extensive to support the needs of those users. Splitting the threading functionality into two separate layers is a good way of providing what the users need while not making it unnecessarily hard to maintain Python on a variety of platforms.
To answer your specific question: if you must use the thread library directly (despite all I've said above), you can do this:
import thread
import time
def main():
t1=thread.start_new_thread(prints,(3,))
t2=thread.start_new_thread(prints,(5,))
def prints(i):
while(i>0):
print "i="+str(i)+"\n"
i=i-1
if __name__=='__main__':
main()
# Give time for the output to show up.
time.sleep(1.0)
But of course, using a time.sleep is a pretty shoddy way of handling things in the main thread: really, we want to wait until both child threads have done their job before exiting. So we'd need to build some functionality where the main thread can wait for child threads. That functionality doesn't exist directly in the thread module, but it does in threading: that's exactly the point of the threading module: it provides a rich, easy-to-use API in place of the minimal, hard-to-use thread API. So we're back to the summary line: don't use thread, use threading.

Twisted threads for multiple clients on server

I have implemented a server program using Twisted. I am using twisted.protocols.basic.LineReceiver along with twisted.internet.protocol.ServerFactory.
I would like to have each client that connects to the server run a set of functions in parallel (I'm thinking of multi-threading for this).
I have some confusion with using twisted.internet.threads.deferToThread for this problem.
Should I call deferToThread in the ServerFactory for this purpose?
Are twisted threads, thread-safe with respect to race conditions?
Previously, I tried using multiprocessing in my server program but it seemed not to work in combination with the Twisted reactor, while deferToThread did the job.
I'm wondering how are Twisted threads implemented? Don't they utilize multiprocessing?
Previously, I tried using multiprocessing in my server program but it seemed not to work in combination with the Twisted reactor, while deferToThread did the job. I'm wondering how are Twisted threads implemented? Don't they utilize multiprocessing?
You didn't say whether you used the multi-threaded version of multiprocessing or the multi-process version of multiprocessing.
You can read about mixing Twisted and multiprocessing on Stack Overflow, though:
Mix Python Twisted with multiprocessing?
Twisted network client with multiprocessing workers?
is twisted incompatible with multiprocessing events and queues?
(And more)
To answer the shorter part of this question - no, Twisted does not use the stdlib multiprocessing package to implement its threading APIs. It uses the stdlib threading module.
Are twisted threads, thread-safe with respect to race conditions?
The answer to this is implied by the above answer: no. "Twisted threads" aren't really a thing. Twisted's threading APIs are just a layer on top of the stdlib threading module (which is really just a Python API for POSIX threads (or something kind of similar but different on Windows). Twisted's threading APIs don't magically eliminate the possibility of race conditions (if there is any magic in Twisted, it is the ability to do certain things concurrently without using threads at all - which helps reduce the number of race conditions in your program, though it doesn't entirely eliminate the possibility of creating them).
Should I call deferToThread in the ServerFactory for this purpose?
I'm not quite sure what the point of this question is. Are you wondering if a method on your ServerFactory subclass is the best place to put your calls to deferToThread? That probably depends on the details of your implementation approach. It probably doesn't make a huge difference overall, though. If you like the pattern of having the factory provide services to protocol instances - go for it.

Python asyncore & dbus

Is it possible to integrate asyncore with dbus through the same main loop?
Usually, DBus integration is done through glib main loop: is it possible to have either asyncore integrate this main loop or have dbus use asyncore's ?
asyncore sucks. glib already provides async stuff, so just use glib's mainloop to do everything.
I wrote a trivial GSource wrapper for one of my own projects called AsyncoreGSource
Just attach it to an appropriate MainContext:
source = AsyncoreGSource([socket_map])
source.attach([main_context])
Naturally the defaults are asyncore.socket_map and the default MainContext respectively.
You can also try monkey-patching asyncore.socket_map, which would have been my solution had I not poked through the GLib python bindings source code for GSource.
Although you got what is probably a perfectly reasonable answer, there is another approach - you don't need to use asyncore's loop per se. Just call asyncore.loop with a zero timeout and a count of 1, which stops it iterating (and thus makes the function name completely misleading) and polls the sockets just once. Call this as often as you need.
I don't know anything about glib's async support but if it requires threads you might still get better performance by using asyncore in this way since it will use select or poll and won't need to spawn additional threads.

Categories

Resources