What is the core difference between asyncio and trio? - python

Today, I found a library named trio which says itself is an asynchronous API for humans. These words are a little similar with requests'. As requests is really a good library, I am wondering what is the advantages of trio.
There aren't many articles about it, I just find an article discussing curio and asyncio. To my surprise, trio says itself is even better than curio(next-generation curio).
After reading half of the article, I cannot find the core difference between these two asynchronous framework. It just gives some examples that curio's implementation is more convenient than asyncio's. But the underlying structure is almost the same.
So could someone give me a reason I have to accept that trio or curio is better than asyncio? Or explain more about why I should choose trio instead of built-in asyncio?

Where I'm coming from: I'm the primary author of trio. I'm also one of the top contributors to curio (and wrote the article about it that you link to), and a Python core dev who's been heavily involved in discussions about how to improve asyncio.
In trio (and curio), one of the core design principles is that you never program with callbacks; it feels more like thread-based programming than callback-based programming. I guess if you open up the hood and look at how they're implemented internally, then there are places where they use callbacks, or things that are sorta equivalent to callbacks if you squint. But that's like saying that Python and C are equivalent because the Python interpreter is implemented in C. You never use callbacks.
Anyway:
Trio vs asyncio
Asyncio is more mature
The first big difference is ecosystem maturity. At the time I'm writing this in March 2018, there are many more libraries with asyncio support than trio support. For example, right now there aren't any real HTTP servers with trio support. The Framework :: AsyncIO classifier on PyPI currently has 122 libraries in it, while the Framework :: Trio classifier only has 8. I'm hoping that this part of the answer will become out of date quickly – for example, here's Kenneth Reitz experimenting with adding trio support in the next version of requests – but right now, you should expect that if you're trio for anything complicated, then you'll run into missing pieces that you need to fill in yourself instead of grabbing a library from pypi, or that you'll need to use the trio-asyncio package that lets you use asyncio libraries in trio programs. (The trio chat channel is useful for finding out about what's available, and what other people are working on.)
Trio makes your code simpler
In terms of the actual libraries, they're also very different. The main argument for trio is that it makes writing concurrent code much, much simpler than using asyncio. Of course, when was the last time you heard someone say that their library makes things harder to use... let me give a concrete example. In this talk (slides), I use the example of implementing RFC 8305 "Happy eyeballs", which is a simple concurrent algorithm used to efficiently establish a network connection. This is something that Glyph has been thinking about for years, and his latest version for Twisted is ~600 lines long. (Asyncio would be about the same; Twisted and asyncio are very similar architecturally.) In the talk, I teach you everything you need to know to implement it in <40 lines using trio (and we fix a bug in his version while we're at it). So in this example, using trio literally makes our code an order of magnitude simpler.
You might also find these comments from users interesting: 1, 2, 3
There are many many differences in detail
Why does this happen? That's a much longer answer :-). I'm gradually working on writing up the different pieces in blog posts and talks, and I'll try to remember to update this answer with links as they become available. Basically, it comes down to Trio having a small set of carefully designed primitives that have a few fundamental differences from any other library I know of (though of course build on ideas from lots of places). Here are some random notes to give you some idea:
A very, very common problem in asyncio and related libraries is that you call some_function(), and it returns, so you think it's done – but actually it's still running in the background. This leads to all kinds of tricky bugs, because it makes it difficult to control the order in which things happen, or know when anything has actually finished, and it can directly hide problems because if a background task crashes with an unhandled exception, asyncio will generally just print something to the console and then keep going. In trio, the way we handle task spawning via "nurseries" means that none of these things happen: when a function returns then you know it's done, and Trio's currently the only concurrency library for Python where exceptions always propagate until you catch them.
Trio's way of managing timeouts and cancellations is novel, and I think better than previous state-of-the-art systems like C# and Golang. I actually did write a whole essay on this, so I won't go into all the details here. But asyncio's cancellation system – or really, systems, it has two of them with slightly different semantics – are based on an older set of ideas than even C# and Golang, and are difficult to use correctly. (For example, it's easy for code to accidentally "escape" a cancellation by spawning a background task; see previous paragraph.)
There's a ton of redundant stuff in asyncio, which can make it hard to tell which thing to use when. You have futures, tasks, and coroutines, which are all basically used for the same purpose but you need to know the differences between them. If you want to implement a network protocol, you have to pick whether to use the protocols/transports layer or the streams layer, and they both have tricky pitfalls (this is what the first part of the essay you linked is about).
Trio's currently the only concurrency library for Python where control-C just works the way you expect (i.e., it raises KeyboardInterrupt where-ever your code is). It's a small thing, but it makes a big difference :-). For various reasons, I don't think this is fixable in asyncio.
Summing up
If you need to ship something to production next week, then you should use asyncio (or Twisted or Tornado or gevent, which are even more mature). They have large ecosystems, other people have used them in production before you, and they're not going anywhere.
If trying to use those frameworks leaves you frustrated and confused, or if want to experiment with a different way of doing things, then definitely check out trio – we're friendly :-).
If you want to ship something to production a year from now... then I'm not sure what to tell you. Python concurrency is in flux. Trio has many advantages at the design level, but is that enough to overcome asyncio's head start? Will asyncio being in the standard library be an advantage, or a disadvantage? (Notice how these days everyone uses requests, even though the standard library has urllib.) How many of the new ideas in trio can be added to asyncio? No-one knows. I expect that there will be a lot of interesting discussions about this at PyCon this year :-).

Related

More than one process at the same time

Hey I am learning Python at the moment. I wrote a few programs. Now I have a question:
Is it possible to run more "operations" at once?
According to my knowledge the scripts runs from the top to the bottom (except from thing like called def and if statements and so on).
For example: I want to do something and wait 5 seconds an then continue but while my program "waits" it should do something other? (This one is very simple)
Or: While checking for input do something other output things.
The examples are very poor but I do not finde something better at the moment. (If something comes to my mind, I will add it later)
I hope you understand what my question is.
Cheers
TL;DR: Use an async approach. Raymond Hettinger is a god, and this talk explains this concept more accurately and thoroughly than I can. ;)
The behavior you are describing is called "concurrency" or "asynchronicity", where you have more than one "piece" of code executing "at the same time". This is one of the hardest problems in practical computer science, because adding the dimension of time causes scheduling problems in addition to logic problems. However, it is very much in demand these days because of multi-core processors and the inherently parallel environment of the internet
"At the same time" is in quotes, because there are two basic ways to make this happen:
actually run the code at the same time
make it look like it is running at the same time.
The first option is called Concurrent programing, and the second is called Asynchronous programming (commonly "async").
Generally, "modern" programming seems to favor async, because it's easier to reason about and comes with fewer, less severe pitfalls. If you do it right, async programs can look a lot like the synchronous, procedural code you're already familiar with. Golang is basically built on the concept. Javascript has embraced "futures" in the form of Promises and async/await. I know it's not Python, but this talk by the creator of Go gives a good overview of the philosophy.
Python gives you three main ways to approach this, separated into three major modules: threading, multiprocessing, and asyncio
multiprocessing and threading are concurrent solutions. They do very similar things, but accomplish them in slightly different ways by delegating to the OS in different ways. This answer has a concise explanation of the difference. Concurrency is notoriously difficult to debug, because it is not deterministic: small differences in timing can result in completely different sequences of execution. You also have to deal with "race conditions" in threads, where two bits of code want to read/change the same piece of shared state at the same time.
asyncio, or "asynchronous input-output" is a more recent, async solution. You'll need at least Python 3.4. It uses event loops to allow long-running tasks to execute without "blocking" the rest of the program. Processes and threads do a similar thing, running two or more operations on even the same processor core by interrupting the running process periodically, forcing them to take turns. But with async, you decide where the turn-taking happens. It's like designing mature adults that interact cooperatively rather than designing kindergarteners that have to be baby-sat by the OS and forced to share the processor.
There are also third-party packages like gevent and eventlet that predate asyncio and work in earlier versions of Python. If you can afford to target Python >=3.4, I would recommend just using asyncio, because it's part of the Python core.

Why is there a need for Twisted?

I have been playing around with the twisted framework for about a week now(more because of curiosity rather than having to use it) and its been a lot of fun doing event driven asynchronous network programming.
However, there is something that I fail to understand. The twisted documentation starts off with
Twisted is a framework designed to be very flexible and let you write powerful servers.
My doubt is :- Why do we need such an event-driven library to write powerful servers when there are already very efficient implementations of various servers out there?
Surely, there must have been more than a couple of concrete implementations which the twisted developers had in mind while writing this event-driven I\O library. What are those? Why exactly was twisted made?
In a comment on another answer, you say "Every library is supposed to have ...". "Supposed" by whom? Having use-cases is certainly a nice way to nail down your requirements, but it's not the only way. It also doesn't make sense to talk about the use-cases for all of Twisted at once. There is no use case that justifies every single API in Twisted. There are hundreds or thousands of different use cases, each which justifies a lesser or greater subdivision of Twisted. These came and went over the years of Twisted's development, and no attempt has been made to keep a list of them. I can say that I worked on part of Twisted Names so that I would have a topic for a paper I was presenting at the time. I implemented the vt102 parser in Twisted Conch because I am obsessed with terminals and wanted a fun project involving them. And I implemented the IMAP4 support in Twisted Mail because I worked at a company developing a mail server which required tighter control over the mail store than any other IMAP4 server at the time offered.
So, as you can see, different parts of Twisted were written for widely differing reasons (and I've only given examples of my own reasons, not the reasons of any other developers).
The initial reason for a program being written often doesn't matter much in the long run though. Now the code is written: Twisted Names now runs the DNS for many domain names on the internet, the vt102 parser helped me get a job, and the company that drove the IMAP4 development is out of business. What really matters is what useful things you can do with the code now. As MattH points out, the resulting plethora of functionality has resulted in a library that (perhaps uniquely) addresses a wide array of interesting problems.
Why do we need such an event-driven library to write powerful servers when there are already very efficient implementations of various servers out there?
So paraphrasing: you can't imagine why anyone would need a toolkit when dyecast products already exist?
I'm guessing you've never needed to knock up a protocol gateway, e.g.
- write a daemon to md5 local files on demand over a unix socket
- interrogate a piece of software using udp and expose statistics over http.
I wrote a little proof-of-concept for the second example for a question here on SO in a handful of minutes. I couldn't do that without twisted.
Have you looked at: ProjectsUsingTwisted?
More on 'why': (disclaimer: I'm not a developer of Twisted proper), it's necessary to consider Twisted's high age (relative to Python's). When Twisted was written there was no sufficiently powerful non-blocking network/event driven library written around the reactor pattern (almost everyone was using threads back then). Twisted's initial use case was a large multiplayer game, although the specifics of this game seems to be somewhat lost in time.
Since the origins, as #MattH's link suggest, a very large amount of various network servers written in Python is based on Twisted.
This PyCon talk by the creator of Twisted should give you answers.
It has changed my opinion of Twisted. Before I viewed it as a massive piece of software with interfaces and weird names, two things that many developers dislike but that are actually just superficial things, and now that I’ve seen the history behind and the amazing number of use cases I respect it a lot. Life is short, you need Twisted :)

Suggestion Needed - Networking in Python - A good idea?

I am considering programming the network related features of my application in Python instead of the C/C++ API. The intended use of networking is to pass text messages between two instances of my application, similar to a game passing player positions as often as possible over the network.
Although the python socket modules seems sufficient and mature, I want to check if there are limitations of the python module which can be a problem at a later stage of the development.
What do you think of the python socket module :
Is it reliable and fast enough for production quality software ?
Are there any known limitations which can be a problem if my app. needs more complex networking other than regular client-server messaging ?
Thanks in advance,
Paul
Check out Twisted, a Python engine for Networking. Has built-in support for TCP, UDP, SSL/TLS, multicast, Unix sockets, a large number of protocols (including HTTP, NNTP, IMAP, SSH, IRC, FTP, and others)
Python is a mature language that can do almost anything that you can do in C/C++ (even direct memory access if you really want to hurt yourself).
You'll find that you can write beautiful code in it in a very short time, that this code is readable from the start and that it will stay readable (you will still know what it does even after returning one year later).
The drawback of Python is that your code will be somewhat slow. "Somewhat" as in "might be too slow for certain cases". So the usual approach is to write as much as possible in Python because it will make your app maintainable. Eventually, you might run into speed issues. That would be the time to consider to rewrite a part of your app in C.
The main advantages of this approach are:
You already have a running application. Translating the code from Python to C is much more simple than write it from scratch.
You already have a running application. After the translation of a small part of Python to C, you just have to test that small part and you can use the rest of the app (that didn't change) to do it.
You don't pay a price upfront. If Python is fast enough for you, you'll never have to do the optional optimization.
Python is much, much more powerful than C. Every line of Python can do the same as 100 or even 1000 lines of C.
To answer #1, I know that among other things, EVE Online (the MMO) uses a variant of Python for their server code.
The python that EVE online uses is StacklessPython (http://www.stackless.com/), and as far as i understand they use it for how it implements threading through using tasklets and whatnot. But since python itself can handle stuff like MMO with 40k people online i think it can do anything.
This bad answer and not really an answer to your question, rather addition to previous answer.
Alan.

Is Twisted an httplib2/socket replacement?

Many python libraries, even recently written ones, use httplib2 or the socket interface to perform networking tasks.
Those are obviously easier to code on than Twisted due to their blocking nature, but I think this is a drawback when integrating them with other code, especially GUI one. If you want scalability, concurrency or GUI integration while avoiding multithreading, Twisted is then a natural choice.
So I would be interested in opinions in those matters:
Should new networking code (with the exception of small command line tools) be written with Twisted?
Would you mix Twisted, http2lib or socket code in the same project?
Is Twisted pythonic for most libraries (it is more complex than alternatives, introduce a dependency to a non-standard package...)?
Edit: please let me phrase this in another way. Do you feel writing new library code with Twisted may add a barrier to its adoption? Twisted has obvious benefits (especially portability and scalability as stated by gimel), but the fact that it is not a core python library may be considered by some as a drawback.
See asychronous-programming-in-python-twisted, you'll have to decide if depending on a non-standard (external) library fits your needs. Note the answer by #Glyph, he is the founder of the Twisted project, and can authoritatively answer any Twisted related question.
At the core of libraries like Twisted, the function in the main loop is not sleep, but an operating system call like select() or poll(), as exposed by a module like the Python select module. I say "like" select, because this is an API that varies a lot between platforms, and almost every GUI toolkit has its own version. Twisted currently provides an abstract interface to 14 different variations on this theme. The common thing that such an API provides is provide a way to say "Here are a list of events that I'm waiting for. Go to sleep until one of them happens, then wake up and tell me which one of them it was."
Should new networking code (with the exception of small command line tools) be written with Twisted?
Maybe. It really depends. Sometimes its just easy enough to wrap the blocking calls in their own thread. Twisted is good for large scale network code.
Would you mix Twisted, http2lib or socket code in the same project?
Sure. But just remember that Twisted is single threaded, and that any blocking call in Twisted will block the entire engine.
Is Twisted pythonic for most libraries (it is more complex than alternatives, introduce a dependency to a non-standard package...)?
There are many Twisted zealots that will say it belongs in the Python standard library. But many people can implement decent networking code with asyncore/asynchat.

Easier concurrency building blocks for Python?

It seems that Python standard library lacks various useful concurrency-related concepts such as atomic counter, executor and others that can be found in e.g. java.util.concurrent. Are there any external libraries that would provide easier building blocks for concurrent Python applications?
Kamaelia, as already mentioned, is aimed at making concurrency easier to work with in python.
Its original use case was network systems (which are a naturally concurrent) and developed with the viewpoint "How can we make these systems easier to develop and maintain".
Since then life has moved on and it is being used in a much wider variety of problem domains from desktop systems (like whiteboarding applications, database modelling, tools for teaching children to read and write) through to back end systems for websites (like stuff for transcoding & converting user contributed images and video for web playback in a variety of scenarios and SMS / text messaging applications.
The core concept is essentially the same idea as Unix pipelines - except instead of processes you can have python generators, threads, or processes - which are termed components. These communicate over inboxes and outboxes - as many as you like of each, rather than just stdin/stdout/stderr. Also rather than requiring serialised file interfaces, you pass between components fully fledged python objects. Also rather than being limited to pipelines, you can have arbitrary shapes - called graphlines.
You can find a full tutorial (video, slides, downloadable PDF booklet) here:
http://www.kamaelia.org/PragmaticConcurrency
Or the 5 minute version here (O'Reilly ignite talk):
http://yeoldeclue.com/cgi-bin/blog/blog.cgi?rm=viewpost&nodeid=1235690128
The focus on the library is pragmatic development, system safety and ease of maintenance though some effort has gone in recently towards adding some syntactic sugar. Like anything the developers (me and others :-) welcome feedback on improving it.
You can also find more information here:
- http://www.slideshare.net/kamaelian
Primarily, Kamaelia's core (Axon) was written to make my day job easier, and to wrap up best practice (message passing, software transactional memory) in a reusable fashion. I hope it makes your life easier too :-)
Although it may not be immediately obvious, itertools.count is indeed an atomic counter (the only operation on an instance x thereof, spelled next(x), is equivalent to an "atomic ++x" if C had such a concept;-). Edit: at least, this surely holds in CPython; I thought it was part of the Python standard definition but apparently IronPython and Jython disagree (not ensuring thread-safety of count.next in their current implementations) so I may well be wrong!
That is, suppose you currently have a data structure such as:
counters = dict.fromkeys(words_of_interest, 0)
...
if w in counters: counters[w] += 1
and your problem is that the latter increment is not atomic, so if two threads are at the same time dealing with the same word of interest the two increments might interfere (only one would "take", so the counter would be incremented only by one, not by two). Then:
counters = dict((w, itertools.count()) for w in words_of_interest)
...
if w in counters: next(counters[w])
will perform the same operations, but in an atomic way.
(There is unfortunately no obvious, documented way to "extract the current value of the counter", though in fact str(x) does return a string such as 'count(3)' from which the current value can be parsed out again;-).
Concurrency in Python (at least CPython) and Java are wildly different, at least in part because of the Global Interpreter Lock (GIL). In general, concurrency in Python is achieved not with threads, but processes. See multiprocessing for the "standard" concurrency module.
Also, check out "A Curious Course on Coroutines and Concurrency" for some concurrency techniques that were pretty new to me coming from Java. David Beazley (the author) is a Smart Guy™ when it comes to Python in general, and concurrency in particular.
kamaelia provides tools for abstracting concurrency to threads or process etc.
P-workers creates a "Job-Worker" abstraction over the python multiprocessing library. It simplifies concurrency with multiprocessing by starting "Workers" that have specific skills/attributes (defined functions), and providing a queue where they receive "Jobs" from. Its somewhat analogous to a thread pool, only with processes instead of threads. Therefore its better suited for a high number of CPU instructions. You can also use it to spawn multiple instances of a single application or even spawn "Workers" that have multiple threads.

Categories

Resources